Classifiers, e.g., those based on Naive Bayes, a support vector machine, or even a neural network, are highly susceptible to a data-poisoning attack. The attack objective is to degrade classification accuracy by covertly embedding malicious (labeled) samples into the training set. Such attacks can be mounted by an insider, through an outsourcing process (for data acquisition or training), or conceivably during active learning. In some cases, a very small amount of poisoning can result in dramatic reduction in classification accuracy. Data poisoning attacks are successful mainly because the malicious injected samples significantly skew the data distribution of the corrupted class. Such attack samples are generally data outliers and in principle separable from the clean samples. We propose a generalized, scalable, and dynamic data driven defense system that: 1) uses a mixture model both to well-fit the (potentially multi-modal) data and to give potential to isolate attack samples in a small subset of the mixture components; 2) performs hypothesis testing to decide both which components and which samples within those components are poisoned, with the identified poisoned ones purged from the training set. Our approaches addresses the attack scenario where adversarial samples are an unknown subset embedded in the initial training set, and can be used to perform data sanitization as a precursor to the training of any type of classifier. The promising results for experiments on the TREC05 spam corpus and Amazon reviews polarity dataset demonstrate the effectiveness of our defense strategy.