A mixture model and EM algorithm for robust classification, outlier rejection, and class discovery

David Jonathan Miller, John Browning

    Research output: Contribution to journalConference article

    1 Scopus citations

    Abstract

    Several authors have addressed learning a classifier given a mixed labeled/unlabeled training set. These works assume each unlabeled sample originates from one of the (known) classes. Here, we consider the scenario in which unlabeled points may belong either to known/predefined or to heretofore undiscovered classes. There are several practical situations where such data may arise. We propose a novel statistical mixture model which views as observed data not only the feature vector and the class label, but also the fact of label presence/absence for each point. Two types of mixture components are posited to explain label presence/absence. "Predefined" components generate both labeled and unlabeled points and assume labels are missing at random. "Nonpredefined" components only generate unlabeled points - thus, in localized regions, they capture data subsets that are exclusively unlabeled. Such subsets may represent an outlier distribution, or new classes. The components' predefined/non-predefmed natures are data-driven, learned along with the other parameters via an algorithm based on expectation-maximization (EM). There are three natural applications: 1) robust classifier design, given a mixed training set with outliers; 2) classification with rejections; 3) identification of the unlabeled points (and their representative components) that originate from unknown classes, i.e. new class discovery. We evaluate our method and alternative approaches on both synthetic and real-world data sets.

    Original languageEnglish (US)
    Pages (from-to)809-812
    Number of pages4
    JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
    Volume2
    StatePublished - Jan 1 2003
    Event2003 IEEE International Conference on Accoustics, Speech, and Signal Processing - Hong Kong, Hong Kong
    Duration: Apr 6 2003Apr 10 2003

      Fingerprint

    All Science Journal Classification (ASJC) codes

    • Software
    • Signal Processing
    • Electrical and Electronic Engineering

    Cite this