Transductive methods for the distributed ensemble classification problem

David Jonathan Miller, Siddharth Pal

    Research output: Contribution to journalArticle

    7 Citations (Scopus)

    Abstract

    We consider ensemble classification for the case where there is no common labeled training data for jointly designing the individual classifiers and the function that aggregates their decisions. This problem, which we call distributed ensemble classification, applies when individual classifiers operate (perhaps remotely) on different sensing modalities and when combining proprietary or legacy classifiers. The conventional wisdom in this case is to apply fixed rules of combination such as voting methods or rules for aggregating probabilities. Alternatively, we take a transductive approach, optimizing the combining rule for an objective function measured on the unlabeled batch of test data. We propose maximum likelihood (ML) objectives that are shown to yield well-known forms of probabilistic aggregation, albeit with iterative, expectation-maximization-based adjustment to account for mismatch between class priors used by individual classifiers and those reflected in the new data batch. These methods are extensions, for the ensemble case, of the work of Saerens, Latinne, and Decaestecker (2002). We also propose an information-theoretic method that generally outperforms the ML methods, better handles classifier redundancies, and addresses some scenarios where the ML methods are not applicable. This method also well handles the case of missing classes in the test batch. On UC Irvine benchmark data, all our methods give improvements in classification accuracy over the use of fixed rules when there is prior mismatch.

    Original languageEnglish (US)
    Pages (from-to)856-884
    Number of pages29
    JournalNeural Computation
    Volume19
    Issue number3
    DOIs
    StatePublished - Mar 1 2007

    Fingerprint

    Benchmarking
    Politics
    Ensemble
    Classifier
    Maximum Likelihood
    Mismatch
    Scenarios
    Voting
    Wisdom
    Conventional
    Redundancy
    Modality
    Benchmark

    All Science Journal Classification (ASJC) codes

    • Arts and Humanities (miscellaneous)
    • Cognitive Neuroscience

    Cite this

    @article{5660e5b800c64147b494fc2304a58995,
    title = "Transductive methods for the distributed ensemble classification problem",
    abstract = "We consider ensemble classification for the case where there is no common labeled training data for jointly designing the individual classifiers and the function that aggregates their decisions. This problem, which we call distributed ensemble classification, applies when individual classifiers operate (perhaps remotely) on different sensing modalities and when combining proprietary or legacy classifiers. The conventional wisdom in this case is to apply fixed rules of combination such as voting methods or rules for aggregating probabilities. Alternatively, we take a transductive approach, optimizing the combining rule for an objective function measured on the unlabeled batch of test data. We propose maximum likelihood (ML) objectives that are shown to yield well-known forms of probabilistic aggregation, albeit with iterative, expectation-maximization-based adjustment to account for mismatch between class priors used by individual classifiers and those reflected in the new data batch. These methods are extensions, for the ensemble case, of the work of Saerens, Latinne, and Decaestecker (2002). We also propose an information-theoretic method that generally outperforms the ML methods, better handles classifier redundancies, and addresses some scenarios where the ML methods are not applicable. This method also well handles the case of missing classes in the test batch. On UC Irvine benchmark data, all our methods give improvements in classification accuracy over the use of fixed rules when there is prior mismatch.",
    author = "Miller, {David Jonathan} and Siddharth Pal",
    year = "2007",
    month = "3",
    day = "1",
    doi = "10.1162/neco.2007.19.3.856",
    language = "English (US)",
    volume = "19",
    pages = "856--884",
    journal = "Neural Computation",
    issn = "0899-7667",
    publisher = "MIT Press Journals",
    number = "3",

    }

    Transductive methods for the distributed ensemble classification problem. / Miller, David Jonathan; Pal, Siddharth.

    In: Neural Computation, Vol. 19, No. 3, 01.03.2007, p. 856-884.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Transductive methods for the distributed ensemble classification problem

    AU - Miller, David Jonathan

    AU - Pal, Siddharth

    PY - 2007/3/1

    Y1 - 2007/3/1

    N2 - We consider ensemble classification for the case where there is no common labeled training data for jointly designing the individual classifiers and the function that aggregates their decisions. This problem, which we call distributed ensemble classification, applies when individual classifiers operate (perhaps remotely) on different sensing modalities and when combining proprietary or legacy classifiers. The conventional wisdom in this case is to apply fixed rules of combination such as voting methods or rules for aggregating probabilities. Alternatively, we take a transductive approach, optimizing the combining rule for an objective function measured on the unlabeled batch of test data. We propose maximum likelihood (ML) objectives that are shown to yield well-known forms of probabilistic aggregation, albeit with iterative, expectation-maximization-based adjustment to account for mismatch between class priors used by individual classifiers and those reflected in the new data batch. These methods are extensions, for the ensemble case, of the work of Saerens, Latinne, and Decaestecker (2002). We also propose an information-theoretic method that generally outperforms the ML methods, better handles classifier redundancies, and addresses some scenarios where the ML methods are not applicable. This method also well handles the case of missing classes in the test batch. On UC Irvine benchmark data, all our methods give improvements in classification accuracy over the use of fixed rules when there is prior mismatch.

    AB - We consider ensemble classification for the case where there is no common labeled training data for jointly designing the individual classifiers and the function that aggregates their decisions. This problem, which we call distributed ensemble classification, applies when individual classifiers operate (perhaps remotely) on different sensing modalities and when combining proprietary or legacy classifiers. The conventional wisdom in this case is to apply fixed rules of combination such as voting methods or rules for aggregating probabilities. Alternatively, we take a transductive approach, optimizing the combining rule for an objective function measured on the unlabeled batch of test data. We propose maximum likelihood (ML) objectives that are shown to yield well-known forms of probabilistic aggregation, albeit with iterative, expectation-maximization-based adjustment to account for mismatch between class priors used by individual classifiers and those reflected in the new data batch. These methods are extensions, for the ensemble case, of the work of Saerens, Latinne, and Decaestecker (2002). We also propose an information-theoretic method that generally outperforms the ML methods, better handles classifier redundancies, and addresses some scenarios where the ML methods are not applicable. This method also well handles the case of missing classes in the test batch. On UC Irvine benchmark data, all our methods give improvements in classification accuracy over the use of fixed rules when there is prior mismatch.

    UR - http://www.scopus.com/inward/record.url?scp=33847690219&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33847690219&partnerID=8YFLogxK

    U2 - 10.1162/neco.2007.19.3.856

    DO - 10.1162/neco.2007.19.3.856

    M3 - Article

    VL - 19

    SP - 856

    EP - 884

    JO - Neural Computation

    JF - Neural Computation

    SN - 0899-7667

    IS - 3

    ER -