Approximate maximum entropy joint feature inference consistent with arbitrary lower-order probability constraints: Application to statistical classification

    Research output: Contribution to journalArticle

    8 Citations (Scopus)

    Abstract

    We propose a new learning method for discrete space statistical classifiers. Similar to Chow and Liu (1968) and Cheeseman (1983), we cast classification/inference within the more general framework of estimating the joint probability mass function (p.m.f.) for the (feature vector, class label) pair. Cheeseman's proposal to build the maximum entropy (ME) joint p.m.f. consistent with general lower-order probability constraints is in principle powerful, allowing general dependencies between features. However, enormous learning complexity has severely limited the use of this approach. Alternative models such as Bayesian networks (BNs) require explicit determination of conditional independencies. These may be difficult to assess given limited data. Here we propose an approximate ME method, which, like previous methods, incorporates general constraints while retaining quite tractable learning. The new method restricts joint p.m.f. support during learning to a small subset of the full feature space. Classification gains are realized over dependence trees, tree-augmented naive Bayes networks, BNs trained by the Kutato algorithm, and multilayer perceptrons. Extensions to more general inference problems are indicated. We also propose a novel exact inference method when there are several missing features.

    Original languageEnglish (US)
    Pages (from-to)2175-2207
    Number of pages33
    JournalNeural Computation
    Volume12
    Issue number9
    DOIs
    StatePublished - Jan 1 2000

    Fingerprint

    Entropy
    Joints
    Learning
    Neural Networks (Computer)
    Inference
    Maximum Entropy
    Bayesian Networks

    All Science Journal Classification (ASJC) codes

    • Arts and Humanities (miscellaneous)
    • Cognitive Neuroscience

    Cite this

    @article{81d5ff5824f849c795d7ae8748a6a096,
    title = "Approximate maximum entropy joint feature inference consistent with arbitrary lower-order probability constraints: Application to statistical classification",
    abstract = "We propose a new learning method for discrete space statistical classifiers. Similar to Chow and Liu (1968) and Cheeseman (1983), we cast classification/inference within the more general framework of estimating the joint probability mass function (p.m.f.) for the (feature vector, class label) pair. Cheeseman's proposal to build the maximum entropy (ME) joint p.m.f. consistent with general lower-order probability constraints is in principle powerful, allowing general dependencies between features. However, enormous learning complexity has severely limited the use of this approach. Alternative models such as Bayesian networks (BNs) require explicit determination of conditional independencies. These may be difficult to assess given limited data. Here we propose an approximate ME method, which, like previous methods, incorporates general constraints while retaining quite tractable learning. The new method restricts joint p.m.f. support during learning to a small subset of the full feature space. Classification gains are realized over dependence trees, tree-augmented naive Bayes networks, BNs trained by the Kutato algorithm, and multilayer perceptrons. Extensions to more general inference problems are indicated. We also propose a novel exact inference method when there are several missing features.",
    author = "Miller, {David Jonathan} and Lian Yan",
    year = "2000",
    month = "1",
    day = "1",
    doi = "10.1162/089976600300015105",
    language = "English (US)",
    volume = "12",
    pages = "2175--2207",
    journal = "Neural Computation",
    issn = "0899-7667",
    publisher = "MIT Press Journals",
    number = "9",

    }

    TY - JOUR

    T1 - Approximate maximum entropy joint feature inference consistent with arbitrary lower-order probability constraints

    T2 - Application to statistical classification

    AU - Miller, David Jonathan

    AU - Yan, Lian

    PY - 2000/1/1

    Y1 - 2000/1/1

    N2 - We propose a new learning method for discrete space statistical classifiers. Similar to Chow and Liu (1968) and Cheeseman (1983), we cast classification/inference within the more general framework of estimating the joint probability mass function (p.m.f.) for the (feature vector, class label) pair. Cheeseman's proposal to build the maximum entropy (ME) joint p.m.f. consistent with general lower-order probability constraints is in principle powerful, allowing general dependencies between features. However, enormous learning complexity has severely limited the use of this approach. Alternative models such as Bayesian networks (BNs) require explicit determination of conditional independencies. These may be difficult to assess given limited data. Here we propose an approximate ME method, which, like previous methods, incorporates general constraints while retaining quite tractable learning. The new method restricts joint p.m.f. support during learning to a small subset of the full feature space. Classification gains are realized over dependence trees, tree-augmented naive Bayes networks, BNs trained by the Kutato algorithm, and multilayer perceptrons. Extensions to more general inference problems are indicated. We also propose a novel exact inference method when there are several missing features.

    AB - We propose a new learning method for discrete space statistical classifiers. Similar to Chow and Liu (1968) and Cheeseman (1983), we cast classification/inference within the more general framework of estimating the joint probability mass function (p.m.f.) for the (feature vector, class label) pair. Cheeseman's proposal to build the maximum entropy (ME) joint p.m.f. consistent with general lower-order probability constraints is in principle powerful, allowing general dependencies between features. However, enormous learning complexity has severely limited the use of this approach. Alternative models such as Bayesian networks (BNs) require explicit determination of conditional independencies. These may be difficult to assess given limited data. Here we propose an approximate ME method, which, like previous methods, incorporates general constraints while retaining quite tractable learning. The new method restricts joint p.m.f. support during learning to a small subset of the full feature space. Classification gains are realized over dependence trees, tree-augmented naive Bayes networks, BNs trained by the Kutato algorithm, and multilayer perceptrons. Extensions to more general inference problems are indicated. We also propose a novel exact inference method when there are several missing features.

    UR - http://www.scopus.com/inward/record.url?scp=0001435073&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0001435073&partnerID=8YFLogxK

    U2 - 10.1162/089976600300015105

    DO - 10.1162/089976600300015105

    M3 - Article

    AN - SCOPUS:0001435073

    VL - 12

    SP - 2175

    EP - 2207

    JO - Neural Computation

    JF - Neural Computation

    SN - 0899-7667

    IS - 9

    ER -