General statistical inference by an approximate application of the maximum entropy principle

Lian Yan, David J. Miller

    Research output: Contribution to conferencePaper

    1 Citation (Scopus)

    Abstract

    We propose a new learning method for building a general statistical inference engine, operating on discrete feature spaces. Such a model allows inference on any feature given values for the other features (or for a feature subset). Bayesian networks (BNs) are versatile tools that possess this inference capability. However, while the BN's explicit representation of conditional independencies is informative, this structure is not so easily learned. Typically, learning methods for BNs use (sub-optimal) greedy search techniques. There is also a difficult issue of overfitting in these models. Alternatively, in 1983 Cheeseman proposed finding the maximum entropy (ME) joint pmf consistent with arbitrary lower order probability constraints. This approach has some potential advantages over BNs. However, the huge complexity required for learning the joint pmf has severely limited the use of this approach until now. Here we propose an approximate ME method which also allows incorporation of arbitrary lower order constraints, but while retaining quite tractable learning complexity. The new method approximates the joint feature pmf (during learning) on a subgrid of the full feature space grid. Experimental results on the UC-Irvine repository reveal significant performance gains over two BN approaches: Chow and Liu's dependence trees and Herskovits and Cooper's Kutato. Several extensions of our approach are indicated.

    Original languageEnglish (US)
    Pages112-121
    Number of pages10
    StatePublished - Dec 1 1999
    EventProceedings of the 1999 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99) - Madison, WI, USA
    Duration: Aug 23 1999Aug 25 1999

    Other

    OtherProceedings of the 1999 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99)
    CityMadison, WI, USA
    Period8/23/998/25/99

    Fingerprint

    Bayesian networks
    Entropy
    Maximum entropy methods
    Inference engines

    All Science Journal Classification (ASJC) codes

    • Signal Processing
    • Software
    • Electrical and Electronic Engineering

    Cite this

    Yan, L., & Miller, D. J. (1999). General statistical inference by an approximate application of the maximum entropy principle. 112-121. Paper presented at Proceedings of the 1999 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99), Madison, WI, USA, .
    Yan, Lian ; Miller, David J. / General statistical inference by an approximate application of the maximum entropy principle. Paper presented at Proceedings of the 1999 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99), Madison, WI, USA, .10 p.
    @conference{1f72435fe96448a7a27bf39ef9c70367,
    title = "General statistical inference by an approximate application of the maximum entropy principle",
    abstract = "We propose a new learning method for building a general statistical inference engine, operating on discrete feature spaces. Such a model allows inference on any feature given values for the other features (or for a feature subset). Bayesian networks (BNs) are versatile tools that possess this inference capability. However, while the BN's explicit representation of conditional independencies is informative, this structure is not so easily learned. Typically, learning methods for BNs use (sub-optimal) greedy search techniques. There is also a difficult issue of overfitting in these models. Alternatively, in 1983 Cheeseman proposed finding the maximum entropy (ME) joint pmf consistent with arbitrary lower order probability constraints. This approach has some potential advantages over BNs. However, the huge complexity required for learning the joint pmf has severely limited the use of this approach until now. Here we propose an approximate ME method which also allows incorporation of arbitrary lower order constraints, but while retaining quite tractable learning complexity. The new method approximates the joint feature pmf (during learning) on a subgrid of the full feature space grid. Experimental results on the UC-Irvine repository reveal significant performance gains over two BN approaches: Chow and Liu's dependence trees and Herskovits and Cooper's Kutato. Several extensions of our approach are indicated.",
    author = "Lian Yan and Miller, {David J.}",
    year = "1999",
    month = "12",
    day = "1",
    language = "English (US)",
    pages = "112--121",
    note = "Proceedings of the 1999 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99) ; Conference date: 23-08-1999 Through 25-08-1999",

    }

    Yan, L & Miller, DJ 1999, 'General statistical inference by an approximate application of the maximum entropy principle', Paper presented at Proceedings of the 1999 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99), Madison, WI, USA, 8/23/99 - 8/25/99 pp. 112-121.

    General statistical inference by an approximate application of the maximum entropy principle. / Yan, Lian; Miller, David J.

    1999. 112-121 Paper presented at Proceedings of the 1999 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99), Madison, WI, USA, .

    Research output: Contribution to conferencePaper

    TY - CONF

    T1 - General statistical inference by an approximate application of the maximum entropy principle

    AU - Yan, Lian

    AU - Miller, David J.

    PY - 1999/12/1

    Y1 - 1999/12/1

    N2 - We propose a new learning method for building a general statistical inference engine, operating on discrete feature spaces. Such a model allows inference on any feature given values for the other features (or for a feature subset). Bayesian networks (BNs) are versatile tools that possess this inference capability. However, while the BN's explicit representation of conditional independencies is informative, this structure is not so easily learned. Typically, learning methods for BNs use (sub-optimal) greedy search techniques. There is also a difficult issue of overfitting in these models. Alternatively, in 1983 Cheeseman proposed finding the maximum entropy (ME) joint pmf consistent with arbitrary lower order probability constraints. This approach has some potential advantages over BNs. However, the huge complexity required for learning the joint pmf has severely limited the use of this approach until now. Here we propose an approximate ME method which also allows incorporation of arbitrary lower order constraints, but while retaining quite tractable learning complexity. The new method approximates the joint feature pmf (during learning) on a subgrid of the full feature space grid. Experimental results on the UC-Irvine repository reveal significant performance gains over two BN approaches: Chow and Liu's dependence trees and Herskovits and Cooper's Kutato. Several extensions of our approach are indicated.

    AB - We propose a new learning method for building a general statistical inference engine, operating on discrete feature spaces. Such a model allows inference on any feature given values for the other features (or for a feature subset). Bayesian networks (BNs) are versatile tools that possess this inference capability. However, while the BN's explicit representation of conditional independencies is informative, this structure is not so easily learned. Typically, learning methods for BNs use (sub-optimal) greedy search techniques. There is also a difficult issue of overfitting in these models. Alternatively, in 1983 Cheeseman proposed finding the maximum entropy (ME) joint pmf consistent with arbitrary lower order probability constraints. This approach has some potential advantages over BNs. However, the huge complexity required for learning the joint pmf has severely limited the use of this approach until now. Here we propose an approximate ME method which also allows incorporation of arbitrary lower order constraints, but while retaining quite tractable learning complexity. The new method approximates the joint feature pmf (during learning) on a subgrid of the full feature space grid. Experimental results on the UC-Irvine repository reveal significant performance gains over two BN approaches: Chow and Liu's dependence trees and Herskovits and Cooper's Kutato. Several extensions of our approach are indicated.

    UR - http://www.scopus.com/inward/record.url?scp=0033337022&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0033337022&partnerID=8YFLogxK

    M3 - Paper

    AN - SCOPUS:0033337022

    SP - 112

    EP - 121

    ER -

    Yan L, Miller DJ. General statistical inference by an approximate application of the maximum entropy principle. 1999. Paper presented at Proceedings of the 1999 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99), Madison, WI, USA, .