Recently, we proposed new methods for approximately learning the maximum entropy (ME) joint pmf for discrete feature spaces. Our approximate techniques overcome the intractability that plagues most ME learning methods when given a general set of constraints. The resulting models are useful for classification as well as more general inference. Our method has been demonstrated to yield strong performance in comparison with Bayesian networks, dependence trees, tree-augmented naive Bayes models, and multilayer perceptrons. After first reviewing our method, we provide insight into why it works and how it is related to, albeit distinct from naive Bayes classification (NBC). The connection to NBC then naturally leads us to suggest a simple method for parsimoniously choosing the set of constraints to encode when forming the model. Finally, we provide new experimental comparisons for our method, with decision trees and with support vector machines.
|Original language||English (US)|
|Number of pages||10|
|State||Published - Dec 1 2001|
All Science Journal Classification (ASJC) codes
- Signal Processing
- Electrical and Electronic Engineering