Improved generative semisupervised learning based on finely grained component-conditional class labeling

David Jonathan Miller, Jayaram Raghuram, George Kesidis, Christopher M. Collins

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

We introduce new inductive, generative semisupervised mixtures with more finely grained class label generation mechanisms than in previous work. Our models combine advantages of semisupervised mixtures, which achieve label extrapolation over a component, and nearestneighbor (NN)/nearest-prototype (NP) classification, which achieve accurate classification in the vicinity of labeled samples or prototypes. For our NN-based method, we propose a novel two-stage stochastic data generation, with all samples first generated using a standard finite mixture and then all class labels generated, conditioned on the samples and their components of origin. This mechanism entails an underlying Markov random field, specific to each mixture component or cluster. We invoke the pseudo-likelihood formulation, which forms the basis for an approximate generalized expectation-maximization model learning algorithm. Our NP-based model overcomes a problem with the NN-based model that manifests at very low labeled fractions. Both models are advantageous when within-component class proportions are not constant over the feature space region "owned by" a component. The practicality of this scenario is borne out by experiments on UC Irvine data sets, which demonstrate significant gains in classification accuracy over previous semisupervised mixtures and also overall gains, over KNN classification. Moreover, for very small labeled fractions, our methods overall outperform supervised linear and nonlinear kernel support vector machines.

Original languageEnglish (US)
Pages (from-to)1926-1966
Number of pages41
JournalNeural computation
Volume24
Issue number7
DOIs
StatePublished - Dec 1 2012

Fingerprint

Learning
Generative
Labeling
Supervised Machine Learning
Prototype
Support Vector Machine
Datasets
Learning Model
Kernel
Extrapolation
Scenarios
Proportion
Experiment

All Science Journal Classification (ASJC) codes

  • Arts and Humanities (miscellaneous)
  • Cognitive Neuroscience

Cite this

@article{5d2ebe08fa314c5fbb1ce36d9ab13bf1,
title = "Improved generative semisupervised learning based on finely grained component-conditional class labeling",
abstract = "We introduce new inductive, generative semisupervised mixtures with more finely grained class label generation mechanisms than in previous work. Our models combine advantages of semisupervised mixtures, which achieve label extrapolation over a component, and nearestneighbor (NN)/nearest-prototype (NP) classification, which achieve accurate classification in the vicinity of labeled samples or prototypes. For our NN-based method, we propose a novel two-stage stochastic data generation, with all samples first generated using a standard finite mixture and then all class labels generated, conditioned on the samples and their components of origin. This mechanism entails an underlying Markov random field, specific to each mixture component or cluster. We invoke the pseudo-likelihood formulation, which forms the basis for an approximate generalized expectation-maximization model learning algorithm. Our NP-based model overcomes a problem with the NN-based model that manifests at very low labeled fractions. Both models are advantageous when within-component class proportions are not constant over the feature space region {"}owned by{"} a component. The practicality of this scenario is borne out by experiments on UC Irvine data sets, which demonstrate significant gains in classification accuracy over previous semisupervised mixtures and also overall gains, over KNN classification. Moreover, for very small labeled fractions, our methods overall outperform supervised linear and nonlinear kernel support vector machines.",
author = "Miller, {David Jonathan} and Jayaram Raghuram and George Kesidis and Collins, {Christopher M.}",
year = "2012",
month = "12",
day = "1",
doi = "10.1162/NECO_a_00284",
language = "English (US)",
volume = "24",
pages = "1926--1966",
journal = "Neural Computation",
issn = "0899-7667",
publisher = "MIT Press Journals",
number = "7",

}

Improved generative semisupervised learning based on finely grained component-conditional class labeling. / Miller, David Jonathan; Raghuram, Jayaram; Kesidis, George; Collins, Christopher M.

In: Neural computation, Vol. 24, No. 7, 01.12.2012, p. 1926-1966.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Improved generative semisupervised learning based on finely grained component-conditional class labeling

AU - Miller, David Jonathan

AU - Raghuram, Jayaram

AU - Kesidis, George

AU - Collins, Christopher M.

PY - 2012/12/1

Y1 - 2012/12/1

N2 - We introduce new inductive, generative semisupervised mixtures with more finely grained class label generation mechanisms than in previous work. Our models combine advantages of semisupervised mixtures, which achieve label extrapolation over a component, and nearestneighbor (NN)/nearest-prototype (NP) classification, which achieve accurate classification in the vicinity of labeled samples or prototypes. For our NN-based method, we propose a novel two-stage stochastic data generation, with all samples first generated using a standard finite mixture and then all class labels generated, conditioned on the samples and their components of origin. This mechanism entails an underlying Markov random field, specific to each mixture component or cluster. We invoke the pseudo-likelihood formulation, which forms the basis for an approximate generalized expectation-maximization model learning algorithm. Our NP-based model overcomes a problem with the NN-based model that manifests at very low labeled fractions. Both models are advantageous when within-component class proportions are not constant over the feature space region "owned by" a component. The practicality of this scenario is borne out by experiments on UC Irvine data sets, which demonstrate significant gains in classification accuracy over previous semisupervised mixtures and also overall gains, over KNN classification. Moreover, for very small labeled fractions, our methods overall outperform supervised linear and nonlinear kernel support vector machines.

AB - We introduce new inductive, generative semisupervised mixtures with more finely grained class label generation mechanisms than in previous work. Our models combine advantages of semisupervised mixtures, which achieve label extrapolation over a component, and nearestneighbor (NN)/nearest-prototype (NP) classification, which achieve accurate classification in the vicinity of labeled samples or prototypes. For our NN-based method, we propose a novel two-stage stochastic data generation, with all samples first generated using a standard finite mixture and then all class labels generated, conditioned on the samples and their components of origin. This mechanism entails an underlying Markov random field, specific to each mixture component or cluster. We invoke the pseudo-likelihood formulation, which forms the basis for an approximate generalized expectation-maximization model learning algorithm. Our NP-based model overcomes a problem with the NN-based model that manifests at very low labeled fractions. Both models are advantageous when within-component class proportions are not constant over the feature space region "owned by" a component. The practicality of this scenario is borne out by experiments on UC Irvine data sets, which demonstrate significant gains in classification accuracy over previous semisupervised mixtures and also overall gains, over KNN classification. Moreover, for very small labeled fractions, our methods overall outperform supervised linear and nonlinear kernel support vector machines.

UR - http://www.scopus.com/inward/record.url?scp=84874044177&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874044177&partnerID=8YFLogxK

U2 - 10.1162/NECO_a_00284

DO - 10.1162/NECO_a_00284

M3 - Article

AN - SCOPUS:84874044177

VL - 24

SP - 1926

EP - 1966

JO - Neural Computation

JF - Neural Computation

SN - 0899-7667

IS - 7

ER -