Exploiting the value of class labels in topic models for semi-supervised document classification

Hossein Soleimani, David J. Miller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We propose a mixture of class-conditioned topic models for classifying text documents using both labeled and unlabeled training documents in a semi-supervised fashion. Most topic models incorporate documents' class labels by generating them after generating the word space. In these models, the training class labels have relatively small effect on the estimated topics, as the likelihood function is mostly dominated by the word space, whose size dwarfs a single class label per document. In this paper, we propose to increase the influence of class labels on model parameters by generating the word space in each document conditioned on the class label. We show that our specific generative process improves classification performance while maintaining the ability of the model to discover topics from the word space. Within our framework, we also provide a principled mechanism to control the contribution of the class labels and the word space to the likelihood function. Experimental results show that our approach achieves better classification performance compared to some standard semi-supervised and supervised topic models. We provide the required code to replicate our experiments at https://github.com/hsoleimani/MCCTM.

Original languageEnglish (US)
Title of host publication2016 International Joint Conference on Neural Networks, IJCNN 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4025-4031
Number of pages7
ISBN (Electronic)9781509006199
DOIs
StatePublished - Oct 31 2016
Event2016 International Joint Conference on Neural Networks, IJCNN 2016 - Vancouver, Canada
Duration: Jul 24 2016Jul 29 2016

Publication series

NameProceedings of the International Joint Conference on Neural Networks
Volume2016-October

Other

Other2016 International Joint Conference on Neural Networks, IJCNN 2016
Country/TerritoryCanada
CityVancouver
Period7/24/167/29/16

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Exploiting the value of class labels in topic models for semi-supervised document classification'. Together they form a unique fingerprint.

Cite this