TY - GEN
T1 - Exploiting the value of class labels in topic models for semi-supervised document classification
AU - Soleimani, Hossein
AU - Miller, David J.
PY - 2016/10/31
Y1 - 2016/10/31
N2 - We propose a mixture of class-conditioned topic models for classifying text documents using both labeled and unlabeled training documents in a semi-supervised fashion. Most topic models incorporate documents' class labels by generating them after generating the word space. In these models, the training class labels have relatively small effect on the estimated topics, as the likelihood function is mostly dominated by the word space, whose size dwarfs a single class label per document. In this paper, we propose to increase the influence of class labels on model parameters by generating the word space in each document conditioned on the class label. We show that our specific generative process improves classification performance while maintaining the ability of the model to discover topics from the word space. Within our framework, we also provide a principled mechanism to control the contribution of the class labels and the word space to the likelihood function. Experimental results show that our approach achieves better classification performance compared to some standard semi-supervised and supervised topic models. We provide the required code to replicate our experiments at https://github.com/hsoleimani/MCCTM.
AB - We propose a mixture of class-conditioned topic models for classifying text documents using both labeled and unlabeled training documents in a semi-supervised fashion. Most topic models incorporate documents' class labels by generating them after generating the word space. In these models, the training class labels have relatively small effect on the estimated topics, as the likelihood function is mostly dominated by the word space, whose size dwarfs a single class label per document. In this paper, we propose to increase the influence of class labels on model parameters by generating the word space in each document conditioned on the class label. We show that our specific generative process improves classification performance while maintaining the ability of the model to discover topics from the word space. Within our framework, we also provide a principled mechanism to control the contribution of the class labels and the word space to the likelihood function. Experimental results show that our approach achieves better classification performance compared to some standard semi-supervised and supervised topic models. We provide the required code to replicate our experiments at https://github.com/hsoleimani/MCCTM.
UR - http://www.scopus.com/inward/record.url?scp=85007248190&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85007248190&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2016.7727723
DO - 10.1109/IJCNN.2016.7727723
M3 - Conference contribution
AN - SCOPUS:85007248190
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 4025
EP - 4031
BT - 2016 International Joint Conference on Neural Networks, IJCNN 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 International Joint Conference on Neural Networks, IJCNN 2016
Y2 - 24 July 2016 through 29 July 2016
ER -