Exploiting the value of class labels in topic models for semi-supervised document classification

Hossein Soleimani, David J. Miller

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We propose a mixture of class-conditioned topic models for classifying text documents using both labeled and unlabeled training documents in a semi-supervised fashion. Most topic models incorporate documents' class labels by generating them after generating the word space. In these models, the training class labels have relatively small effect on the estimated topics, as the likelihood function is mostly dominated by the word space, whose size dwarfs a single class label per document. In this paper, we propose to increase the influence of class labels on model parameters by generating the word space in each document conditioned on the class label. We show that our specific generative process improves classification performance while maintaining the ability of the model to discover topics from the word space. Within our framework, we also provide a principled mechanism to control the contribution of the class labels and the word space to the likelihood function. Experimental results show that our approach achieves better classification performance compared to some standard semi-supervised and supervised topic models. We provide the required code to replicate our experiments at https://github.com/hsoleimani/MCCTM.

    Original languageEnglish (US)
    Title of host publication2016 International Joint Conference on Neural Networks, IJCNN 2016
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages4025-4031
    Number of pages7
    ISBN (Electronic)9781509006199
    DOIs
    StatePublished - Oct 31 2016
    Event2016 International Joint Conference on Neural Networks, IJCNN 2016 - Vancouver, Canada
    Duration: Jul 24 2016Jul 29 2016

    Publication series

    NameProceedings of the International Joint Conference on Neural Networks
    Volume2016-October

    Other

    Other2016 International Joint Conference on Neural Networks, IJCNN 2016
    CountryCanada
    CityVancouver
    Period7/24/167/29/16

    All Science Journal Classification (ASJC) codes

    • Software
    • Artificial Intelligence

    Fingerprint Dive into the research topics of 'Exploiting the value of class labels in topic models for semi-supervised document classification'. Together they form a unique fingerprint.

    Cite this