Sparse topic models by parameter sharing

Hossein Soleimani, David J. Miller

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We propose a sparse Bayesian topic model, based on parameter sharing, for modeling text corpora. In Latent Dirichlet Allocation (LDA), each topic models all words, even though many words are not topic-specific, i.e. have similar occurrence frequencies across different topics. We propose a sparser approach by introducing a universal shared model, used by each topic to model the subset of words that are not topic-specific. A Bernoulli random variable is associated with each word under every topic, determining whether that word is modeled topic-specifically, with a free parameter, or by the shared model, with a common parameter. Results of our experiments show that our model achieves sparser topic presence in documents and higher test likelihood than LDA.

    Original languageEnglish (US)
    Title of host publicationIEEE International Workshop on Machine Learning for Signal Processing, MLSP
    EditorsTulay Adali, Jan Larsen, Mamadou Mboup, Eric Moreau
    PublisherIEEE Computer Society
    ISBN (Electronic)9781479936946
    DOIs
    StatePublished - Nov 14 2014
    Event2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014 - Reims, France
    Duration: Sep 21 2014Sep 24 2014

    Publication series

    NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
    ISSN (Print)2161-0363
    ISSN (Electronic)2161-0371

    Other

    Other2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014
    Country/TerritoryFrance
    CityReims
    Period9/21/149/24/14

    All Science Journal Classification (ASJC) codes

    • Human-Computer Interaction
    • Signal Processing

    Fingerprint

    Dive into the research topics of 'Sparse topic models by parameter sharing'. Together they form a unique fingerprint.

    Cite this