Semisupervised, multilabel, multi-instance learning for structured data

Hossein Soleimani, David J. Miller

    Research output: Contribution to journalLetterpeer-review

    10 Scopus citations


    Many classification tasks require both labeling objects and determining label associations for parts of each object. Example applications include labeling segments of images or determining relevant parts of a text document when the training labels are available only at the image or document level. This task is usually referred to as multi-instance (MI) learning, where the learner typically receives a collection of labeled (or sometimes unlabeled) bags, each containing several segments (instances). We propose a semisupervised MI learning method for multilabel classification. Most MI learning methods treat instances in each bag as independent and identically distributed samples. However, in many practical applications, instances are related to each other and should not be considered independent. Our model discovers a latent low-dimensional space that captures structure within each bag. Further, unlike many other MI learning methods, which are primarily developed for binary classification, we model multiple classes jointly, thus also capturing possible dependencies between different classes. We develop our model within a semisupervised framework, which leverages both labeled and, typically, a larger set of unlabeled bags for training. We develop several efficient inference methods for our model. We first introduce a Markov chain Monte Carlo method for inference, which can handle arbitrary relations between bag labels and instance labels, including the standard hard-max MI assumption. We also develop an extension of our model that uses stochastic variational Bayes methods for inference, and thus scales better to massive data sets. Experiments show that our approach outperforms several MI learning and standard classification methods on both bag-level and instance-level label prediction. All code for replicating our experiments is available from

    Original languageEnglish (US)
    Pages (from-to)1053-1102
    Number of pages50
    JournalNeural computation
    Issue number4
    StatePublished - Apr 1 2017

    All Science Journal Classification (ASJC) codes

    • Arts and Humanities (miscellaneous)
    • Cognitive Neuroscience


    Dive into the research topics of 'Semisupervised, multilabel, multi-instance learning for structured data'. Together they form a unique fingerprint.

    Cite this