Multi-modal hierarchical dirichlet process model for predicting image annotation and image-object label correspondence

Oksana Yakhnenko, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

Many real-world applications call for learning predictive relationships from multi-modal data. In particular, in multi-media and web applications, given a dataset of images and their associated captions, one might want to construct a predictive model that not only predicts a caption for the image but also labels the individual objects in the image. We address this problem using a multi-modal hierarchical Dirichlet Process model (MoM-HDP) - a stochastic process for modeling multi-modal data. MoM-HDP is an analog of a multi-modal Latent Dirichlet Allocation (MoM-LDA) with an infinite number of mixture components. Thus MoM-HDP allows circumventing the need for a priori choice of the number of mixture components or the computational expense of model selection. During training, the model has access to an un-segmented image and its caption, but not the labels for each object in the image. The trained model is used to predict the label for each region of interest in a segmented image. The model parameters are estimated efficiently using variational inference. We use two large benchmark datasets to compare the performance of the proposed MoM-HDP model with that of MoM-LDA model as well as some simple alternatives: Naive Bayes and Logistic Regression classifiers based on the formulation of the image annotation and image-label correspondence problems as one-against-all classification. Our experimental results show that unlike MoM-LDA, the performance of MoM-HDP is invariant to the number of mixture components. Furthermore, our experimental evaluation shows that the generalization performance of MoM-HDP is superior to that of MoM-HDP as well as the one-against-all Naive Bayes and Logistic Regression classifiers.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133
Pages280-290
Number of pages11
StatePublished - Dec 1 2009
Event9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States
Duration: Apr 30 2009May 2 2009

Publication series

NameSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
Volume1

Other

Other9th SIAM International Conference on Data Mining 2009, SDM 2009
CountryUnited States
CitySparks, NV
Period4/30/095/2/09

Fingerprint

Image Annotation
Dirichlet Process
Process Model
Labels
Correspondence
Naive Bayes
Logistic Regression
Logistics
Classifier
Classifiers
Correspondence Problem
Predict
Model
Multimedia Applications
Predictive Model
Region of Interest
Object
Real-world Applications
Web Application
Experimental Evaluation

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Software
  • Applied Mathematics

Cite this

Yakhnenko, O., & Honavar, V. (2009). Multi-modal hierarchical dirichlet process model for predicting image annotation and image-object label correspondence. In Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133 (pp. 280-290). (Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics; Vol. 1).
Yakhnenko, Oksana ; Honavar, Vasant. / Multi-modal hierarchical dirichlet process model for predicting image annotation and image-object label correspondence. Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133. 2009. pp. 280-290 (Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics).
@inproceedings{01a3ca90076d4b2e921b0625bb368fdd,
title = "Multi-modal hierarchical dirichlet process model for predicting image annotation and image-object label correspondence",
abstract = "Many real-world applications call for learning predictive relationships from multi-modal data. In particular, in multi-media and web applications, given a dataset of images and their associated captions, one might want to construct a predictive model that not only predicts a caption for the image but also labels the individual objects in the image. We address this problem using a multi-modal hierarchical Dirichlet Process model (MoM-HDP) - a stochastic process for modeling multi-modal data. MoM-HDP is an analog of a multi-modal Latent Dirichlet Allocation (MoM-LDA) with an infinite number of mixture components. Thus MoM-HDP allows circumventing the need for a priori choice of the number of mixture components or the computational expense of model selection. During training, the model has access to an un-segmented image and its caption, but not the labels for each object in the image. The trained model is used to predict the label for each region of interest in a segmented image. The model parameters are estimated efficiently using variational inference. We use two large benchmark datasets to compare the performance of the proposed MoM-HDP model with that of MoM-LDA model as well as some simple alternatives: Naive Bayes and Logistic Regression classifiers based on the formulation of the image annotation and image-label correspondence problems as one-against-all classification. Our experimental results show that unlike MoM-LDA, the performance of MoM-HDP is invariant to the number of mixture components. Furthermore, our experimental evaluation shows that the generalization performance of MoM-HDP is superior to that of MoM-HDP as well as the one-against-all Naive Bayes and Logistic Regression classifiers.",
author = "Oksana Yakhnenko and Vasant Honavar",
year = "2009",
month = "12",
day = "1",
language = "English (US)",
isbn = "9781615671090",
series = "Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics",
pages = "280--290",
booktitle = "Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133",

}

Yakhnenko, O & Honavar, V 2009, Multi-modal hierarchical dirichlet process model for predicting image annotation and image-object label correspondence. in Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133. Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics, vol. 1, pp. 280-290, 9th SIAM International Conference on Data Mining 2009, SDM 2009, Sparks, NV, United States, 4/30/09.

Multi-modal hierarchical dirichlet process model for predicting image annotation and image-object label correspondence. / Yakhnenko, Oksana; Honavar, Vasant.

Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133. 2009. p. 280-290 (Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics; Vol. 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Multi-modal hierarchical dirichlet process model for predicting image annotation and image-object label correspondence

AU - Yakhnenko, Oksana

AU - Honavar, Vasant

PY - 2009/12/1

Y1 - 2009/12/1

N2 - Many real-world applications call for learning predictive relationships from multi-modal data. In particular, in multi-media and web applications, given a dataset of images and their associated captions, one might want to construct a predictive model that not only predicts a caption for the image but also labels the individual objects in the image. We address this problem using a multi-modal hierarchical Dirichlet Process model (MoM-HDP) - a stochastic process for modeling multi-modal data. MoM-HDP is an analog of a multi-modal Latent Dirichlet Allocation (MoM-LDA) with an infinite number of mixture components. Thus MoM-HDP allows circumventing the need for a priori choice of the number of mixture components or the computational expense of model selection. During training, the model has access to an un-segmented image and its caption, but not the labels for each object in the image. The trained model is used to predict the label for each region of interest in a segmented image. The model parameters are estimated efficiently using variational inference. We use two large benchmark datasets to compare the performance of the proposed MoM-HDP model with that of MoM-LDA model as well as some simple alternatives: Naive Bayes and Logistic Regression classifiers based on the formulation of the image annotation and image-label correspondence problems as one-against-all classification. Our experimental results show that unlike MoM-LDA, the performance of MoM-HDP is invariant to the number of mixture components. Furthermore, our experimental evaluation shows that the generalization performance of MoM-HDP is superior to that of MoM-HDP as well as the one-against-all Naive Bayes and Logistic Regression classifiers.

AB - Many real-world applications call for learning predictive relationships from multi-modal data. In particular, in multi-media and web applications, given a dataset of images and their associated captions, one might want to construct a predictive model that not only predicts a caption for the image but also labels the individual objects in the image. We address this problem using a multi-modal hierarchical Dirichlet Process model (MoM-HDP) - a stochastic process for modeling multi-modal data. MoM-HDP is an analog of a multi-modal Latent Dirichlet Allocation (MoM-LDA) with an infinite number of mixture components. Thus MoM-HDP allows circumventing the need for a priori choice of the number of mixture components or the computational expense of model selection. During training, the model has access to an un-segmented image and its caption, but not the labels for each object in the image. The trained model is used to predict the label for each region of interest in a segmented image. The model parameters are estimated efficiently using variational inference. We use two large benchmark datasets to compare the performance of the proposed MoM-HDP model with that of MoM-LDA model as well as some simple alternatives: Naive Bayes and Logistic Regression classifiers based on the formulation of the image annotation and image-label correspondence problems as one-against-all classification. Our experimental results show that unlike MoM-LDA, the performance of MoM-HDP is invariant to the number of mixture components. Furthermore, our experimental evaluation shows that the generalization performance of MoM-HDP is superior to that of MoM-HDP as well as the one-against-all Naive Bayes and Logistic Regression classifiers.

UR - http://www.scopus.com/inward/record.url?scp=72849143525&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72849143525&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:72849143525

SN - 9781615671090

T3 - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics

SP - 280

EP - 290

BT - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133

ER -

Yakhnenko O, Honavar V. Multi-modal hierarchical dirichlet process model for predicting image annotation and image-object label correspondence. In Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133. 2009. p. 280-290. (Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics).