Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation

Yan Hua, Shuhui Wang, Siyuan Liu, Anni Cai, Qingming Huang

Research output: Contribution to journalArticlepeer-review

32 Scopus citations

Abstract

With the explosive growth of web data, effective and efficient technologies are in urgent need for retrieving semantically relevant contents of heterogeneous modalities. Previous studies devote efforts to modeling simple cross-modal statistical dependencies, and globally projecting the heterogeneous modalities into a measurable subspace. However, global projections cannot appropriately adapt to diverse contents, and the naturally existing multilevel semantic relation in web data is ignored. We study the problem of semantic coherent retrieval, where documents from different modalities should be ranked by the semantic relevance to the query. Accordingly, we propose TINA, a correlation learning method by adaptive hierarchical semantic aggregation. First, by joint modeling of content and ontology similarities, we build a semantic hierarchy to measure multilevel semantic relevance. Second, with a set of local linear projections and probabilistic membership functions, we propose two paradigms for local expert aggregation, i.e., local projection aggregation and local distance aggregation. To learn the cross-modal projections, we optimize the structure risk objective function that involves semantic coherence measurement, local projection consistency, and the complexity penalty of local projections. Compared to existing approaches, a better bias-variance tradeoff is achieved by TINA in real-world cross-modal correlation learning tasks. Extensive experiments on widely used NUS-WIDE and ICML-Challenge for image-Text retrieval demonstrate that TINA better adapts to the multilevel semantic relation and content divergence, and, thus, outperforms state of the art with better semantic coherence.

Original languageEnglish (US)
Article number7422147
Pages (from-to)1201-1216
Number of pages16
JournalIEEE Transactions on Multimedia
Volume18
Issue number6
DOIs
StatePublished - Jun 2016

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Media Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation'. Together they form a unique fingerprint.

Cite this