Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation

Yan Hua, Shuhui Wang, Siyuan Liu, Anni Cai, Qingming Huang

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

With the explosive growth of web data, effective and efficient technologies are in urgent need for retrieving semantically relevant contents of heterogeneous modalities. Previous studies devote efforts to modeling simple cross-modal statistical dependencies, and globally projecting the heterogeneous modalities into a measurable subspace. However, global projections cannot appropriately adapt to diverse contents, and the naturally existing multilevel semantic relation in web data is ignored. We study the problem of semantic coherent retrieval, where documents from different modalities should be ranked by the semantic relevance to the query. Accordingly, we propose TINA, a correlation learning method by adaptive hierarchical semantic aggregation. First, by joint modeling of content and ontology similarities, we build a semantic hierarchy to measure multilevel semantic relevance. Second, with a set of local linear projections and probabilistic membership functions, we propose two paradigms for local expert aggregation, i.e., local projection aggregation and local distance aggregation. To learn the cross-modal projections, we optimize the structure risk objective function that involves semantic coherence measurement, local projection consistency, and the complexity penalty of local projections. Compared to existing approaches, a better bias-variance tradeoff is achieved by TINA in real-world cross-modal correlation learning tasks. Extensive experiments on widely used NUS-WIDE and ICML-Challenge for image-Text retrieval demonstrate that TINA better adapts to the multilevel semantic relation and content divergence, and, thus, outperforms state of the art with better semantic coherence.

Original languageEnglish (US)
Article number7422147
Pages (from-to)1201-1216
Number of pages16
JournalIEEE Transactions on Multimedia
Volume18
Issue number6
DOIs
StatePublished - Jun 1 2016

Fingerprint

Agglomeration
Semantics
Membership functions
Ontology
Experiments

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Media Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering

Cite this

Hua, Yan ; Wang, Shuhui ; Liu, Siyuan ; Cai, Anni ; Huang, Qingming. / Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation. In: IEEE Transactions on Multimedia. 2016 ; Vol. 18, No. 6. pp. 1201-1216.
@article{c3980909a21546ef8508e243796b87dc,
title = "Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation",
abstract = "With the explosive growth of web data, effective and efficient technologies are in urgent need for retrieving semantically relevant contents of heterogeneous modalities. Previous studies devote efforts to modeling simple cross-modal statistical dependencies, and globally projecting the heterogeneous modalities into a measurable subspace. However, global projections cannot appropriately adapt to diverse contents, and the naturally existing multilevel semantic relation in web data is ignored. We study the problem of semantic coherent retrieval, where documents from different modalities should be ranked by the semantic relevance to the query. Accordingly, we propose TINA, a correlation learning method by adaptive hierarchical semantic aggregation. First, by joint modeling of content and ontology similarities, we build a semantic hierarchy to measure multilevel semantic relevance. Second, with a set of local linear projections and probabilistic membership functions, we propose two paradigms for local expert aggregation, i.e., local projection aggregation and local distance aggregation. To learn the cross-modal projections, we optimize the structure risk objective function that involves semantic coherence measurement, local projection consistency, and the complexity penalty of local projections. Compared to existing approaches, a better bias-variance tradeoff is achieved by TINA in real-world cross-modal correlation learning tasks. Extensive experiments on widely used NUS-WIDE and ICML-Challenge for image-Text retrieval demonstrate that TINA better adapts to the multilevel semantic relation and content divergence, and, thus, outperforms state of the art with better semantic coherence.",
author = "Yan Hua and Shuhui Wang and Siyuan Liu and Anni Cai and Qingming Huang",
year = "2016",
month = "6",
day = "1",
doi = "10.1109/TMM.2016.2535864",
language = "English (US)",
volume = "18",
pages = "1201--1216",
journal = "IEEE Transactions on Multimedia",
issn = "1520-9210",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation. / Hua, Yan; Wang, Shuhui; Liu, Siyuan; Cai, Anni; Huang, Qingming.

In: IEEE Transactions on Multimedia, Vol. 18, No. 6, 7422147, 01.06.2016, p. 1201-1216.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation

AU - Hua, Yan

AU - Wang, Shuhui

AU - Liu, Siyuan

AU - Cai, Anni

AU - Huang, Qingming

PY - 2016/6/1

Y1 - 2016/6/1

N2 - With the explosive growth of web data, effective and efficient technologies are in urgent need for retrieving semantically relevant contents of heterogeneous modalities. Previous studies devote efforts to modeling simple cross-modal statistical dependencies, and globally projecting the heterogeneous modalities into a measurable subspace. However, global projections cannot appropriately adapt to diverse contents, and the naturally existing multilevel semantic relation in web data is ignored. We study the problem of semantic coherent retrieval, where documents from different modalities should be ranked by the semantic relevance to the query. Accordingly, we propose TINA, a correlation learning method by adaptive hierarchical semantic aggregation. First, by joint modeling of content and ontology similarities, we build a semantic hierarchy to measure multilevel semantic relevance. Second, with a set of local linear projections and probabilistic membership functions, we propose two paradigms for local expert aggregation, i.e., local projection aggregation and local distance aggregation. To learn the cross-modal projections, we optimize the structure risk objective function that involves semantic coherence measurement, local projection consistency, and the complexity penalty of local projections. Compared to existing approaches, a better bias-variance tradeoff is achieved by TINA in real-world cross-modal correlation learning tasks. Extensive experiments on widely used NUS-WIDE and ICML-Challenge for image-Text retrieval demonstrate that TINA better adapts to the multilevel semantic relation and content divergence, and, thus, outperforms state of the art with better semantic coherence.

AB - With the explosive growth of web data, effective and efficient technologies are in urgent need for retrieving semantically relevant contents of heterogeneous modalities. Previous studies devote efforts to modeling simple cross-modal statistical dependencies, and globally projecting the heterogeneous modalities into a measurable subspace. However, global projections cannot appropriately adapt to diverse contents, and the naturally existing multilevel semantic relation in web data is ignored. We study the problem of semantic coherent retrieval, where documents from different modalities should be ranked by the semantic relevance to the query. Accordingly, we propose TINA, a correlation learning method by adaptive hierarchical semantic aggregation. First, by joint modeling of content and ontology similarities, we build a semantic hierarchy to measure multilevel semantic relevance. Second, with a set of local linear projections and probabilistic membership functions, we propose two paradigms for local expert aggregation, i.e., local projection aggregation and local distance aggregation. To learn the cross-modal projections, we optimize the structure risk objective function that involves semantic coherence measurement, local projection consistency, and the complexity penalty of local projections. Compared to existing approaches, a better bias-variance tradeoff is achieved by TINA in real-world cross-modal correlation learning tasks. Extensive experiments on widely used NUS-WIDE and ICML-Challenge for image-Text retrieval demonstrate that TINA better adapts to the multilevel semantic relation and content divergence, and, thus, outperforms state of the art with better semantic coherence.

UR - http://www.scopus.com/inward/record.url?scp=84971280330&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84971280330&partnerID=8YFLogxK

U2 - 10.1109/TMM.2016.2535864

DO - 10.1109/TMM.2016.2535864

M3 - Article

AN - SCOPUS:84971280330

VL - 18

SP - 1201

EP - 1216

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

IS - 6

M1 - 7422147

ER -