A survey of current datasets for vision and language research

Francis Ferraro, Nasrin Mostafazadeh, Kenneth Huang, Lucy Vanderwende, Jacob Devlin, Michel Galley, Margaret Mitchell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

25 Citations (Scopus)

Abstract

Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.

Original languageEnglish (US)
Title of host publicationConference Proceedings - EMNLP 2015
Subtitle of host publicationConference on Empirical Methods in Natural Language Processing
PublisherAssociation for Computational Linguistics (ACL)
Pages207-213
Number of pages7
ISBN (Electronic)9781941643327
StatePublished - Jan 1 2015
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Lisbon, Portugal
Duration: Sep 17 2015Sep 21 2015

Publication series

NameConference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2015
CountryPortugal
CityLisbon
Period9/17/159/21/15

Fingerprint

Explosions
Artificial intelligence

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Ferraro, F., Mostafazadeh, N., Huang, K., Vanderwende, L., Devlin, J., Galley, M., & Mitchell, M. (2015). A survey of current datasets for vision and language research. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 207-213). (Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing). Association for Computational Linguistics (ACL).
Ferraro, Francis ; Mostafazadeh, Nasrin ; Huang, Kenneth ; Vanderwende, Lucy ; Devlin, Jacob ; Galley, Michel ; Mitchell, Margaret. / A survey of current datasets for vision and language research. Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2015. pp. 207-213 (Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing).
@inproceedings{a38bc565b53c49c4b229bf4019b29dd6,
title = "A survey of current datasets for vision and language research",
abstract = "Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.",
author = "Francis Ferraro and Nasrin Mostafazadeh and Kenneth Huang and Lucy Vanderwende and Jacob Devlin and Michel Galley and Margaret Mitchell",
year = "2015",
month = "1",
day = "1",
language = "English (US)",
series = "Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing",
publisher = "Association for Computational Linguistics (ACL)",
pages = "207--213",
booktitle = "Conference Proceedings - EMNLP 2015",

}

Ferraro, F, Mostafazadeh, N, Huang, K, Vanderwende, L, Devlin, J, Galley, M & Mitchell, M 2015, A survey of current datasets for vision and language research. in Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (ACL), pp. 207-213, Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 9/17/15.

A survey of current datasets for vision and language research. / Ferraro, Francis; Mostafazadeh, Nasrin; Huang, Kenneth; Vanderwende, Lucy; Devlin, Jacob; Galley, Michel; Mitchell, Margaret.

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2015. p. 207-213 (Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A survey of current datasets for vision and language research

AU - Ferraro, Francis

AU - Mostafazadeh, Nasrin

AU - Huang, Kenneth

AU - Vanderwende, Lucy

AU - Devlin, Jacob

AU - Galley, Michel

AU - Mitchell, Margaret

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.

AB - Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.

UR - http://www.scopus.com/inward/record.url?scp=84959904882&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959904882&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84959904882

T3 - Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing

SP - 207

EP - 213

BT - Conference Proceedings - EMNLP 2015

PB - Association for Computational Linguistics (ACL)

ER -

Ferraro F, Mostafazadeh N, Huang K, Vanderwende L, Devlin J, Galley M et al. A survey of current datasets for vision and language research. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL). 2015. p. 207-213. (Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing).