TY - GEN
T1 - Translating videos to natural language using deep recurrent neural networks
AU - Venugopalan, Subhashini
AU - Xu, Huijuan
AU - Donahue, Jeff
AU - Rohrbach, Marcus
AU - Mooney, Raymond
AU - Saenko, Kate
N1 - Publisher Copyright:
© 2015 Association for Computational Linguistics.
PY - 2015
Y1 - 2015
N2 - Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images. In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure. Described video datasets are scarce, and most existing methods have been applied to toy domains with a small vocabulary of possible words. By transferring knowledge from 1.2M+ images with category labels and 100,000+ images with captions, our method is able to create sentence descriptions of open-domain videos with large vocabularies. We compare our approach with recent work using language generation metrics, subject, verb, and object prediction accuracy, and a human evaluation.
AB - Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images. In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure. Described video datasets are scarce, and most existing methods have been applied to toy domains with a small vocabulary of possible words. By transferring knowledge from 1.2M+ images with category labels and 100,000+ images with captions, our method is able to create sentence descriptions of open-domain videos with large vocabularies. We compare our approach with recent work using language generation metrics, subject, verb, and object prediction accuracy, and a human evaluation.
UR - http://www.scopus.com/inward/record.url?scp=84959876769&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959876769&partnerID=8YFLogxK
U2 - 10.3115/v1/n15-1173
DO - 10.3115/v1/n15-1173
M3 - Conference contribution
AN - SCOPUS:84959876769
T3 - NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
SP - 1494
EP - 1504
BT - NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
Y2 - 31 May 2015 through 5 June 2015
ER -