Improving low-resource cross-lingual document retrieval by reranking with deep bilingual representations

Rui Zhang, Caitlin Westerfield, Sungrok Shim, Garrett Bingham, Alexander Fabbri, William Hu, Neha Verma, Dragomir Radev

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose to boost low-resource cross-lingual document retrieval performance with deep bilingual query-document representations. We match queries and documents in both source and target languages with four components, each of which is implemented as a term interaction-based deep neural network with cross-lingual word embeddings as input. By including query likelihood scores as extra features, our model effectively learns to rerank the retrieved documents by using a small number of relevance labels for low-resource language pairs. Due to the shared cross-lingual word embedding space, the model can also be directly applied to another language pair without any training label. Experimental results on the MATERIAL dataset show that our model outperforms the competitive translation-based baselines on English-Swahili, English-Tagalog, and English-Somali cross-lingual information retrieval tasks.

Original languageEnglish (US)
Title of host publicationACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages3173-3179
Number of pages7
ISBN (Electronic)9781950737482
StatePublished - 2020
Event57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Florence, Italy
Duration: Jul 28 2019Aug 2 2019

Publication series

NameACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
CountryItaly
CityFlorence
Period7/28/198/2/19

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science(all)
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Improving low-resource cross-lingual document retrieval by reranking with deep bilingual representations'. Together they form a unique fingerprint.

Cite this