Solving the “who's mark Johnson” puzzle: Information extraction based cross document coreference

Jian Huang, Sarah M. Taylor, Jonathan L. Smith, Konstantinos A. Fotiadis, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Cross Document Coreference (CDC) is the problem of resolving the underlying identity of entities across multiple documents and is a major step for document understanding. We develop a framework to efficiently determine the identity of a person based on extracted information, which includes unary properties such as gender and title, as well as binary relationships with other named entities such as co-occurrence and geo-locations. At the heart of our approach is a suite of similarity functions (specialists) for matching relationships and a relational density-based clustering algorithm that delineates name clusters based on pairwise similarity. We demonstrate the effectiveness of our methods on the WePS benchmark datasets and point out future research directions.

Original languageEnglish (US)
Title of host publicationNAACL-HLT 2009 - Human Language Technologies
Subtitle of host publication2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Student Research Workshop and Doctoral Consortium
EditorsUlrich Germann, Chirag Shah, Svetlana Stoyanchev, Carolyn Penstein Rose, Anoop Sarkar
PublisherAssociation for Computational Linguistics (ACL)
Pages7-12
Number of pages6
ISBN (Electronic)9781932432428
StatePublished - 2009
Event2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2009 - Boulder, United States
Duration: Jun 1 2009 → …

Publication series

NameNAACL-HLT 2009 - Human Language Technologies: 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Student Research Workshop and Doctoral Consortium

Conference

Conference2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2009
Country/TerritoryUnited States
CityBoulder
Period6/1/09 → …

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Solving the “who's mark Johnson” puzzle: Information extraction based cross document coreference'. Together they form a unique fingerprint.

Cite this