Large scale author name disambiguation in digital libraries

Madian Khabsa, Pucktada Treeratpituk, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations

Abstract

Person name disambiguation is essential to distinguish between persons that share the same name where unique identifiers are not present. In many domains this is a common problem including digital libraries where the same name can refer to multiple unique authors. Correctly attributing work and citations requires the digital library's database to be disambiguated. In this work we describe a large scale framework for disambiguating author names efficiently and effectively. The framework uses a density based clustering algorithm with a random forest based distance function to clusters unique authors. Effective use of blocking functions allows the clustering algorithm to be run in parallel. In our experiments we show that the framework disambiguates authors of more than 4 million papers in 24 hours.

Original languageEnglish (US)
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
EditorsWo Chang, Jun Huan, Nick Cercone, Saumyadipta Pyne, Vasant Honavar, Jimmy Lin, Xiaohua Tony Hu, Charu Aggarwal, Bamshad Mobasher, Jian Pei, Raghunath Nambiar
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages41-42
Number of pages2
ISBN (Electronic)9781479956654
DOIs
StatePublished - Jan 7 2015
Event2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington, United States
Duration: Oct 27 2014Oct 30 2014

Publication series

NameProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Other

Other2nd IEEE International Conference on Big Data, IEEE Big Data 2014
Country/TerritoryUnited States
CityWashington
Period10/27/1410/30/14

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Information Systems

Fingerprint

Dive into the research topics of 'Large scale author name disambiguation in digital libraries'. Together they form a unique fingerprint.

Cite this