In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify and correct such name variants (e.g., "Vannevar Bush" and "V. Vush"). Our study is based on a scalable two-step framework, where step 1 is to substantially reduce the number of candidates via blocking, and step 2 is to measure the distance of two names via coauthor information. Combining four blocking methods and seven distance measures on four data sets, we present extensive experimental results, and identify combinations that are scalable and effective to disambiguate author names in citations.
|Original language||English (US)|
|Number of pages||10|
|Journal||Proceedings of the ACM/IEEE Joint Conference on Digital Libraries|
|State||Published - Nov 10 2005|
|Event||5th ACM/IEEE Joint Conference on Digital Libraries - Digital Libraries: Cyberinfrastructure for Research and Education - Denver, CO, United States|
Duration: Jun 7 2005 → Jun 11 2005
All Science Journal Classification (ASJC) codes