Generative models for name disambiguation

Yang Song, Jian Huang, Isaac G. Councill, Jia Li, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or evenshare the same name with other people. In this paper, we present an efficient framework by using two novel topic-based models, extended from Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. Experiments indicate that our approach consistently outperforms other unsupervised methods including spectral and DBSCAN clustering. Scalability is addressed by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset.

Original languageEnglish (US)
Title of host publication16th International World Wide Web Conference, WWW2007
Pages1163-1164
Number of pages2
DOIs
StatePublished - Oct 22 2007
Event16th International World Wide Web Conference, WWW2007 - Banff, AB, Canada
Duration: May 8 2007May 12 2007

Publication series

Name16th International World Wide Web Conference, WWW2007

Other

Other16th International World Wide Web Conference, WWW2007
CountryCanada
CityBanff, AB
Period5/8/075/12/07

Fingerprint

Scalability
Semantics
Experiments
Uncertainty

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Software

Cite this

Song, Y., Huang, J., Councill, I. G., Li, J., & Giles, C. L. (2007). Generative models for name disambiguation. In 16th International World Wide Web Conference, WWW2007 (pp. 1163-1164). (16th International World Wide Web Conference, WWW2007). https://doi.org/10.1145/1242572.1242746
Song, Yang ; Huang, Jian ; Councill, Isaac G. ; Li, Jia ; Giles, C. Lee. / Generative models for name disambiguation. 16th International World Wide Web Conference, WWW2007. 2007. pp. 1163-1164 (16th International World Wide Web Conference, WWW2007).
@inproceedings{3186b55f47ae4e2d9758187d4009f72b,
title = "Generative models for name disambiguation",
abstract = "Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or evenshare the same name with other people. In this paper, we present an efficient framework by using two novel topic-based models, extended from Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. Experiments indicate that our approach consistently outperforms other unsupervised methods including spectral and DBSCAN clustering. Scalability is addressed by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset.",
author = "Yang Song and Jian Huang and Councill, {Isaac G.} and Jia Li and Giles, {C. Lee}",
year = "2007",
month = "10",
day = "22",
doi = "10.1145/1242572.1242746",
language = "English (US)",
isbn = "1595936548",
series = "16th International World Wide Web Conference, WWW2007",
pages = "1163--1164",
booktitle = "16th International World Wide Web Conference, WWW2007",

}

Song, Y, Huang, J, Councill, IG, Li, J & Giles, CL 2007, Generative models for name disambiguation. in 16th International World Wide Web Conference, WWW2007. 16th International World Wide Web Conference, WWW2007, pp. 1163-1164, 16th International World Wide Web Conference, WWW2007, Banff, AB, Canada, 5/8/07. https://doi.org/10.1145/1242572.1242746

Generative models for name disambiguation. / Song, Yang; Huang, Jian; Councill, Isaac G.; Li, Jia; Giles, C. Lee.

16th International World Wide Web Conference, WWW2007. 2007. p. 1163-1164 (16th International World Wide Web Conference, WWW2007).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Generative models for name disambiguation

AU - Song, Yang

AU - Huang, Jian

AU - Councill, Isaac G.

AU - Li, Jia

AU - Giles, C. Lee

PY - 2007/10/22

Y1 - 2007/10/22

N2 - Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or evenshare the same name with other people. In this paper, we present an efficient framework by using two novel topic-based models, extended from Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. Experiments indicate that our approach consistently outperforms other unsupervised methods including spectral and DBSCAN clustering. Scalability is addressed by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset.

AB - Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or evenshare the same name with other people. In this paper, we present an efficient framework by using two novel topic-based models, extended from Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. Experiments indicate that our approach consistently outperforms other unsupervised methods including spectral and DBSCAN clustering. Scalability is addressed by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset.

UR - http://www.scopus.com/inward/record.url?scp=35348824296&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35348824296&partnerID=8YFLogxK

U2 - 10.1145/1242572.1242746

DO - 10.1145/1242572.1242746

M3 - Conference contribution

AN - SCOPUS:35348824296

SN - 1595936548

SN - 9781595936547

T3 - 16th International World Wide Web Conference, WWW2007

SP - 1163

EP - 1164

BT - 16th International World Wide Web Conference, WWW2007

ER -

Song Y, Huang J, Councill IG, Li J, Giles CL. Generative models for name disambiguation. In 16th International World Wide Web Conference, WWW2007. 2007. p. 1163-1164. (16th International World Wide Web Conference, WWW2007). https://doi.org/10.1145/1242572.1242746