Entity resolution using search engine results

Madian Khabsa, Pucktada Treeratpituk, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity e ∈ E assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.

Original languageEnglish (US)
Title of host publicationCIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Pages2363-2366
Number of pages4
DOIs
StatePublished - Dec 19 2012
Event21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States
Duration: Oct 29 2012Nov 2 2012

Publication series

NameACM International Conference Proceeding Series

Other

Other21st ACM International Conference on Information and Knowledge Management, CIKM 2012
CountryUnited States
CityMaui, HI
Period10/29/1211/2/12

Fingerprint

String searching algorithms
Search engines
Websites
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Khabsa, M., Treeratpituk, P., & Giles, C. L. (2012). Entity resolution using search engine results. In CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management (pp. 2363-2366). (ACM International Conference Proceeding Series). https://doi.org/10.1145/2396761.2398641
Khabsa, Madian ; Treeratpituk, Pucktada ; Giles, C. Lee. / Entity resolution using search engine results. CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. pp. 2363-2366 (ACM International Conference Proceeding Series).
@inproceedings{989c79b10a79406ea557f359f66e470e,
title = "Entity resolution using search engine results",
abstract = "Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity e ∈ E assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.",
author = "Madian Khabsa and Pucktada Treeratpituk and Giles, {C. Lee}",
year = "2012",
month = "12",
day = "19",
doi = "10.1145/2396761.2398641",
language = "English (US)",
isbn = "9781450311564",
series = "ACM International Conference Proceeding Series",
pages = "2363--2366",
booktitle = "CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management",

}

Khabsa, M, Treeratpituk, P & Giles, CL 2012, Entity resolution using search engine results. in CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM International Conference Proceeding Series, pp. 2363-2366, 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, United States, 10/29/12. https://doi.org/10.1145/2396761.2398641

Entity resolution using search engine results. / Khabsa, Madian; Treeratpituk, Pucktada; Giles, C. Lee.

CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. p. 2363-2366 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Entity resolution using search engine results

AU - Khabsa, Madian

AU - Treeratpituk, Pucktada

AU - Giles, C. Lee

PY - 2012/12/19

Y1 - 2012/12/19

N2 - Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity e ∈ E assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.

AB - Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity e ∈ E assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.

UR - http://www.scopus.com/inward/record.url?scp=84871087336&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84871087336&partnerID=8YFLogxK

U2 - 10.1145/2396761.2398641

DO - 10.1145/2396761.2398641

M3 - Conference contribution

AN - SCOPUS:84871087336

SN - 9781450311564

T3 - ACM International Conference Proceeding Series

SP - 2363

EP - 2366

BT - CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management

ER -

Khabsa M, Treeratpituk P, Giles CL. Entity resolution using search engine results. In CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. p. 2363-2366. (ACM International Conference Proceeding Series). https://doi.org/10.1145/2396761.2398641