A Web Service for Author Name Disambiguation in Scholarly Databases

Kunho Kim, Athar Sefid, Bruce A. Weinberg, Clyde Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Author Name Disambiguation (AND) is the task of clustering unique author names from publication records in scholarly or related databases. Although AND has been extensively studied and has served as an important preprocessing step for several tasks (e.g. calculating bibliometrics and scientometrics for authors), there are few publicly available tools for disambiguation in large-scale scholarly databases. Furthermore, most of the disambiguated data is embedded within the search engines of the scholarly databases, and existing application programming interfaces (APIs) have limited features and are often unavailable for users for various reasons. This makes it difficult for researchers and developers to use the data for various applications (e.g. author search) or research. Here, we design a novel, web-based, RESTful API for searching disambiguated authors, using the PubMed database as a sample application. We offer two type of queries, attribute-based queries and record-based queries which serve different purposes. Attribute-based queries retrieve authors with the attributes available in the database. We study different search engines to find the most appropriate one for processing attribute-based queries. Record-based queries retrieve authors that are most likely to have written a query publication provided by a user. To accelerate record-based queries, we develop a novel algorithm that has a fast record-to-cluster match. We show that our algorithm can accelerate the query by a factor of 4.01 compared to a baseline naive approach.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages265-273
Number of pages9
ISBN (Print)9781538672471
DOIs
StatePublished - Sep 5 2018
Event25th IEEE International Conference on Web Services, ICWS 2018 - San Francisco, United States
Duration: Jul 2 2018Jul 7 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services

Other

Other25th IEEE International Conference on Web Services, ICWS 2018
CountryUnited States
CitySan Francisco
Period7/2/187/7/18

Fingerprint

Web services
Search engines
Application programming interfaces (API)
Data base
Query
Processing

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management

Cite this

Kim, K., Sefid, A., Weinberg, B. A., & Giles, C. L. (2018). A Web Service for Author Name Disambiguation in Scholarly Databases. In Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services (pp. 265-273). [8456358] (Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICWS.2018.00041
Kim, Kunho ; Sefid, Athar ; Weinberg, Bruce A. ; Giles, Clyde Lee. / A Web Service for Author Name Disambiguation in Scholarly Databases. Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 265-273 (Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services).
@inproceedings{53559218587a4e9fba75d166d218c915,
title = "A Web Service for Author Name Disambiguation in Scholarly Databases",
abstract = "Author Name Disambiguation (AND) is the task of clustering unique author names from publication records in scholarly or related databases. Although AND has been extensively studied and has served as an important preprocessing step for several tasks (e.g. calculating bibliometrics and scientometrics for authors), there are few publicly available tools for disambiguation in large-scale scholarly databases. Furthermore, most of the disambiguated data is embedded within the search engines of the scholarly databases, and existing application programming interfaces (APIs) have limited features and are often unavailable for users for various reasons. This makes it difficult for researchers and developers to use the data for various applications (e.g. author search) or research. Here, we design a novel, web-based, RESTful API for searching disambiguated authors, using the PubMed database as a sample application. We offer two type of queries, attribute-based queries and record-based queries which serve different purposes. Attribute-based queries retrieve authors with the attributes available in the database. We study different search engines to find the most appropriate one for processing attribute-based queries. Record-based queries retrieve authors that are most likely to have written a query publication provided by a user. To accelerate record-based queries, we develop a novel algorithm that has a fast record-to-cluster match. We show that our algorithm can accelerate the query by a factor of 4.01 compared to a baseline naive approach.",
author = "Kunho Kim and Athar Sefid and Weinberg, {Bruce A.} and Giles, {Clyde Lee}",
year = "2018",
month = "9",
day = "5",
doi = "10.1109/ICWS.2018.00041",
language = "English (US)",
isbn = "9781538672471",
series = "Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "265--273",
booktitle = "Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services",
address = "United States",

}

Kim, K, Sefid, A, Weinberg, BA & Giles, CL 2018, A Web Service for Author Name Disambiguation in Scholarly Databases. in Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services., 8456358, Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services, Institute of Electrical and Electronics Engineers Inc., pp. 265-273, 25th IEEE International Conference on Web Services, ICWS 2018, San Francisco, United States, 7/2/18. https://doi.org/10.1109/ICWS.2018.00041

A Web Service for Author Name Disambiguation in Scholarly Databases. / Kim, Kunho; Sefid, Athar; Weinberg, Bruce A.; Giles, Clyde Lee.

Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services. Institute of Electrical and Electronics Engineers Inc., 2018. p. 265-273 8456358 (Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A Web Service for Author Name Disambiguation in Scholarly Databases

AU - Kim, Kunho

AU - Sefid, Athar

AU - Weinberg, Bruce A.

AU - Giles, Clyde Lee

PY - 2018/9/5

Y1 - 2018/9/5

N2 - Author Name Disambiguation (AND) is the task of clustering unique author names from publication records in scholarly or related databases. Although AND has been extensively studied and has served as an important preprocessing step for several tasks (e.g. calculating bibliometrics and scientometrics for authors), there are few publicly available tools for disambiguation in large-scale scholarly databases. Furthermore, most of the disambiguated data is embedded within the search engines of the scholarly databases, and existing application programming interfaces (APIs) have limited features and are often unavailable for users for various reasons. This makes it difficult for researchers and developers to use the data for various applications (e.g. author search) or research. Here, we design a novel, web-based, RESTful API for searching disambiguated authors, using the PubMed database as a sample application. We offer two type of queries, attribute-based queries and record-based queries which serve different purposes. Attribute-based queries retrieve authors with the attributes available in the database. We study different search engines to find the most appropriate one for processing attribute-based queries. Record-based queries retrieve authors that are most likely to have written a query publication provided by a user. To accelerate record-based queries, we develop a novel algorithm that has a fast record-to-cluster match. We show that our algorithm can accelerate the query by a factor of 4.01 compared to a baseline naive approach.

AB - Author Name Disambiguation (AND) is the task of clustering unique author names from publication records in scholarly or related databases. Although AND has been extensively studied and has served as an important preprocessing step for several tasks (e.g. calculating bibliometrics and scientometrics for authors), there are few publicly available tools for disambiguation in large-scale scholarly databases. Furthermore, most of the disambiguated data is embedded within the search engines of the scholarly databases, and existing application programming interfaces (APIs) have limited features and are often unavailable for users for various reasons. This makes it difficult for researchers and developers to use the data for various applications (e.g. author search) or research. Here, we design a novel, web-based, RESTful API for searching disambiguated authors, using the PubMed database as a sample application. We offer two type of queries, attribute-based queries and record-based queries which serve different purposes. Attribute-based queries retrieve authors with the attributes available in the database. We study different search engines to find the most appropriate one for processing attribute-based queries. Record-based queries retrieve authors that are most likely to have written a query publication provided by a user. To accelerate record-based queries, we develop a novel algorithm that has a fast record-to-cluster match. We show that our algorithm can accelerate the query by a factor of 4.01 compared to a baseline naive approach.

UR - http://www.scopus.com/inward/record.url?scp=85054007073&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054007073&partnerID=8YFLogxK

U2 - 10.1109/ICWS.2018.00041

DO - 10.1109/ICWS.2018.00041

M3 - Conference contribution

SN - 9781538672471

T3 - Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services

SP - 265

EP - 273

BT - Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Kim K, Sefid A, Weinberg BA, Giles CL. A Web Service for Author Name Disambiguation in Scholarly Databases. In Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services. Institute of Electrical and Electronics Engineers Inc. 2018. p. 265-273. 8456358. (Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services). https://doi.org/10.1109/ICWS.2018.00041