TY - GEN
T1 - A Web Service for Author Name Disambiguation in Scholarly Databases
AU - Kim, Kunho
AU - Sefid, Athar
AU - Weinberg, Bruce A.
AU - Giles, C. Lee
N1 - Funding Information:
The main contribution of this paper is to provide two types of author queries which serve differentpurposes: one is attribute-based and the other is record-based. Attribute-based queries use an internal resource (attribute) to query authors, which is supported by indexing records with attributes. An example is querying an author with the name ”Jane Doe”. For record-based queries, users provide their own resource to query authors. An example is to find author and publication records from a dissertation record that does not exist in the database. We discuss how to accelerate record-based queries using our proposed record-to-clusterpairwise classification.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/5
Y1 - 2018/9/5
N2 - Author Name Disambiguation (AND) is the task of clustering unique author names from publication records in scholarly or related databases. Although AND has been extensively studied and has served as an important preprocessing step for several tasks (e.g. calculating bibliometrics and scientometrics for authors), there are few publicly available tools for disambiguation in large-scale scholarly databases. Furthermore, most of the disambiguated data is embedded within the search engines of the scholarly databases, and existing application programming interfaces (APIs) have limited features and are often unavailable for users for various reasons. This makes it difficult for researchers and developers to use the data for various applications (e.g. author search) or research. Here, we design a novel, web-based, RESTful API for searching disambiguated authors, using the PubMed database as a sample application. We offer two type of queries, attribute-based queries and record-based queries which serve different purposes. Attribute-based queries retrieve authors with the attributes available in the database. We study different search engines to find the most appropriate one for processing attribute-based queries. Record-based queries retrieve authors that are most likely to have written a query publication provided by a user. To accelerate record-based queries, we develop a novel algorithm that has a fast record-to-cluster match. We show that our algorithm can accelerate the query by a factor of 4.01 compared to a baseline naive approach.
AB - Author Name Disambiguation (AND) is the task of clustering unique author names from publication records in scholarly or related databases. Although AND has been extensively studied and has served as an important preprocessing step for several tasks (e.g. calculating bibliometrics and scientometrics for authors), there are few publicly available tools for disambiguation in large-scale scholarly databases. Furthermore, most of the disambiguated data is embedded within the search engines of the scholarly databases, and existing application programming interfaces (APIs) have limited features and are often unavailable for users for various reasons. This makes it difficult for researchers and developers to use the data for various applications (e.g. author search) or research. Here, we design a novel, web-based, RESTful API for searching disambiguated authors, using the PubMed database as a sample application. We offer two type of queries, attribute-based queries and record-based queries which serve different purposes. Attribute-based queries retrieve authors with the attributes available in the database. We study different search engines to find the most appropriate one for processing attribute-based queries. Record-based queries retrieve authors that are most likely to have written a query publication provided by a user. To accelerate record-based queries, we develop a novel algorithm that has a fast record-to-cluster match. We show that our algorithm can accelerate the query by a factor of 4.01 compared to a baseline naive approach.
UR - http://www.scopus.com/inward/record.url?scp=85054007073&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054007073&partnerID=8YFLogxK
U2 - 10.1109/ICWS.2018.00041
DO - 10.1109/ICWS.2018.00041
M3 - Conference contribution
AN - SCOPUS:85054007073
SN - 9781538672471
T3 - Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services
SP - 265
EP - 273
BT - Proceedings - 2018 IEEE International Conference on Web Services, ICWS 2018 - Part of the 2018 IEEE World Congress on Services
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Conference on Web Services, ICWS 2018
Y2 - 2 July 2018 through 7 July 2018
ER -