TY - JOUR
T1 - CiteSeerX
T2 - AI in a digital library search engine
AU - Wu, Jian
AU - William, Kyle
AU - Chen, Hung Hsuan
AU - Khabsa, Madian
AU - Caragea, Cornelia
AU - Tuarob, Suppawong
AU - Ororbia, Alexander
AU - Jordan, Douglas
AU - Mitra, Prasenjit
AU - Giles, C. Lee
N1 - Funding Information:
We acknowledge partial support from the National Science Foundation and suggestions from Robert Neches.
Publisher Copyright:
Copyright © 2015, Association for the Advancement of Artificial Intelligence. All rights reserved.
PY - 2015/9/1
Y1 - 2015/9/1
N2 - CiteSeerX is a digital library search engine that provides access to more than 5 million scholarly documents with nearly a million users and millions of hits per day. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. We also present AI technologies, implemented in table and algorithm search, that are special search modes in CiteSeerX. While it is challenging to rebuild a system like Cite-SeerX from scratch, many of these AI technologies are transferable to other digital libraries and search engines.
AB - CiteSeerX is a digital library search engine that provides access to more than 5 million scholarly documents with nearly a million users and millions of hits per day. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. We also present AI technologies, implemented in table and algorithm search, that are special search modes in CiteSeerX. While it is challenging to rebuild a system like Cite-SeerX from scratch, many of these AI technologies are transferable to other digital libraries and search engines.
UR - http://www.scopus.com/inward/record.url?scp=84975034118&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84975034118&partnerID=8YFLogxK
U2 - 10.1609/aimag.v36i3.2601
DO - 10.1609/aimag.v36i3.2601
M3 - Article
AN - SCOPUS:84975034118
VL - 36
SP - 35
EP - 48
JO - AI Magazine
JF - AI Magazine
SN - 0738-4602
IS - 3
ER -