Keyphrase Extraction in Scholarly Digital Library Search Engines

Krutarth Patel, Cornelia Caragea, Jian Wu, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scholarly digital libraries provide access to scientific publications and comprise useful resources for researchers who search for literature on specific subject areas. CiteSeerX is an example of such a digital library search engine that provides access to more than 10 million academic documents and has nearly one million users and three million hits per day. Artificial Intelligence (AI) technologies are used in many components of CiteSeerX including Web crawling, document ingestion, and metadata extraction. CiteSeerX also uses an unsupervised algorithm called noun phrase chunking (NP-Chunking) to extract keyphrases out of documents. However, often NP-Chunking extracts many unimportant noun phrases. In this paper, we investigate and contrast three supervised keyphrase extraction models to explore their deployment in CiteSeerX for extracting high quality keyphrases. To perform user evaluations on the keyphrases predicted by different models, we integrate a voting interface into CiteSeerX. We show the development and deployment of the keyphrase extraction models and the maintenance requirements.

Original languageEnglish (US)
Title of host publicationWeb Services – ICWS 2020 - 27th International Conference, Held as Part of the Services Conference Federation, SCF 2020, Proceedings
EditorsWei-Shinn Ku, Yasuhiko Kanemasa, Mohamed Adel Serhani, Liang-Jie Zhang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages179-196
Number of pages18
ISBN (Print)9783030596170
DOIs
StatePublished - 2020
Event27th International Conference on Web Services, ICWS 2020, held as part of the Services Conference Federation, SCF 2020 - Honolulu, United States
Duration: Sep 18 2020Sep 20 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12406 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Web Services, ICWS 2020, held as part of the Services Conference Federation, SCF 2020
CountryUnited States
CityHonolulu
Period9/18/209/20/20

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Keyphrase Extraction in Scholarly Digital Library Search Engines'. Together they form a unique fingerprint.

Cite this