Information extraction for scholarly digital libraries

Kyle Williams, Jian Wu, Zhaohui Wu, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations


Scholarly documents contain many data entities, such as titles, authors, affiliations, figures, and tables. These entities can be used to enhance digital library services through enhanced metadata and enable the development of new services and tools for interacting with and exploring scholarly data. However, in a world of scholarly big data, extracting these entities in a scalable, efficient and accurate manner can be challenging. In this tutorial, we introduce the broad field of information extraction for scholarly digital libraries. Drawing on our experience in running the Cite-SeerX digital library, which has performed information extraction on over 7 million academic documents, we argue for the need for automatic information extraction, describe different approaches for performing information extraction, present tools and datasets that are readily available, and describe best practices and areas of research interest.

Original languageEnglish (US)
Title of host publicationJCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages2
ISBN (Electronic)9781450342292
StatePublished - Sep 1 2016
Event16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016 - Newark, United States
Duration: Jun 19 2016Jun 23 2016

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996


Other16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016
Country/TerritoryUnited States

All Science Journal Classification (ASJC) codes

  • Engineering(all)


Dive into the research topics of 'Information extraction for scholarly digital libraries'. Together they form a unique fingerprint.

Cite this