A metadata generation system for scanned scientific volumes

Xiaonan Lu, Brewster Kahle, James Z. Wang, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

Large scale digitization projects have been conducted at digital libraries to preserve cultural artifacts and to provide permanent access. The increasing amount of digitized resources, including scanned books and scientific publications, requires development of tools and methods that will efficiently analyze and manage large collections of digitized resources. In this work, we tackle the problem of extracting metadata from scanned volumes of journals. Our goal is to extract information describing internal structures and content of scanned volumes, which is necessary for providing effective content access functionalities to digital library users. We propose methods for automatically generating volume level, issue level, and article level metadata based on format and text features extracted from OCRed text. We show the performance of our system on scanned bound historical documents nearly two centuries old. We have developed the system and integrated it into an operational digital library, the Internet Archive, for real-world usage.

Original languageEnglish (US)
Title of host publicationJCDL'08
Subtitle of host publicationProceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008
Pages167-176
Number of pages10
DOIs
StatePublished - 2008
Event8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08 - Pittsburgh, PA, United States
Duration: Jun 16 2008Jun 20 2008

Publication series

NameProceedings of the ACM International Conference on Digital Libraries

Other

Other8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08
CountryUnited States
CityPittsburgh, PA
Period6/16/086/20/08

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'A metadata generation system for scanned scientific volumes'. Together they form a unique fingerprint.

Cite this