A figure search engine architecture for a chemistry digital library

Sagnik Ray Choudhury, Suppawong Tuarob, Prasenjit Mitra, Lior Rokach, Andi Kirk, Silvia Szep, Donald Pellegrino, Sue Jones, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

Academic papers contain multiple figures representing important findings and experimental results; we present a search engine specifically focused on figures in academic documents. This search engine allows users to search on figures in approximately 150,000 chemistry journal articles though the method is easily extendable to other domains. Our system indexes figure caption and mentions extracted from the PDF in documents using a custom built extractor. Recall and precision performance of extracted figures is in the 80 to 90 % range. We give the frame work for the extraction algorithm, architecture and ranking function.

Original languageEnglish (US)
Title of host publicationJCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
Pages369-370
Number of pages2
DOIs
StatePublished - Aug 23 2013
Event13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013 - Indianapolis, IN, United States
Duration: Jul 22 2013Jul 26 2013

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013
CountryUnited States
CityIndianapolis, IN
Period7/22/137/26/13

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Choudhury, S. R., Tuarob, S., Mitra, P., Rokach, L., Kirk, A., Szep, S., Pellegrino, D., Jones, S., & Lee Giles, C. (2013). A figure search engine architecture for a chemistry digital library. In JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 369-370). (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). https://doi.org/10.1145/2467696.2467757