Scalable algorithms for scholarly figure mining and semantics

Sagnik Ray Choudhury, Shuting Wang, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Most scholarly papers contain one or multiple figures. Often these figures show experimental results, e.g, line graphs are used to compare various methods. Compared to the text of the paper, figures and their semantics have received relatively less attention. This has significantly limited semantic search capabilities in scholarly search engines. Here, we report scalable algorithms for generating semantic metadata for figures. Our system has four sequential modules: 1. Extraction of figure, caption and mention; 2. Binary classification of figures as compound (contains sub-figures) or not; 3. Three class classification of non compound figures as line graph, bar graph or others; and 4. Automatic processing of line graphs to generate a textual summary. In each step a metadata file is generated, each having richer information than the previous one. The algorithms are scalable yet each individual step has an accuracy greater than 80%.

Original languageEnglish (US)
Title of host publicationProceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference
EditorsLe Gruenwald, Sven Groppe
PublisherAssociation for Computing Machinery
ISBN (Print)9781450342995
DOIs
StatePublished - Jun 26 2016
Event2016 International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference - San Francisco, United States
Duration: Jul 1 2016 → …

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2016 International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference
CountryUnited States
CitySan Francisco
Period7/1/16 → …

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Choudhury, S. R., Wang, S., & Giles, C. L. (2016). Scalable algorithms for scholarly figure mining and semantics. In L. Gruenwald, & S. Groppe (Eds.), Proceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference [a1] (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery. https://doi.org/10.1145/2928294.2928305