Scalable algorithms for scholarly figure mining and semantics

Sagnik Ray Choudhury, Shuting Wang, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Most scholarly papers contain one or multiple figures. Often these figures show experimental results, e.g, line graphs are used to compare various methods. Compared to the text of the paper, figures and their semantics have received relatively less attention. This has significantly limited semantic search capabilities in scholarly search engines. Here, we report scalable algorithms for generating semantic metadata for figures. Our system has four sequential modules: 1. Extraction of figure, caption and mention; 2. Binary classification of figures as compound (contains sub-figures) or not; 3. Three class classification of non compound figures as line graph, bar graph or others; and 4. Automatic processing of line graphs to generate a textual summary. In each step a metadata file is generated, each having richer information than the previous one. The algorithms are scalable yet each individual step has an accuracy greater than 80%.

Original languageEnglish (US)
Title of host publicationProceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference
EditorsLe Gruenwald, Sven Groppe
PublisherAssociation for Computing Machinery
ISBN (Print)9781450342995
DOIs
StatePublished - Jun 26 2016
Event2016 International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference - San Francisco, United States
Duration: Jul 1 2016 → …

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2016 International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference
CountryUnited States
CitySan Francisco
Period7/1/16 → …

Fingerprint

Semantics
Metadata
Search engines
Processing

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Choudhury, S. R., Wang, S., & Giles, C. L. (2016). Scalable algorithms for scholarly figure mining and semantics. In L. Gruenwald, & S. Groppe (Eds.), Proceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference [a1] (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery. https://doi.org/10.1145/2928294.2928305
Choudhury, Sagnik Ray ; Wang, Shuting ; Giles, C. Lee. / Scalable algorithms for scholarly figure mining and semantics. Proceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference. editor / Le Gruenwald ; Sven Groppe. Association for Computing Machinery, 2016. (Proceedings of the ACM SIGMOD International Conference on Management of Data).
@inproceedings{885d9faffc714f77956d90af6e0a1e40,
title = "Scalable algorithms for scholarly figure mining and semantics",
abstract = "Most scholarly papers contain one or multiple figures. Often these figures show experimental results, e.g, line graphs are used to compare various methods. Compared to the text of the paper, figures and their semantics have received relatively less attention. This has significantly limited semantic search capabilities in scholarly search engines. Here, we report scalable algorithms for generating semantic metadata for figures. Our system has four sequential modules: 1. Extraction of figure, caption and mention; 2. Binary classification of figures as compound (contains sub-figures) or not; 3. Three class classification of non compound figures as line graph, bar graph or others; and 4. Automatic processing of line graphs to generate a textual summary. In each step a metadata file is generated, each having richer information than the previous one. The algorithms are scalable yet each individual step has an accuracy greater than 80{\%}.",
author = "Choudhury, {Sagnik Ray} and Shuting Wang and Giles, {C. Lee}",
year = "2016",
month = "6",
day = "26",
doi = "10.1145/2928294.2928305",
language = "English (US)",
isbn = "9781450342995",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
publisher = "Association for Computing Machinery",
editor = "Le Gruenwald and Sven Groppe",
booktitle = "Proceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference",

}

Choudhury, SR, Wang, S & Giles, CL 2016, Scalable algorithms for scholarly figure mining and semantics. in L Gruenwald & S Groppe (eds), Proceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference., a1, Proceedings of the ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, 2016 International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference, San Francisco, United States, 7/1/16. https://doi.org/10.1145/2928294.2928305

Scalable algorithms for scholarly figure mining and semantics. / Choudhury, Sagnik Ray; Wang, Shuting; Giles, C. Lee.

Proceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference. ed. / Le Gruenwald; Sven Groppe. Association for Computing Machinery, 2016. a1 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Scalable algorithms for scholarly figure mining and semantics

AU - Choudhury, Sagnik Ray

AU - Wang, Shuting

AU - Giles, C. Lee

PY - 2016/6/26

Y1 - 2016/6/26

N2 - Most scholarly papers contain one or multiple figures. Often these figures show experimental results, e.g, line graphs are used to compare various methods. Compared to the text of the paper, figures and their semantics have received relatively less attention. This has significantly limited semantic search capabilities in scholarly search engines. Here, we report scalable algorithms for generating semantic metadata for figures. Our system has four sequential modules: 1. Extraction of figure, caption and mention; 2. Binary classification of figures as compound (contains sub-figures) or not; 3. Three class classification of non compound figures as line graph, bar graph or others; and 4. Automatic processing of line graphs to generate a textual summary. In each step a metadata file is generated, each having richer information than the previous one. The algorithms are scalable yet each individual step has an accuracy greater than 80%.

AB - Most scholarly papers contain one or multiple figures. Often these figures show experimental results, e.g, line graphs are used to compare various methods. Compared to the text of the paper, figures and their semantics have received relatively less attention. This has significantly limited semantic search capabilities in scholarly search engines. Here, we report scalable algorithms for generating semantic metadata for figures. Our system has four sequential modules: 1. Extraction of figure, caption and mention; 2. Binary classification of figures as compound (contains sub-figures) or not; 3. Three class classification of non compound figures as line graph, bar graph or others; and 4. Automatic processing of line graphs to generate a textual summary. In each step a metadata file is generated, each having richer information than the previous one. The algorithms are scalable yet each individual step has an accuracy greater than 80%.

UR - http://www.scopus.com/inward/record.url?scp=85045211577&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045211577&partnerID=8YFLogxK

U2 - 10.1145/2928294.2928305

DO - 10.1145/2928294.2928305

M3 - Conference contribution

AN - SCOPUS:85045211577

SN - 9781450342995

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

BT - Proceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference

A2 - Gruenwald, Le

A2 - Groppe, Sven

PB - Association for Computing Machinery

ER -

Choudhury SR, Wang S, Giles CL. Scalable algorithms for scholarly figure mining and semantics. In Gruenwald L, Groppe S, editors, Proceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference. Association for Computing Machinery. 2016. a1. (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/2928294.2928305