Automatic extraction of data from bar charts

Rabah A. Al-Zaidy, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Scientific charts are an effective tool to visualize numerical data trends. They appear in a wide range of contexts, from experimental results in scientific papers to statistical anal- yses in business reports. The abundance of scientific charts in the web has made it inevitable for search engines to in- clude them as indexed content. However, the queries based on only the textual data used to tag the images can limit query results. Many studies exist to address the extraction of data from scientific diagrams in order to improve search results. In our approach to achieving this goal, we attempt to enhance the semantic labeling of the charts by using the original data values that these charts were designed to rep- resent. In this paper, we describe a method to extract data values from a specific class of charts, bar charts. The extrac- tion process is fully automated using image processing and text recognition techniques combined with various heuristics derived from the graphical properties of bar charts. The ex- tracted information can be used to enrich the indexing con- tent for bar charts and improve search results. We evaluate the effectiveness of our method on bar charts drawn from the web as well as charts embedded in digital documents.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450338493
DOIs
StatePublished - Oct 7 2015
Event8th International Conference on Knowledge Capture, K-CAP 2015 - Palisades, United States
Duration: Oct 7 2015Oct 10 2015

Publication series

NameProceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015

Other

Other8th International Conference on Knowledge Capture, K-CAP 2015
CountryUnited States
CityPalisades
Period10/7/1510/10/15

Fingerprint

Search engines
World Wide Web
Labeling
Image processing
Semantics
Industry

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Information Systems
  • Software

Cite this

Al-Zaidy, R. A., & Giles, C. L. (2015). Automatic extraction of data from bar charts. In Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015 [30] (Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015). Association for Computing Machinery, Inc. https://doi.org/10.1145/2815833.2816956
Al-Zaidy, Rabah A. ; Giles, C. Lee. / Automatic extraction of data from bar charts. Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015. Association for Computing Machinery, Inc, 2015. (Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015).
@inproceedings{34fce327e7334e47a4183095f758cc27,
title = "Automatic extraction of data from bar charts",
abstract = "Scientific charts are an effective tool to visualize numerical data trends. They appear in a wide range of contexts, from experimental results in scientific papers to statistical anal- yses in business reports. The abundance of scientific charts in the web has made it inevitable for search engines to in- clude them as indexed content. However, the queries based on only the textual data used to tag the images can limit query results. Many studies exist to address the extraction of data from scientific diagrams in order to improve search results. In our approach to achieving this goal, we attempt to enhance the semantic labeling of the charts by using the original data values that these charts were designed to rep- resent. In this paper, we describe a method to extract data values from a specific class of charts, bar charts. The extrac- tion process is fully automated using image processing and text recognition techniques combined with various heuristics derived from the graphical properties of bar charts. The ex- tracted information can be used to enrich the indexing con- tent for bar charts and improve search results. We evaluate the effectiveness of our method on bar charts drawn from the web as well as charts embedded in digital documents.",
author = "Al-Zaidy, {Rabah A.} and Giles, {C. Lee}",
year = "2015",
month = "10",
day = "7",
doi = "10.1145/2815833.2816956",
language = "English (US)",
series = "Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015",
publisher = "Association for Computing Machinery, Inc",
booktitle = "Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015",

}

Al-Zaidy, RA & Giles, CL 2015, Automatic extraction of data from bar charts. in Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015., 30, Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015, Association for Computing Machinery, Inc, 8th International Conference on Knowledge Capture, K-CAP 2015, Palisades, United States, 10/7/15. https://doi.org/10.1145/2815833.2816956

Automatic extraction of data from bar charts. / Al-Zaidy, Rabah A.; Giles, C. Lee.

Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015. Association for Computing Machinery, Inc, 2015. 30 (Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Automatic extraction of data from bar charts

AU - Al-Zaidy, Rabah A.

AU - Giles, C. Lee

PY - 2015/10/7

Y1 - 2015/10/7

N2 - Scientific charts are an effective tool to visualize numerical data trends. They appear in a wide range of contexts, from experimental results in scientific papers to statistical anal- yses in business reports. The abundance of scientific charts in the web has made it inevitable for search engines to in- clude them as indexed content. However, the queries based on only the textual data used to tag the images can limit query results. Many studies exist to address the extraction of data from scientific diagrams in order to improve search results. In our approach to achieving this goal, we attempt to enhance the semantic labeling of the charts by using the original data values that these charts were designed to rep- resent. In this paper, we describe a method to extract data values from a specific class of charts, bar charts. The extrac- tion process is fully automated using image processing and text recognition techniques combined with various heuristics derived from the graphical properties of bar charts. The ex- tracted information can be used to enrich the indexing con- tent for bar charts and improve search results. We evaluate the effectiveness of our method on bar charts drawn from the web as well as charts embedded in digital documents.

AB - Scientific charts are an effective tool to visualize numerical data trends. They appear in a wide range of contexts, from experimental results in scientific papers to statistical anal- yses in business reports. The abundance of scientific charts in the web has made it inevitable for search engines to in- clude them as indexed content. However, the queries based on only the textual data used to tag the images can limit query results. Many studies exist to address the extraction of data from scientific diagrams in order to improve search results. In our approach to achieving this goal, we attempt to enhance the semantic labeling of the charts by using the original data values that these charts were designed to rep- resent. In this paper, we describe a method to extract data values from a specific class of charts, bar charts. The extrac- tion process is fully automated using image processing and text recognition techniques combined with various heuristics derived from the graphical properties of bar charts. The ex- tracted information can be used to enrich the indexing con- tent for bar charts and improve search results. We evaluate the effectiveness of our method on bar charts drawn from the web as well as charts embedded in digital documents.

UR - http://www.scopus.com/inward/record.url?scp=84997523778&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84997523778&partnerID=8YFLogxK

U2 - 10.1145/2815833.2816956

DO - 10.1145/2815833.2816956

M3 - Conference contribution

AN - SCOPUS:84997523778

T3 - Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015

BT - Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015

PB - Association for Computing Machinery, Inc

ER -

Al-Zaidy RA, Giles CL. Automatic extraction of data from bar charts. In Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015. Association for Computing Machinery, Inc. 2015. 30. (Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015). https://doi.org/10.1145/2815833.2816956