Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents

Saurabh Kataria, William Browuer, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.

Original languageEnglish (US)
Title of host publicationAAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference
Pages1169-1174
Number of pages6
StatePublished - Dec 24 2008
Event23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08 - Chicago, IL, United States
Duration: Jul 13 2008Jul 17 2008

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume2

Other

Other23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08
CountryUnited States
CityChicago, IL
Period7/13/087/17/08

Fingerprint

Labels
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence

Cite this

Kataria, S., Browuer, W., Mitra, P., & Giles, C. L. (2008). Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents. In AAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference (pp. 1169-1174). (Proceedings of the National Conference on Artificial Intelligence; Vol. 2).
Kataria, Saurabh ; Browuer, William ; Mitra, Prasenjit ; Giles, C. Lee. / Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents. AAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference. 2008. pp. 1169-1174 (Proceedings of the National Conference on Artificial Intelligence).
@inproceedings{648b1d63b12041ad81b8363866563894,
title = "Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents",
abstract = "Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.",
author = "Saurabh Kataria and William Browuer and Prasenjit Mitra and Giles, {C. Lee}",
year = "2008",
month = "12",
day = "24",
language = "English (US)",
isbn = "9781577353683",
series = "Proceedings of the National Conference on Artificial Intelligence",
pages = "1169--1174",
booktitle = "AAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference",

}

Kataria, S, Browuer, W, Mitra, P & Giles, CL 2008, Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents. in AAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference. Proceedings of the National Conference on Artificial Intelligence, vol. 2, pp. 1169-1174, 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08, Chicago, IL, United States, 7/13/08.

Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents. / Kataria, Saurabh; Browuer, William; Mitra, Prasenjit; Giles, C. Lee.

AAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference. 2008. p. 1169-1174 (Proceedings of the National Conference on Artificial Intelligence; Vol. 2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents

AU - Kataria, Saurabh

AU - Browuer, William

AU - Mitra, Prasenjit

AU - Giles, C. Lee

PY - 2008/12/24

Y1 - 2008/12/24

N2 - Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.

AB - Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.

UR - http://www.scopus.com/inward/record.url?scp=57749191459&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57749191459&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:57749191459

SN - 9781577353683

T3 - Proceedings of the National Conference on Artificial Intelligence

SP - 1169

EP - 1174

BT - AAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference

ER -

Kataria S, Browuer W, Mitra P, Giles CL. Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents. In AAAI-08/IAAI-08 Proceedings - 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference. 2008. p. 1169-1174. (Proceedings of the National Conference on Artificial Intelligence).