Automatic summary generation for scientific data charts

Rabah A. Al-Zaidy, Sagnik Ray Choudhury, Clyde Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Scientific charts in the web, whether as images or embedded in digital documents, contain valuable information that is not fully available to information retrieval tools. The information used to describe these charts is typically extracted from the image metadata rather than the information the graphic was initially designed to express. The problem of understanding digital charts found in scholarly documents, and inferring useful textual information from their graphical components is the focus of this study. We present an approach to automatically read the chart data, specifically bar charts, and provide the user with a textual summary of the chart. The proposed method follows a knowledge discovery approach that relies on a versatile graph representation of the chart. This representation is derived from analyzing a chart's original data values, from which useful features are extracted. The data features are in turn used to construct a semantic-graph. To generate a summary, the semantic-graph of the chart is mapped to appropriately crafted protoforms, which are constructs based on fuzzy logic. We verify the effectiveness of our framework by conducting experiments on bar charts extracted from over 1,000 PDF documents. Our preliminary results show that, under certain assumptions, 83% of the produced summaries provide plausible descriptions of the bar charts.

Original languageEnglish (US)
Title of host publicationWS-16-01
Subtitle of host publicationArtificial Intelligence Applied to Assistive Technologies and Smart Environments; WS-16-02: AI, Ethics, and Society; WS-16-03: Artificial Intelligence for Cyber Security; WS-16-04: Artificial Intelligence for Smart Grids and Smart Buildings; WS-16-05: Beyond NP; WS-16-06: Computer Poker and Imperfect Information Games; WS-16-07: Declarative Learning Based Programming; WS-16-08: Expanding the Boundaries of Health Informatics Using AI; WS-16-09: Incentives and Trust in Electronic Communities; WS-16-10: Knowledge Extraction from Text; WS-16-11: Multiagent Interaction without Prior Coordination; WS-16-12: Planning for Hybrid Systems; WS-16-13: Scholarly Big Data: AI Perspectives, Challenges, and Ideas; WS-16-14: Symbiotic Cognitive Systems; WS-16-15: World Wide Web and Population Health Intelligence
PublisherAI Access Foundation
Pages658-663
Number of pages6
VolumeWS-16-01 - WS-16-15
ISBN (Electronic)9781577357599
StatePublished - Jan 1 2016
Event30th AAAI Conference on Artificial Intelligence, AAAI 2016 - Phoenix, United States
Duration: Feb 12 2016Feb 17 2016

Other

Other30th AAAI Conference on Artificial Intelligence, AAAI 2016
CountryUnited States
CityPhoenix
Period2/12/162/17/16

Fingerprint

Semantics
Metadata
Information retrieval
Fuzzy logic
Data mining
Experiments

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Al-Zaidy, R. A., Choudhury, S. R., & Giles, C. L. (2016). Automatic summary generation for scientific data charts. In WS-16-01: Artificial Intelligence Applied to Assistive Technologies and Smart Environments; WS-16-02: AI, Ethics, and Society; WS-16-03: Artificial Intelligence for Cyber Security; WS-16-04: Artificial Intelligence for Smart Grids and Smart Buildings; WS-16-05: Beyond NP; WS-16-06: Computer Poker and Imperfect Information Games; WS-16-07: Declarative Learning Based Programming; WS-16-08: Expanding the Boundaries of Health Informatics Using AI; WS-16-09: Incentives and Trust in Electronic Communities; WS-16-10: Knowledge Extraction from Text; WS-16-11: Multiagent Interaction without Prior Coordination; WS-16-12: Planning for Hybrid Systems; WS-16-13: Scholarly Big Data: AI Perspectives, Challenges, and Ideas; WS-16-14: Symbiotic Cognitive Systems; WS-16-15: World Wide Web and Population Health Intelligence (Vol. WS-16-01 - WS-16-15, pp. 658-663). AI Access Foundation.
Al-Zaidy, Rabah A. ; Choudhury, Sagnik Ray ; Giles, Clyde Lee. / Automatic summary generation for scientific data charts. WS-16-01: Artificial Intelligence Applied to Assistive Technologies and Smart Environments; WS-16-02: AI, Ethics, and Society; WS-16-03: Artificial Intelligence for Cyber Security; WS-16-04: Artificial Intelligence for Smart Grids and Smart Buildings; WS-16-05: Beyond NP; WS-16-06: Computer Poker and Imperfect Information Games; WS-16-07: Declarative Learning Based Programming; WS-16-08: Expanding the Boundaries of Health Informatics Using AI; WS-16-09: Incentives and Trust in Electronic Communities; WS-16-10: Knowledge Extraction from Text; WS-16-11: Multiagent Interaction without Prior Coordination; WS-16-12: Planning for Hybrid Systems; WS-16-13: Scholarly Big Data: AI Perspectives, Challenges, and Ideas; WS-16-14: Symbiotic Cognitive Systems; WS-16-15: World Wide Web and Population Health Intelligence. Vol. WS-16-01 - WS-16-15 AI Access Foundation, 2016. pp. 658-663
@inproceedings{a137243fb10e471f8ac3e81039a1289f,
title = "Automatic summary generation for scientific data charts",
abstract = "Scientific charts in the web, whether as images or embedded in digital documents, contain valuable information that is not fully available to information retrieval tools. The information used to describe these charts is typically extracted from the image metadata rather than the information the graphic was initially designed to express. The problem of understanding digital charts found in scholarly documents, and inferring useful textual information from their graphical components is the focus of this study. We present an approach to automatically read the chart data, specifically bar charts, and provide the user with a textual summary of the chart. The proposed method follows a knowledge discovery approach that relies on a versatile graph representation of the chart. This representation is derived from analyzing a chart's original data values, from which useful features are extracted. The data features are in turn used to construct a semantic-graph. To generate a summary, the semantic-graph of the chart is mapped to appropriately crafted protoforms, which are constructs based on fuzzy logic. We verify the effectiveness of our framework by conducting experiments on bar charts extracted from over 1,000 PDF documents. Our preliminary results show that, under certain assumptions, 83{\%} of the produced summaries provide plausible descriptions of the bar charts.",
author = "Al-Zaidy, {Rabah A.} and Choudhury, {Sagnik Ray} and Giles, {Clyde Lee}",
year = "2016",
month = "1",
day = "1",
language = "English (US)",
volume = "WS-16-01 - WS-16-15",
pages = "658--663",
booktitle = "WS-16-01",
publisher = "AI Access Foundation",
address = "United States",

}

Al-Zaidy, RA, Choudhury, SR & Giles, CL 2016, Automatic summary generation for scientific data charts. in WS-16-01: Artificial Intelligence Applied to Assistive Technologies and Smart Environments; WS-16-02: AI, Ethics, and Society; WS-16-03: Artificial Intelligence for Cyber Security; WS-16-04: Artificial Intelligence for Smart Grids and Smart Buildings; WS-16-05: Beyond NP; WS-16-06: Computer Poker and Imperfect Information Games; WS-16-07: Declarative Learning Based Programming; WS-16-08: Expanding the Boundaries of Health Informatics Using AI; WS-16-09: Incentives and Trust in Electronic Communities; WS-16-10: Knowledge Extraction from Text; WS-16-11: Multiagent Interaction without Prior Coordination; WS-16-12: Planning for Hybrid Systems; WS-16-13: Scholarly Big Data: AI Perspectives, Challenges, and Ideas; WS-16-14: Symbiotic Cognitive Systems; WS-16-15: World Wide Web and Population Health Intelligence. vol. WS-16-01 - WS-16-15, AI Access Foundation, pp. 658-663, 30th AAAI Conference on Artificial Intelligence, AAAI 2016, Phoenix, United States, 2/12/16.

Automatic summary generation for scientific data charts. / Al-Zaidy, Rabah A.; Choudhury, Sagnik Ray; Giles, Clyde Lee.

WS-16-01: Artificial Intelligence Applied to Assistive Technologies and Smart Environments; WS-16-02: AI, Ethics, and Society; WS-16-03: Artificial Intelligence for Cyber Security; WS-16-04: Artificial Intelligence for Smart Grids and Smart Buildings; WS-16-05: Beyond NP; WS-16-06: Computer Poker and Imperfect Information Games; WS-16-07: Declarative Learning Based Programming; WS-16-08: Expanding the Boundaries of Health Informatics Using AI; WS-16-09: Incentives and Trust in Electronic Communities; WS-16-10: Knowledge Extraction from Text; WS-16-11: Multiagent Interaction without Prior Coordination; WS-16-12: Planning for Hybrid Systems; WS-16-13: Scholarly Big Data: AI Perspectives, Challenges, and Ideas; WS-16-14: Symbiotic Cognitive Systems; WS-16-15: World Wide Web and Population Health Intelligence. Vol. WS-16-01 - WS-16-15 AI Access Foundation, 2016. p. 658-663.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Automatic summary generation for scientific data charts

AU - Al-Zaidy, Rabah A.

AU - Choudhury, Sagnik Ray

AU - Giles, Clyde Lee

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Scientific charts in the web, whether as images or embedded in digital documents, contain valuable information that is not fully available to information retrieval tools. The information used to describe these charts is typically extracted from the image metadata rather than the information the graphic was initially designed to express. The problem of understanding digital charts found in scholarly documents, and inferring useful textual information from their graphical components is the focus of this study. We present an approach to automatically read the chart data, specifically bar charts, and provide the user with a textual summary of the chart. The proposed method follows a knowledge discovery approach that relies on a versatile graph representation of the chart. This representation is derived from analyzing a chart's original data values, from which useful features are extracted. The data features are in turn used to construct a semantic-graph. To generate a summary, the semantic-graph of the chart is mapped to appropriately crafted protoforms, which are constructs based on fuzzy logic. We verify the effectiveness of our framework by conducting experiments on bar charts extracted from over 1,000 PDF documents. Our preliminary results show that, under certain assumptions, 83% of the produced summaries provide plausible descriptions of the bar charts.

AB - Scientific charts in the web, whether as images or embedded in digital documents, contain valuable information that is not fully available to information retrieval tools. The information used to describe these charts is typically extracted from the image metadata rather than the information the graphic was initially designed to express. The problem of understanding digital charts found in scholarly documents, and inferring useful textual information from their graphical components is the focus of this study. We present an approach to automatically read the chart data, specifically bar charts, and provide the user with a textual summary of the chart. The proposed method follows a knowledge discovery approach that relies on a versatile graph representation of the chart. This representation is derived from analyzing a chart's original data values, from which useful features are extracted. The data features are in turn used to construct a semantic-graph. To generate a summary, the semantic-graph of the chart is mapped to appropriately crafted protoforms, which are constructs based on fuzzy logic. We verify the effectiveness of our framework by conducting experiments on bar charts extracted from over 1,000 PDF documents. Our preliminary results show that, under certain assumptions, 83% of the produced summaries provide plausible descriptions of the bar charts.

UR - http://www.scopus.com/inward/record.url?scp=84974575964&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84974575964&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84974575964

VL - WS-16-01 - WS-16-15

SP - 658

EP - 663

BT - WS-16-01

PB - AI Access Foundation

ER -

Al-Zaidy RA, Choudhury SR, Giles CL. Automatic summary generation for scientific data charts. In WS-16-01: Artificial Intelligence Applied to Assistive Technologies and Smart Environments; WS-16-02: AI, Ethics, and Society; WS-16-03: Artificial Intelligence for Cyber Security; WS-16-04: Artificial Intelligence for Smart Grids and Smart Buildings; WS-16-05: Beyond NP; WS-16-06: Computer Poker and Imperfect Information Games; WS-16-07: Declarative Learning Based Programming; WS-16-08: Expanding the Boundaries of Health Informatics Using AI; WS-16-09: Incentives and Trust in Electronic Communities; WS-16-10: Knowledge Extraction from Text; WS-16-11: Multiagent Interaction without Prior Coordination; WS-16-12: Planning for Hybrid Systems; WS-16-13: Scholarly Big Data: AI Perspectives, Challenges, and Ideas; WS-16-14: Symbiotic Cognitive Systems; WS-16-15: World Wide Web and Population Health Intelligence. Vol. WS-16-01 - WS-16-15. AI Access Foundation. 2016. p. 658-663