Automatic theory generation from analyst text files using coherence networks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.

Original languageEnglish (US)
Title of host publicationNext-Generation Analyst II
PublisherSPIE
Volume9122
ISBN (Print)9781628410594
DOIs
StatePublished - Jan 1 2014
EventNext-Generation Analyst II - Baltimore, MD, United States
Duration: May 6 2014May 6 2014

Other

OtherNext-Generation Analyst II
CountryUnited States
CityBaltimore, MD
Period5/6/145/6/14

Fingerprint

Electric network analysis
files
natural language processing
Tuning
Genetic algorithms
Semantics
Display devices
network analysis
semantics
Processing
genetic algorithms
Semantic Network
counters
Wikipedia
tuning
Network Analysis
Predicate
Natural Language
optimization
Display

All Science Journal Classification (ASJC) codes

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Cite this

@inproceedings{fb266a323bee48ebb65a9b7c6d642c2c,
title = "Automatic theory generation from analyst text files using coherence networks",
abstract = "This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.",
author = "Steven Shaffer",
year = "2014",
month = "1",
day = "1",
doi = "10.1117/12.2049528",
language = "English (US)",
isbn = "9781628410594",
volume = "9122",
booktitle = "Next-Generation Analyst II",
publisher = "SPIE",
address = "United States",

}

Shaffer, S 2014, Automatic theory generation from analyst text files using coherence networks. in Next-Generation Analyst II. vol. 9122, 912202, SPIE, Next-Generation Analyst II, Baltimore, MD, United States, 5/6/14. https://doi.org/10.1117/12.2049528

Automatic theory generation from analyst text files using coherence networks. / Shaffer, Steven.

Next-Generation Analyst II. Vol. 9122 SPIE, 2014. 912202.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Automatic theory generation from analyst text files using coherence networks

AU - Shaffer, Steven

PY - 2014/1/1

Y1 - 2014/1/1

N2 - This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.

AB - This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.

UR - http://www.scopus.com/inward/record.url?scp=84906337260&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84906337260&partnerID=8YFLogxK

U2 - 10.1117/12.2049528

DO - 10.1117/12.2049528

M3 - Conference contribution

SN - 9781628410594

VL - 9122

BT - Next-Generation Analyst II

PB - SPIE

ER -