NegAIT

A new parser for medical text simplification using morphological, sentential and double negation

Partha Mukherjee, Gondy Leroy, David Kauchak, Srinidhi Rajanarayanan, Damian Y. Romero Diaz, Nicole P. Yuan, T. Gail Pritchard, Sonia Colina

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Many different text features influence text readability and content comprehension. Negation is commonly suggested as one such feature, but few general-purpose tools exist to discover negation and studies of the impact of negation on text readability are rare. In this paper, we introduce a new negation parser (NegAIT) for detecting morphological, sentential, and double negation. We evaluated the parser using a human annotated gold standard containing 500 Wikipedia sentences and achieved 95%, 89% and 67% precision with 100%, 80%, and 67% recall, respectively. We also investigate two applications of this new negation parser. First, we performed a corpus statistics study to demonstrate different negation usage in easy and difficult text. Negation usage was compared in six corpora: patient blogs (4 K sentences), Cochrane reviews (91 K sentences), PubMed abstracts (20 K sentences), clinical trial texts (48 K sentences), and English and Simple English Wikipedia articles for different medical topics (60 K and 6 K sentences). The most difficult text contained the least negation. However, when comparing negation types, difficult texts (i.e., Cochrane, PubMed, English Wikipedia and clinical trials) contained significantly (p < 0.01) more morphological negations. Second, we conducted a predictive analytics study to show the importance of negation in distinguishing between easy and difficulty text. Five binary classifiers (Naïve Bayes, SVM, decision tree, logistic regression and linear regression) were trained using only negation information. All classifiers achieved better performance than the majority baseline. The Naïve Bayes’ classifier achieved the highest accuracy at 77% (9% higher than the majority baseline).

Original languageEnglish (US)
Pages (from-to)55-62
Number of pages8
JournalJournal of Biomedical Informatics
Volume69
DOIs
StatePublished - May 1 2017

Fingerprint

Classifiers
PubMed
Blogging
Clinical Trials
Decision Trees
Blogs
Decision trees
Linear regression
Logistics
Linear Models
Logistic Models
Statistics
Predictive analytics

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Health Informatics

Cite this

Mukherjee, Partha ; Leroy, Gondy ; Kauchak, David ; Rajanarayanan, Srinidhi ; Romero Diaz, Damian Y. ; Yuan, Nicole P. ; Pritchard, T. Gail ; Colina, Sonia. / NegAIT : A new parser for medical text simplification using morphological, sentential and double negation. In: Journal of Biomedical Informatics. 2017 ; Vol. 69. pp. 55-62.
@article{444f0164f71a4356bc77b69d8acf6959,
title = "NegAIT: A new parser for medical text simplification using morphological, sentential and double negation",
abstract = "Many different text features influence text readability and content comprehension. Negation is commonly suggested as one such feature, but few general-purpose tools exist to discover negation and studies of the impact of negation on text readability are rare. In this paper, we introduce a new negation parser (NegAIT) for detecting morphological, sentential, and double negation. We evaluated the parser using a human annotated gold standard containing 500 Wikipedia sentences and achieved 95{\%}, 89{\%} and 67{\%} precision with 100{\%}, 80{\%}, and 67{\%} recall, respectively. We also investigate two applications of this new negation parser. First, we performed a corpus statistics study to demonstrate different negation usage in easy and difficult text. Negation usage was compared in six corpora: patient blogs (4 K sentences), Cochrane reviews (91 K sentences), PubMed abstracts (20 K sentences), clinical trial texts (48 K sentences), and English and Simple English Wikipedia articles for different medical topics (60 K and 6 K sentences). The most difficult text contained the least negation. However, when comparing negation types, difficult texts (i.e., Cochrane, PubMed, English Wikipedia and clinical trials) contained significantly (p < 0.01) more morphological negations. Second, we conducted a predictive analytics study to show the importance of negation in distinguishing between easy and difficulty text. Five binary classifiers (Na{\"i}ve Bayes, SVM, decision tree, logistic regression and linear regression) were trained using only negation information. All classifiers achieved better performance than the majority baseline. The Na{\"i}ve Bayes’ classifier achieved the highest accuracy at 77{\%} (9{\%} higher than the majority baseline).",
author = "Partha Mukherjee and Gondy Leroy and David Kauchak and Srinidhi Rajanarayanan and {Romero Diaz}, {Damian Y.} and Yuan, {Nicole P.} and Pritchard, {T. Gail} and Sonia Colina",
year = "2017",
month = "5",
day = "1",
doi = "10.1016/j.jbi.2017.03.014",
language = "English (US)",
volume = "69",
pages = "55--62",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",

}

Mukherjee, P, Leroy, G, Kauchak, D, Rajanarayanan, S, Romero Diaz, DY, Yuan, NP, Pritchard, TG & Colina, S 2017, 'NegAIT: A new parser for medical text simplification using morphological, sentential and double negation', Journal of Biomedical Informatics, vol. 69, pp. 55-62. https://doi.org/10.1016/j.jbi.2017.03.014

NegAIT : A new parser for medical text simplification using morphological, sentential and double negation. / Mukherjee, Partha; Leroy, Gondy; Kauchak, David; Rajanarayanan, Srinidhi; Romero Diaz, Damian Y.; Yuan, Nicole P.; Pritchard, T. Gail; Colina, Sonia.

In: Journal of Biomedical Informatics, Vol. 69, 01.05.2017, p. 55-62.

Research output: Contribution to journalArticle

TY - JOUR

T1 - NegAIT

T2 - A new parser for medical text simplification using morphological, sentential and double negation

AU - Mukherjee, Partha

AU - Leroy, Gondy

AU - Kauchak, David

AU - Rajanarayanan, Srinidhi

AU - Romero Diaz, Damian Y.

AU - Yuan, Nicole P.

AU - Pritchard, T. Gail

AU - Colina, Sonia

PY - 2017/5/1

Y1 - 2017/5/1

N2 - Many different text features influence text readability and content comprehension. Negation is commonly suggested as one such feature, but few general-purpose tools exist to discover negation and studies of the impact of negation on text readability are rare. In this paper, we introduce a new negation parser (NegAIT) for detecting morphological, sentential, and double negation. We evaluated the parser using a human annotated gold standard containing 500 Wikipedia sentences and achieved 95%, 89% and 67% precision with 100%, 80%, and 67% recall, respectively. We also investigate two applications of this new negation parser. First, we performed a corpus statistics study to demonstrate different negation usage in easy and difficult text. Negation usage was compared in six corpora: patient blogs (4 K sentences), Cochrane reviews (91 K sentences), PubMed abstracts (20 K sentences), clinical trial texts (48 K sentences), and English and Simple English Wikipedia articles for different medical topics (60 K and 6 K sentences). The most difficult text contained the least negation. However, when comparing negation types, difficult texts (i.e., Cochrane, PubMed, English Wikipedia and clinical trials) contained significantly (p < 0.01) more morphological negations. Second, we conducted a predictive analytics study to show the importance of negation in distinguishing between easy and difficulty text. Five binary classifiers (Naïve Bayes, SVM, decision tree, logistic regression and linear regression) were trained using only negation information. All classifiers achieved better performance than the majority baseline. The Naïve Bayes’ classifier achieved the highest accuracy at 77% (9% higher than the majority baseline).

AB - Many different text features influence text readability and content comprehension. Negation is commonly suggested as one such feature, but few general-purpose tools exist to discover negation and studies of the impact of negation on text readability are rare. In this paper, we introduce a new negation parser (NegAIT) for detecting morphological, sentential, and double negation. We evaluated the parser using a human annotated gold standard containing 500 Wikipedia sentences and achieved 95%, 89% and 67% precision with 100%, 80%, and 67% recall, respectively. We also investigate two applications of this new negation parser. First, we performed a corpus statistics study to demonstrate different negation usage in easy and difficult text. Negation usage was compared in six corpora: patient blogs (4 K sentences), Cochrane reviews (91 K sentences), PubMed abstracts (20 K sentences), clinical trial texts (48 K sentences), and English and Simple English Wikipedia articles for different medical topics (60 K and 6 K sentences). The most difficult text contained the least negation. However, when comparing negation types, difficult texts (i.e., Cochrane, PubMed, English Wikipedia and clinical trials) contained significantly (p < 0.01) more morphological negations. Second, we conducted a predictive analytics study to show the importance of negation in distinguishing between easy and difficulty text. Five binary classifiers (Naïve Bayes, SVM, decision tree, logistic regression and linear regression) were trained using only negation information. All classifiers achieved better performance than the majority baseline. The Naïve Bayes’ classifier achieved the highest accuracy at 77% (9% higher than the majority baseline).

UR - http://www.scopus.com/inward/record.url?scp=85016487612&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016487612&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2017.03.014

DO - 10.1016/j.jbi.2017.03.014

M3 - Article

VL - 69

SP - 55

EP - 62

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

ER -