Identifying valuable information from Twitter during natural disasters

Brandon Truong, Cornelia Caragea, Anna Squicciarini, Andrea H. Tapia

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Social media is a vital source of information during any major event, especially natural disasters. However, with the exponential increase in volume of social media data, so comes the increase in conversational data that does not provide valuable information, especially in the context of disaster events, thus, diminishing peoples' ability to find the information that they need in order to organize relief efforts, find help, and potentially save lives. This project focuses on the development of a Bayesian approach to the classification of tweets (posts on Twitter) during Hurricane Sandy in order to distinguish "informational" from "conversational" tweets. We designed an effective set of features and used them as input to Naïve Bayes classifiers. In comparison to a "bag of words" approach, the new feature set provides similar results in the classification of tweets. However, the designed feature set contains only 9 features compared with more than 3000 features for "bag of words." When the feature set is combined with "bag of words", accuracy achieves 85.2914%. If integrated into disaster-related systems, our approach can serve as a boon to any person or organization seeking to extract useful information in the midst of a natural disaster.

Original languageEnglish (US)
JournalProceedings of the ASIST Annual Meeting
Volume51
Issue number1
DOIs
StatePublished - Jan 1 2014

Fingerprint

twitter
Disasters
natural disaster
social media
disaster
major event
source of information
Hurricanes
organization
Classifiers
human being
event
ability

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Library and Information Sciences

Cite this

@article{65819e39869247b0a459fa9135dd4c92,
title = "Identifying valuable information from Twitter during natural disasters",
abstract = "Social media is a vital source of information during any major event, especially natural disasters. However, with the exponential increase in volume of social media data, so comes the increase in conversational data that does not provide valuable information, especially in the context of disaster events, thus, diminishing peoples' ability to find the information that they need in order to organize relief efforts, find help, and potentially save lives. This project focuses on the development of a Bayesian approach to the classification of tweets (posts on Twitter) during Hurricane Sandy in order to distinguish {"}informational{"} from {"}conversational{"} tweets. We designed an effective set of features and used them as input to Na{\"i}ve Bayes classifiers. In comparison to a {"}bag of words{"} approach, the new feature set provides similar results in the classification of tweets. However, the designed feature set contains only 9 features compared with more than 3000 features for {"}bag of words.{"} When the feature set is combined with {"}bag of words{"}, accuracy achieves 85.2914{\%}. If integrated into disaster-related systems, our approach can serve as a boon to any person or organization seeking to extract useful information in the midst of a natural disaster.",
author = "Brandon Truong and Cornelia Caragea and Anna Squicciarini and Tapia, {Andrea H.}",
year = "2014",
month = "1",
day = "1",
doi = "10.1002/meet.2014.14505101162",
language = "English (US)",
volume = "51",
journal = "Proceedings of the ASIST Annual Meeting",
issn = "1550-8390",
publisher = "Learned Information",
number = "1",

}

Identifying valuable information from Twitter during natural disasters. / Truong, Brandon; Caragea, Cornelia; Squicciarini, Anna; Tapia, Andrea H.

In: Proceedings of the ASIST Annual Meeting, Vol. 51, No. 1, 01.01.2014.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Identifying valuable information from Twitter during natural disasters

AU - Truong, Brandon

AU - Caragea, Cornelia

AU - Squicciarini, Anna

AU - Tapia, Andrea H.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Social media is a vital source of information during any major event, especially natural disasters. However, with the exponential increase in volume of social media data, so comes the increase in conversational data that does not provide valuable information, especially in the context of disaster events, thus, diminishing peoples' ability to find the information that they need in order to organize relief efforts, find help, and potentially save lives. This project focuses on the development of a Bayesian approach to the classification of tweets (posts on Twitter) during Hurricane Sandy in order to distinguish "informational" from "conversational" tweets. We designed an effective set of features and used them as input to Naïve Bayes classifiers. In comparison to a "bag of words" approach, the new feature set provides similar results in the classification of tweets. However, the designed feature set contains only 9 features compared with more than 3000 features for "bag of words." When the feature set is combined with "bag of words", accuracy achieves 85.2914%. If integrated into disaster-related systems, our approach can serve as a boon to any person or organization seeking to extract useful information in the midst of a natural disaster.

AB - Social media is a vital source of information during any major event, especially natural disasters. However, with the exponential increase in volume of social media data, so comes the increase in conversational data that does not provide valuable information, especially in the context of disaster events, thus, diminishing peoples' ability to find the information that they need in order to organize relief efforts, find help, and potentially save lives. This project focuses on the development of a Bayesian approach to the classification of tweets (posts on Twitter) during Hurricane Sandy in order to distinguish "informational" from "conversational" tweets. We designed an effective set of features and used them as input to Naïve Bayes classifiers. In comparison to a "bag of words" approach, the new feature set provides similar results in the classification of tweets. However, the designed feature set contains only 9 features compared with more than 3000 features for "bag of words." When the feature set is combined with "bag of words", accuracy achieves 85.2914%. If integrated into disaster-related systems, our approach can serve as a boon to any person or organization seeking to extract useful information in the midst of a natural disaster.

UR - http://www.scopus.com/inward/record.url?scp=84961634327&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961634327&partnerID=8YFLogxK

U2 - 10.1002/meet.2014.14505101162

DO - 10.1002/meet.2014.14505101162

M3 - Article

AN - SCOPUS:84961634327

VL - 51

JO - Proceedings of the ASIST Annual Meeting

JF - Proceedings of the ASIST Annual Meeting

SN - 1550-8390

IS - 1

ER -