TY - JOUR
T1 - Identifying valuable information from Twitter during natural disasters
AU - Truong, Brandon
AU - Caragea, Cornelia
AU - Squicciarini, Anna
AU - Tapia, Andrea H.
PY - 2014
Y1 - 2014
N2 - Social media is a vital source of information during any major event, especially natural disasters. However, with the exponential increase in volume of social media data, so comes the increase in conversational data that does not provide valuable information, especially in the context of disaster events, thus, diminishing peoples' ability to find the information that they need in order to organize relief efforts, find help, and potentially save lives. This project focuses on the development of a Bayesian approach to the classification of tweets (posts on Twitter) during Hurricane Sandy in order to distinguish "informational" from "conversational" tweets. We designed an effective set of features and used them as input to Naïve Bayes classifiers. In comparison to a "bag of words" approach, the new feature set provides similar results in the classification of tweets. However, the designed feature set contains only 9 features compared with more than 3000 features for "bag of words." When the feature set is combined with "bag of words", accuracy achieves 85.2914%. If integrated into disaster-related systems, our approach can serve as a boon to any person or organization seeking to extract useful information in the midst of a natural disaster.
AB - Social media is a vital source of information during any major event, especially natural disasters. However, with the exponential increase in volume of social media data, so comes the increase in conversational data that does not provide valuable information, especially in the context of disaster events, thus, diminishing peoples' ability to find the information that they need in order to organize relief efforts, find help, and potentially save lives. This project focuses on the development of a Bayesian approach to the classification of tweets (posts on Twitter) during Hurricane Sandy in order to distinguish "informational" from "conversational" tweets. We designed an effective set of features and used them as input to Naïve Bayes classifiers. In comparison to a "bag of words" approach, the new feature set provides similar results in the classification of tweets. However, the designed feature set contains only 9 features compared with more than 3000 features for "bag of words." When the feature set is combined with "bag of words", accuracy achieves 85.2914%. If integrated into disaster-related systems, our approach can serve as a boon to any person or organization seeking to extract useful information in the midst of a natural disaster.
UR - http://www.scopus.com/inward/record.url?scp=84961634327&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84961634327&partnerID=8YFLogxK
U2 - 10.1002/meet.2014.14505101162
DO - 10.1002/meet.2014.14505101162
M3 - Article
AN - SCOPUS:84961634327
VL - 51
JO - Proceedings of the ASIST Annual Meeting
JF - Proceedings of the ASIST Annual Meeting
SN - 1550-8390
IS - 1
ER -