Abstract
Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media.
Original language | English (US) |
---|---|
Title of host publication | 12th International AAAI Conference on Web and Social Media, ICWSM 2018 |
Publisher | AAAI press |
Pages | 330-339 |
Number of pages | 10 |
ISBN (Electronic) | 9781577357988 |
State | Published - Jan 1 2018 |
Event | 12th International AAAI Conference on Web and Social Media, ICWSM 2018 - Palo Alto, United States Duration: Jun 25 2018 → Jun 28 2018 |
Publication series
Name | 12th International AAAI Conference on Web and Social Media, ICWSM 2018 |
---|
Other
Other | 12th International AAAI Conference on Web and Social Media, ICWSM 2018 |
---|---|
Country | United States |
City | Palo Alto |
Period | 6/25/18 → 6/28/18 |
Fingerprint
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
Cite this
}
Anatomy of online hate : Developing a taxonomy and machine learning models for identifying and classifying hate in online news media. / Salminen, Joni; Almerekhi, Hind; Milenković, Milica; Jung, Soon Gyo; An, Jisun; Kwak, Haewoon; Jansen, Bernard James.
12th International AAAI Conference on Web and Social Media, ICWSM 2018. AAAI press, 2018. p. 330-339 (12th International AAAI Conference on Web and Social Media, ICWSM 2018).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
TY - GEN
T1 - Anatomy of online hate
T2 - Developing a taxonomy and machine learning models for identifying and classifying hate in online news media
AU - Salminen, Joni
AU - Almerekhi, Hind
AU - Milenković, Milica
AU - Jung, Soon Gyo
AU - An, Jisun
AU - Kwak, Haewoon
AU - Jansen, Bernard James
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media.
AB - Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media.
UR - http://www.scopus.com/inward/record.url?scp=85050643679&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050643679&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85050643679
T3 - 12th International AAAI Conference on Web and Social Media, ICWSM 2018
SP - 330
EP - 339
BT - 12th International AAAI Conference on Web and Social Media, ICWSM 2018
PB - AAAI press
ER -