Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media

Joni Salminen, Hind Almerekhi, Milica Milenković, Soon Gyo Jung, Jisun An, Haewoon Kwak, Bernard James Jansen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media.

Original languageEnglish (US)
Title of host publication12th International AAAI Conference on Web and Social Media, ICWSM 2018
PublisherAAAI press
Pages330-339
Number of pages10
ISBN (Electronic)9781577357988
StatePublished - Jan 1 2018
Event12th International AAAI Conference on Web and Social Media, ICWSM 2018 - Palo Alto, United States
Duration: Jun 25 2018Jun 28 2018

Publication series

Name12th International AAAI Conference on Web and Social Media, ICWSM 2018

Other

Other12th International AAAI Conference on Web and Social Media, ICWSM 2018
CountryUnited States
CityPalo Alto
Period6/25/186/28/18

Fingerprint

Taxonomies
Learning systems
Adaptive boosting
Decision trees
Logistics
Labels
Health
Testing

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Cite this

Salminen, J., Almerekhi, H., Milenković, M., Jung, S. G., An, J., Kwak, H., & Jansen, B. J. (2018). Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media. In 12th International AAAI Conference on Web and Social Media, ICWSM 2018 (pp. 330-339). (12th International AAAI Conference on Web and Social Media, ICWSM 2018). AAAI press.
Salminen, Joni ; Almerekhi, Hind ; Milenković, Milica ; Jung, Soon Gyo ; An, Jisun ; Kwak, Haewoon ; Jansen, Bernard James. / Anatomy of online hate : Developing a taxonomy and machine learning models for identifying and classifying hate in online news media. 12th International AAAI Conference on Web and Social Media, ICWSM 2018. AAAI press, 2018. pp. 330-339 (12th International AAAI Conference on Web and Social Media, ICWSM 2018).
@inproceedings{e361892af6bf424ab010d5c9e39bfeb5,
title = "Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media",
abstract = "Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media.",
author = "Joni Salminen and Hind Almerekhi and Milica Milenković and Jung, {Soon Gyo} and Jisun An and Haewoon Kwak and Jansen, {Bernard James}",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
series = "12th International AAAI Conference on Web and Social Media, ICWSM 2018",
publisher = "AAAI press",
pages = "330--339",
booktitle = "12th International AAAI Conference on Web and Social Media, ICWSM 2018",

}

Salminen, J, Almerekhi, H, Milenković, M, Jung, SG, An, J, Kwak, H & Jansen, BJ 2018, Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media. in 12th International AAAI Conference on Web and Social Media, ICWSM 2018. 12th International AAAI Conference on Web and Social Media, ICWSM 2018, AAAI press, pp. 330-339, 12th International AAAI Conference on Web and Social Media, ICWSM 2018, Palo Alto, United States, 6/25/18.

Anatomy of online hate : Developing a taxonomy and machine learning models for identifying and classifying hate in online news media. / Salminen, Joni; Almerekhi, Hind; Milenković, Milica; Jung, Soon Gyo; An, Jisun; Kwak, Haewoon; Jansen, Bernard James.

12th International AAAI Conference on Web and Social Media, ICWSM 2018. AAAI press, 2018. p. 330-339 (12th International AAAI Conference on Web and Social Media, ICWSM 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Anatomy of online hate

T2 - Developing a taxonomy and machine learning models for identifying and classifying hate in online news media

AU - Salminen, Joni

AU - Almerekhi, Hind

AU - Milenković, Milica

AU - Jung, Soon Gyo

AU - An, Jisun

AU - Kwak, Haewoon

AU - Jansen, Bernard James

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media.

AB - Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media.

UR - http://www.scopus.com/inward/record.url?scp=85050643679&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050643679&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85050643679

T3 - 12th International AAAI Conference on Web and Social Media, ICWSM 2018

SP - 330

EP - 339

BT - 12th International AAAI Conference on Web and Social Media, ICWSM 2018

PB - AAAI press

ER -

Salminen J, Almerekhi H, Milenković M, Jung SG, An J, Kwak H et al. Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media. In 12th International AAAI Conference on Web and Social Media, ICWSM 2018. AAAI press. 2018. p. 330-339. (12th International AAAI Conference on Web and Social Media, ICWSM 2018).