Filtering offensive language in online communities using grammatical relations

Zhi Xu, Sencun Zhu

Research output: Contribution to conferencePaper

19 Citations (Scopus)

Abstract

Offensive language has arisen to be a big issue to the health of both online communities and their users. To the online community, the spread of offensive language undermines its reputation, drives users away, and even directly affects its growth. To users, viewing offensive language brings negative influence to their mental health, especially for children and youth. When offensive language is detected in a user message, a problem arises about how the offensive language should be removed, i.e. the offensive language filtering problem. To solve this problem, manual filtering approach is known to produce the best filtering result. However, manual filtering is costly in time and labor thus can not be widely applied. In this paper, we analyze the offensive language in text messages posted in online communities, and propose a new automatic sentence-level filtering approach that is able to semantically remove the offensive language by utilizing the grammatical relations among words. Comparing with existing automatic filtering approaches, the proposed filtering approach provides filtering results much closer to manual filtering. To demonstrate our work, we created a dataset by manually filtering over 11,000 text comments from the YouTube website. Experiments on this dataset show over 90% agreement in filtered results between the proposed approach and manual filtering approach. Moreover, we show the overhead of applying proposed approach to user comments filtering is reasonable, making it practical to be adopted in real life applications.

Original languageEnglish (US)
StatePublished - Jan 1 2010
Event7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010 - Redmond, WA, United States
Duration: Jul 13 2010Jul 14 2010

Other

Other7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010
CountryUnited States
CityRedmond, WA
Period7/13/107/14/10

Fingerprint

Health
Websites
Personnel
Experiments

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Xu, Z., & Zhu, S. (2010). Filtering offensive language in online communities using grammatical relations. Paper presented at 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010, Redmond, WA, United States.
Xu, Zhi ; Zhu, Sencun. / Filtering offensive language in online communities using grammatical relations. Paper presented at 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010, Redmond, WA, United States.
@conference{9daf02ab0e5645a297a20fc6718204cf,
title = "Filtering offensive language in online communities using grammatical relations",
abstract = "Offensive language has arisen to be a big issue to the health of both online communities and their users. To the online community, the spread of offensive language undermines its reputation, drives users away, and even directly affects its growth. To users, viewing offensive language brings negative influence to their mental health, especially for children and youth. When offensive language is detected in a user message, a problem arises about how the offensive language should be removed, i.e. the offensive language filtering problem. To solve this problem, manual filtering approach is known to produce the best filtering result. However, manual filtering is costly in time and labor thus can not be widely applied. In this paper, we analyze the offensive language in text messages posted in online communities, and propose a new automatic sentence-level filtering approach that is able to semantically remove the offensive language by utilizing the grammatical relations among words. Comparing with existing automatic filtering approaches, the proposed filtering approach provides filtering results much closer to manual filtering. To demonstrate our work, we created a dataset by manually filtering over 11,000 text comments from the YouTube website. Experiments on this dataset show over 90{\%} agreement in filtered results between the proposed approach and manual filtering approach. Moreover, we show the overhead of applying proposed approach to user comments filtering is reasonable, making it practical to be adopted in real life applications.",
author = "Zhi Xu and Sencun Zhu",
year = "2010",
month = "1",
day = "1",
language = "English (US)",
note = "7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010 ; Conference date: 13-07-2010 Through 14-07-2010",

}

Xu, Z & Zhu, S 2010, 'Filtering offensive language in online communities using grammatical relations' Paper presented at 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010, Redmond, WA, United States, 7/13/10 - 7/14/10, .

Filtering offensive language in online communities using grammatical relations. / Xu, Zhi; Zhu, Sencun.

2010. Paper presented at 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010, Redmond, WA, United States.

Research output: Contribution to conferencePaper

TY - CONF

T1 - Filtering offensive language in online communities using grammatical relations

AU - Xu, Zhi

AU - Zhu, Sencun

PY - 2010/1/1

Y1 - 2010/1/1

N2 - Offensive language has arisen to be a big issue to the health of both online communities and their users. To the online community, the spread of offensive language undermines its reputation, drives users away, and even directly affects its growth. To users, viewing offensive language brings negative influence to their mental health, especially for children and youth. When offensive language is detected in a user message, a problem arises about how the offensive language should be removed, i.e. the offensive language filtering problem. To solve this problem, manual filtering approach is known to produce the best filtering result. However, manual filtering is costly in time and labor thus can not be widely applied. In this paper, we analyze the offensive language in text messages posted in online communities, and propose a new automatic sentence-level filtering approach that is able to semantically remove the offensive language by utilizing the grammatical relations among words. Comparing with existing automatic filtering approaches, the proposed filtering approach provides filtering results much closer to manual filtering. To demonstrate our work, we created a dataset by manually filtering over 11,000 text comments from the YouTube website. Experiments on this dataset show over 90% agreement in filtered results between the proposed approach and manual filtering approach. Moreover, we show the overhead of applying proposed approach to user comments filtering is reasonable, making it practical to be adopted in real life applications.

AB - Offensive language has arisen to be a big issue to the health of both online communities and their users. To the online community, the spread of offensive language undermines its reputation, drives users away, and even directly affects its growth. To users, viewing offensive language brings negative influence to their mental health, especially for children and youth. When offensive language is detected in a user message, a problem arises about how the offensive language should be removed, i.e. the offensive language filtering problem. To solve this problem, manual filtering approach is known to produce the best filtering result. However, manual filtering is costly in time and labor thus can not be widely applied. In this paper, we analyze the offensive language in text messages posted in online communities, and propose a new automatic sentence-level filtering approach that is able to semantically remove the offensive language by utilizing the grammatical relations among words. Comparing with existing automatic filtering approaches, the proposed filtering approach provides filtering results much closer to manual filtering. To demonstrate our work, we created a dataset by manually filtering over 11,000 text comments from the YouTube website. Experiments on this dataset show over 90% agreement in filtered results between the proposed approach and manual filtering approach. Moreover, we show the overhead of applying proposed approach to user comments filtering is reasonable, making it practical to be adopted in real life applications.

UR - http://www.scopus.com/inward/record.url?scp=84904790471&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904790471&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:84904790471

ER -

Xu Z, Zhu S. Filtering offensive language in online communities using grammatical relations. 2010. Paper presented at 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010, Redmond, WA, United States.