Filtering offensive language in online communities using grammatical relations

Zhi Xu, Sencun Zhu

Research output: Contribution to conferencePaper

24 Scopus citations

Abstract

Offensive language has arisen to be a big issue to the health of both online communities and their users. To the online community, the spread of offensive language undermines its reputation, drives users away, and even directly affects its growth. To users, viewing offensive language brings negative influence to their mental health, especially for children and youth. When offensive language is detected in a user message, a problem arises about how the offensive language should be removed, i.e. the offensive language filtering problem. To solve this problem, manual filtering approach is known to produce the best filtering result. However, manual filtering is costly in time and labor thus can not be widely applied. In this paper, we analyze the offensive language in text messages posted in online communities, and propose a new automatic sentence-level filtering approach that is able to semantically remove the offensive language by utilizing the grammatical relations among words. Comparing with existing automatic filtering approaches, the proposed filtering approach provides filtering results much closer to manual filtering. To demonstrate our work, we created a dataset by manually filtering over 11,000 text comments from the YouTube website. Experiments on this dataset show over 90% agreement in filtered results between the proposed approach and manual filtering approach. Moreover, we show the overhead of applying proposed approach to user comments filtering is reasonable, making it practical to be adopted in real life applications.

Original languageEnglish (US)
StatePublished - Jan 1 2010
Event7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010 - Redmond, WA, United States
Duration: Jul 13 2010Jul 14 2010

Other

Other7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010
CountryUnited States
CityRedmond, WA
Period7/13/107/14/10

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint Dive into the research topics of 'Filtering offensive language in online communities using grammatical relations'. Together they form a unique fingerprint.

Cite this