Offensive language has arisen to be a big issue to the health of both online communities and their users. To the online community, the spread of offensive language undermines its reputation, drives users away, and even directly affects its growth. To users, viewing offensive language brings negative influence to their mental health, especially for children and youth. When offensive language is detected in a user message, a problem arises about how the offensive language should be removed, i.e. the offensive language filtering problem. To solve this problem, manual filtering approach is known to produce the best filtering result. However, manual filtering is costly in time and labor thus can not be widely applied. In this paper, we analyze the offensive language in text messages posted in online communities, and propose a new automatic sentence-level filtering approach that is able to semantically remove the offensive language by utilizing the grammatical relations among words. Comparing with existing automatic filtering approaches, the proposed filtering approach provides filtering results much closer to manual filtering. To demonstrate our work, we created a dataset by manually filtering over 11,000 text comments from the YouTube website. Experiments on this dataset show over 90% agreement in filtered results between the proposed approach and manual filtering approach. Moreover, we show the overhead of applying proposed approach to user comments filtering is reasonable, making it practical to be adopted in real life applications.
|Original language||English (US)|
|State||Published - Jan 1 2010|
|Event||7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010 - Redmond, WA, United States|
Duration: Jul 13 2010 → Jul 14 2010
|Other||7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010|
|Period||7/13/10 → 7/14/10|
All Science Journal Classification (ASJC) codes