Classifying objectionable websites based on image content

James Wang, Jia Li, Gio Wiederhold, Oscar Firschein

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper describes IBCOW (Image-based Classification of Objectionable Websites), a system capable of classifying a website as objectionable or benign based on image content. The system uses WIPETM (Wavelet Image Pornography Elimination) and statistics to provide robust classification of on-line objectionable World Wide Web sites. Semantically-meaningful feature vector matching is carried out so that comparisons between a given on-line image and images marked as “objectionable” and “benign” in a training set can be performed efficiently and effectively in the WIPE module. If more than a certain number of images sampled from a site is found to be objectionable, then the site is considered to be objectionable. The statistical analysis for determining the size of the image sample and the threshold number of objectionable images is given in this paper. The system is practical for real-world applications, classifying a Web site at a speed of less than 2 minutes each, including the time to compute the feature vector for the images downloaded from the site, on a Pentium Pro PC. Besides its exceptional speed, it has demonstrated 97% sensitivity and 97% specificity in classifying a Web site based solely on images. Both the sensitivity and the specificity in real-world applications is expected to be higher because our performance evaluation is relatively conservative and surrounding text can be used to assist the classification process.

Original languageEnglish (US)
Title of host publicationInteractive Distributed Multimedia Systems and Telecommunication Services - 5th International Workshop, IDMS 1998, Proceedings
PublisherSpringer Verlag
Pages114-124
Number of pages11
ISBN (Print)3540649557, 9783540649557
StatePublished - Jan 1 1998
Event5th International Workshop on Interactive Distributed Multimedia Systems and Telecommunication Services, IDMS 1998 - Oslo, Norway
Duration: Sep 8 1998Sep 11 1998

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1483
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th International Workshop on Interactive Distributed Multimedia Systems and Telecommunication Services, IDMS 1998
CountryNorway
CityOslo
Period9/8/989/11/98

Fingerprint

Websites
Real-world Applications
Feature Vector
World Wide Web
Specificity
Statistical methods
Statistics
Statistical Analysis
Performance Evaluation
Elimination
Wavelets
Module

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Wang, J., Li, J., Wiederhold, G., & Firschein, O. (1998). Classifying objectionable websites based on image content. In Interactive Distributed Multimedia Systems and Telecommunication Services - 5th International Workshop, IDMS 1998, Proceedings (pp. 114-124). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1483). Springer Verlag.
Wang, James ; Li, Jia ; Wiederhold, Gio ; Firschein, Oscar. / Classifying objectionable websites based on image content. Interactive Distributed Multimedia Systems and Telecommunication Services - 5th International Workshop, IDMS 1998, Proceedings. Springer Verlag, 1998. pp. 114-124 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{6ef74451cb8f43dfb68232095398e10a,
title = "Classifying objectionable websites based on image content",
abstract = "This paper describes IBCOW (Image-based Classification of Objectionable Websites), a system capable of classifying a website as objectionable or benign based on image content. The system uses WIPETM (Wavelet Image Pornography Elimination) and statistics to provide robust classification of on-line objectionable World Wide Web sites. Semantically-meaningful feature vector matching is carried out so that comparisons between a given on-line image and images marked as “objectionable” and “benign” in a training set can be performed efficiently and effectively in the WIPE module. If more than a certain number of images sampled from a site is found to be objectionable, then the site is considered to be objectionable. The statistical analysis for determining the size of the image sample and the threshold number of objectionable images is given in this paper. The system is practical for real-world applications, classifying a Web site at a speed of less than 2 minutes each, including the time to compute the feature vector for the images downloaded from the site, on a Pentium Pro PC. Besides its exceptional speed, it has demonstrated 97{\%} sensitivity and 97{\%} specificity in classifying a Web site based solely on images. Both the sensitivity and the specificity in real-world applications is expected to be higher because our performance evaluation is relatively conservative and surrounding text can be used to assist the classification process.",
author = "James Wang and Jia Li and Gio Wiederhold and Oscar Firschein",
year = "1998",
month = "1",
day = "1",
language = "English (US)",
isbn = "3540649557",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "114--124",
booktitle = "Interactive Distributed Multimedia Systems and Telecommunication Services - 5th International Workshop, IDMS 1998, Proceedings",
address = "Germany",

}

Wang, J, Li, J, Wiederhold, G & Firschein, O 1998, Classifying objectionable websites based on image content. in Interactive Distributed Multimedia Systems and Telecommunication Services - 5th International Workshop, IDMS 1998, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1483, Springer Verlag, pp. 114-124, 5th International Workshop on Interactive Distributed Multimedia Systems and Telecommunication Services, IDMS 1998, Oslo, Norway, 9/8/98.

Classifying objectionable websites based on image content. / Wang, James; Li, Jia; Wiederhold, Gio; Firschein, Oscar.

Interactive Distributed Multimedia Systems and Telecommunication Services - 5th International Workshop, IDMS 1998, Proceedings. Springer Verlag, 1998. p. 114-124 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1483).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Classifying objectionable websites based on image content

AU - Wang, James

AU - Li, Jia

AU - Wiederhold, Gio

AU - Firschein, Oscar

PY - 1998/1/1

Y1 - 1998/1/1

N2 - This paper describes IBCOW (Image-based Classification of Objectionable Websites), a system capable of classifying a website as objectionable or benign based on image content. The system uses WIPETM (Wavelet Image Pornography Elimination) and statistics to provide robust classification of on-line objectionable World Wide Web sites. Semantically-meaningful feature vector matching is carried out so that comparisons between a given on-line image and images marked as “objectionable” and “benign” in a training set can be performed efficiently and effectively in the WIPE module. If more than a certain number of images sampled from a site is found to be objectionable, then the site is considered to be objectionable. The statistical analysis for determining the size of the image sample and the threshold number of objectionable images is given in this paper. The system is practical for real-world applications, classifying a Web site at a speed of less than 2 minutes each, including the time to compute the feature vector for the images downloaded from the site, on a Pentium Pro PC. Besides its exceptional speed, it has demonstrated 97% sensitivity and 97% specificity in classifying a Web site based solely on images. Both the sensitivity and the specificity in real-world applications is expected to be higher because our performance evaluation is relatively conservative and surrounding text can be used to assist the classification process.

AB - This paper describes IBCOW (Image-based Classification of Objectionable Websites), a system capable of classifying a website as objectionable or benign based on image content. The system uses WIPETM (Wavelet Image Pornography Elimination) and statistics to provide robust classification of on-line objectionable World Wide Web sites. Semantically-meaningful feature vector matching is carried out so that comparisons between a given on-line image and images marked as “objectionable” and “benign” in a training set can be performed efficiently and effectively in the WIPE module. If more than a certain number of images sampled from a site is found to be objectionable, then the site is considered to be objectionable. The statistical analysis for determining the size of the image sample and the threshold number of objectionable images is given in this paper. The system is practical for real-world applications, classifying a Web site at a speed of less than 2 minutes each, including the time to compute the feature vector for the images downloaded from the site, on a Pentium Pro PC. Besides its exceptional speed, it has demonstrated 97% sensitivity and 97% specificity in classifying a Web site based solely on images. Both the sensitivity and the specificity in real-world applications is expected to be higher because our performance evaluation is relatively conservative and surrounding text can be used to assist the classification process.

UR - http://www.scopus.com/inward/record.url?scp=84947266634&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947266634&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540649557

SN - 9783540649557

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 114

EP - 124

BT - Interactive Distributed Multimedia Systems and Telecommunication Services - 5th International Workshop, IDMS 1998, Proceedings

PB - Springer Verlag

ER -

Wang J, Li J, Wiederhold G, Firschein O. Classifying objectionable websites based on image content. In Interactive Distributed Multimedia Systems and Telecommunication Services - 5th International Workshop, IDMS 1998, Proceedings. Springer Verlag. 1998. p. 114-124. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).