Visual methods for examining SVM classifiers

Doina Caragea, Dianne Cook, Hadley Wickham, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Support vector machines (SVM) offer a theoretically wellfounded approach to automated learning of pattern classifiers. They have been proven to give highly accurate results in complex classification problems, for example, gene expression analysis. The SVM algorithm is also quite intuitive with a few inputs to vary in the fitting process and several outputs that are interesting to study. For many data mining tasks (e.g., cancer prediction) finding classifiers with good predictive accuracy is important, but understanding the classifier is equally important. By studying the classifier outputs we may be able to produce a simpler classifier, learn which variables are the important discriminators between classes, and find the samples that are problematic to the classification. Visual methods for exploratory data analysis can help us to study the outputs and complement automated classification algorithms in data mining. We present the use of tour-based methods to plot aspects of the SVM classifier. This approach provides insights about the cluster structure in the data, the nature of boundaries between clusters, and problematic outliers. Furthermore, tours can be used to assess the variable importance. We show how visual methods can be used as a complement to crossvalidation methods in order to find good SVM input parameters for a particular data set.

Original languageEnglish (US)
Title of host publicationVisual Data Mining - Theory, Techniques and Tools for Visual Analytics
EditorsSimeon J. Simoff, Michael H. Bohlen, Arturas Mazeika
Pages136-153
Number of pages18
DOIs
StatePublished - Aug 29 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4404 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Support vector machines
Support Vector Machine
Classifiers
Classifier
Data mining
Output
Data Mining
Complement
Gene Expression Analysis
Exploratory Data Analysis
Discriminators
Classification Algorithm
Gene expression
Cross-validation
Classification Problems
Outlier
Vision
Intuitive
Cancer
Vary

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Caragea, D., Cook, D., Wickham, H., & Honavar, V. (2008). Visual methods for examining SVM classifiers. In S. J. Simoff, M. H. Bohlen, & A. Mazeika (Eds.), Visual Data Mining - Theory, Techniques and Tools for Visual Analytics (pp. 136-153). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4404 LNCS). https://doi.org/10.1007/978-3-540-71080-6_10
Caragea, Doina ; Cook, Dianne ; Wickham, Hadley ; Honavar, Vasant. / Visual methods for examining SVM classifiers. Visual Data Mining - Theory, Techniques and Tools for Visual Analytics. editor / Simeon J. Simoff ; Michael H. Bohlen ; Arturas Mazeika. 2008. pp. 136-153 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{d7689b4e6453427c881c5770925f6505,
title = "Visual methods for examining SVM classifiers",
abstract = "Support vector machines (SVM) offer a theoretically wellfounded approach to automated learning of pattern classifiers. They have been proven to give highly accurate results in complex classification problems, for example, gene expression analysis. The SVM algorithm is also quite intuitive with a few inputs to vary in the fitting process and several outputs that are interesting to study. For many data mining tasks (e.g., cancer prediction) finding classifiers with good predictive accuracy is important, but understanding the classifier is equally important. By studying the classifier outputs we may be able to produce a simpler classifier, learn which variables are the important discriminators between classes, and find the samples that are problematic to the classification. Visual methods for exploratory data analysis can help us to study the outputs and complement automated classification algorithms in data mining. We present the use of tour-based methods to plot aspects of the SVM classifier. This approach provides insights about the cluster structure in the data, the nature of boundaries between clusters, and problematic outliers. Furthermore, tours can be used to assess the variable importance. We show how visual methods can be used as a complement to crossvalidation methods in order to find good SVM input parameters for a particular data set.",
author = "Doina Caragea and Dianne Cook and Hadley Wickham and Vasant Honavar",
year = "2008",
month = "8",
day = "29",
doi = "10.1007/978-3-540-71080-6_10",
language = "English (US)",
isbn = "3540710795",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "136--153",
editor = "Simoff, {Simeon J.} and Bohlen, {Michael H.} and Arturas Mazeika",
booktitle = "Visual Data Mining - Theory, Techniques and Tools for Visual Analytics",

}

Caragea, D, Cook, D, Wickham, H & Honavar, V 2008, Visual methods for examining SVM classifiers. in SJ Simoff, MH Bohlen & A Mazeika (eds), Visual Data Mining - Theory, Techniques and Tools for Visual Analytics. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4404 LNCS, pp. 136-153. https://doi.org/10.1007/978-3-540-71080-6_10

Visual methods for examining SVM classifiers. / Caragea, Doina; Cook, Dianne; Wickham, Hadley; Honavar, Vasant.

Visual Data Mining - Theory, Techniques and Tools for Visual Analytics. ed. / Simeon J. Simoff; Michael H. Bohlen; Arturas Mazeika. 2008. p. 136-153 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4404 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Visual methods for examining SVM classifiers

AU - Caragea, Doina

AU - Cook, Dianne

AU - Wickham, Hadley

AU - Honavar, Vasant

PY - 2008/8/29

Y1 - 2008/8/29

N2 - Support vector machines (SVM) offer a theoretically wellfounded approach to automated learning of pattern classifiers. They have been proven to give highly accurate results in complex classification problems, for example, gene expression analysis. The SVM algorithm is also quite intuitive with a few inputs to vary in the fitting process and several outputs that are interesting to study. For many data mining tasks (e.g., cancer prediction) finding classifiers with good predictive accuracy is important, but understanding the classifier is equally important. By studying the classifier outputs we may be able to produce a simpler classifier, learn which variables are the important discriminators between classes, and find the samples that are problematic to the classification. Visual methods for exploratory data analysis can help us to study the outputs and complement automated classification algorithms in data mining. We present the use of tour-based methods to plot aspects of the SVM classifier. This approach provides insights about the cluster structure in the data, the nature of boundaries between clusters, and problematic outliers. Furthermore, tours can be used to assess the variable importance. We show how visual methods can be used as a complement to crossvalidation methods in order to find good SVM input parameters for a particular data set.

AB - Support vector machines (SVM) offer a theoretically wellfounded approach to automated learning of pattern classifiers. They have been proven to give highly accurate results in complex classification problems, for example, gene expression analysis. The SVM algorithm is also quite intuitive with a few inputs to vary in the fitting process and several outputs that are interesting to study. For many data mining tasks (e.g., cancer prediction) finding classifiers with good predictive accuracy is important, but understanding the classifier is equally important. By studying the classifier outputs we may be able to produce a simpler classifier, learn which variables are the important discriminators between classes, and find the samples that are problematic to the classification. Visual methods for exploratory data analysis can help us to study the outputs and complement automated classification algorithms in data mining. We present the use of tour-based methods to plot aspects of the SVM classifier. This approach provides insights about the cluster structure in the data, the nature of boundaries between clusters, and problematic outliers. Furthermore, tours can be used to assess the variable importance. We show how visual methods can be used as a complement to crossvalidation methods in order to find good SVM input parameters for a particular data set.

UR - http://www.scopus.com/inward/record.url?scp=50149121547&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=50149121547&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-71080-6_10

DO - 10.1007/978-3-540-71080-6_10

M3 - Conference contribution

AN - SCOPUS:50149121547

SN - 3540710795

SN - 9783540710790

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 136

EP - 153

BT - Visual Data Mining - Theory, Techniques and Tools for Visual Analytics

A2 - Simoff, Simeon J.

A2 - Bohlen, Michael H.

A2 - Mazeika, Arturas

ER -

Caragea D, Cook D, Wickham H, Honavar V. Visual methods for examining SVM classifiers. In Simoff SJ, Bohlen MH, Mazeika A, editors, Visual Data Mining - Theory, Techniques and Tools for Visual Analytics. 2008. p. 136-153. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-71080-6_10