Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers

Dae Ki Kang, Adrian Silvescu, Jun Zhang, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Attribute Value Taxonomies (AVT) have been shown to be useful in constructing compact, robust, and comprehensible classifiers. However, in many application domains, human-designed AVTs are unavailable. We introduce AVT-Learner, an algorithm for automated construction of attribute value taxonomies from data. AVT-Learner uses Hierarchical Agglomerative Clustering (HAC) to cluster attribute values based on the distribution of classes that cooccur with the values. We describe experiments on UCI data sets that compare the performance of AVT-NBL (an AVT-guided Naive Bayes Learner) with that of the standard Naive Bayes Learner (NBL) applied to the original data set. Our results show that the AVTs generated by AVT-Learner are competitive with human-generated AVTs (in cases where such AVTs are available). AVT-NBL using AVTs generated by AVT-Learner achieves classification accuracies that are comparable to or higher than those obtained by NBL; and the resulting classifiers are significantly more compact than those generated by NBL.

Original languageEnglish (US)
Title of host publicationProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
EditorsR. Rastogi, K. Morik, M. Bramer, X. Wu
Pages130-137
Number of pages8
StatePublished - Dec 1 2004
EventProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004 - Brighton, United Kingdom
Duration: Nov 1 2004Nov 4 2004

Publication series

NameProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004

Other

OtherProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
CountryUnited Kingdom
CityBrighton
Period11/1/0411/4/04

Fingerprint

Taxonomies
Classifiers

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Kang, D. K., Silvescu, A., Zhang, J., & Honavar, V. (2004). Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers. In R. Rastogi, K. Morik, M. Bramer, & X. Wu (Eds.), Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004 (pp. 130-137). (Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004).
Kang, Dae Ki ; Silvescu, Adrian ; Zhang, Jun ; Honavar, Vasant. / Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers. Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004. editor / R. Rastogi ; K. Morik ; M. Bramer ; X. Wu. 2004. pp. 130-137 (Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004).
@inproceedings{f75eae833d0149cbb400d6507e6d4c7e,
title = "Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers",
abstract = "Attribute Value Taxonomies (AVT) have been shown to be useful in constructing compact, robust, and comprehensible classifiers. However, in many application domains, human-designed AVTs are unavailable. We introduce AVT-Learner, an algorithm for automated construction of attribute value taxonomies from data. AVT-Learner uses Hierarchical Agglomerative Clustering (HAC) to cluster attribute values based on the distribution of classes that cooccur with the values. We describe experiments on UCI data sets that compare the performance of AVT-NBL (an AVT-guided Naive Bayes Learner) with that of the standard Naive Bayes Learner (NBL) applied to the original data set. Our results show that the AVTs generated by AVT-Learner are competitive with human-generated AVTs (in cases where such AVTs are available). AVT-NBL using AVTs generated by AVT-Learner achieves classification accuracies that are comparable to or higher than those obtained by NBL; and the resulting classifiers are significantly more compact than those generated by NBL.",
author = "Kang, {Dae Ki} and Adrian Silvescu and Jun Zhang and Vasant Honavar",
year = "2004",
month = "12",
day = "1",
language = "English (US)",
isbn = "0769521428",
series = "Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004",
pages = "130--137",
editor = "R. Rastogi and K. Morik and M. Bramer and X. Wu",
booktitle = "Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004",

}

Kang, DK, Silvescu, A, Zhang, J & Honavar, V 2004, Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers. in R Rastogi, K Morik, M Bramer & X Wu (eds), Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004. Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 130-137, Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004, Brighton, United Kingdom, 11/1/04.

Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers. / Kang, Dae Ki; Silvescu, Adrian; Zhang, Jun; Honavar, Vasant.

Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004. ed. / R. Rastogi; K. Morik; M. Bramer; X. Wu. 2004. p. 130-137 (Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers

AU - Kang, Dae Ki

AU - Silvescu, Adrian

AU - Zhang, Jun

AU - Honavar, Vasant

PY - 2004/12/1

Y1 - 2004/12/1

N2 - Attribute Value Taxonomies (AVT) have been shown to be useful in constructing compact, robust, and comprehensible classifiers. However, in many application domains, human-designed AVTs are unavailable. We introduce AVT-Learner, an algorithm for automated construction of attribute value taxonomies from data. AVT-Learner uses Hierarchical Agglomerative Clustering (HAC) to cluster attribute values based on the distribution of classes that cooccur with the values. We describe experiments on UCI data sets that compare the performance of AVT-NBL (an AVT-guided Naive Bayes Learner) with that of the standard Naive Bayes Learner (NBL) applied to the original data set. Our results show that the AVTs generated by AVT-Learner are competitive with human-generated AVTs (in cases where such AVTs are available). AVT-NBL using AVTs generated by AVT-Learner achieves classification accuracies that are comparable to or higher than those obtained by NBL; and the resulting classifiers are significantly more compact than those generated by NBL.

AB - Attribute Value Taxonomies (AVT) have been shown to be useful in constructing compact, robust, and comprehensible classifiers. However, in many application domains, human-designed AVTs are unavailable. We introduce AVT-Learner, an algorithm for automated construction of attribute value taxonomies from data. AVT-Learner uses Hierarchical Agglomerative Clustering (HAC) to cluster attribute values based on the distribution of classes that cooccur with the values. We describe experiments on UCI data sets that compare the performance of AVT-NBL (an AVT-guided Naive Bayes Learner) with that of the standard Naive Bayes Learner (NBL) applied to the original data set. Our results show that the AVTs generated by AVT-Learner are competitive with human-generated AVTs (in cases where such AVTs are available). AVT-NBL using AVTs generated by AVT-Learner achieves classification accuracies that are comparable to or higher than those obtained by NBL; and the resulting classifiers are significantly more compact than those generated by NBL.

UR - http://www.scopus.com/inward/record.url?scp=19544364462&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=19544364462&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:19544364462

SN - 0769521428

T3 - Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004

SP - 130

EP - 137

BT - Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004

A2 - Rastogi, R.

A2 - Morik, K.

A2 - Bramer, M.

A2 - Wu, X.

ER -

Kang DK, Silvescu A, Zhang J, Honavar V. Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers. In Rastogi R, Morik K, Bramer M, Wu X, editors, Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004. 2004. p. 130-137. (Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004).