A source coding approach to classification by vector quantization and the principle of minimum description length

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(Xi, Yi)}i=1n, which are independent samples from a joint distribution PXY. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution PXY ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution PXY. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CARTR on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

Original languageEnglish (US)
Title of host publicationProceedings - DCC 2002
Subtitle of host publicationData Compression Conference
EditorsJames A. Storer, Martin Cohn
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages382-391
Number of pages10
ISBN (Electronic)0769514774
DOIs
StatePublished - Jan 1 2002
EventData Compression Conference, DCC 2002 - Snowbird, United States
Duration: Apr 2 2002Apr 4 2002

Publication series

NameData Compression Conference Proceedings
Volume2002-January
ISSN (Print)1068-0314

Other

OtherData Compression Conference, DCC 2002
CountryUnited States
CitySnowbird
Period4/2/024/4/02

Fingerprint

Vector quantization
Entropy

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Cite this

Li, J. (2002). A source coding approach to classification by vector quantization and the principle of minimum description length. In J. A. Storer, & M. Cohn (Eds.), Proceedings - DCC 2002: Data Compression Conference (pp. 382-391). [999978] (Data Compression Conference Proceedings; Vol. 2002-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DCC.2002.999978
Li, Jia. / A source coding approach to classification by vector quantization and the principle of minimum description length. Proceedings - DCC 2002: Data Compression Conference. editor / James A. Storer ; Martin Cohn. Institute of Electrical and Electronics Engineers Inc., 2002. pp. 382-391 (Data Compression Conference Proceedings).
@inproceedings{78e1cc1042d64768b692a9d8afea6fda,
title = "A source coding approach to classification by vector quantization and the principle of minimum description length",
abstract = "An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(Xi, Yi)}i=1n, which are independent samples from a joint distribution PXY. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution PXY ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution PXY. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CARTR on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.",
author = "Jia Li",
year = "2002",
month = "1",
day = "1",
doi = "10.1109/DCC.2002.999978",
language = "English (US)",
series = "Data Compression Conference Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "382--391",
editor = "Storer, {James A.} and Martin Cohn",
booktitle = "Proceedings - DCC 2002",
address = "United States",

}

Li, J 2002, A source coding approach to classification by vector quantization and the principle of minimum description length. in JA Storer & M Cohn (eds), Proceedings - DCC 2002: Data Compression Conference., 999978, Data Compression Conference Proceedings, vol. 2002-January, Institute of Electrical and Electronics Engineers Inc., pp. 382-391, Data Compression Conference, DCC 2002, Snowbird, United States, 4/2/02. https://doi.org/10.1109/DCC.2002.999978

A source coding approach to classification by vector quantization and the principle of minimum description length. / Li, Jia.

Proceedings - DCC 2002: Data Compression Conference. ed. / James A. Storer; Martin Cohn. Institute of Electrical and Electronics Engineers Inc., 2002. p. 382-391 999978 (Data Compression Conference Proceedings; Vol. 2002-January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A source coding approach to classification by vector quantization and the principle of minimum description length

AU - Li, Jia

PY - 2002/1/1

Y1 - 2002/1/1

N2 - An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(Xi, Yi)}i=1n, which are independent samples from a joint distribution PXY. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution PXY ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution PXY. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CARTR on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

AB - An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(Xi, Yi)}i=1n, which are independent samples from a joint distribution PXY. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution PXY ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution PXY. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CARTR on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

UR - http://www.scopus.com/inward/record.url?scp=84863345028&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863345028&partnerID=8YFLogxK

U2 - 10.1109/DCC.2002.999978

DO - 10.1109/DCC.2002.999978

M3 - Conference contribution

AN - SCOPUS:84863345028

T3 - Data Compression Conference Proceedings

SP - 382

EP - 391

BT - Proceedings - DCC 2002

A2 - Storer, James A.

A2 - Cohn, Martin

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Li J. A source coding approach to classification by vector quantization and the principle of minimum description length. In Storer JA, Cohn M, editors, Proceedings - DCC 2002: Data Compression Conference. Institute of Electrical and Electronics Engineers Inc. 2002. p. 382-391. 999978. (Data Compression Conference Proceedings). https://doi.org/10.1109/DCC.2002.999978