### Abstract

An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(X_{i}, Y_{i})}_{i=1}^{n}, which are independent samples from a joint distribution P_{XY}. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution P_{XY} ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution P_{XY}. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CART^{R} on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

Original language | English (US) |
---|---|

Title of host publication | Proceedings - DCC 2002 |

Subtitle of host publication | Data Compression Conference |

Editors | James A. Storer, Martin Cohn |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 382-391 |

Number of pages | 10 |

ISBN (Electronic) | 0769514774 |

DOIs | |

State | Published - Jan 1 2002 |

Event | Data Compression Conference, DCC 2002 - Snowbird, United States Duration: Apr 2 2002 → Apr 4 2002 |

### Publication series

Name | Data Compression Conference Proceedings |
---|---|

Volume | 2002-January |

ISSN (Print) | 1068-0314 |

### Other

Other | Data Compression Conference, DCC 2002 |
---|---|

Country | United States |

City | Snowbird |

Period | 4/2/02 → 4/4/02 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Computer Networks and Communications

### Cite this

*Proceedings - DCC 2002: Data Compression Conference*(pp. 382-391). [999978] (Data Compression Conference Proceedings; Vol. 2002-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DCC.2002.999978

}

*Proceedings - DCC 2002: Data Compression Conference.*, 999978, Data Compression Conference Proceedings, vol. 2002-January, Institute of Electrical and Electronics Engineers Inc., pp. 382-391, Data Compression Conference, DCC 2002, Snowbird, United States, 4/2/02. https://doi.org/10.1109/DCC.2002.999978

**A source coding approach to classification by vector quantization and the principle of minimum description length.** / Li, Jia.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - A source coding approach to classification by vector quantization and the principle of minimum description length

AU - Li, Jia

PY - 2002/1/1

Y1 - 2002/1/1

N2 - An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(Xi, Yi)}i=1n, which are independent samples from a joint distribution PXY. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution PXY ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution PXY. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CARTR on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

AB - An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(Xi, Yi)}i=1n, which are independent samples from a joint distribution PXY. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution PXY ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution PXY. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CARTR on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

UR - http://www.scopus.com/inward/record.url?scp=84863345028&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863345028&partnerID=8YFLogxK

U2 - 10.1109/DCC.2002.999978

DO - 10.1109/DCC.2002.999978

M3 - Conference contribution

AN - SCOPUS:84863345028

T3 - Data Compression Conference Proceedings

SP - 382

EP - 391

BT - Proceedings - DCC 2002

A2 - Storer, James A.

A2 - Cohn, Martin

PB - Institute of Electrical and Electronics Engineers Inc.

ER -