Variable selection for clustering by separability based on ridgelines

Hyangmin Lee, Jia Li

Research output: Contribution to journalArticle

11 Scopus citations

Abstract

A new variable selection algorithm is developed for clustering based on mode association. In conventional mixture-model-based clustering, each mixture component is treated as one cluster and the separation between clusters is usually measured by the ratio of between- and within-component dispersion. In this article, we allow one cluster to contain several components depending on whether theymerge into one mode. The extent of separation between clusters is quantified using critical points on the ridgeline between two modes, which reflects the exact geometry of the density function. The computational foundation consists of the recently developed Modal expectation-maximization (MEM) algorithm which solves the modes of a Gaussian mixture density, and the Ridgeline expectation-maximization (REM) algorithm which solves the ridgeline passing through the critical points of the mixed density of two unimode clusters. Forward selection is used to find a subset of variables that maximizes an aggregated index of pairwise cluster separability. Theoretical analysis of the procedure is provided. We experiment with both simulated and real datasets and compare with several state-of-the-art variable selection algorithms. Supplemental materials including an R-package, datasets, and appendices for proofs are available online.

Original languageEnglish (US)
Pages (from-to)315-336
Number of pages22
JournalJournal of Computational and Graphical Statistics
Volume21
Issue number2
DOIs
StatePublished - Jun 2012

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Discrete Mathematics and Combinatorics
  • Statistics, Probability and Uncertainty

Fingerprint Dive into the research topics of 'Variable selection for clustering by separability based on ridgelines'. Together they form a unique fingerprint.

  • Cite this