Pairwise constrained clustering for sparse and high dimensional feature spaces

Su Yan, Hai Wang, Dongwon Lee, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To address this challenge, we propose two novel semi-supervised clustering methods that incorporate prior knowledge in the form of pairwise cluster membership constraints. In particular,we project high-dimensional data onto a much reduced-dimension subspace, where rough clustering structure defined by the prior knowledge is strengthened. Metric learning is then performed on the subspace to construct more informative pairwise distances. We also propose to propagate constraints locally to improve the informativeness of pairwise distances. When the new methods are evaluated using two real benchmark data sets, they show substantial improvement using only limited prior knowledge.

Original languageEnglish (US)
Title of host publication13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
Pages620-627
Number of pages8
DOIs
StatePublished - 2009
Event13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009 - Bangkok, Thailand
Duration: Apr 27 2009Apr 30 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5476 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
Country/TerritoryThailand
CityBangkok
Period4/27/094/30/09

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Pairwise constrained clustering for sparse and high dimensional feature spaces'. Together they form a unique fingerprint.

Cite this