A new mallows distance based metric for comparing clusterings

Ding Zhou, Jia Li, Hongyuan Zha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

41 Scopus citations

Abstract

Despite of the large number of algorithms developed for clustering, the study on comparing clustering results is limited. In this paper, we propose a measure for comparing clustering results to tackle two issues insufficiently addressed or even overlooked by existing methods: (a) taking into account the distance between cluster representatives when assessing the similarity of clustering results; (b) constructing a unified framework for defining a distance based on either hard or soft clustering and ensuring the triangle inequality under the definition. Our measure is derived from a complete and globally optimal matching between clusters in two clustering results. It is shown that the distance is an instance of the Mallows distance-a metric between probability distributions in statistics. As a result, the defined distance inherits desirable properties from the Mallows distance. Experiments show that our clustering distance measure successfully handles cases difficult for other measures.

Original languageEnglish (US)
Title of host publicationICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
EditorsL. Raedt, S. Wrobel
Pages1033-1040
Number of pages8
StatePublished - Dec 1 2005
EventICML 2005: 22nd International Conference on Machine Learning - Bonn, Germany
Duration: Aug 7 2005Aug 11 2005

Publication series

NameICML 2005 - Proceedings of the 22nd International Conference on Machine Learning

Other

OtherICML 2005: 22nd International Conference on Machine Learning
CountryGermany
CityBonn
Period8/7/058/11/05

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Zhou, D., Li, J., & Zha, H. (2005). A new mallows distance based metric for comparing clusterings. In L. Raedt, & S. Wrobel (Eds.), ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning (pp. 1033-1040). (ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning).