Robust clustering

Amit Banerjee, Rajesh N. Davé

Research output: Contribution to journalReview article

14 Citations (Scopus)

Abstract

Historical and recent developments in the field of robust clustering and their applications are reviewed. The discussion focuses on different strategies that have been developed to reduce the sensitivity of clustering methods to outliers in data, while pointing out the importance of the need for both efficient partitioning and simultaneous robust model fitting. Although all clustering methods and algorithms have good partitioning capabilities when data are clean and free of outliers, they break down in the presence of outliers in the data. This is because classical development in the field of clustering has focused on such assumptions that data is free of noise and the data are well distributed, Robust model fitting, while retaining the partitioning power, involves the development of methods and algorithms that reject these classical assumptions either by explicitly incorporating robust statistical methods (often regression based) or by recasting the clustering problem in a way that does so implicitly. In this review, the robust model fitting aspect is identified in pertinent methodological and algorithmic advances and tied to related developments in robust statistics wherever possible. The paper also includes representative samples of various applications of robust clustering methods to both synthetic and real-world datasets.

Original languageEnglish (US)
Pages (from-to)29-59
Number of pages31
JournalWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Volume2
Issue number1
DOIs
StatePublished - Dec 1 2012

Fingerprint

Statistical methods
Statistics

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

@article{2f2cf03c90574d4a9c585155837968f6,
title = "Robust clustering",
abstract = "Historical and recent developments in the field of robust clustering and their applications are reviewed. The discussion focuses on different strategies that have been developed to reduce the sensitivity of clustering methods to outliers in data, while pointing out the importance of the need for both efficient partitioning and simultaneous robust model fitting. Although all clustering methods and algorithms have good partitioning capabilities when data are clean and free of outliers, they break down in the presence of outliers in the data. This is because classical development in the field of clustering has focused on such assumptions that data is free of noise and the data are well distributed, Robust model fitting, while retaining the partitioning power, involves the development of methods and algorithms that reject these classical assumptions either by explicitly incorporating robust statistical methods (often regression based) or by recasting the clustering problem in a way that does so implicitly. In this review, the robust model fitting aspect is identified in pertinent methodological and algorithmic advances and tied to related developments in robust statistics wherever possible. The paper also includes representative samples of various applications of robust clustering methods to both synthetic and real-world datasets.",
author = "Amit Banerjee and Dav{\'e}, {Rajesh N.}",
year = "2012",
month = "12",
day = "1",
doi = "10.1002/widm.49",
language = "English (US)",
volume = "2",
pages = "29--59",
journal = "Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery",
issn = "1942-4787",
publisher = "John Wiley and Sons Inc.",
number = "1",

}

Robust clustering. / Banerjee, Amit; Davé, Rajesh N.

In: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 2, No. 1, 01.12.2012, p. 29-59.

Research output: Contribution to journalReview article

TY - JOUR

T1 - Robust clustering

AU - Banerjee, Amit

AU - Davé, Rajesh N.

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Historical and recent developments in the field of robust clustering and their applications are reviewed. The discussion focuses on different strategies that have been developed to reduce the sensitivity of clustering methods to outliers in data, while pointing out the importance of the need for both efficient partitioning and simultaneous robust model fitting. Although all clustering methods and algorithms have good partitioning capabilities when data are clean and free of outliers, they break down in the presence of outliers in the data. This is because classical development in the field of clustering has focused on such assumptions that data is free of noise and the data are well distributed, Robust model fitting, while retaining the partitioning power, involves the development of methods and algorithms that reject these classical assumptions either by explicitly incorporating robust statistical methods (often regression based) or by recasting the clustering problem in a way that does so implicitly. In this review, the robust model fitting aspect is identified in pertinent methodological and algorithmic advances and tied to related developments in robust statistics wherever possible. The paper also includes representative samples of various applications of robust clustering methods to both synthetic and real-world datasets.

AB - Historical and recent developments in the field of robust clustering and their applications are reviewed. The discussion focuses on different strategies that have been developed to reduce the sensitivity of clustering methods to outliers in data, while pointing out the importance of the need for both efficient partitioning and simultaneous robust model fitting. Although all clustering methods and algorithms have good partitioning capabilities when data are clean and free of outliers, they break down in the presence of outliers in the data. This is because classical development in the field of clustering has focused on such assumptions that data is free of noise and the data are well distributed, Robust model fitting, while retaining the partitioning power, involves the development of methods and algorithms that reject these classical assumptions either by explicitly incorporating robust statistical methods (often regression based) or by recasting the clustering problem in a way that does so implicitly. In this review, the robust model fitting aspect is identified in pertinent methodological and algorithmic advances and tied to related developments in robust statistics wherever possible. The paper also includes representative samples of various applications of robust clustering methods to both synthetic and real-world datasets.

UR - http://www.scopus.com/inward/record.url?scp=84873252175&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873252175&partnerID=8YFLogxK

U2 - 10.1002/widm.49

DO - 10.1002/widm.49

M3 - Review article

AN - SCOPUS:84873252175

VL - 2

SP - 29

EP - 59

JO - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

JF - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

SN - 1942-4787

IS - 1

ER -