CPS analysis: Self-contained validation of biomedical data clustering

Lixiang Zhang, Lin Lin, Jia Li

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Motivation: Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. Results: We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. Contact: lzz46@psu.edu or jiali@psu.edu

Original languageEnglish (US)
Pages (from-to)3516-3521
Number of pages6
JournalBioinformatics
Volume36
Issue number11
DOIs
StatePublished - Jun 1 2020

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'CPS analysis: Self-contained validation of biomedical data clustering'. Together they form a unique fingerprint.

Cite this