CPS analysis: Self-contained validation of biomedical data clustering

Lixiang Zhang, Lin Lin, Jia Li

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


Motivation: Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. Results: We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. Contact: lzz46@psu.edu or jiali@psu.edu

Original languageEnglish (US)
Pages (from-to)3516-3521
Number of pages6
Issue number11
StatePublished - Jun 1 2020

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics


Dive into the research topics of 'CPS analysis: Self-contained validation of biomedical data clustering'. Together they form a unique fingerprint.

Cite this