Feature-guided clustering of multi-dimensional flow cytometry datasets

Qing T. Zeng, Juan Pablo Pratt, Jane Pak, Dino Ravnic, Harold Huss, Steven J. Mentzer

Research output: Contribution to journalArticle

35 Citations (Scopus)

Abstract

Background: Flow cytometry produces large multi-dimensional datasets of the physical and molecular characteristics of individual cells. The objective of this study was to simplify the cytometry datasets by arranging or clustering "objects" (cells) into a smaller number of relatively homogeneous groups (clusters) on the basis of interobject similarities and dissimilarities. Results: The algorithm was designed to be driven by histogram features; that is, the relevant single parameter histogram features were used to guide multidimensional k-means clustering without an a priori estimate of cluster number. To test this approach, we simulated cell-derived datasets using protein-coated microspheres (artificial "cells"). The microspheres were constructed to provide 119 populations in 40 samples. The feature-guided (FG) approach accurately identified 100% of the predetermined cluster combinations. In contrast, an approach based on the partition index (PI) cluster validity measure accurately identified 83.2% of the clusters. Direct comparisons of the two methods indicated that the FG method was significantly more accurate than PI in identifying both the number of clusters and the number of objects within the clusters (p < .0001). Conclusion: We conclude that parameter feature analysis can be used to effectively guide k-means clustering of flow cytometry datasets.

Original languageEnglish (US)
Pages (from-to)325-331
Number of pages7
JournalJournal of Biomedical Informatics
Volume40
Issue number3
DOIs
StatePublished - Jun 1 2007

Fingerprint

Flow cytometry
Microspheres
Cluster Analysis
Flow Cytometry
Artificial Cells
Proteins
Datasets
Population

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Health Informatics

Cite this

Zeng, Qing T. ; Pratt, Juan Pablo ; Pak, Jane ; Ravnic, Dino ; Huss, Harold ; Mentzer, Steven J. / Feature-guided clustering of multi-dimensional flow cytometry datasets. In: Journal of Biomedical Informatics. 2007 ; Vol. 40, No. 3. pp. 325-331.
@article{ae77035648444a1389b3e3328827c34a,
title = "Feature-guided clustering of multi-dimensional flow cytometry datasets",
abstract = "Background: Flow cytometry produces large multi-dimensional datasets of the physical and molecular characteristics of individual cells. The objective of this study was to simplify the cytometry datasets by arranging or clustering {"}objects{"} (cells) into a smaller number of relatively homogeneous groups (clusters) on the basis of interobject similarities and dissimilarities. Results: The algorithm was designed to be driven by histogram features; that is, the relevant single parameter histogram features were used to guide multidimensional k-means clustering without an a priori estimate of cluster number. To test this approach, we simulated cell-derived datasets using protein-coated microspheres (artificial {"}cells{"}). The microspheres were constructed to provide 119 populations in 40 samples. The feature-guided (FG) approach accurately identified 100{\%} of the predetermined cluster combinations. In contrast, an approach based on the partition index (PI) cluster validity measure accurately identified 83.2{\%} of the clusters. Direct comparisons of the two methods indicated that the FG method was significantly more accurate than PI in identifying both the number of clusters and the number of objects within the clusters (p < .0001). Conclusion: We conclude that parameter feature analysis can be used to effectively guide k-means clustering of flow cytometry datasets.",
author = "Zeng, {Qing T.} and Pratt, {Juan Pablo} and Jane Pak and Dino Ravnic and Harold Huss and Mentzer, {Steven J.}",
year = "2007",
month = "6",
day = "1",
doi = "10.1016/j.jbi.2006.06.005",
language = "English (US)",
volume = "40",
pages = "325--331",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",
number = "3",

}

Feature-guided clustering of multi-dimensional flow cytometry datasets. / Zeng, Qing T.; Pratt, Juan Pablo; Pak, Jane; Ravnic, Dino; Huss, Harold; Mentzer, Steven J.

In: Journal of Biomedical Informatics, Vol. 40, No. 3, 01.06.2007, p. 325-331.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Feature-guided clustering of multi-dimensional flow cytometry datasets

AU - Zeng, Qing T.

AU - Pratt, Juan Pablo

AU - Pak, Jane

AU - Ravnic, Dino

AU - Huss, Harold

AU - Mentzer, Steven J.

PY - 2007/6/1

Y1 - 2007/6/1

N2 - Background: Flow cytometry produces large multi-dimensional datasets of the physical and molecular characteristics of individual cells. The objective of this study was to simplify the cytometry datasets by arranging or clustering "objects" (cells) into a smaller number of relatively homogeneous groups (clusters) on the basis of interobject similarities and dissimilarities. Results: The algorithm was designed to be driven by histogram features; that is, the relevant single parameter histogram features were used to guide multidimensional k-means clustering without an a priori estimate of cluster number. To test this approach, we simulated cell-derived datasets using protein-coated microspheres (artificial "cells"). The microspheres were constructed to provide 119 populations in 40 samples. The feature-guided (FG) approach accurately identified 100% of the predetermined cluster combinations. In contrast, an approach based on the partition index (PI) cluster validity measure accurately identified 83.2% of the clusters. Direct comparisons of the two methods indicated that the FG method was significantly more accurate than PI in identifying both the number of clusters and the number of objects within the clusters (p < .0001). Conclusion: We conclude that parameter feature analysis can be used to effectively guide k-means clustering of flow cytometry datasets.

AB - Background: Flow cytometry produces large multi-dimensional datasets of the physical and molecular characteristics of individual cells. The objective of this study was to simplify the cytometry datasets by arranging or clustering "objects" (cells) into a smaller number of relatively homogeneous groups (clusters) on the basis of interobject similarities and dissimilarities. Results: The algorithm was designed to be driven by histogram features; that is, the relevant single parameter histogram features were used to guide multidimensional k-means clustering without an a priori estimate of cluster number. To test this approach, we simulated cell-derived datasets using protein-coated microspheres (artificial "cells"). The microspheres were constructed to provide 119 populations in 40 samples. The feature-guided (FG) approach accurately identified 100% of the predetermined cluster combinations. In contrast, an approach based on the partition index (PI) cluster validity measure accurately identified 83.2% of the clusters. Direct comparisons of the two methods indicated that the FG method was significantly more accurate than PI in identifying both the number of clusters and the number of objects within the clusters (p < .0001). Conclusion: We conclude that parameter feature analysis can be used to effectively guide k-means clustering of flow cytometry datasets.

UR - http://www.scopus.com/inward/record.url?scp=34248346643&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34248346643&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2006.06.005

DO - 10.1016/j.jbi.2006.06.005

M3 - Article

VL - 40

SP - 325

EP - 331

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

IS - 3

ER -