Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples

Andrew Cron, Cécile Gouttefangeas, Jacob Frelinger, Lin Lin, Satwinder K. Singh, Cedrik M. Britten, Marij J.P. Welters, Sjoerd H. van der Burg, Mike West, Cliburn Chan

Research output: Contribution to journalArticle

47 Citations (Scopus)

Abstract

Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1% or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest in an automated fashion. Two specific challenges for automated analysis are to detect extremely low frequency event subsets without biasing the estimate by pre-processing enrichment, and the ability to align cell subsets across multiple data samples for comparative analysis. In this manuscript, we develop hierarchical modeling extensions to the Dirichlet Process Gaussian Mixture Model (DPGMM) approach we have previously described for cell subset identification, and show that the hierarchical DPGMM (HDPGMM) naturally generates an aligned data model that captures both commonalities and variations across multiple samples. HDPGMM also increases the sensitivity to extremely low frequency events by sharing information across multiple samples analyzed simultaneously. We validate the accuracy and reproducibility of HDPGMM estimates of antigen-specific T cells on clinically relevant reference peripheral blood mononuclear cell (PBMC) samples with known frequencies of antigen-specific T cells. These cell samples take advantage of retrovirally TCR-transduced T cells spiked into autologous PBMC samples to give a defined number of antigen-specific T cells detectable by HLA-peptide multimer binding. We provide open source software that can take advantage of both multiple processors and GPU-acceleration to perform the numerically-demanding computations. We show that hierarchical modeling is a useful probabilistic approach that can provide a consistent labeling of cell subsets and increase the sensitivity of rare event detection in the context of quantifying antigen-specific immune responses.

Original languageEnglish (US)
Article numbere1003130
JournalPLoS computational biology
Volume9
Issue number7
DOIs
StatePublished - Jul 1 2013

Fingerprint

Flow Cytometry
Hierarchical Modeling
Event Detection
Rare Events
Flow cytometry
flow cytometry
Antigens
antigen
T-cells
Alignment
Subset
Cell
antigens
modeling
T-Lymphocytes
T-lymphocytes
cells
sampling
Blood
blood

All Science Journal Classification (ASJC) codes

  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Ecology
  • Molecular Biology
  • Genetics
  • Cellular and Molecular Neuroscience
  • Computational Theory and Mathematics

Cite this

Cron, Andrew ; Gouttefangeas, Cécile ; Frelinger, Jacob ; Lin, Lin ; Singh, Satwinder K. ; Britten, Cedrik M. ; Welters, Marij J.P. ; van der Burg, Sjoerd H. ; West, Mike ; Chan, Cliburn. / Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples. In: PLoS computational biology. 2013 ; Vol. 9, No. 7.
@article{6f3940a401b64b6e8142033571c9557d,
title = "Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples",
abstract = "Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1{\%} or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest in an automated fashion. Two specific challenges for automated analysis are to detect extremely low frequency event subsets without biasing the estimate by pre-processing enrichment, and the ability to align cell subsets across multiple data samples for comparative analysis. In this manuscript, we develop hierarchical modeling extensions to the Dirichlet Process Gaussian Mixture Model (DPGMM) approach we have previously described for cell subset identification, and show that the hierarchical DPGMM (HDPGMM) naturally generates an aligned data model that captures both commonalities and variations across multiple samples. HDPGMM also increases the sensitivity to extremely low frequency events by sharing information across multiple samples analyzed simultaneously. We validate the accuracy and reproducibility of HDPGMM estimates of antigen-specific T cells on clinically relevant reference peripheral blood mononuclear cell (PBMC) samples with known frequencies of antigen-specific T cells. These cell samples take advantage of retrovirally TCR-transduced T cells spiked into autologous PBMC samples to give a defined number of antigen-specific T cells detectable by HLA-peptide multimer binding. We provide open source software that can take advantage of both multiple processors and GPU-acceleration to perform the numerically-demanding computations. We show that hierarchical modeling is a useful probabilistic approach that can provide a consistent labeling of cell subsets and increase the sensitivity of rare event detection in the context of quantifying antigen-specific immune responses.",
author = "Andrew Cron and C{\'e}cile Gouttefangeas and Jacob Frelinger and Lin Lin and Singh, {Satwinder K.} and Britten, {Cedrik M.} and Welters, {Marij J.P.} and {van der Burg}, {Sjoerd H.} and Mike West and Cliburn Chan",
year = "2013",
month = "7",
day = "1",
doi = "10.1371/journal.pcbi.1003130",
language = "English (US)",
volume = "9",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "7",

}

Cron, A, Gouttefangeas, C, Frelinger, J, Lin, L, Singh, SK, Britten, CM, Welters, MJP, van der Burg, SH, West, M & Chan, C 2013, 'Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples', PLoS computational biology, vol. 9, no. 7, e1003130. https://doi.org/10.1371/journal.pcbi.1003130

Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples. / Cron, Andrew; Gouttefangeas, Cécile; Frelinger, Jacob; Lin, Lin; Singh, Satwinder K.; Britten, Cedrik M.; Welters, Marij J.P.; van der Burg, Sjoerd H.; West, Mike; Chan, Cliburn.

In: PLoS computational biology, Vol. 9, No. 7, e1003130, 01.07.2013.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples

AU - Cron, Andrew

AU - Gouttefangeas, Cécile

AU - Frelinger, Jacob

AU - Lin, Lin

AU - Singh, Satwinder K.

AU - Britten, Cedrik M.

AU - Welters, Marij J.P.

AU - van der Burg, Sjoerd H.

AU - West, Mike

AU - Chan, Cliburn

PY - 2013/7/1

Y1 - 2013/7/1

N2 - Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1% or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest in an automated fashion. Two specific challenges for automated analysis are to detect extremely low frequency event subsets without biasing the estimate by pre-processing enrichment, and the ability to align cell subsets across multiple data samples for comparative analysis. In this manuscript, we develop hierarchical modeling extensions to the Dirichlet Process Gaussian Mixture Model (DPGMM) approach we have previously described for cell subset identification, and show that the hierarchical DPGMM (HDPGMM) naturally generates an aligned data model that captures both commonalities and variations across multiple samples. HDPGMM also increases the sensitivity to extremely low frequency events by sharing information across multiple samples analyzed simultaneously. We validate the accuracy and reproducibility of HDPGMM estimates of antigen-specific T cells on clinically relevant reference peripheral blood mononuclear cell (PBMC) samples with known frequencies of antigen-specific T cells. These cell samples take advantage of retrovirally TCR-transduced T cells spiked into autologous PBMC samples to give a defined number of antigen-specific T cells detectable by HLA-peptide multimer binding. We provide open source software that can take advantage of both multiple processors and GPU-acceleration to perform the numerically-demanding computations. We show that hierarchical modeling is a useful probabilistic approach that can provide a consistent labeling of cell subsets and increase the sensitivity of rare event detection in the context of quantifying antigen-specific immune responses.

AB - Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1% or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest in an automated fashion. Two specific challenges for automated analysis are to detect extremely low frequency event subsets without biasing the estimate by pre-processing enrichment, and the ability to align cell subsets across multiple data samples for comparative analysis. In this manuscript, we develop hierarchical modeling extensions to the Dirichlet Process Gaussian Mixture Model (DPGMM) approach we have previously described for cell subset identification, and show that the hierarchical DPGMM (HDPGMM) naturally generates an aligned data model that captures both commonalities and variations across multiple samples. HDPGMM also increases the sensitivity to extremely low frequency events by sharing information across multiple samples analyzed simultaneously. We validate the accuracy and reproducibility of HDPGMM estimates of antigen-specific T cells on clinically relevant reference peripheral blood mononuclear cell (PBMC) samples with known frequencies of antigen-specific T cells. These cell samples take advantage of retrovirally TCR-transduced T cells spiked into autologous PBMC samples to give a defined number of antigen-specific T cells detectable by HLA-peptide multimer binding. We provide open source software that can take advantage of both multiple processors and GPU-acceleration to perform the numerically-demanding computations. We show that hierarchical modeling is a useful probabilistic approach that can provide a consistent labeling of cell subsets and increase the sensitivity of rare event detection in the context of quantifying antigen-specific immune responses.

UR - http://www.scopus.com/inward/record.url?scp=84880849822&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880849822&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1003130

DO - 10.1371/journal.pcbi.1003130

M3 - Article

C2 - 23874174

AN - SCOPUS:84880849822

VL - 9

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 7

M1 - e1003130

ER -