A regression framework for assessing covariate effects on the reproducibility of high-throughput experiments

Qunhua Li, Feipeng Zhang

Research output: Contribution to journalArticle

Abstract

The outcome of high-throughput biological experiments is affected by many operational factors in the experimental and data-analytical procedures. Understanding how these factors affect the reproducibility of the outcome is critical for establishing workflows that produce replicable discoveries. In this article, we propose a regression framework, based on a novel cumulative link model, to assess the covariate effects of operational factors on the reproducibility of findings from high-throughput experiments. In contrast to existing graphical approaches, our method allows one to succinctly characterize the simultaneous and independent effects of covariates on reproducibility and to compare reproducibility while controlling for potential confounding variables. We also establish a connection between our model and certain Archimedean copula models. This connection not only offers our regression framework an interpretation in copula models, but also provides guidance on choosing the functional forms of the regression. Furthermore, it also opens a new way to interpret and utilize these copulas in the context of reproducibility. Using simulations, we show that our method produces calibrated type I error and is more powerful in detecting difference in reproducibility than existing measures of agreement. We illustrate the usefulness of our method using a ChIP-seq study and a microarray study.

Original languageEnglish (US)
Pages (from-to)803-813
Number of pages11
JournalBiometrics
Volume74
Issue number3
DOIs
StatePublished - Sep 2018

Fingerprint

Reproducibility
reproducibility
High Throughput
Covariates
Regression
Throughput
Experiment
Copula Models
Experiments
Confounding Factors (Epidemiology)
Workflow
Microarrays
Reproducibility of Results
Archimedean Copula
Confounding
Type I error
Copula
Microarray
Work Flow
Guidance

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Cite this

@article{c39cc79948aa492e838a4dd6becd1032,
title = "A regression framework for assessing covariate effects on the reproducibility of high-throughput experiments",
abstract = "The outcome of high-throughput biological experiments is affected by many operational factors in the experimental and data-analytical procedures. Understanding how these factors affect the reproducibility of the outcome is critical for establishing workflows that produce replicable discoveries. In this article, we propose a regression framework, based on a novel cumulative link model, to assess the covariate effects of operational factors on the reproducibility of findings from high-throughput experiments. In contrast to existing graphical approaches, our method allows one to succinctly characterize the simultaneous and independent effects of covariates on reproducibility and to compare reproducibility while controlling for potential confounding variables. We also establish a connection between our model and certain Archimedean copula models. This connection not only offers our regression framework an interpretation in copula models, but also provides guidance on choosing the functional forms of the regression. Furthermore, it also opens a new way to interpret and utilize these copulas in the context of reproducibility. Using simulations, we show that our method produces calibrated type I error and is more powerful in detecting difference in reproducibility than existing measures of agreement. We illustrate the usefulness of our method using a ChIP-seq study and a microarray study.",
author = "Qunhua Li and Feipeng Zhang",
year = "2018",
month = "9",
doi = "10.1111/biom.12832",
language = "English (US)",
volume = "74",
pages = "803--813",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "3",

}

A regression framework for assessing covariate effects on the reproducibility of high-throughput experiments. / Li, Qunhua; Zhang, Feipeng.

In: Biometrics, Vol. 74, No. 3, 09.2018, p. 803-813.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A regression framework for assessing covariate effects on the reproducibility of high-throughput experiments

AU - Li, Qunhua

AU - Zhang, Feipeng

PY - 2018/9

Y1 - 2018/9

N2 - The outcome of high-throughput biological experiments is affected by many operational factors in the experimental and data-analytical procedures. Understanding how these factors affect the reproducibility of the outcome is critical for establishing workflows that produce replicable discoveries. In this article, we propose a regression framework, based on a novel cumulative link model, to assess the covariate effects of operational factors on the reproducibility of findings from high-throughput experiments. In contrast to existing graphical approaches, our method allows one to succinctly characterize the simultaneous and independent effects of covariates on reproducibility and to compare reproducibility while controlling for potential confounding variables. We also establish a connection between our model and certain Archimedean copula models. This connection not only offers our regression framework an interpretation in copula models, but also provides guidance on choosing the functional forms of the regression. Furthermore, it also opens a new way to interpret and utilize these copulas in the context of reproducibility. Using simulations, we show that our method produces calibrated type I error and is more powerful in detecting difference in reproducibility than existing measures of agreement. We illustrate the usefulness of our method using a ChIP-seq study and a microarray study.

AB - The outcome of high-throughput biological experiments is affected by many operational factors in the experimental and data-analytical procedures. Understanding how these factors affect the reproducibility of the outcome is critical for establishing workflows that produce replicable discoveries. In this article, we propose a regression framework, based on a novel cumulative link model, to assess the covariate effects of operational factors on the reproducibility of findings from high-throughput experiments. In contrast to existing graphical approaches, our method allows one to succinctly characterize the simultaneous and independent effects of covariates on reproducibility and to compare reproducibility while controlling for potential confounding variables. We also establish a connection between our model and certain Archimedean copula models. This connection not only offers our regression framework an interpretation in copula models, but also provides guidance on choosing the functional forms of the regression. Furthermore, it also opens a new way to interpret and utilize these copulas in the context of reproducibility. Using simulations, we show that our method produces calibrated type I error and is more powerful in detecting difference in reproducibility than existing measures of agreement. We illustrate the usefulness of our method using a ChIP-seq study and a microarray study.

UR - http://www.scopus.com/inward/record.url?scp=85036518457&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85036518457&partnerID=8YFLogxK

U2 - 10.1111/biom.12832

DO - 10.1111/biom.12832

M3 - Article

C2 - 29192968

AN - SCOPUS:85036518457

VL - 74

SP - 803

EP - 813

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 3

ER -