Maximum Rank Reproducibility: A Nonparametric Approach to Assessing Reproducibility in Replicate Experiments

Daisy Lahaina Philtron, Yafei Lyu, Qunhua Li, Debashis Ghosh

Research output: Contribution to journalArticle

Abstract

The identification of reproducible signals from the results of replicate high-throughput experiments is an important part of modern biological research. Often little is known about the dependence structure and the marginal distribution of the data, motivating the development of a nonparametric approach to assess reproducibility. The procedure, which we call the maximum rank reproducibility (MaRR) procedure, uses a maximum rank statistic to parse reproducible signals from noise without making assumptions about the distribution of reproducible signals. Because it uses the rank scale this procedure can be easily applied to a variety of data types. One application is to assess the reproducibility of RNA-seq technology using data produced by the sequencing quality control (SEQC) consortium, which coordinated a multi-laboratory effort to assess reproducibility across three RNA-seq platforms. Our results on simulations and SEQC data show that the MaRR procedure effectively controls false discovery rates, has desirable power properties, and compares well to existing methods. Supplementary materials for this article are available online.

Original languageEnglish (US)
Pages (from-to)1028-1039
Number of pages12
JournalJournal of the American Statistical Association
Volume113
Issue number523
DOIs
StatePublished - Jul 3 2018

Fingerprint

Reproducibility
Experiment
Quality Control
Sequencing
Rank Statistics
Dependence Structure
Marginal Distribution
High Throughput
Simulation
Quality control

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

@article{1f2aded060cb40ca8de5bcc6f658472d,
title = "Maximum Rank Reproducibility: A Nonparametric Approach to Assessing Reproducibility in Replicate Experiments",
abstract = "The identification of reproducible signals from the results of replicate high-throughput experiments is an important part of modern biological research. Often little is known about the dependence structure and the marginal distribution of the data, motivating the development of a nonparametric approach to assess reproducibility. The procedure, which we call the maximum rank reproducibility (MaRR) procedure, uses a maximum rank statistic to parse reproducible signals from noise without making assumptions about the distribution of reproducible signals. Because it uses the rank scale this procedure can be easily applied to a variety of data types. One application is to assess the reproducibility of RNA-seq technology using data produced by the sequencing quality control (SEQC) consortium, which coordinated a multi-laboratory effort to assess reproducibility across three RNA-seq platforms. Our results on simulations and SEQC data show that the MaRR procedure effectively controls false discovery rates, has desirable power properties, and compares well to existing methods. Supplementary materials for this article are available online.",
author = "Philtron, {Daisy Lahaina} and Yafei Lyu and Qunhua Li and Debashis Ghosh",
year = "2018",
month = "7",
day = "3",
doi = "10.1080/01621459.2017.1397521",
language = "English (US)",
volume = "113",
pages = "1028--1039",
journal = "Journal of the American Statistical Association",
issn = "0162-1459",
publisher = "Taylor and Francis Ltd.",
number = "523",

}

Maximum Rank Reproducibility : A Nonparametric Approach to Assessing Reproducibility in Replicate Experiments. / Philtron, Daisy Lahaina; Lyu, Yafei; Li, Qunhua; Ghosh, Debashis.

In: Journal of the American Statistical Association, Vol. 113, No. 523, 03.07.2018, p. 1028-1039.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Maximum Rank Reproducibility

T2 - A Nonparametric Approach to Assessing Reproducibility in Replicate Experiments

AU - Philtron, Daisy Lahaina

AU - Lyu, Yafei

AU - Li, Qunhua

AU - Ghosh, Debashis

PY - 2018/7/3

Y1 - 2018/7/3

N2 - The identification of reproducible signals from the results of replicate high-throughput experiments is an important part of modern biological research. Often little is known about the dependence structure and the marginal distribution of the data, motivating the development of a nonparametric approach to assess reproducibility. The procedure, which we call the maximum rank reproducibility (MaRR) procedure, uses a maximum rank statistic to parse reproducible signals from noise without making assumptions about the distribution of reproducible signals. Because it uses the rank scale this procedure can be easily applied to a variety of data types. One application is to assess the reproducibility of RNA-seq technology using data produced by the sequencing quality control (SEQC) consortium, which coordinated a multi-laboratory effort to assess reproducibility across three RNA-seq platforms. Our results on simulations and SEQC data show that the MaRR procedure effectively controls false discovery rates, has desirable power properties, and compares well to existing methods. Supplementary materials for this article are available online.

AB - The identification of reproducible signals from the results of replicate high-throughput experiments is an important part of modern biological research. Often little is known about the dependence structure and the marginal distribution of the data, motivating the development of a nonparametric approach to assess reproducibility. The procedure, which we call the maximum rank reproducibility (MaRR) procedure, uses a maximum rank statistic to parse reproducible signals from noise without making assumptions about the distribution of reproducible signals. Because it uses the rank scale this procedure can be easily applied to a variety of data types. One application is to assess the reproducibility of RNA-seq technology using data produced by the sequencing quality control (SEQC) consortium, which coordinated a multi-laboratory effort to assess reproducibility across three RNA-seq platforms. Our results on simulations and SEQC data show that the MaRR procedure effectively controls false discovery rates, has desirable power properties, and compares well to existing methods. Supplementary materials for this article are available online.

UR - http://www.scopus.com/inward/record.url?scp=85054642697&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054642697&partnerID=8YFLogxK

U2 - 10.1080/01621459.2017.1397521

DO - 10.1080/01621459.2017.1397521

M3 - Article

C2 - 31249430

AN - SCOPUS:85054642697

VL - 113

SP - 1028

EP - 1039

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

SN - 0162-1459

IS - 523

ER -