A statistical approach to selecting and confirming validation targets in -omics experiments

Jeffrey T. Leek, Margaret A. Taub, Jason Laurence Rasgon

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Background: Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets.Results: Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result.Conclusions: For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results.

Original languageEnglish (US)
Article number150
JournalBMC Bioinformatics
Volume13
Issue number1
DOIs
StatePublished - Jun 27 2012

Fingerprint

Genes
Technology
RNA Sequence Analysis
Target
Validation Studies
Statistical methods
Genomics
Throughput
Proteomics
Experiment
Experiments
Research Personnel
Genome
Microarrays
RNA
Costs and Cost Analysis
Personnel
Proteins
Statistical method
High Throughput

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

@article{97b99d1041884e838e85ceb63fcfe470,
title = "A statistical approach to selecting and confirming validation targets in -omics experiments",
abstract = "Background: Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets.Results: Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result.Conclusions: For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results.",
author = "Leek, {Jeffrey T.} and Taub, {Margaret A.} and Rasgon, {Jason Laurence}",
year = "2012",
month = "6",
day = "27",
doi = "10.1186/1471-2105-13-150",
language = "English (US)",
volume = "13",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

A statistical approach to selecting and confirming validation targets in -omics experiments. / Leek, Jeffrey T.; Taub, Margaret A.; Rasgon, Jason Laurence.

In: BMC Bioinformatics, Vol. 13, No. 1, 150, 27.06.2012.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A statistical approach to selecting and confirming validation targets in -omics experiments

AU - Leek, Jeffrey T.

AU - Taub, Margaret A.

AU - Rasgon, Jason Laurence

PY - 2012/6/27

Y1 - 2012/6/27

N2 - Background: Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets.Results: Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result.Conclusions: For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results.

AB - Background: Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets.Results: Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result.Conclusions: For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results.

UR - http://www.scopus.com/inward/record.url?scp=84862736361&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862736361&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-13-150

DO - 10.1186/1471-2105-13-150

M3 - Article

C2 - 22738145

AN - SCOPUS:84862736361

VL - 13

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 150

ER -