Diverse convergent evidence in the genetic analysis of complex disease

Coordinating omic, informatic, and experimental evidence to better identify and validate risk factors

Timothy H. Ciesielski, Sarah A. Pendergrass, Marquitta J. White, Nuri Kodaman, Rafal S. Sobota, Minjun Huang, Jacquelaine Bartlett, Jing Li, Qinxin Pan, Jiang Gui, Scott Brian Selleck, Christopher I. Amos, Marylyn Deriggi Ritchie, Jason H. Moore, Scott M. Williams

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/ validation, promote effective science communication, and guide future research directions.

Original languageEnglish (US)
Article number10
JournalBioData Mining
Volume7
Issue number1
DOIs
StatePublished - Jun 30 2014

Fingerprint

Informatics
Risk Factors
False Positive
Genome-Wide Association Study
Pathology
Masks
Statistical methods
Genes
Research Personnel
Throughput
Imperfections
Leverage
Defects
High Throughput
Mask
Evidence
Communication
Genome
Eliminate
Complement

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Molecular Biology
  • Genetics
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Ciesielski, Timothy H. ; Pendergrass, Sarah A. ; White, Marquitta J. ; Kodaman, Nuri ; Sobota, Rafal S. ; Huang, Minjun ; Bartlett, Jacquelaine ; Li, Jing ; Pan, Qinxin ; Gui, Jiang ; Selleck, Scott Brian ; Amos, Christopher I. ; Ritchie, Marylyn Deriggi ; Moore, Jason H. ; Williams, Scott M. / Diverse convergent evidence in the genetic analysis of complex disease : Coordinating omic, informatic, and experimental evidence to better identify and validate risk factors. In: BioData Mining. 2014 ; Vol. 7, No. 1.
@article{6dc2f7a064614ac4b01216ab189b3b63,
title = "Diverse convergent evidence in the genetic analysis of complex disease: Coordinating omic, informatic, and experimental evidence to better identify and validate risk factors",
abstract = "In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/ validation, promote effective science communication, and guide future research directions.",
author = "Ciesielski, {Timothy H.} and Pendergrass, {Sarah A.} and White, {Marquitta J.} and Nuri Kodaman and Sobota, {Rafal S.} and Minjun Huang and Jacquelaine Bartlett and Jing Li and Qinxin Pan and Jiang Gui and Selleck, {Scott Brian} and Amos, {Christopher I.} and Ritchie, {Marylyn Deriggi} and Moore, {Jason H.} and Williams, {Scott M.}",
year = "2014",
month = "6",
day = "30",
doi = "10.1186/1756-0381-7-10",
language = "English (US)",
volume = "7",
journal = "BioData Mining",
issn = "1756-0381",
publisher = "BioMed Central",
number = "1",

}

Ciesielski, TH, Pendergrass, SA, White, MJ, Kodaman, N, Sobota, RS, Huang, M, Bartlett, J, Li, J, Pan, Q, Gui, J, Selleck, SB, Amos, CI, Ritchie, MD, Moore, JH & Williams, SM 2014, 'Diverse convergent evidence in the genetic analysis of complex disease: Coordinating omic, informatic, and experimental evidence to better identify and validate risk factors', BioData Mining, vol. 7, no. 1, 10. https://doi.org/10.1186/1756-0381-7-10

Diverse convergent evidence in the genetic analysis of complex disease : Coordinating omic, informatic, and experimental evidence to better identify and validate risk factors. / Ciesielski, Timothy H.; Pendergrass, Sarah A.; White, Marquitta J.; Kodaman, Nuri; Sobota, Rafal S.; Huang, Minjun; Bartlett, Jacquelaine; Li, Jing; Pan, Qinxin; Gui, Jiang; Selleck, Scott Brian; Amos, Christopher I.; Ritchie, Marylyn Deriggi; Moore, Jason H.; Williams, Scott M.

In: BioData Mining, Vol. 7, No. 1, 10, 30.06.2014.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Diverse convergent evidence in the genetic analysis of complex disease

T2 - Coordinating omic, informatic, and experimental evidence to better identify and validate risk factors

AU - Ciesielski, Timothy H.

AU - Pendergrass, Sarah A.

AU - White, Marquitta J.

AU - Kodaman, Nuri

AU - Sobota, Rafal S.

AU - Huang, Minjun

AU - Bartlett, Jacquelaine

AU - Li, Jing

AU - Pan, Qinxin

AU - Gui, Jiang

AU - Selleck, Scott Brian

AU - Amos, Christopher I.

AU - Ritchie, Marylyn Deriggi

AU - Moore, Jason H.

AU - Williams, Scott M.

PY - 2014/6/30

Y1 - 2014/6/30

N2 - In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/ validation, promote effective science communication, and guide future research directions.

AB - In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/ validation, promote effective science communication, and guide future research directions.

UR - http://www.scopus.com/inward/record.url?scp=84903357135&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84903357135&partnerID=8YFLogxK

U2 - 10.1186/1756-0381-7-10

DO - 10.1186/1756-0381-7-10

M3 - Article

VL - 7

JO - BioData Mining

JF - BioData Mining

SN - 1756-0381

IS - 1

M1 - 10

ER -