The effects of linkage disequilibrium in large scale SNP datasets for MDR

Benjamin J. Grady, Eric S. Torstenson, Marylyn Deriggi Ritchie

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package. Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases. Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

Original languageEnglish (US)
Article number11
JournalBioData Mining
Volume4
Issue number1
DOIs
StatePublished - May 9 2011

Fingerprint

Multifactor Dimensionality Reduction
Linkage Disequilibrium
Dimensionality Reduction
Single Nucleotide Polymorphism
Locus
Association reactions
Analytical Methods
Datasets
Software packages
Confounding
Predictive Model
Software Package
Genomics
Software

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Molecular Biology
  • Genetics
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Grady, Benjamin J. ; Torstenson, Eric S. ; Ritchie, Marylyn Deriggi. / The effects of linkage disequilibrium in large scale SNP datasets for MDR. In: BioData Mining. 2011 ; Vol. 4, No. 1.
@article{a4165dff032148d494e048ec12f86dab,
title = "The effects of linkage disequilibrium in large scale SNP datasets for MDR",
abstract = "Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package. Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases. Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.",
author = "Grady, {Benjamin J.} and Torstenson, {Eric S.} and Ritchie, {Marylyn Deriggi}",
year = "2011",
month = "5",
day = "9",
doi = "10.1186/1756-0381-4-11",
language = "English (US)",
volume = "4",
journal = "BioData Mining",
issn = "1756-0381",
publisher = "BioMed Central",
number = "1",

}

The effects of linkage disequilibrium in large scale SNP datasets for MDR. / Grady, Benjamin J.; Torstenson, Eric S.; Ritchie, Marylyn Deriggi.

In: BioData Mining, Vol. 4, No. 1, 11, 09.05.2011.

Research output: Contribution to journalArticle

TY - JOUR

T1 - The effects of linkage disequilibrium in large scale SNP datasets for MDR

AU - Grady, Benjamin J.

AU - Torstenson, Eric S.

AU - Ritchie, Marylyn Deriggi

PY - 2011/5/9

Y1 - 2011/5/9

N2 - Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package. Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases. Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

AB - Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package. Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases. Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

UR - http://www.scopus.com/inward/record.url?scp=79955564727&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955564727&partnerID=8YFLogxK

U2 - 10.1186/1756-0381-4-11

DO - 10.1186/1756-0381-4-11

M3 - Article

C2 - 21545716

AN - SCOPUS:79955564727

VL - 4

JO - BioData Mining

JF - BioData Mining

SN - 1756-0381

IS - 1

M1 - 11

ER -