COE: A general approach for efficient genome-wide two-locus epistasis test in disease association study

Xiang Zhang, Feng Pan, Yuying Xie, Fei Zou, Wei Wang

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

The availability of high-density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this article, we propose a general approach, COE, for efficient large scale gene-gene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convex statistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.

Original languageEnglish (US)
Pages (from-to)401-415
Number of pages15
JournalJournal of Computational Biology
Volume17
Issue number3
DOIs
StatePublished - Mar 1 2010

Fingerprint

Epistasis
Single nucleotide Polymorphism
Single Nucleotide Polymorphism
Locus
Nucleotides
Genome
Polymorphism
Genes
Gene
Upper bound
Genotype
Indexing
Statistics
Permutation Test
Genome-Wide Association Study
Interaction
Convex function
Optimality
Availability
Coe Alginate

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Cite this

@article{8faa9a14241c43489d6046f6a8d3cc75,
title = "COE: A general approach for efficient genome-wide two-locus epistasis test in disease association study",
abstract = "The availability of high-density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this article, we propose a general approach, COE, for efficient large scale gene-gene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convex statistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.",
author = "Xiang Zhang and Feng Pan and Yuying Xie and Fei Zou and Wei Wang",
year = "2010",
month = "3",
day = "1",
doi = "10.1089/cmb.2009.0155",
language = "English (US)",
volume = "17",
pages = "401--415",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "3",

}

COE : A general approach for efficient genome-wide two-locus epistasis test in disease association study. / Zhang, Xiang; Pan, Feng; Xie, Yuying; Zou, Fei; Wang, Wei.

In: Journal of Computational Biology, Vol. 17, No. 3, 01.03.2010, p. 401-415.

Research output: Contribution to journalArticle

TY - JOUR

T1 - COE

T2 - A general approach for efficient genome-wide two-locus epistasis test in disease association study

AU - Zhang, Xiang

AU - Pan, Feng

AU - Xie, Yuying

AU - Zou, Fei

AU - Wang, Wei

PY - 2010/3/1

Y1 - 2010/3/1

N2 - The availability of high-density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this article, we propose a general approach, COE, for efficient large scale gene-gene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convex statistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.

AB - The availability of high-density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this article, we propose a general approach, COE, for efficient large scale gene-gene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convex statistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.

UR - http://www.scopus.com/inward/record.url?scp=77950798630&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950798630&partnerID=8YFLogxK

U2 - 10.1089/cmb.2009.0155

DO - 10.1089/cmb.2009.0155

M3 - Article

C2 - 20377453

AN - SCOPUS:77950798630

VL - 17

SP - 401

EP - 415

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 3

ER -