COE: A general approach for efficient genome-wide two-locus epistasis test in disease association study

Xiang Zhang, Feng Pan, Yuying Xie, Fei Zou, Wei Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

The availability of high density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this paper, we propose a general approach, COE, for efficient large scale genegene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convexstatistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 13th Annual International Conference, RECOMB 2009, Proceedings
Pages253-269
Number of pages17
DOIs
StatePublished - Jul 17 2009
Event13th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2009 - Tucson, AZ, United States
Duration: May 18 2009May 21 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5541 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2009
CountryUnited States
CityTucson, AZ
Period5/18/095/21/09

Fingerprint

Epistasis
Single nucleotide Polymorphism
Nucleotides
Polymorphism
Locus
Genome
Genes
Upper bound
Genotype
Indexing
Gene
Permutation Test
Interaction
Convex function
Optimality
Availability
Statistics
Range of data
Demonstrate
Experiment

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Zhang, X., Pan, F., Xie, Y., Zou, F., & Wang, W. (2009). COE: A general approach for efficient genome-wide two-locus epistasis test in disease association study. In Research in Computational Molecular Biology - 13th Annual International Conference, RECOMB 2009, Proceedings (pp. 253-269). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5541 LNBI). https://doi.org/10.1007/978-3-642-02008-7_19
Zhang, Xiang ; Pan, Feng ; Xie, Yuying ; Zou, Fei ; Wang, Wei. / COE : A general approach for efficient genome-wide two-locus epistasis test in disease association study. Research in Computational Molecular Biology - 13th Annual International Conference, RECOMB 2009, Proceedings. 2009. pp. 253-269 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{2b1f0e0442b84e209b2761e215581ffe,
title = "COE: A general approach for efficient genome-wide two-locus epistasis test in disease association study",
abstract = "The availability of high density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this paper, we propose a general approach, COE, for efficient large scale genegene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convexstatistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.",
author = "Xiang Zhang and Feng Pan and Yuying Xie and Fei Zou and Wei Wang",
year = "2009",
month = "7",
day = "17",
doi = "10.1007/978-3-642-02008-7_19",
language = "English (US)",
isbn = "9783642020070",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "253--269",
booktitle = "Research in Computational Molecular Biology - 13th Annual International Conference, RECOMB 2009, Proceedings",

}

Zhang, X, Pan, F, Xie, Y, Zou, F & Wang, W 2009, COE: A general approach for efficient genome-wide two-locus epistasis test in disease association study. in Research in Computational Molecular Biology - 13th Annual International Conference, RECOMB 2009, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5541 LNBI, pp. 253-269, 13th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2009, Tucson, AZ, United States, 5/18/09. https://doi.org/10.1007/978-3-642-02008-7_19

COE : A general approach for efficient genome-wide two-locus epistasis test in disease association study. / Zhang, Xiang; Pan, Feng; Xie, Yuying; Zou, Fei; Wang, Wei.

Research in Computational Molecular Biology - 13th Annual International Conference, RECOMB 2009, Proceedings. 2009. p. 253-269 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5541 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - COE

T2 - A general approach for efficient genome-wide two-locus epistasis test in disease association study

AU - Zhang, Xiang

AU - Pan, Feng

AU - Xie, Yuying

AU - Zou, Fei

AU - Wang, Wei

PY - 2009/7/17

Y1 - 2009/7/17

N2 - The availability of high density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this paper, we propose a general approach, COE, for efficient large scale genegene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convexstatistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.

AB - The availability of high density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this paper, we propose a general approach, COE, for efficient large scale genegene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convexstatistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.

UR - http://www.scopus.com/inward/record.url?scp=67650308781&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650308781&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-02008-7_19

DO - 10.1007/978-3-642-02008-7_19

M3 - Conference contribution

AN - SCOPUS:67650308781

SN - 9783642020070

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 253

EP - 269

BT - Research in Computational Molecular Biology - 13th Annual International Conference, RECOMB 2009, Proceedings

ER -

Zhang X, Pan F, Xie Y, Zou F, Wang W. COE: A general approach for efficient genome-wide two-locus epistasis test in disease association study. In Research in Computational Molecular Biology - 13th Annual International Conference, RECOMB 2009, Proceedings. 2009. p. 253-269. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-02008-7_19