FastANOVA: An efficient algorithm for genome-wide association study

Xiang Zhang, Fei Zou, Wei Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

30 Citations (Scopus)

Abstract

Studying the association between quantitative phenotype (such as height or weight) and single nucleotide polymorphisms (SNPs) is an important problem in biology. To understand underlying mechanisms of complex phenotypes, it is often necessary to consider joint genetic effects across multiple SNPs. ANOVA (analysis of variance) test is routinely used in association study. Important findings from studying gene-gene (SNP-pair) interactions are appearing in the literature. However, the number of SNPs can be up to millions. Evaluating joint effects of SNPs is a challenging task even for SNP-pairs. Moreover, with large number of SNPs correlated, permutation procedure is preferred over simple Bonferroni correction for properly controlling family-wise error rate and retaining mapping power, which dramatically increases the computational cost of association study. In this paper, we study the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. We derive an upper bound of SNP-pair ANOVA test, which can be expressed as the sum of two terms. The first term is based on single-SNP ANOVA test. The second term is based on the SNPs and independent of any phenotype permutation. Furthermore, SNP-pairs can be organized into groups, each of which shares a common upper bound. This allows for maximum reuse of intermediate computation, efficient upper bound estimation, and effective SNP-pair pruning. Consequently, FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones. Extensive experiments demonstrate that FastANOVA is orders of magnitude faster than the brute-force implementation of ANOVA tests on all SNP pairs.

Original languageEnglish (US)
Title of host publicationKDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining
Pages821-829
Number of pages9
DOIs
StatePublished - 2008
Event14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, NV, United States
Duration: Aug 24 2008Aug 27 2008

Other

Other14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
CountryUnited States
CityLas Vegas, NV
Period8/24/088/27/08

Fingerprint

Nucleotides
Polymorphism
Genes
Analysis of variance (ANOVA)

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Zhang, X., Zou, F., & Wang, W. (2008). FastANOVA: An efficient algorithm for genome-wide association study. In KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining (pp. 821-829) https://doi.org/10.1145/1401890.1401988
Zhang, Xiang ; Zou, Fei ; Wang, Wei. / FastANOVA : An efficient algorithm for genome-wide association study. KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. 2008. pp. 821-829
@inproceedings{bec0a322405b4f82b67d8afae2bc3827,
title = "FastANOVA: An efficient algorithm for genome-wide association study",
abstract = "Studying the association between quantitative phenotype (such as height or weight) and single nucleotide polymorphisms (SNPs) is an important problem in biology. To understand underlying mechanisms of complex phenotypes, it is often necessary to consider joint genetic effects across multiple SNPs. ANOVA (analysis of variance) test is routinely used in association study. Important findings from studying gene-gene (SNP-pair) interactions are appearing in the literature. However, the number of SNPs can be up to millions. Evaluating joint effects of SNPs is a challenging task even for SNP-pairs. Moreover, with large number of SNPs correlated, permutation procedure is preferred over simple Bonferroni correction for properly controlling family-wise error rate and retaining mapping power, which dramatically increases the computational cost of association study. In this paper, we study the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. We derive an upper bound of SNP-pair ANOVA test, which can be expressed as the sum of two terms. The first term is based on single-SNP ANOVA test. The second term is based on the SNPs and independent of any phenotype permutation. Furthermore, SNP-pairs can be organized into groups, each of which shares a common upper bound. This allows for maximum reuse of intermediate computation, efficient upper bound estimation, and effective SNP-pair pruning. Consequently, FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones. Extensive experiments demonstrate that FastANOVA is orders of magnitude faster than the brute-force implementation of ANOVA tests on all SNP pairs.",
author = "Xiang Zhang and Fei Zou and Wei Wang",
year = "2008",
doi = "10.1145/1401890.1401988",
language = "English (US)",
isbn = "9781605581934",
pages = "821--829",
booktitle = "KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining",

}

Zhang, X, Zou, F & Wang, W 2008, FastANOVA: An efficient algorithm for genome-wide association study. in KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. pp. 821-829, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, Las Vegas, NV, United States, 8/24/08. https://doi.org/10.1145/1401890.1401988

FastANOVA : An efficient algorithm for genome-wide association study. / Zhang, Xiang; Zou, Fei; Wang, Wei.

KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 821-829.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - FastANOVA

T2 - An efficient algorithm for genome-wide association study

AU - Zhang, Xiang

AU - Zou, Fei

AU - Wang, Wei

PY - 2008

Y1 - 2008

N2 - Studying the association between quantitative phenotype (such as height or weight) and single nucleotide polymorphisms (SNPs) is an important problem in biology. To understand underlying mechanisms of complex phenotypes, it is often necessary to consider joint genetic effects across multiple SNPs. ANOVA (analysis of variance) test is routinely used in association study. Important findings from studying gene-gene (SNP-pair) interactions are appearing in the literature. However, the number of SNPs can be up to millions. Evaluating joint effects of SNPs is a challenging task even for SNP-pairs. Moreover, with large number of SNPs correlated, permutation procedure is preferred over simple Bonferroni correction for properly controlling family-wise error rate and retaining mapping power, which dramatically increases the computational cost of association study. In this paper, we study the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. We derive an upper bound of SNP-pair ANOVA test, which can be expressed as the sum of two terms. The first term is based on single-SNP ANOVA test. The second term is based on the SNPs and independent of any phenotype permutation. Furthermore, SNP-pairs can be organized into groups, each of which shares a common upper bound. This allows for maximum reuse of intermediate computation, efficient upper bound estimation, and effective SNP-pair pruning. Consequently, FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones. Extensive experiments demonstrate that FastANOVA is orders of magnitude faster than the brute-force implementation of ANOVA tests on all SNP pairs.

AB - Studying the association between quantitative phenotype (such as height or weight) and single nucleotide polymorphisms (SNPs) is an important problem in biology. To understand underlying mechanisms of complex phenotypes, it is often necessary to consider joint genetic effects across multiple SNPs. ANOVA (analysis of variance) test is routinely used in association study. Important findings from studying gene-gene (SNP-pair) interactions are appearing in the literature. However, the number of SNPs can be up to millions. Evaluating joint effects of SNPs is a challenging task even for SNP-pairs. Moreover, with large number of SNPs correlated, permutation procedure is preferred over simple Bonferroni correction for properly controlling family-wise error rate and retaining mapping power, which dramatically increases the computational cost of association study. In this paper, we study the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. We derive an upper bound of SNP-pair ANOVA test, which can be expressed as the sum of two terms. The first term is based on single-SNP ANOVA test. The second term is based on the SNPs and independent of any phenotype permutation. Furthermore, SNP-pairs can be organized into groups, each of which shares a common upper bound. This allows for maximum reuse of intermediate computation, efficient upper bound estimation, and effective SNP-pair pruning. Consequently, FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones. Extensive experiments demonstrate that FastANOVA is orders of magnitude faster than the brute-force implementation of ANOVA tests on all SNP pairs.

UR - http://www.scopus.com/inward/record.url?scp=65449155592&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65449155592&partnerID=8YFLogxK

U2 - 10.1145/1401890.1401988

DO - 10.1145/1401890.1401988

M3 - Conference contribution

AN - SCOPUS:65449155592

SN - 9781605581934

SP - 821

EP - 829

BT - KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining

ER -

Zhang X, Zou F, Wang W. FastANOVA: An efficient algorithm for genome-wide association study. In KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 821-829 https://doi.org/10.1145/1401890.1401988