Over the past few decades, genome-wide association studies analyzed by efficient statistical procedures have successfully identified single-nucleotide polymorphisms (SNPs) that are associated with complex traits or human diseases. However, due to the overwhelming number of SNPs, most approaches have focused on additive genetic model without genome-wide SNP-SNP interactions. In this study, we propose an efficient statistical procedure in a genetic model-free framework for detecting SNPs exhibiting main genetic effects as well as epistatic interactions. Specifically, the association between phenotype and genotype is characterized by an unknown function to be estimated using nonparametric techniques, and a two-stage non-parametric independence screening procedure is proposed to sequentially identify potentially important main genetic effects and interactions. Finally, the subset of genetic predictors implied by two-stage non-parametric independence screening is analyzed by penalized regressions such as LASSO, and a final model is identified. In this framework, specific genetic model is not assumed and interactions are not only amongmarginally important SNPs.Therefore, SNPs that are involved in genetic regulatory networks but missed by previous studies are expected to be recognized. In simulation studies, we show that the procedure is computationally efficient and has an outstanding finite sample performance in selecting potential SNPs as well as SNP^SNP interactions. A real data analysis further indicates the importance of epistatic interactions in explaining body mass index.
All Science Journal Classification (ASJC) codes
- Information Systems
- Molecular Biology