TY - JOUR
T1 - Comparison of approaches for machine-learning optimization of neural networks for detecting gene-gene interactions in genetic epidemiology
AU - Motsinger-Reif, Alison A.
AU - Dudek, Scott M.
AU - Hahn, Lance W.
AU - Ritchie, Marylyn D.
PY - 2008/5
Y1 - 2008/5
N2 - The detection of genotypes that predict common, complex disease is a challenge for human geneticists. The phenomenon of epistasis, or gene-gene interactions, is particularly problematic for traditional statistical techniques. Additionally, the explosion of genetic information makes exhaustive searches of multilocus combinations computationally infeasible. To address these challenges, neural networks (NN), a pattern recognition method, have been used. One limitation of the NN approach is that its success is dependent on the architecture of the network. To solve this, machine-learning approaches have been suggested to evolve the best NN architecture for a particular data set. In this study we provide a detailed technical description of the use of grammatical evolution to optimize neural networks (GENN) for use in genetic association studies. We compare the performance of GENN to that of a previous machine-learning NN application - genetic programming neural networks in both simulated and real data. We show that GENN greatly outperforms genetic programming neural networks in data sets with a large number of single nucleotide polymorphisms. Additionally, we demonstrate that GENN has high power to detect disease-risk loci in a range of high-order epistatic models. Finally, we demonstrate the scalability of the GENN method with increasing numbers of variables - as many as 500,000 single nucleotide polymorphisms.
AB - The detection of genotypes that predict common, complex disease is a challenge for human geneticists. The phenomenon of epistasis, or gene-gene interactions, is particularly problematic for traditional statistical techniques. Additionally, the explosion of genetic information makes exhaustive searches of multilocus combinations computationally infeasible. To address these challenges, neural networks (NN), a pattern recognition method, have been used. One limitation of the NN approach is that its success is dependent on the architecture of the network. To solve this, machine-learning approaches have been suggested to evolve the best NN architecture for a particular data set. In this study we provide a detailed technical description of the use of grammatical evolution to optimize neural networks (GENN) for use in genetic association studies. We compare the performance of GENN to that of a previous machine-learning NN application - genetic programming neural networks in both simulated and real data. We show that GENN greatly outperforms genetic programming neural networks in data sets with a large number of single nucleotide polymorphisms. Additionally, we demonstrate that GENN has high power to detect disease-risk loci in a range of high-order epistatic models. Finally, we demonstrate the scalability of the GENN method with increasing numbers of variables - as many as 500,000 single nucleotide polymorphisms.
UR - http://www.scopus.com/inward/record.url?scp=43249108568&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=43249108568&partnerID=8YFLogxK
U2 - 10.1002/gepi.20307
DO - 10.1002/gepi.20307
M3 - Article
C2 - 18265411
AN - SCOPUS:43249108568
SN - 0741-0395
VL - 32
SP - 325
EP - 340
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 4
ER -