Comparative analysis of methods for detecting interacting loci

Li Chen, Guoqiang Yu, Carl D. Langefeld, David J. Miller, Richard T. Guy, Jayaram Raghuram, Xiguo Yuan, David M. Herrington, Yue Wang

    Research output: Contribution to journalArticle

    25 Citations (Scopus)

    Abstract

    Background: Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted.Results: We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs.Conclusion: This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.

    Original languageEnglish (US)
    Article number344
    JournalBMC genomics
    Volume12
    DOIs
    StatePublished - Jul 5 2011

    Fingerprint

    Single Nucleotide Polymorphism
    Entropy
    Multifactor Dimensionality Reduction
    Logistic Models
    Gene-Environment Interaction
    Genetic Loci
    Penetrance
    Information Storage and Retrieval
    Linkage Disequilibrium
    Gene Frequency
    Genes

    All Science Journal Classification (ASJC) codes

    • Biotechnology
    • Genetics

    Cite this

    Chen, L., Yu, G., Langefeld, C. D., Miller, D. J., Guy, R. T., Raghuram, J., ... Wang, Y. (2011). Comparative analysis of methods for detecting interacting loci. BMC genomics, 12, [344]. https://doi.org/10.1186/1471-2164-12-344
    Chen, Li ; Yu, Guoqiang ; Langefeld, Carl D. ; Miller, David J. ; Guy, Richard T. ; Raghuram, Jayaram ; Yuan, Xiguo ; Herrington, David M. ; Wang, Yue. / Comparative analysis of methods for detecting interacting loci. In: BMC genomics. 2011 ; Vol. 12.
    @article{8b40bf7e9a504e37834791154a72d0a1,
    title = "Comparative analysis of methods for detecting interacting loci",
    abstract = "Background: Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted.Results: We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs.Conclusion: This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.",
    author = "Li Chen and Guoqiang Yu and Langefeld, {Carl D.} and Miller, {David J.} and Guy, {Richard T.} and Jayaram Raghuram and Xiguo Yuan and Herrington, {David M.} and Yue Wang",
    year = "2011",
    month = "7",
    day = "5",
    doi = "10.1186/1471-2164-12-344",
    language = "English (US)",
    volume = "12",
    journal = "BMC Genomics",
    issn = "1471-2164",
    publisher = "BioMed Central",

    }

    Chen, L, Yu, G, Langefeld, CD, Miller, DJ, Guy, RT, Raghuram, J, Yuan, X, Herrington, DM & Wang, Y 2011, 'Comparative analysis of methods for detecting interacting loci', BMC genomics, vol. 12, 344. https://doi.org/10.1186/1471-2164-12-344

    Comparative analysis of methods for detecting interacting loci. / Chen, Li; Yu, Guoqiang; Langefeld, Carl D.; Miller, David J.; Guy, Richard T.; Raghuram, Jayaram; Yuan, Xiguo; Herrington, David M.; Wang, Yue.

    In: BMC genomics, Vol. 12, 344, 05.07.2011.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Comparative analysis of methods for detecting interacting loci

    AU - Chen, Li

    AU - Yu, Guoqiang

    AU - Langefeld, Carl D.

    AU - Miller, David J.

    AU - Guy, Richard T.

    AU - Raghuram, Jayaram

    AU - Yuan, Xiguo

    AU - Herrington, David M.

    AU - Wang, Yue

    PY - 2011/7/5

    Y1 - 2011/7/5

    N2 - Background: Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted.Results: We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs.Conclusion: This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.

    AB - Background: Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted.Results: We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs.Conclusion: This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.

    UR - http://www.scopus.com/inward/record.url?scp=79959838298&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=79959838298&partnerID=8YFLogxK

    U2 - 10.1186/1471-2164-12-344

    DO - 10.1186/1471-2164-12-344

    M3 - Article

    C2 - 21729295

    AN - SCOPUS:79959838298

    VL - 12

    JO - BMC Genomics

    JF - BMC Genomics

    SN - 1471-2164

    M1 - 344

    ER -