Next-generation analysis of cataracts: Determining knowledge driven gene-gene interactions using biofilter, and gene-environment interactions using the PhenX Toolkit

Sarah A. Pendergrass, Shefali S. Verma, Molly Hall, Emily R. Holzinger, Carrie B. Moore, John R. Wallace, Scott M. Dudek, Wayne Huggins, Terrie Kitchner, Carol Waudby, Richard Berg, Catherine A. McCarty, Marylyn Deriggi Ritchie

Research output: Contribution to journalConference article

16 Citations (Scopus)

Abstract

Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, cataract cases and controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 527, 953 and 527, 936 single nucleotide polymorphisms (SNPs) for gene-gene and gene-environment analyses, respectively, with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using Biofilter, we developed 57, 376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 13 statistically significant SNP-SNP models with an interaction with p-value < 1×10 -4 , as well as an overall model with p-value < 0.01 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 782 gene-environment models that exhibit an interaction with a p-value < 1×10 -4 associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.

Original languageEnglish (US)
Pages (from-to)147-158
Number of pages12
JournalPacific Symposium on Biocomputing
StatePublished - Jan 1 2013
Event18th Pacific Symposium on Biocomputing, PSB 2013 - Kohala Coast, United States
Duration: Jan 3 2013Jan 7 2013

Fingerprint

Gene-Environment Interaction
Cataract
Single Nucleotide Polymorphism
Electronic Health Records
Genes
Genome-Wide Association Study
National Human Genome Research Institute (U.S.)
Precision Medicine
Information Storage and Retrieval
Genomics
Research
Gene Frequency
Case-Control Studies
Smoking
Alcohols
Phenotype

All Science Journal Classification (ASJC) codes

  • Medicine(all)

Cite this

Pendergrass, Sarah A. ; Verma, Shefali S. ; Hall, Molly ; Holzinger, Emily R. ; Moore, Carrie B. ; Wallace, John R. ; Dudek, Scott M. ; Huggins, Wayne ; Kitchner, Terrie ; Waudby, Carol ; Berg, Richard ; McCarty, Catherine A. ; Ritchie, Marylyn Deriggi. / Next-generation analysis of cataracts : Determining knowledge driven gene-gene interactions using biofilter, and gene-environment interactions using the PhenX Toolkit. In: Pacific Symposium on Biocomputing. 2013 ; pp. 147-158.
@article{2d0fe038a625439dae1dc6aae1c26f0d,
title = "Next-generation analysis of cataracts: Determining knowledge driven gene-gene interactions using biofilter, and gene-environment interactions using the PhenX Toolkit",
abstract = "Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, cataract cases and controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 527, 953 and 527, 936 single nucleotide polymorphisms (SNPs) for gene-gene and gene-environment analyses, respectively, with minor allele frequency > 1{\%}, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using Biofilter, we developed 57, 376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 13 statistically significant SNP-SNP models with an interaction with p-value < 1×10 -4 , as well as an overall model with p-value < 0.01 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 782 gene-environment models that exhibit an interaction with a p-value < 1×10 -4 associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.",
author = "Pendergrass, {Sarah A.} and Verma, {Shefali S.} and Molly Hall and Holzinger, {Emily R.} and Moore, {Carrie B.} and Wallace, {John R.} and Dudek, {Scott M.} and Wayne Huggins and Terrie Kitchner and Carol Waudby and Richard Berg and McCarty, {Catherine A.} and Ritchie, {Marylyn Deriggi}",
year = "2013",
month = "1",
day = "1",
language = "English (US)",
pages = "147--158",
journal = "Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing",
issn = "2335-6936",

}

Pendergrass, SA, Verma, SS, Hall, M, Holzinger, ER, Moore, CB, Wallace, JR, Dudek, SM, Huggins, W, Kitchner, T, Waudby, C, Berg, R, McCarty, CA & Ritchie, MD 2013, 'Next-generation analysis of cataracts: Determining knowledge driven gene-gene interactions using biofilter, and gene-environment interactions using the PhenX Toolkit', Pacific Symposium on Biocomputing, pp. 147-158.

Next-generation analysis of cataracts : Determining knowledge driven gene-gene interactions using biofilter, and gene-environment interactions using the PhenX Toolkit. / Pendergrass, Sarah A.; Verma, Shefali S.; Hall, Molly; Holzinger, Emily R.; Moore, Carrie B.; Wallace, John R.; Dudek, Scott M.; Huggins, Wayne; Kitchner, Terrie; Waudby, Carol; Berg, Richard; McCarty, Catherine A.; Ritchie, Marylyn Deriggi.

In: Pacific Symposium on Biocomputing, 01.01.2013, p. 147-158.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Next-generation analysis of cataracts

T2 - Determining knowledge driven gene-gene interactions using biofilter, and gene-environment interactions using the PhenX Toolkit

AU - Pendergrass, Sarah A.

AU - Verma, Shefali S.

AU - Hall, Molly

AU - Holzinger, Emily R.

AU - Moore, Carrie B.

AU - Wallace, John R.

AU - Dudek, Scott M.

AU - Huggins, Wayne

AU - Kitchner, Terrie

AU - Waudby, Carol

AU - Berg, Richard

AU - McCarty, Catherine A.

AU - Ritchie, Marylyn Deriggi

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, cataract cases and controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 527, 953 and 527, 936 single nucleotide polymorphisms (SNPs) for gene-gene and gene-environment analyses, respectively, with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using Biofilter, we developed 57, 376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 13 statistically significant SNP-SNP models with an interaction with p-value < 1×10 -4 , as well as an overall model with p-value < 0.01 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 782 gene-environment models that exhibit an interaction with a p-value < 1×10 -4 associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.

AB - Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, cataract cases and controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 527, 953 and 527, 936 single nucleotide polymorphisms (SNPs) for gene-gene and gene-environment analyses, respectively, with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using Biofilter, we developed 57, 376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 13 statistically significant SNP-SNP models with an interaction with p-value < 1×10 -4 , as well as an overall model with p-value < 0.01 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 782 gene-environment models that exhibit an interaction with a p-value < 1×10 -4 associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.

UR - http://www.scopus.com/inward/record.url?scp=84891471451&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84891471451&partnerID=8YFLogxK

M3 - Conference article

C2 - 23424120

AN - SCOPUS:84891471451

SP - 147

EP - 158

JO - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

JF - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

SN - 2335-6936

ER -