PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations

Joshua C. Denny, Marylyn Deriggi Ritchie, Melissa A. Basford, Jill M. Pulley, Lisa Bastarache, Kristin Brown-Gentry, Deede Wang, Dan R. Masys, Dan M. Roden, Dana C. Crawford

Research output: Contribution to journalArticle

382 Citations (Scopus)

Abstract

Motivation: Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease-gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European-Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP-disease associations for these SNPs. Results: Four of seven known SNP-disease associations using the PheWAS algorithm were replicated with P-values between 2.8×10-6 and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P <0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP-disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance. Availability:The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research. Contact: josh.denny@vanderbilt.edu.

Original languageEnglish (US)
Article numberbtq126
Pages (from-to)1205-1210
Number of pages6
JournalBioinformatics
Volume26
Issue number9
DOIs
StatePublished - Mar 24 2010

Fingerprint

Single Nucleotide Polymorphism
Genes
Nucleotides
Polymorphism
Gene
Single nucleotide Polymorphism
Electronic Health Records
Electronic medical equipment
Software
Electronics
Population Control
Carotid Stenosis
International Classification of Diseases
Crohn Disease
Table
Systemic Lupus Erythematosus
Atrial Fibrillation
Multiple Sclerosis
Coronary Artery Disease
Rheumatoid Arthritis

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Denny, J. C., Ritchie, M. D., Basford, M. A., Pulley, J. M., Bastarache, L., Brown-Gentry, K., ... Crawford, D. C. (2010). PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics, 26(9), 1205-1210. [btq126]. https://doi.org/10.1093/bioinformatics/btq126
Denny, Joshua C. ; Ritchie, Marylyn Deriggi ; Basford, Melissa A. ; Pulley, Jill M. ; Bastarache, Lisa ; Brown-Gentry, Kristin ; Wang, Deede ; Masys, Dan R. ; Roden, Dan M. ; Crawford, Dana C. / PheWAS : Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. In: Bioinformatics. 2010 ; Vol. 26, No. 9. pp. 1205-1210.
@article{eb47b7306d184e66bd133af68534e57d,
title = "PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations",
abstract = "Motivation: Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease-gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European-Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP-disease associations for these SNPs. Results: Four of seven known SNP-disease associations using the PheWAS algorithm were replicated with P-values between 2.8×10-6 and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P <0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP-disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance. Availability:The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research. Contact: josh.denny@vanderbilt.edu.",
author = "Denny, {Joshua C.} and Ritchie, {Marylyn Deriggi} and Basford, {Melissa A.} and Pulley, {Jill M.} and Lisa Bastarache and Kristin Brown-Gentry and Deede Wang and Masys, {Dan R.} and Roden, {Dan M.} and Crawford, {Dana C.}",
year = "2010",
month = "3",
day = "24",
doi = "10.1093/bioinformatics/btq126",
language = "English (US)",
volume = "26",
pages = "1205--1210",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "9",

}

Denny, JC, Ritchie, MD, Basford, MA, Pulley, JM, Bastarache, L, Brown-Gentry, K, Wang, D, Masys, DR, Roden, DM & Crawford, DC 2010, 'PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations', Bioinformatics, vol. 26, no. 9, btq126, pp. 1205-1210. https://doi.org/10.1093/bioinformatics/btq126

PheWAS : Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. / Denny, Joshua C.; Ritchie, Marylyn Deriggi; Basford, Melissa A.; Pulley, Jill M.; Bastarache, Lisa; Brown-Gentry, Kristin; Wang, Deede; Masys, Dan R.; Roden, Dan M.; Crawford, Dana C.

In: Bioinformatics, Vol. 26, No. 9, btq126, 24.03.2010, p. 1205-1210.

Research output: Contribution to journalArticle

TY - JOUR

T1 - PheWAS

T2 - Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations

AU - Denny, Joshua C.

AU - Ritchie, Marylyn Deriggi

AU - Basford, Melissa A.

AU - Pulley, Jill M.

AU - Bastarache, Lisa

AU - Brown-Gentry, Kristin

AU - Wang, Deede

AU - Masys, Dan R.

AU - Roden, Dan M.

AU - Crawford, Dana C.

PY - 2010/3/24

Y1 - 2010/3/24

N2 - Motivation: Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease-gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European-Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP-disease associations for these SNPs. Results: Four of seven known SNP-disease associations using the PheWAS algorithm were replicated with P-values between 2.8×10-6 and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P <0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP-disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance. Availability:The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research. Contact: josh.denny@vanderbilt.edu.

AB - Motivation: Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease-gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European-Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP-disease associations for these SNPs. Results: Four of seven known SNP-disease associations using the PheWAS algorithm were replicated with P-values between 2.8×10-6 and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P <0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP-disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance. Availability:The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research. Contact: josh.denny@vanderbilt.edu.

UR - http://www.scopus.com/inward/record.url?scp=77952822074&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952822074&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btq126

DO - 10.1093/bioinformatics/btq126

M3 - Article

C2 - 20335276

AN - SCOPUS:77952822074

VL - 26

SP - 1205

EP - 1210

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 9

M1 - btq126

ER -

Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K et al. PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010 Mar 24;26(9):1205-1210. btq126. https://doi.org/10.1093/bioinformatics/btq126