We repurposed existing genotypes in DNA biobanks across the Electronic Medical Records and Genomics network to perform a genome-wide association study for primary hypothyroidism, the most common thyroid disease. Electronic selection algorithms incorporating billing codes, laboratory values, text queries, and medication records identified 1317 cases and 5053 controls of European ancestry within five electronic medical records (EMRs); the algorithms' positive predictive values were 92.4% and 98.5% for cases and controls, respectively. Four single-nucleotide polymorphisms (SNPs) in linkage disequilibrium at 9q22 near FOXE1 were associated with hypothyroidism at genome-wide significance, the strongest being rs7850258 (odds ratio [OR] 0.74, p = 3.96 × 10 -9). This association was replicated in a set of 263 cases and 1616 controls (OR = 0.60, p = 5.7 × 10 -6). A phenome-wide association study (PheWAS) that was performed on this locus with 13,617 individuals and more than 200,000 patient-years of billing data identified associations with additional phenotypes: thyroiditis (OR = 0.58, p = 1.4 × 10 -5), nodular (OR = 0.76, p = 3.1 × 10 -5) and multinodular (OR = 0.69, p = 3.9 × 10 -5) goiters, and thyrotoxicosis (OR = 0.76, p = 1.5 × 10 -3), but not Graves disease (OR = 1.03, p = 0.82). Thyroid cancer, previously associated with this locus, was not significantly associated in the PheWAS (OR = 1.29, p = 0.09). The strongest association in the PheWAS was hypothyroidism (OR = 0.76, p = 2.7 × 10 -13), which had an odds ratio that was nearly identical to that of the curated case-control population in the primary analysis, providing further validation of the PheWAS method. Our findings indicate that EMR-linked genomic data could allow discovery of genes associated with many diseases without additional genotyping cost.
All Science Journal Classification (ASJC) codes