Evaluation of predixcan for prioritizing GWAS associations and predicting gene expression

Binglan Li, Shefali S. Verma, Yogasudha C. Veturi, Anurag Verma, Yuki Bradford, David W. Haas, Marylyn Deriggi Ritchie

Research output: Contribution to journalConference article

4 Citations (Scopus)

Abstract

Genome-wide association studies (GWAS) have been successful in facilitating the understanding of genetic architecture behind human diseases, but this approach faces many challenges. To identify disease-related loci with modest to weak effect size, GWAS requires very large sample sizes, which can be computational burdensome. In addition, the interpretation of discovered associations remains difficult. PrediXcan was developed to help address these issues. With built in SNP-expression models, PrediXcan is able to predict the expression of genes that are regulated by putative expression quantitative trait loci (eQTLs), and these predicted expression levels can then be used to perform gene-based association studies. This approach reduces the multiple testing burden from millions of variants down to several thousand genes. But most importantly, the identified associations can reveal the genes that are under regulation of eQTLs and consequently involved in disease pathogenesis. In this study, two of the most practical functions of PrediXcan were tested: 1) predicting gene expression, and 2) prioritizing GWAS results. We tested the prediction accuracy of PrediXcan by comparing the predicted and observed gene expression levels, and also looked into some potential influential factors and a filter criterion with the aim of improving PrediXcan performance. As for GWAS prioritization, predicted gene expression levels were used to obtain gene-trait associations, and background regions of significant associations were examined to decrease the likelihood of false positives. Our results showed that 1) PrediXcan predicted gene expression levels accurately for some but not all genes; 2) including more putative eQTLs into prediction did not improve the prediction accuracy; and 3) integrating predicted gene expression levels from the two PrediXcan whole blood models did not eliminate false positives. Still, PrediXcan was able to prioritize GWAS associations that were below the genome-wide significance threshold in GWAS, while retaining GWAS significant results. This study suggests several ways to consider PrediXcan’s performance that will be of value to eQTL and complex human disease research.

Original languageEnglish (US)
Pages (from-to)448-459
Number of pages12
JournalPacific Symposium on Biocomputing
Volume0
Issue number212669
DOIs
StatePublished - Jan 1 2018
Event23rd Pacific Symposium on Biocomputing, PSB 2018 - Kohala Coast, United States
Duration: Jan 3 2018Jan 7 2018

Fingerprint

Genome-Wide Association Study
Gene expression
Genes
Gene Expression
Quantitative Trait Loci
Gene Expression Regulation
Sample Size
Single Nucleotide Polymorphism
Genome
Research
Blood

All Science Journal Classification (ASJC) codes

  • Medicine(all)

Cite this

Li, B., Verma, S. S., Veturi, Y. C., Verma, A., Bradford, Y., Haas, D. W., & Ritchie, M. D. (2018). Evaluation of predixcan for prioritizing GWAS associations and predicting gene expression. Pacific Symposium on Biocomputing, 0(212669), 448-459. https://doi.org/10.1142/9789813235533_0041
Li, Binglan ; Verma, Shefali S. ; Veturi, Yogasudha C. ; Verma, Anurag ; Bradford, Yuki ; Haas, David W. ; Ritchie, Marylyn Deriggi. / Evaluation of predixcan for prioritizing GWAS associations and predicting gene expression. In: Pacific Symposium on Biocomputing. 2018 ; Vol. 0, No. 212669. pp. 448-459.
@article{0d0571062cce4334bf35d3a9ae8df276,
title = "Evaluation of predixcan for prioritizing GWAS associations and predicting gene expression",
abstract = "Genome-wide association studies (GWAS) have been successful in facilitating the understanding of genetic architecture behind human diseases, but this approach faces many challenges. To identify disease-related loci with modest to weak effect size, GWAS requires very large sample sizes, which can be computational burdensome. In addition, the interpretation of discovered associations remains difficult. PrediXcan was developed to help address these issues. With built in SNP-expression models, PrediXcan is able to predict the expression of genes that are regulated by putative expression quantitative trait loci (eQTLs), and these predicted expression levels can then be used to perform gene-based association studies. This approach reduces the multiple testing burden from millions of variants down to several thousand genes. But most importantly, the identified associations can reveal the genes that are under regulation of eQTLs and consequently involved in disease pathogenesis. In this study, two of the most practical functions of PrediXcan were tested: 1) predicting gene expression, and 2) prioritizing GWAS results. We tested the prediction accuracy of PrediXcan by comparing the predicted and observed gene expression levels, and also looked into some potential influential factors and a filter criterion with the aim of improving PrediXcan performance. As for GWAS prioritization, predicted gene expression levels were used to obtain gene-trait associations, and background regions of significant associations were examined to decrease the likelihood of false positives. Our results showed that 1) PrediXcan predicted gene expression levels accurately for some but not all genes; 2) including more putative eQTLs into prediction did not improve the prediction accuracy; and 3) integrating predicted gene expression levels from the two PrediXcan whole blood models did not eliminate false positives. Still, PrediXcan was able to prioritize GWAS associations that were below the genome-wide significance threshold in GWAS, while retaining GWAS significant results. This study suggests several ways to consider PrediXcan’s performance that will be of value to eQTL and complex human disease research.",
author = "Binglan Li and Verma, {Shefali S.} and Veturi, {Yogasudha C.} and Anurag Verma and Yuki Bradford and Haas, {David W.} and Ritchie, {Marylyn Deriggi}",
year = "2018",
month = "1",
day = "1",
doi = "10.1142/9789813235533_0041",
language = "English (US)",
volume = "0",
pages = "448--459",
journal = "Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing",
issn = "2335-6936",
number = "212669",

}

Li, B, Verma, SS, Veturi, YC, Verma, A, Bradford, Y, Haas, DW & Ritchie, MD 2018, 'Evaluation of predixcan for prioritizing GWAS associations and predicting gene expression', Pacific Symposium on Biocomputing, vol. 0, no. 212669, pp. 448-459. https://doi.org/10.1142/9789813235533_0041

Evaluation of predixcan for prioritizing GWAS associations and predicting gene expression. / Li, Binglan; Verma, Shefali S.; Veturi, Yogasudha C.; Verma, Anurag; Bradford, Yuki; Haas, David W.; Ritchie, Marylyn Deriggi.

In: Pacific Symposium on Biocomputing, Vol. 0, No. 212669, 01.01.2018, p. 448-459.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Evaluation of predixcan for prioritizing GWAS associations and predicting gene expression

AU - Li, Binglan

AU - Verma, Shefali S.

AU - Veturi, Yogasudha C.

AU - Verma, Anurag

AU - Bradford, Yuki

AU - Haas, David W.

AU - Ritchie, Marylyn Deriggi

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Genome-wide association studies (GWAS) have been successful in facilitating the understanding of genetic architecture behind human diseases, but this approach faces many challenges. To identify disease-related loci with modest to weak effect size, GWAS requires very large sample sizes, which can be computational burdensome. In addition, the interpretation of discovered associations remains difficult. PrediXcan was developed to help address these issues. With built in SNP-expression models, PrediXcan is able to predict the expression of genes that are regulated by putative expression quantitative trait loci (eQTLs), and these predicted expression levels can then be used to perform gene-based association studies. This approach reduces the multiple testing burden from millions of variants down to several thousand genes. But most importantly, the identified associations can reveal the genes that are under regulation of eQTLs and consequently involved in disease pathogenesis. In this study, two of the most practical functions of PrediXcan were tested: 1) predicting gene expression, and 2) prioritizing GWAS results. We tested the prediction accuracy of PrediXcan by comparing the predicted and observed gene expression levels, and also looked into some potential influential factors and a filter criterion with the aim of improving PrediXcan performance. As for GWAS prioritization, predicted gene expression levels were used to obtain gene-trait associations, and background regions of significant associations were examined to decrease the likelihood of false positives. Our results showed that 1) PrediXcan predicted gene expression levels accurately for some but not all genes; 2) including more putative eQTLs into prediction did not improve the prediction accuracy; and 3) integrating predicted gene expression levels from the two PrediXcan whole blood models did not eliminate false positives. Still, PrediXcan was able to prioritize GWAS associations that were below the genome-wide significance threshold in GWAS, while retaining GWAS significant results. This study suggests several ways to consider PrediXcan’s performance that will be of value to eQTL and complex human disease research.

AB - Genome-wide association studies (GWAS) have been successful in facilitating the understanding of genetic architecture behind human diseases, but this approach faces many challenges. To identify disease-related loci with modest to weak effect size, GWAS requires very large sample sizes, which can be computational burdensome. In addition, the interpretation of discovered associations remains difficult. PrediXcan was developed to help address these issues. With built in SNP-expression models, PrediXcan is able to predict the expression of genes that are regulated by putative expression quantitative trait loci (eQTLs), and these predicted expression levels can then be used to perform gene-based association studies. This approach reduces the multiple testing burden from millions of variants down to several thousand genes. But most importantly, the identified associations can reveal the genes that are under regulation of eQTLs and consequently involved in disease pathogenesis. In this study, two of the most practical functions of PrediXcan were tested: 1) predicting gene expression, and 2) prioritizing GWAS results. We tested the prediction accuracy of PrediXcan by comparing the predicted and observed gene expression levels, and also looked into some potential influential factors and a filter criterion with the aim of improving PrediXcan performance. As for GWAS prioritization, predicted gene expression levels were used to obtain gene-trait associations, and background regions of significant associations were examined to decrease the likelihood of false positives. Our results showed that 1) PrediXcan predicted gene expression levels accurately for some but not all genes; 2) including more putative eQTLs into prediction did not improve the prediction accuracy; and 3) integrating predicted gene expression levels from the two PrediXcan whole blood models did not eliminate false positives. Still, PrediXcan was able to prioritize GWAS associations that were below the genome-wide significance threshold in GWAS, while retaining GWAS significant results. This study suggests several ways to consider PrediXcan’s performance that will be of value to eQTL and complex human disease research.

UR - http://www.scopus.com/inward/record.url?scp=85048461666&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048461666&partnerID=8YFLogxK

U2 - 10.1142/9789813235533_0041

DO - 10.1142/9789813235533_0041

M3 - Conference article

C2 - 29218904

AN - SCOPUS:85048461666

VL - 0

SP - 448

EP - 459

JO - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

JF - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

SN - 2335-6936

IS - 212669

ER -