Genome-wide association studies have identified many disease loci, most of which however are located in non-coding regions. Functionally understanding non-coding variants is an important but challenging problem. We propose new statistical methods to make use of large-scale functional annotation data in all available human cell types to tackle the following specific problems, which will greatly enhance our capability in understanding the functional roles of non-coding variants in a gene and cell type-aware context. 1. We will develop new methods to integrate existing function annotation results in all available human cell types to fine-map disease causal mutations. Simultaneously, we predict the functions disrupted by the mutation and the likely cell types through which the mutations impact disease risks. 2. We will further use RNA-seq data in multiple human cell types to identify target genes impacted by disease mutations. Complementary to eQTL studies, our method is not dependent on data availability, and hence can be broadly applied to predict cell type-specific target genes, especially when eQTLs are not available in those cell types. The methods proposed in this project are innovative in that not only they integrate multi-cell type functional information to predict disease causal mutations, but also they obtain quantitative estimates of functional effects and cell type effects on disease risk. More importantly, the methods work for small data, such as diseases with only a handful number of significant GWAS hits. Unlike big data algorithms, we approach the question in small scales in order to achieve interpretability and broad applicability. This will allow us to study disease mutations in a large number of complex traits with limited amount of data, such as data extracted from tables in literatures. Consequently, comprehensive comparison of function enrichment, effector cell types and target genes shared among all available diseases can be performed, which will create new forms of insights for the disease regulatory mechanism that are otherwise unattainable by existing approaches. We propose to apply our methods to all complex traits in GWAScatalog, and we will implement the methods in a software tool for the community to use.
|Effective start/end date||6/1/18 → 3/31/20|
- National Institutes of Health: $195,274.00
Genome-Wide Association Study