Project Summary: An essential problem in molecular biology is to understand how proteins and DNA interact to regulate gene expression and influence phenotypes. With advanced sequencing technologies, massive amount of genetic, epigenetic, and genomic data sets have been quickly generated. Exploiting the hundreds of genome-wide data sets across many samples provides us with an unprecedented opportunity to study the interplays among regulatory marks and their impacts on gene expression. By comparing genome-wide features across samples, key regulators functioning in specific cell types can be identified with substantial power and resolution. New hypotheses for the mechanisms of gene regulation during cell differentiation can be derived and tested, which will then illuminate previously intractable issues in the genetics of disease susceptibility. While numerous computational endeavors have been conducted to study epigenetic dynamics and pinpoint their locations, there has been a lack of unified and powerful framework to analyze multiple genomes jointly in a way that accounts for both position and cell type specificity of epigenetic events. We recently introduced a new Bayesian method called IDEAS (integrative and discriminative epigenome annotation system) that satisfactorily addressed this need, and using independent experimental data we have demonstrated its superior performance over existing state-of-the-art algorithms. In this project, we aim to substantially expand the scope and applicability of the IDEAS method, and to develop a powerful software tool for public use. In particular, we propose to 1) segment genomes with missing tracks without data imputation and integrate results between studies; 2) model covariate effects and detect epigenomic association; 3) infer fine-grained local cell type relationships; and 4) integrate chromatin conformation data to improve segmentation. In collaboration with Dr. Hardison (co-I), we will further evaluate the accuracy of a subset of our predictions experimentally. The success of this project will benefit method development, generate new resources, and importantly, advance our capability in large-scale data integration towards understanding the roles of (epi)genetics in gene regulation and complex disease.
|Effective start/end date||8/1/17 → 7/31/21|
- National Institute of General Medical Sciences: $342,350.00
- National Institute of General Medical Sciences: $342,685.00