Statistical methods for improving reproducibility and utility of sequencing data

Project: Research project

Project Details


DESCRIPTION (provided by applicant): Sequencing-based assays have become the technology of choice for studying genome-wide protein-DNA interactions and chromatin states defined by histone modifications (both by ChIP-seq) as well as transcriptomes (RNA-seq). Despite their widespread use, many experimental and data-analytical challenges still must be overcome to reach reliable and reproducible biological interpretations of the data. The small sample size of each individual study further limits the power and reliability of data analyses. When replicate samples or similar samples from different studies are available, reproducibility across replicate samples informs us about the fidelity of the identification, and potentially it ca be used to detect reproducible signals that are too modest to be detected reliably in individual samples. We propose to develop a suite of new statistical methods that make use of the reproducibility information provided by the replicate samples to examine the quality of experiments, select reliable identifications, and optimize operational parameters in the experimental design. Aim 1 will develop statistical methods to assess the reproducibility of identifications and to select identifications by their reproducibility in several sequencing-based analyses. The reproducibility-based selection criterion complements the usual measure of significance on a single sample, but has the benefit of being comparable across data sets, platforms and different measures of significance. Aim 2 will develop a regression framework to assess how operational parameters in the experimental and data analytical procedures affect the reproducibility of ChIP-seq and RNA-seq experiments. It will allow one to characterize the simultaneous and independent effects of covariates on reproducibility of the assays and to compare reproducibility of protocols while controlling for potential confounding variables. Aim 3 will develop semi-parametric, rank-based meta-analysis methods for integrating RNA-seq-based transcriptome analyses from different sources. The proposed methods will take into account heterogeneity due to data sources, and they will incorporate the study goals in the meta-analysis.
Effective start/end date9/1/137/31/14


  • National Institute of General Medical Sciences: $271,736.00

Fingerprint Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.