Motivation: Next-generation sequencing (NGS) technologies have enabled whole-genome discovery and analysis of genetic variants in many species of interest. Individuals are often sequenced at low coverage for detecting novel variants, phasing haplotypes and inferring population structures. Although several tools have been developed for SNP and genotype calling in NGS data, haplotype phasing is often done separately on the called genotypes.Results: We propose a dynamic Bayesian Markov model (DBM) for simultaneous genotype calling and haplotype phasing in low-coverage NGS data of unrelated individuals. Our method is fully probabilistic that produces consistent inference of genotypes, haplotypes and recombination probabilities. Using data from the 1000 Genomes Project, we demonstrate that DBM not only yields more accurate results than some popular methods, but also provides novel characterization of haplotype structures at the individual level for visualization, interpretation and comparison in downstream analysis. DBM is a powerful and flexible tool that can be applied to many sequencing studies. Its statistical framework can also be extended to accommodate broader scopes of data.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics