We develop techniques to estimate the statistical significance of gap-free alignments between two genomic DNA sequences, using human-mouse alignments as an example. The sequences are assumed to be sufficiently similar that some but not all of the neutrally evolving regions (i.e., those under no evolutionary constraint) can be reliably aligned. Our goal is to model the situation in which the neutral rate of evolution, and hence the extent of the aligning intervals, varies across the genome. In some cases, this permits the weaker of two matches to be judged as less likely to have arisen by chance, provided it lies in a genomic interval with a high level of background divergence. We employ a hidden Markov model to capture variations in divergence rates and assign probability values to gap-free alignments using techniques of Dembo and Karlin, which are related to those used for the same purpose by BLAST. Our methods are illustrated in detail using a 1.49 Mb genomic region. Results obtained from the analysis of human chromosome 22 using these techniques are also provided.
All Science Journal Classification (ASJC) codes
- Modeling and Simulation
- Molecular Biology
- Computational Mathematics
- Computational Theory and Mathematics