Scoring pairwise genomic sequence alignments.

Francesca Chiaromonte, V. B. Yap, W. Miller

Research output: Contribution to journalArticle

124 Scopus citations

Abstract

The parameters by which alignments are scored can strongly affect sensitivity and specificity of alignment procedures. While appropriate parameter choices are well understood for protein alignments, much less is known for genomic DNA sequences. We describe a straightforward approach to scoring nucleotide substitutions in genomic sequence alignments, especially human-mouse comparisons. Scores are obtained from relative frequencies of aligned nucleotides observed in alignments of non-coding, non-repetitive genomic regions, and can be theoretically motivated through substitution models. Additional accuracy can be attained by down-weighting alignments characterized by low compositional complexity. We also describe an evaluation protocol that is relevant when alignments are intended to identify all and only the orthologous positions. One particular scoring matrix, called HOXD70, has proven to be generally effective for human-mouse comparisons, and has been used by the PipMaker server since July, 2000. We discuss but leave open the problem of effectively scoring regions of strongly biased nucleotide composition, such as low G + C content.

Original languageEnglish (US)
Pages (from-to)115-126
Number of pages12
JournalPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
StatePublished - Jan 1 2002

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Medicine(all)

Cite this