Scoring pairwise genomic sequence alignments.

F. Chiaromonte, V. B. Yap, W. Miller

Research output: Contribution to journalArticlepeer-review

128 Scopus citations

Abstract

The parameters by which alignments are scored can strongly affect sensitivity and specificity of alignment procedures. While appropriate parameter choices are well understood for protein alignments, much less is known for genomic DNA sequences. We describe a straightforward approach to scoring nucleotide substitutions in genomic sequence alignments, especially human-mouse comparisons. Scores are obtained from relative frequencies of aligned nucleotides observed in alignments of non-coding, non-repetitive genomic regions, and can be theoretically motivated through substitution models. Additional accuracy can be attained by down-weighting alignments characterized by low compositional complexity. We also describe an evaluation protocol that is relevant when alignments are intended to identify all and only the orthologous positions. One particular scoring matrix, called HOXD70, has proven to be generally effective for human-mouse comparisons, and has been used by the PipMaker server since July, 2000. We discuss but leave open the problem of effectively scoring regions of strongly biased nucleotide composition, such as low G + C content.

Original languageEnglish (US)
Pages (from-to)115-126
Number of pages12
JournalPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
StatePublished - 2002

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Scoring pairwise genomic sequence alignments.'. Together they form a unique fingerprint.

Cite this