Alignment seeding strategies using contiguous pyrimidine purine matches

Minmei Hou, Louxin Zhang, Robert Scott Harris

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large-scale genomic pairwise aligners usually start with a seeding procedure, which scans two sequences to obtain base matches (called hits) that follow a certain pattern (called a seed). The seed pattern and size determine the sensitivity and specificity of the seeding procedure and greatly affect the alignment accuracy and computational efficiency. Much effort has been focused on obtaining an optimal (set of) spaced seed(s) to improve sensitivity. However, specificity also becomes a big concern when aligning very long genomic sequences. We present a seeding strategy that identifies contiguous pyrimidine purine (py · pu) matches. This model may improve sensitivity and specificity simultaneously compared to a contiguous base match model. We further present a seeding strategy that identifies contiguous py · pu matches with at least a certain number of contiguous base matches. This model significantly improves sensitivity and specificity simultaneously compared to the base match model. It can also achieve better sensitivity than an optimal spaced seed without loss of specificity, when the ratio of transition to transversion is high. Our examination on the CFTR region of 2M bases between human and mouse shows that this new model can have very high specificity without much loss of sensitivity compared to an optimal spaced seed. Based on the characteristics (e.g. the sequence similarity, the ratio between transition and transversion, and the lengths of gapless alignments) of alignments between human and other mammals, the new seeding strategies are promising in improving alignment quality of a wide selection of species pairs. This paper also lays the groundwork for future advancement of applying spaced patterns in these seeding strategies

Original languageEnglish (US)
Title of host publication2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
Pages384-391
Number of pages8
DOIs
StatePublished - Nov 26 2012
Event2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012 - Orlando, FL, United States
Duration: Oct 7 2012Oct 10 2012

Publication series

Name2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012

Other

Other2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
CountryUnited States
CityOrlando, FL
Period10/7/1210/10/12

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Health Information Management

Cite this

Hou, M., Zhang, L., & Harris, R. S. (2012). Alignment seeding strategies using contiguous pyrimidine purine matches. In 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012 (pp. 384-391). (2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012). https://doi.org/10.1145/2382936.2382985