Recent Developments in Linear-Space Alignment Methods

A Survey

Kun Mao Chao, Ross Cameron Hardison, Webb Miller

Research output: Contribution to journalArticle

40 Citations (Scopus)

Abstract

A dynamic-programming strategy for sequence alignment first proposed in 1975 by Dan Hirschberg can be adapted to yield a number of extremely space-efficient algorithms. Specifically, these algorithms align two sequences using only “linear space,” i.e., an amount of computer memory that is proportional to the sum of the lengths of the two sequences being aligned. This paper begins by reviewing the basic idea, as it applies to the global (i.e., end-to-end) alignment of two DNA or protein sequences. Three of our recent extensions of the technique are then outlined. The first extension computes an optimal alignment subject to the constraint that each position, i, of the first sequence must be aligned somewhere between positions L[i] and U[i] of the second sequence, for given values of L and U. The second finds all aligned position pairs (i.e., potential columns of the alignment) that occur in an alignment whose score exceeds a given threshold. The third treats the case where each of the two sequences is allowed to be an alignment (e.g., a sequence of aligned pairs), using a sensitive scoring scheme. We also describe two linear-space methods for computing k best local (i.e., involving only a part of each sequence) alignments, where k ≥ 1. One is a linear-space version of the algorithm of Waterman and Eggert (1987), and the other is based on the strategy proposed by Wilbur and Lipman (1983). Finally, we describe programs that implement various combinations of these techniques to provide a multisequence alignment method that is especially suited to handling a few very long sequences. The utility of these programs is illustrated by analysis of the locus control region of the β-like globin gene cluster of several mammals.

Original languageEnglish (US)
Pages (from-to)271-291
Number of pages21
JournalJournal of Computational Biology
Volume1
Issue number4
DOIs
StatePublished - Jan 1 1994

Fingerprint

Linear Space
Alignment
Sequence Alignment
Locus Control Region
Globins
Multigene Family
Mammals
Multisequences
DNA
Protein Sequence
Surveys and Questionnaires
Scoring
DNA Sequence
Proteins
Dynamic Programming
Locus
Dynamic programming
Exceed
Efficient Algorithms
Directly proportional

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Cite this

@article{59464357f27c423e81e4df3814f2cbef,
title = "Recent Developments in Linear-Space Alignment Methods: A Survey",
abstract = "A dynamic-programming strategy for sequence alignment first proposed in 1975 by Dan Hirschberg can be adapted to yield a number of extremely space-efficient algorithms. Specifically, these algorithms align two sequences using only “linear space,” i.e., an amount of computer memory that is proportional to the sum of the lengths of the two sequences being aligned. This paper begins by reviewing the basic idea, as it applies to the global (i.e., end-to-end) alignment of two DNA or protein sequences. Three of our recent extensions of the technique are then outlined. The first extension computes an optimal alignment subject to the constraint that each position, i, of the first sequence must be aligned somewhere between positions L[i] and U[i] of the second sequence, for given values of L and U. The second finds all aligned position pairs (i.e., potential columns of the alignment) that occur in an alignment whose score exceeds a given threshold. The third treats the case where each of the two sequences is allowed to be an alignment (e.g., a sequence of aligned pairs), using a sensitive scoring scheme. We also describe two linear-space methods for computing k best local (i.e., involving only a part of each sequence) alignments, where k ≥ 1. One is a linear-space version of the algorithm of Waterman and Eggert (1987), and the other is based on the strategy proposed by Wilbur and Lipman (1983). Finally, we describe programs that implement various combinations of these techniques to provide a multisequence alignment method that is especially suited to handling a few very long sequences. The utility of these programs is illustrated by analysis of the locus control region of the β-like globin gene cluster of several mammals.",
author = "Chao, {Kun Mao} and Hardison, {Ross Cameron} and Webb Miller",
year = "1994",
month = "1",
day = "1",
doi = "10.1089/cmb.1994.1.271",
language = "English (US)",
volume = "1",
pages = "271--291",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "4",

}

Recent Developments in Linear-Space Alignment Methods : A Survey. / Chao, Kun Mao; Hardison, Ross Cameron; Miller, Webb.

In: Journal of Computational Biology, Vol. 1, No. 4, 01.01.1994, p. 271-291.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Recent Developments in Linear-Space Alignment Methods

T2 - A Survey

AU - Chao, Kun Mao

AU - Hardison, Ross Cameron

AU - Miller, Webb

PY - 1994/1/1

Y1 - 1994/1/1

N2 - A dynamic-programming strategy for sequence alignment first proposed in 1975 by Dan Hirschberg can be adapted to yield a number of extremely space-efficient algorithms. Specifically, these algorithms align two sequences using only “linear space,” i.e., an amount of computer memory that is proportional to the sum of the lengths of the two sequences being aligned. This paper begins by reviewing the basic idea, as it applies to the global (i.e., end-to-end) alignment of two DNA or protein sequences. Three of our recent extensions of the technique are then outlined. The first extension computes an optimal alignment subject to the constraint that each position, i, of the first sequence must be aligned somewhere between positions L[i] and U[i] of the second sequence, for given values of L and U. The second finds all aligned position pairs (i.e., potential columns of the alignment) that occur in an alignment whose score exceeds a given threshold. The third treats the case where each of the two sequences is allowed to be an alignment (e.g., a sequence of aligned pairs), using a sensitive scoring scheme. We also describe two linear-space methods for computing k best local (i.e., involving only a part of each sequence) alignments, where k ≥ 1. One is a linear-space version of the algorithm of Waterman and Eggert (1987), and the other is based on the strategy proposed by Wilbur and Lipman (1983). Finally, we describe programs that implement various combinations of these techniques to provide a multisequence alignment method that is especially suited to handling a few very long sequences. The utility of these programs is illustrated by analysis of the locus control region of the β-like globin gene cluster of several mammals.

AB - A dynamic-programming strategy for sequence alignment first proposed in 1975 by Dan Hirschberg can be adapted to yield a number of extremely space-efficient algorithms. Specifically, these algorithms align two sequences using only “linear space,” i.e., an amount of computer memory that is proportional to the sum of the lengths of the two sequences being aligned. This paper begins by reviewing the basic idea, as it applies to the global (i.e., end-to-end) alignment of two DNA or protein sequences. Three of our recent extensions of the technique are then outlined. The first extension computes an optimal alignment subject to the constraint that each position, i, of the first sequence must be aligned somewhere between positions L[i] and U[i] of the second sequence, for given values of L and U. The second finds all aligned position pairs (i.e., potential columns of the alignment) that occur in an alignment whose score exceeds a given threshold. The third treats the case where each of the two sequences is allowed to be an alignment (e.g., a sequence of aligned pairs), using a sensitive scoring scheme. We also describe two linear-space methods for computing k best local (i.e., involving only a part of each sequence) alignments, where k ≥ 1. One is a linear-space version of the algorithm of Waterman and Eggert (1987), and the other is based on the strategy proposed by Wilbur and Lipman (1983). Finally, we describe programs that implement various combinations of these techniques to provide a multisequence alignment method that is especially suited to handling a few very long sequences. The utility of these programs is illustrated by analysis of the locus control region of the β-like globin gene cluster of several mammals.

UR - http://www.scopus.com/inward/record.url?scp=0028679375&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028679375&partnerID=8YFLogxK

U2 - 10.1089/cmb.1994.1.271

DO - 10.1089/cmb.1994.1.271

M3 - Article

VL - 1

SP - 271

EP - 291

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 4

ER -