SAM: String-based sequence search algorithm for mitochondrial DNA database queries

Alexander Röck, Jodi Irwin, Arne Dür, Thomas Parsons, Walther Parson

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org).

Original languageEnglish (US)
Pages (from-to)126-132
Number of pages7
JournalForensic Science International: Genetics
Volume5
Issue number2
DOIs
StatePublished - Mar 1 2011

Fingerprint

Nucleic Acid Databases
Mitochondrial DNA
Databases
Haplotypes
Software
Nucleotides
Forensic Genetics
Mitochondrial Genome
Manuscripts
Sequence Alignment
Haploidy
Population Genetics
Mutation Rate

All Science Journal Classification (ASJC) codes

  • Pathology and Forensic Medicine
  • Genetics

Cite this

Röck, Alexander ; Irwin, Jodi ; Dür, Arne ; Parsons, Thomas ; Parson, Walther. / SAM : String-based sequence search algorithm for mitochondrial DNA database queries. In: Forensic Science International: Genetics. 2011 ; Vol. 5, No. 2. pp. 126-132.
@article{1fcb0664f0064828aab3e287e9896f03,
title = "SAM: String-based sequence search algorithm for mitochondrial DNA database queries",
abstract = "The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org).",
author = "Alexander R{\"o}ck and Jodi Irwin and Arne D{\"u}r and Thomas Parsons and Walther Parson",
year = "2011",
month = "3",
day = "1",
doi = "10.1016/j.fsigen.2010.10.006",
language = "English (US)",
volume = "5",
pages = "126--132",
journal = "Forensic Science International: Genetics",
issn = "1872-4973",
publisher = "Elsevier Ireland Ltd",
number = "2",

}

SAM : String-based sequence search algorithm for mitochondrial DNA database queries. / Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther.

In: Forensic Science International: Genetics, Vol. 5, No. 2, 01.03.2011, p. 126-132.

Research output: Contribution to journalArticle

TY - JOUR

T1 - SAM

T2 - String-based sequence search algorithm for mitochondrial DNA database queries

AU - Röck, Alexander

AU - Irwin, Jodi

AU - Dür, Arne

AU - Parsons, Thomas

AU - Parson, Walther

PY - 2011/3/1

Y1 - 2011/3/1

N2 - The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org).

AB - The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org).

UR - http://www.scopus.com/inward/record.url?scp=79952708826&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952708826&partnerID=8YFLogxK

U2 - 10.1016/j.fsigen.2010.10.006

DO - 10.1016/j.fsigen.2010.10.006

M3 - Article

C2 - 21056022

AN - SCOPUS:79952708826

VL - 5

SP - 126

EP - 132

JO - Forensic Science International: Genetics

JF - Forensic Science International: Genetics

SN - 1872-4973

IS - 2

ER -