Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm

Ha Minh Lam, Oliver Ratmann, Maciej F. Boni

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Identifying recombinant sequences in an era of large genomic databases is challenging as it requires an efficient algorithm to identify candidate recombinants and parents, as well as appropriate statistical methods to correct for the large number of comparisons performed. In 2007, a computation was introduced for an exact nonparametric mosaicism statistic that gave high-precision P values for putative recombinants. This exact computation meant that multiple-comparisons corrected P values also had high precision, which is crucial when performing millions or billions of tests in large databases. Here, we introduce an improvement to the algorithmic complexity of this computation from O(mn 3) to O(mn 2), where m and n are the numbers of recombination-informative sites in the candidate recombinant. This new computation allows for recombination analysis to be performed in alignments with thousands of polymorphic sites. Benchmark runs are presented on viral genome sequence alignments, new features are introduced, and applications outside recombination analysis are discussed.

Original languageEnglish (US)
Pages (from-to)247-251
Number of pages5
JournalMolecular biology and evolution
Volume35
Issue number1
DOIs
StatePublished - Jan 1 2018

Fingerprint

Genetic Recombination
recombination
sequence alignment
Databases
Benchmarking
Mosaicism
statistical analysis
statistics
Sequence Alignment
Viral Genome
Nonparametric Statistics
genome
genomics
testing
detection
comparison
alignment
analysis
genetic databases
test

All Science Journal Classification (ASJC) codes

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics

Cite this

@article{f82dd29fcb554317b5db7381f87d08d1,
title = "Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm",
abstract = "Identifying recombinant sequences in an era of large genomic databases is challenging as it requires an efficient algorithm to identify candidate recombinants and parents, as well as appropriate statistical methods to correct for the large number of comparisons performed. In 2007, a computation was introduced for an exact nonparametric mosaicism statistic that gave high-precision P values for putative recombinants. This exact computation meant that multiple-comparisons corrected P values also had high precision, which is crucial when performing millions or billions of tests in large databases. Here, we introduce an improvement to the algorithmic complexity of this computation from O(mn 3) to O(mn 2), where m and n are the numbers of recombination-informative sites in the candidate recombinant. This new computation allows for recombination analysis to be performed in alignments with thousands of polymorphic sites. Benchmark runs are presented on viral genome sequence alignments, new features are introduced, and applications outside recombination analysis are discussed.",
author = "Lam, {Ha Minh} and Oliver Ratmann and Boni, {Maciej F.}",
year = "2018",
month = "1",
day = "1",
doi = "10.1093/molbev/msx263",
language = "English (US)",
volume = "35",
pages = "247--251",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "1",

}

Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm. / Lam, Ha Minh; Ratmann, Oliver; Boni, Maciej F.

In: Molecular biology and evolution, Vol. 35, No. 1, 01.01.2018, p. 247-251.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm

AU - Lam, Ha Minh

AU - Ratmann, Oliver

AU - Boni, Maciej F.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Identifying recombinant sequences in an era of large genomic databases is challenging as it requires an efficient algorithm to identify candidate recombinants and parents, as well as appropriate statistical methods to correct for the large number of comparisons performed. In 2007, a computation was introduced for an exact nonparametric mosaicism statistic that gave high-precision P values for putative recombinants. This exact computation meant that multiple-comparisons corrected P values also had high precision, which is crucial when performing millions or billions of tests in large databases. Here, we introduce an improvement to the algorithmic complexity of this computation from O(mn 3) to O(mn 2), where m and n are the numbers of recombination-informative sites in the candidate recombinant. This new computation allows for recombination analysis to be performed in alignments with thousands of polymorphic sites. Benchmark runs are presented on viral genome sequence alignments, new features are introduced, and applications outside recombination analysis are discussed.

AB - Identifying recombinant sequences in an era of large genomic databases is challenging as it requires an efficient algorithm to identify candidate recombinants and parents, as well as appropriate statistical methods to correct for the large number of comparisons performed. In 2007, a computation was introduced for an exact nonparametric mosaicism statistic that gave high-precision P values for putative recombinants. This exact computation meant that multiple-comparisons corrected P values also had high precision, which is crucial when performing millions or billions of tests in large databases. Here, we introduce an improvement to the algorithmic complexity of this computation from O(mn 3) to O(mn 2), where m and n are the numbers of recombination-informative sites in the candidate recombinant. This new computation allows for recombination analysis to be performed in alignments with thousands of polymorphic sites. Benchmark runs are presented on viral genome sequence alignments, new features are introduced, and applications outside recombination analysis are discussed.

UR - http://www.scopus.com/inward/record.url?scp=85040551882&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040551882&partnerID=8YFLogxK

U2 - 10.1093/molbev/msx263

DO - 10.1093/molbev/msx263

M3 - Article

C2 - 29029186

AN - SCOPUS:85040551882

VL - 35

SP - 247

EP - 251

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 1

ER -