Alignathon: A competitive assessment of whole-genome alignment methods

Dent Earl, Ngan Nguyen, Glenn Hickey, Robert S. Harris, Stephen Fitzgerald, Kathryn Beal, Igor Seledtsov, Vladimir Molodtsov, Brian J. Raney, Hiram Clawson, Jaebum Kim, Carsten Kemena, Jia Ming Chang, Ionas Erb, Alexander Poliakov, Minmei Hou, Javier Herrero, William James Kent, Victor Solovyev, Aaron E. DarlingJian Ma, Cedric Notredame, Michael Brudno, Inna Dubchak, David Haussler, Benedict Paten

Research output: Contribution to journalArticle

36 Citations (Scopus)

Abstract

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.

Original languageEnglish (US)
Pages (from-to)2077-2089
Number of pages13
JournalGenome research
Volume24
Issue number12
DOIs
StatePublished - Dec 1 2014

Fingerprint

Benchmarking
Sequence Alignment
Genome
Phylogeny
Diptera
Primates
Datasets
Proteins

All Science Journal Classification (ASJC) codes

  • Genetics
  • Genetics(clinical)

Cite this

Earl, D., Nguyen, N., Hickey, G., Harris, R. S., Fitzgerald, S., Beal, K., ... Paten, B. (2014). Alignathon: A competitive assessment of whole-genome alignment methods. Genome research, 24(12), 2077-2089. https://doi.org/10.1101/gr.174920.114
Earl, Dent ; Nguyen, Ngan ; Hickey, Glenn ; Harris, Robert S. ; Fitzgerald, Stephen ; Beal, Kathryn ; Seledtsov, Igor ; Molodtsov, Vladimir ; Raney, Brian J. ; Clawson, Hiram ; Kim, Jaebum ; Kemena, Carsten ; Chang, Jia Ming ; Erb, Ionas ; Poliakov, Alexander ; Hou, Minmei ; Herrero, Javier ; Kent, William James ; Solovyev, Victor ; Darling, Aaron E. ; Ma, Jian ; Notredame, Cedric ; Brudno, Michael ; Dubchak, Inna ; Haussler, David ; Paten, Benedict. / Alignathon : A competitive assessment of whole-genome alignment methods. In: Genome research. 2014 ; Vol. 24, No. 12. pp. 2077-2089.
@article{1691ffc5d56b491787ed39b1f6b87e3f,
title = "Alignathon: A competitive assessment of whole-genome alignment methods",
abstract = "Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.",
author = "Dent Earl and Ngan Nguyen and Glenn Hickey and Harris, {Robert S.} and Stephen Fitzgerald and Kathryn Beal and Igor Seledtsov and Vladimir Molodtsov and Raney, {Brian J.} and Hiram Clawson and Jaebum Kim and Carsten Kemena and Chang, {Jia Ming} and Ionas Erb and Alexander Poliakov and Minmei Hou and Javier Herrero and Kent, {William James} and Victor Solovyev and Darling, {Aaron E.} and Jian Ma and Cedric Notredame and Michael Brudno and Inna Dubchak and David Haussler and Benedict Paten",
year = "2014",
month = "12",
day = "1",
doi = "10.1101/gr.174920.114",
language = "English (US)",
volume = "24",
pages = "2077--2089",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "12",

}

Earl, D, Nguyen, N, Hickey, G, Harris, RS, Fitzgerald, S, Beal, K, Seledtsov, I, Molodtsov, V, Raney, BJ, Clawson, H, Kim, J, Kemena, C, Chang, JM, Erb, I, Poliakov, A, Hou, M, Herrero, J, Kent, WJ, Solovyev, V, Darling, AE, Ma, J, Notredame, C, Brudno, M, Dubchak, I, Haussler, D & Paten, B 2014, 'Alignathon: A competitive assessment of whole-genome alignment methods', Genome research, vol. 24, no. 12, pp. 2077-2089. https://doi.org/10.1101/gr.174920.114

Alignathon : A competitive assessment of whole-genome alignment methods. / Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Harris, Robert S.; Fitzgerald, Stephen; Beal, Kathryn; Seledtsov, Igor; Molodtsov, Vladimir; Raney, Brian J.; Clawson, Hiram; Kim, Jaebum; Kemena, Carsten; Chang, Jia Ming; Erb, Ionas; Poliakov, Alexander; Hou, Minmei; Herrero, Javier; Kent, William James; Solovyev, Victor; Darling, Aaron E.; Ma, Jian; Notredame, Cedric; Brudno, Michael; Dubchak, Inna; Haussler, David; Paten, Benedict.

In: Genome research, Vol. 24, No. 12, 01.12.2014, p. 2077-2089.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Alignathon

T2 - A competitive assessment of whole-genome alignment methods

AU - Earl, Dent

AU - Nguyen, Ngan

AU - Hickey, Glenn

AU - Harris, Robert S.

AU - Fitzgerald, Stephen

AU - Beal, Kathryn

AU - Seledtsov, Igor

AU - Molodtsov, Vladimir

AU - Raney, Brian J.

AU - Clawson, Hiram

AU - Kim, Jaebum

AU - Kemena, Carsten

AU - Chang, Jia Ming

AU - Erb, Ionas

AU - Poliakov, Alexander

AU - Hou, Minmei

AU - Herrero, Javier

AU - Kent, William James

AU - Solovyev, Victor

AU - Darling, Aaron E.

AU - Ma, Jian

AU - Notredame, Cedric

AU - Brudno, Michael

AU - Dubchak, Inna

AU - Haussler, David

AU - Paten, Benedict

PY - 2014/12/1

Y1 - 2014/12/1

N2 - Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.

AB - Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.

UR - http://www.scopus.com/inward/record.url?scp=84913533708&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84913533708&partnerID=8YFLogxK

U2 - 10.1101/gr.174920.114

DO - 10.1101/gr.174920.114

M3 - Article

C2 - 25273068

AN - SCOPUS:84913533708

VL - 24

SP - 2077

EP - 2089

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 12

ER -

Earl D, Nguyen N, Hickey G, Harris RS, Fitzgerald S, Beal K et al. Alignathon: A competitive assessment of whole-genome alignment methods. Genome research. 2014 Dec 1;24(12):2077-2089. https://doi.org/10.1101/gr.174920.114