TY - JOUR
T1 - Safe and Complete Contig Assembly Through Omnitigs
AU - Tomescu, Alexandru I.
AU - Medvedev, Paul
N1 - Funding Information:
This work was supported, in part, by NSF awards DBI-1356529, IIS-1453527, and IIS-1421908 to P.M. and by Academy of Finland grant 274977 to A.I.T.
Publisher Copyright:
© 2017, Mary Ann Liebert, Inc.
PY - 2017/6
Y1 - 2017/6
N2 - Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs - a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question remains: given a genome graph G (e.g., a de Bruijn, or a string graph), what are all the strings that can be safely reported from G as contigs? In this article, we answer this question using a model in which the genome is a circular covering walk. We also give a polynomial-time algorithm to find such strings, which we call omnitigs. Our experiments show that omnitigs are 66%-82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.
AB - Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs - a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question remains: given a genome graph G (e.g., a de Bruijn, or a string graph), what are all the strings that can be safely reported from G as contigs? In this article, we answer this question using a model in which the genome is a circular covering walk. We also give a polynomial-time algorithm to find such strings, which we call omnitigs. Our experiments show that omnitigs are 66%-82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.
UR - http://www.scopus.com/inward/record.url?scp=85020414373&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020414373&partnerID=8YFLogxK
U2 - 10.1089/cmb.2016.0141
DO - 10.1089/cmb.2016.0141
M3 - Article
C2 - 27749096
AN - SCOPUS:85020414373
SN - 1066-5277
VL - 24
SP - 590
EP - 602
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 6
ER -