Optimal omnitig listing for safe and complete contig assembly

Massimo Cairo, Paul Medvedev, Nidia Obscura Acosta, Romeo Rizzi, Alexandru I. Tomescu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Genome assembly is the problem of reconstructing a genome sequence from a set of reads from a sequencing experiment. Typical formulations of the assembly problem admit in practice many genomic reconstructions, and actual genome assemblers usually output contigs, namely substrings that are promised to occur in the genome. To bridge the theory and practice, Tomescu and Medvedev [RECOMB 2016] reformulated contig assembly as finding all substrings common to all genomic reconstructions. They also gave a characterization of those walks (omnitigs) that are common to all closed edge-covering walks of a (directed) graph, a typical notion of genomic reconstruction. An algorithm for listing all maximal omnitigs was also proposed, by launching an exhaustive visit from every edge. In this paper, we prove new insights about the structure of omnitigs and solve several open questions about them. We combine these to achieve an O(nm)-time algorithm for outputting all the maximal omnitigs of a graph (with n nodes and m edges). This is also optimal, as we show families of graphs whose total omnitig length is Ω(nm). We implement this algorithm and show that it is 9-12 times faster in practice than the one of Tomescu and Medvedev [RECOMB 2016].

Original languageEnglish (US)
Title of host publication28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017
EditorsJakub Radoszewski, Juha Karkkainen, Jakub Radoszewski, Wojciech Rytter
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959770392
DOIs
StatePublished - Jul 1 2017
Event28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017 - Warsaw, Poland
Duration: Jul 4 2017Jul 6 2017

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume78
ISSN (Print)1868-8969

Other

Other28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017
CountryPoland
CityWarsaw
Period7/4/177/6/17

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint Dive into the research topics of 'Optimal omnitig listing for safe and complete contig assembly'. Together they form a unique fingerprint.

Cite this