Ab initio whole genome shotgun assembly with mated short reads

Paul Medvedev, Michael Brudno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs.

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 12th Annual International Conference, RECOMB 2008, Proceedings
Pages50-64
Number of pages15
DOIs
StatePublished - Jul 21 2008
Event"12th Annual InternationalConference on REsearch in COmputational Molecular Biology, RECOMB 2008" - Singapore, Singapore
Duration: Mar 30 2008Apr 2 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4955 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other"12th Annual InternationalConference on REsearch in COmputational Molecular Biology, RECOMB 2008"
CountrySingapore
CitySingapore
Period3/30/084/2/08

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Medvedev, P., & Brudno, M. (2008). Ab initio whole genome shotgun assembly with mated short reads. In Research in Computational Molecular Biology - 12th Annual International Conference, RECOMB 2008, Proceedings (pp. 50-64). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4955 LNBI). https://doi.org/10.1007/978-3-540-78839-3_5