Accurate assembly of multi-end RNA-seq data with Scallop2

Qimin Zhang, Qian Shi, Mingfu Shao

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Modern RNA-sequencing (RNA-seq) protocols can produce multi-end data, where multiple reads originating from the same transcript are attached to the same barcode. The long-range information in the multi-end reads is beneficial in phasing complicated spliced isoforms, but assembly algorithms that leverage such information are lacking. Here we introduce Scallop2, a reference-based assembler optimized for multi-end RNA-seq data. The algorithmic core of Scallop2 consists of three steps: (1) using an algorithm to ‘bridge’ multi-end reads into single-end phasing paths in the context of a splice graph, (2) employing a method to refine erroneous splice graphs by utilizing multi-end reads that fail to bridge and (3) piping the refined splice graph and bridged phasing paths into an algorithm that integrates multiple phase-preserving decompositions. Tested on 561 cells in two Smart-seq3 datasets and on ten Illumina paired-end RNA-seq samples, Scallop2 substantially improves the assembly accuracy compared with two popular assemblers (StringTie2 and Scallop).

Original languageEnglish (US)
Pages (from-to)148-152
Number of pages5
JournalNature Computational Science
Volume2
Issue number3
DOIs
StatePublished - Mar 2022

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Accurate assembly of multi-end RNA-seq data with Scallop2'. Together they form a unique fingerprint.

Cite this