HECTOR: A parallel multistage homopolymer spectrum based error corrector for 454 sequencing data

Adrianto Wirawan, Robert Scott Harris, Yongchao Liu, Bertil Schmidt, Jan Schröder

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Background: Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopolymer-length errors in the 454 sequencing reads.Results: We present HECTOR, a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In this algorithm, for the first time we have investigated a novel homopolymer spectrum based approach to handle homopolymer insertions or deletions, which are the dominant sequencing errors in 454 pyrosequencing reads. We have evaluated the performance of HECTOR, in terms of correction quality, runtime and parallel scalability, using both simulated and real pyrosequencing datasets. This performance has been further compared to that of Coral, a state-of-the-art error corrector which is based on multiple sequence alignment and Acacia, a recently published error corrector for amplicon pyrosequences. Our evaluations reveal that HECTOR demonstrates comparable correction quality to Coral, but runs 3.7× faster on average. In addition, HECTOR performs well even when the coverage of the dataset is low.Conclusion: Our homopolymer spectrum based approach is theoretically capable of processing arbitrary-length homopolymer-length errors, with a linear time complexity. HECTOR employs a multi-threaded design based on a master-slave computing model. Our experimental results show that HECTOR is a practical 454 pyrosequencing read error corrector which is competitive in terms of both correction quality and speed. The source code and all simulated data are available at: http://hector454.sourceforge.net.

Original languageEnglish (US)
Article number131
JournalBMC bioinformatics
Volume15
Issue number1
DOIs
StatePublished - May 6 2014

Fingerprint

Anthozoa
Corrector
Homopolymerization
Sequencing
Acacia
Sequence Alignment
Technology
Costs and Cost Analysis
Multiple Sequence Alignment
Datasets
Linear Complexity
Error correction
Error Correction
Imperfect
Deletion
High Throughput
Time Complexity
Insertion
Scalability
Linear Time

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Wirawan, Adrianto ; Harris, Robert Scott ; Liu, Yongchao ; Schmidt, Bertil ; Schröder, Jan. / HECTOR : A parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In: BMC bioinformatics. 2014 ; Vol. 15, No. 1.
@article{a162bfe743894db9bfc967d9a14a8fd8,
title = "HECTOR: A parallel multistage homopolymer spectrum based error corrector for 454 sequencing data",
abstract = "Background: Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopolymer-length errors in the 454 sequencing reads.Results: We present HECTOR, a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In this algorithm, for the first time we have investigated a novel homopolymer spectrum based approach to handle homopolymer insertions or deletions, which are the dominant sequencing errors in 454 pyrosequencing reads. We have evaluated the performance of HECTOR, in terms of correction quality, runtime and parallel scalability, using both simulated and real pyrosequencing datasets. This performance has been further compared to that of Coral, a state-of-the-art error corrector which is based on multiple sequence alignment and Acacia, a recently published error corrector for amplicon pyrosequences. Our evaluations reveal that HECTOR demonstrates comparable correction quality to Coral, but runs 3.7× faster on average. In addition, HECTOR performs well even when the coverage of the dataset is low.Conclusion: Our homopolymer spectrum based approach is theoretically capable of processing arbitrary-length homopolymer-length errors, with a linear time complexity. HECTOR employs a multi-threaded design based on a master-slave computing model. Our experimental results show that HECTOR is a practical 454 pyrosequencing read error corrector which is competitive in terms of both correction quality and speed. The source code and all simulated data are available at: http://hector454.sourceforge.net.",
author = "Adrianto Wirawan and Harris, {Robert Scott} and Yongchao Liu and Bertil Schmidt and Jan Schr{\"o}der",
year = "2014",
month = "5",
day = "6",
doi = "10.1186/1471-2105-15-131",
language = "English (US)",
volume = "15",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

HECTOR : A parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. / Wirawan, Adrianto; Harris, Robert Scott; Liu, Yongchao; Schmidt, Bertil; Schröder, Jan.

In: BMC bioinformatics, Vol. 15, No. 1, 131, 06.05.2014.

Research output: Contribution to journalArticle

TY - JOUR

T1 - HECTOR

T2 - A parallel multistage homopolymer spectrum based error corrector for 454 sequencing data

AU - Wirawan, Adrianto

AU - Harris, Robert Scott

AU - Liu, Yongchao

AU - Schmidt, Bertil

AU - Schröder, Jan

PY - 2014/5/6

Y1 - 2014/5/6

N2 - Background: Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopolymer-length errors in the 454 sequencing reads.Results: We present HECTOR, a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In this algorithm, for the first time we have investigated a novel homopolymer spectrum based approach to handle homopolymer insertions or deletions, which are the dominant sequencing errors in 454 pyrosequencing reads. We have evaluated the performance of HECTOR, in terms of correction quality, runtime and parallel scalability, using both simulated and real pyrosequencing datasets. This performance has been further compared to that of Coral, a state-of-the-art error corrector which is based on multiple sequence alignment and Acacia, a recently published error corrector for amplicon pyrosequences. Our evaluations reveal that HECTOR demonstrates comparable correction quality to Coral, but runs 3.7× faster on average. In addition, HECTOR performs well even when the coverage of the dataset is low.Conclusion: Our homopolymer spectrum based approach is theoretically capable of processing arbitrary-length homopolymer-length errors, with a linear time complexity. HECTOR employs a multi-threaded design based on a master-slave computing model. Our experimental results show that HECTOR is a practical 454 pyrosequencing read error corrector which is competitive in terms of both correction quality and speed. The source code and all simulated data are available at: http://hector454.sourceforge.net.

AB - Background: Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopolymer-length errors in the 454 sequencing reads.Results: We present HECTOR, a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In this algorithm, for the first time we have investigated a novel homopolymer spectrum based approach to handle homopolymer insertions or deletions, which are the dominant sequencing errors in 454 pyrosequencing reads. We have evaluated the performance of HECTOR, in terms of correction quality, runtime and parallel scalability, using both simulated and real pyrosequencing datasets. This performance has been further compared to that of Coral, a state-of-the-art error corrector which is based on multiple sequence alignment and Acacia, a recently published error corrector for amplicon pyrosequences. Our evaluations reveal that HECTOR demonstrates comparable correction quality to Coral, but runs 3.7× faster on average. In addition, HECTOR performs well even when the coverage of the dataset is low.Conclusion: Our homopolymer spectrum based approach is theoretically capable of processing arbitrary-length homopolymer-length errors, with a linear time complexity. HECTOR employs a multi-threaded design based on a master-slave computing model. Our experimental results show that HECTOR is a practical 454 pyrosequencing read error corrector which is competitive in terms of both correction quality and speed. The source code and all simulated data are available at: http://hector454.sourceforge.net.

UR - http://www.scopus.com/inward/record.url?scp=84900865525&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84900865525&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-15-131

DO - 10.1186/1471-2105-15-131

M3 - Article

C2 - 24885381

AN - SCOPUS:84900865525

VL - 15

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 131

ER -