Comparison of next generation sequencing technologies for transcriptome characterization

P. Kerr Wall, Jim Leebens-Mack, André S. Chanderbali, Abdelali Barakat, Erik Wolcott, Haiying Liang, Lena Landherr Sheaffer, Lynn P. Tomsho, Yi Hu, John Edward Carlson, Hong Ma, Stephan C. Schuster, Douglas E. Soltis, Pamela S. Soltis, Naomi S. Altman, Claude Walker Depamphilis

Research output: Contribution to journalArticle

156 Citations (Scopus)

Abstract

Background: We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results: The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion: NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.

Original languageEnglish (US)
Article number347
JournalBMC genomics
Volume10
DOIs
StatePublished - Aug 1 2009

Fingerprint

Eschscholzia
Transcriptome
Arabidopsis
Persea
Untranslated Regions
Complementary DNA
Technology
Costs and Cost Analysis
Introns
Exons
Expressed Sequence Tags
Genes
Clone Cells
Genome
Gene Expression

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Genetics

Cite this

Wall, P. K., Leebens-Mack, J., Chanderbali, A. S., Barakat, A., Wolcott, E., Liang, H., ... Depamphilis, C. W. (2009). Comparison of next generation sequencing technologies for transcriptome characterization. BMC genomics, 10, [347]. https://doi.org/10.1186/1471-2164-10-347
Wall, P. Kerr ; Leebens-Mack, Jim ; Chanderbali, André S. ; Barakat, Abdelali ; Wolcott, Erik ; Liang, Haiying ; Sheaffer, Lena Landherr ; Tomsho, Lynn P. ; Hu, Yi ; Carlson, John Edward ; Ma, Hong ; Schuster, Stephan C. ; Soltis, Douglas E. ; Soltis, Pamela S. ; Altman, Naomi S. ; Depamphilis, Claude Walker. / Comparison of next generation sequencing technologies for transcriptome characterization. In: BMC genomics. 2009 ; Vol. 10.
@article{4a8f4af8dd354ce686167c96b55ac158,
title = "Comparison of next generation sequencing technologies for transcriptome characterization",
abstract = "Background: We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with {"}Next Generation{"} (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results: The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7{\%}) mapped exactly to known exons, while 1,117 (0.8{\%}) mapped to introns, 11,524 (8.6{\%}) spanned annotated intron/exon boundaries, and 3,066 (2.3{\%}) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion: NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.",
author = "Wall, {P. Kerr} and Jim Leebens-Mack and Chanderbali, {Andr{\'e} S.} and Abdelali Barakat and Erik Wolcott and Haiying Liang and Sheaffer, {Lena Landherr} and Tomsho, {Lynn P.} and Yi Hu and Carlson, {John Edward} and Hong Ma and Schuster, {Stephan C.} and Soltis, {Douglas E.} and Soltis, {Pamela S.} and Altman, {Naomi S.} and Depamphilis, {Claude Walker}",
year = "2009",
month = "8",
day = "1",
doi = "10.1186/1471-2164-10-347",
language = "English (US)",
volume = "10",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

Wall, PK, Leebens-Mack, J, Chanderbali, AS, Barakat, A, Wolcott, E, Liang, H, Sheaffer, LL, Tomsho, LP, Hu, Y, Carlson, JE, Ma, H, Schuster, SC, Soltis, DE, Soltis, PS, Altman, NS & Depamphilis, CW 2009, 'Comparison of next generation sequencing technologies for transcriptome characterization', BMC genomics, vol. 10, 347. https://doi.org/10.1186/1471-2164-10-347

Comparison of next generation sequencing technologies for transcriptome characterization. / Wall, P. Kerr; Leebens-Mack, Jim; Chanderbali, André S.; Barakat, Abdelali; Wolcott, Erik; Liang, Haiying; Sheaffer, Lena Landherr; Tomsho, Lynn P.; Hu, Yi; Carlson, John Edward; Ma, Hong; Schuster, Stephan C.; Soltis, Douglas E.; Soltis, Pamela S.; Altman, Naomi S.; Depamphilis, Claude Walker.

In: BMC genomics, Vol. 10, 347, 01.08.2009.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Comparison of next generation sequencing technologies for transcriptome characterization

AU - Wall, P. Kerr

AU - Leebens-Mack, Jim

AU - Chanderbali, André S.

AU - Barakat, Abdelali

AU - Wolcott, Erik

AU - Liang, Haiying

AU - Sheaffer, Lena Landherr

AU - Tomsho, Lynn P.

AU - Hu, Yi

AU - Carlson, John Edward

AU - Ma, Hong

AU - Schuster, Stephan C.

AU - Soltis, Douglas E.

AU - Soltis, Pamela S.

AU - Altman, Naomi S.

AU - Depamphilis, Claude Walker

PY - 2009/8/1

Y1 - 2009/8/1

N2 - Background: We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results: The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion: NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.

AB - Background: We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results: The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion: NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.

UR - http://www.scopus.com/inward/record.url?scp=69449099392&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69449099392&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-10-347

DO - 10.1186/1471-2164-10-347

M3 - Article

C2 - 19646272

AN - SCOPUS:69449099392

VL - 10

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - 347

ER -

Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang H et al. Comparison of next generation sequencing technologies for transcriptome characterization. BMC genomics. 2009 Aug 1;10. 347. https://doi.org/10.1186/1471-2164-10-347