TY - JOUR
T1 - Taxonomic Classification of Bacterial 16S rRNA Genes Using Short Sequencing Reads
T2 - Evaluation of Effective Study Designs
AU - Mizrahi-Man, Orna
AU - Davenport, Emily R.
AU - Gilad, Yoav
N1 - Funding Information:
We thank members of the Gilad lab for helpful discussions and C. Ober and Z. Gauhar for comments on the manuscript. This work was supported by NIH grant HL092206 to YG, and by a pilot and feasibility DDRCC grant to YG, funded by P30DK42086.
PY - 2013/1/15
Y1 - 2013/1/15
N2 - Massively parallel high throughput sequencing technologies allow us to interrogate the microbial composition of biological samples at unprecedented resolution. The typical approach is to perform high-throughout sequencing of 16S rRNA genes, which are then taxonomically classified based on similarity to known sequences in existing databases. Current technologies cause a predicament though, because although they enable deep coverage of samples, they are limited in the length of sequence they can produce. As a result, high-throughout studies of microbial communities often do not sequence the entire 16S rRNA gene. The challenge is to obtain reliable representation of bacterial communities through taxonomic classification of short 16S rRNA gene sequences. In this study we explored properties of different study designs and developed specific recommendations for effective use of short-read sequencing technologies for the purpose of interrogating bacterial communities, with a focus on classification using naïve Bayesian classifiers. To assess precision and coverage of each design, we used a collection of ~8,500 manually curated 16S rRNA gene sequences from cultured bacteria and a set of over one million bacterial 16S rRNA gene sequences retrieved from environmental samples, respectively. We also tested different configurations of taxonomic classification approaches using short read sequencing data, and provide recommendations for optimal choice of the relevant parameters. We conclude that with a judicious selection of the sequenced region and the corresponding choice of a suitable training set for taxonomic classification, it is possible to explore bacterial communities at great depth using current technologies, with only a minimal loss of taxonomic resolution.
AB - Massively parallel high throughput sequencing technologies allow us to interrogate the microbial composition of biological samples at unprecedented resolution. The typical approach is to perform high-throughout sequencing of 16S rRNA genes, which are then taxonomically classified based on similarity to known sequences in existing databases. Current technologies cause a predicament though, because although they enable deep coverage of samples, they are limited in the length of sequence they can produce. As a result, high-throughout studies of microbial communities often do not sequence the entire 16S rRNA gene. The challenge is to obtain reliable representation of bacterial communities through taxonomic classification of short 16S rRNA gene sequences. In this study we explored properties of different study designs and developed specific recommendations for effective use of short-read sequencing technologies for the purpose of interrogating bacterial communities, with a focus on classification using naïve Bayesian classifiers. To assess precision and coverage of each design, we used a collection of ~8,500 manually curated 16S rRNA gene sequences from cultured bacteria and a set of over one million bacterial 16S rRNA gene sequences retrieved from environmental samples, respectively. We also tested different configurations of taxonomic classification approaches using short read sequencing data, and provide recommendations for optimal choice of the relevant parameters. We conclude that with a judicious selection of the sequenced region and the corresponding choice of a suitable training set for taxonomic classification, it is possible to explore bacterial communities at great depth using current technologies, with only a minimal loss of taxonomic resolution.
UR - http://www.scopus.com/inward/record.url?scp=84872132821&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84872132821&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0053608
DO - 10.1371/journal.pone.0053608
M3 - Article
C2 - 23308262
AN - SCOPUS:84872132821
VL - 8
JO - PLoS One
JF - PLoS One
SN - 1932-6203
IS - 1
M1 - e53608
ER -