TY - JOUR
T1 - Scikit-ribo Enables Accurate Estimation and Robust Modeling of Translation Dynamics at Codon Resolution
AU - Fang, Han
AU - Huang, Yi Fei
AU - Radhakrishnan, Aditya
AU - Siepel, Adam
AU - Lyon, Gholson J.
AU - Schatz, Michael C.
N1 - Funding Information:
We would like to thank Allen Buskirk, Rachel Green, and Fritz Sedlazeck for providing constructive comments on the manuscript. We also want to thank Rob Patro, Noah Dukler, and Max Doerfel for helpful discussions. This project was supported in part by the US NIH ( R01-HG006677 ) and US National Science Foundation ( DBI-1350041 ) to M.C.S., the Cold Spring Harbor Laboratory (CSHL) Cancer Center Support Grant ( 5P30CA045508 ), and the NIH (NIGMS) grant GM102192 to A.S. G.J.L. serves on advisory boards for GenePeeks, Omicia, and Seven Bridges Genomics, is a consultant to Genos, and previously served as a consultant to Good Start Genetics.
Funding Information:
We would like to thank Allen Buskirk, Rachel Green, and Fritz Sedlazeck for providing constructive comments on the manuscript. We also want to thank Rob Patro, Noah Dukler, and Max Doerfel for helpful discussions. This project was supported in part by the US NIH (R01-HG006677) and US National Science Foundation (DBI-1350041) to M.C.S., the Cold Spring Harbor Laboratory (CSHL) Cancer Center Support Grant (5P30CA045508), and the NIH (NIGMS) grant GM102192 to A.S. G.J.L. serves on advisory boards for GenePeeks, Omicia, and Seven Bridges Genomics, is a consultant to Genos, and previously served as a consultant to Good Start Genetics.
Publisher Copyright:
© 2017 Elsevier Inc.
PY - 2018/2/28
Y1 - 2018/2/28
N2 - Ribosome profiling (Ribo-seq) is a powerful technique for measuring protein translation; however, sampling errors and biological biases are prevalent and poorly understood. Addressing these issues, we present Scikit-ribo (https://github.com/schatzlab/scikit-ribo), an open-source analysis package for accurate genome-wide A-site prediction and translation efficiency (TE) estimation from Ribo-seq and RNA sequencing data. Scikit-ribo accurately identifies A-site locations and reproduces codon elongation rates using several digestion protocols (r = 0.99). Next, we show that the commonly used reads per kilobase of transcript per million mapped reads-derived TE estimation is prone to biases, especially for low-abundance genes. Scikit-ribo introduces a codon-level generalized linear model with ridge penalty that correctly estimates TE, while accommodating variable codon elongation rates and mRNA secondary structure. This corrects the TE errors for over 2,000 genes in S. cerevisiae, which we validate using mass spectrometry of protein abundances (r = 0.81), and allows us to determine the Kozak-like sequence directly from Ribo-seq. We conclude with an analysis of coverage requirements needed for robust codon-level analysis and quantify the artifacts that can occur from cycloheximide treatment. New open-source statistical learning software package enables accurate analysis of translational efficiency from Ribo-seq and RNA-seq data. Using it corrects the biases for thousands of genes in S. cerevisiae, which enables improved estimates of relative protein abundances and the discovery of the Kozak-like regulatory sequence in yeast from Ribo-seq data.
AB - Ribosome profiling (Ribo-seq) is a powerful technique for measuring protein translation; however, sampling errors and biological biases are prevalent and poorly understood. Addressing these issues, we present Scikit-ribo (https://github.com/schatzlab/scikit-ribo), an open-source analysis package for accurate genome-wide A-site prediction and translation efficiency (TE) estimation from Ribo-seq and RNA sequencing data. Scikit-ribo accurately identifies A-site locations and reproduces codon elongation rates using several digestion protocols (r = 0.99). Next, we show that the commonly used reads per kilobase of transcript per million mapped reads-derived TE estimation is prone to biases, especially for low-abundance genes. Scikit-ribo introduces a codon-level generalized linear model with ridge penalty that correctly estimates TE, while accommodating variable codon elongation rates and mRNA secondary structure. This corrects the TE errors for over 2,000 genes in S. cerevisiae, which we validate using mass spectrometry of protein abundances (r = 0.81), and allows us to determine the Kozak-like sequence directly from Ribo-seq. We conclude with an analysis of coverage requirements needed for robust codon-level analysis and quantify the artifacts that can occur from cycloheximide treatment. New open-source statistical learning software package enables accurate analysis of translational efficiency from Ribo-seq and RNA-seq data. Using it corrects the biases for thousands of genes in S. cerevisiae, which enables improved estimates of relative protein abundances and the discovery of the Kozak-like regulatory sequence in yeast from Ribo-seq data.
UR - http://www.scopus.com/inward/record.url?scp=85040635351&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040635351&partnerID=8YFLogxK
U2 - 10.1016/j.cels.2017.12.007
DO - 10.1016/j.cels.2017.12.007
M3 - Article
C2 - 29361467
AN - SCOPUS:85040635351
SN - 2405-4712
VL - 6
SP - 180-191.e4
JO - Cell Systems
JF - Cell Systems
IS - 2
ER -