Prediction of RNA binding sites in proteins from amino acid sequence

Michael Terribilini, Jae Hyung Lee, Changhui Yan, Robert L. Jernigan, Vasant Honavar, Drena Dobbs

Research output: Contribution to journalArticle

136 Citations (Scopus)

Abstract

RNA-protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA-protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA-protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA-protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA-protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.) Published by Cold Spring Harbor Laboratory Press.

Original languageEnglish (US)
Pages (from-to)1450-1462
Number of pages13
JournalRNA
Volume12
Issue number8
DOIs
StatePublished - Aug 3 2006

Fingerprint

Amino Acid Sequence
Binding Sites
RNA
Proteins
RNA-Binding Proteins
Biological Phenomena
Virus Assembly
Amino Acids
Telomerase
Gene Expression Regulation
Protein Sorting Signals
Databases

All Science Journal Classification (ASJC) codes

  • Molecular Biology

Cite this

Terribilini, M., Lee, J. H., Yan, C., Jernigan, R. L., Honavar, V., & Dobbs, D. (2006). Prediction of RNA binding sites in proteins from amino acid sequence. RNA, 12(8), 1450-1462. https://doi.org/10.1261/rna.2197306
Terribilini, Michael ; Lee, Jae Hyung ; Yan, Changhui ; Jernigan, Robert L. ; Honavar, Vasant ; Dobbs, Drena. / Prediction of RNA binding sites in proteins from amino acid sequence. In: RNA. 2006 ; Vol. 12, No. 8. pp. 1450-1462.
@article{793f090932f84aa1b9e84b9e0ba9616f,
title = "Prediction of RNA binding sites in proteins from amino acid sequence",
abstract = "RNA-protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA-protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA-protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA-protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85{\%} overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA-protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.) Published by Cold Spring Harbor Laboratory Press.",
author = "Michael Terribilini and Lee, {Jae Hyung} and Changhui Yan and Jernigan, {Robert L.} and Vasant Honavar and Drena Dobbs",
year = "2006",
month = "8",
day = "3",
doi = "10.1261/rna.2197306",
language = "English (US)",
volume = "12",
pages = "1450--1462",
journal = "RNA",
issn = "1355-8382",
publisher = "Cold Spring Harbor Laboratory Press",
number = "8",

}

Terribilini, M, Lee, JH, Yan, C, Jernigan, RL, Honavar, V & Dobbs, D 2006, 'Prediction of RNA binding sites in proteins from amino acid sequence', RNA, vol. 12, no. 8, pp. 1450-1462. https://doi.org/10.1261/rna.2197306

Prediction of RNA binding sites in proteins from amino acid sequence. / Terribilini, Michael; Lee, Jae Hyung; Yan, Changhui; Jernigan, Robert L.; Honavar, Vasant; Dobbs, Drena.

In: RNA, Vol. 12, No. 8, 03.08.2006, p. 1450-1462.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Prediction of RNA binding sites in proteins from amino acid sequence

AU - Terribilini, Michael

AU - Lee, Jae Hyung

AU - Yan, Changhui

AU - Jernigan, Robert L.

AU - Honavar, Vasant

AU - Dobbs, Drena

PY - 2006/8/3

Y1 - 2006/8/3

N2 - RNA-protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA-protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA-protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA-protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA-protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.) Published by Cold Spring Harbor Laboratory Press.

AB - RNA-protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA-protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA-protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA-protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA-protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.) Published by Cold Spring Harbor Laboratory Press.

UR - http://www.scopus.com/inward/record.url?scp=33746526551&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746526551&partnerID=8YFLogxK

U2 - 10.1261/rna.2197306

DO - 10.1261/rna.2197306

M3 - Article

C2 - 16790841

AN - SCOPUS:33746526551

VL - 12

SP - 1450

EP - 1462

JO - RNA

JF - RNA

SN - 1355-8382

IS - 8

ER -

Terribilini M, Lee JH, Yan C, Jernigan RL, Honavar V, Dobbs D. Prediction of RNA binding sites in proteins from amino acid sequence. RNA. 2006 Aug 3;12(8):1450-1462. https://doi.org/10.1261/rna.2197306