Glycosylation site prediction using ensembles of Support Vector Machine classifiers

Cornelia Caragea, Jivko Sinapov, Adrian Silvescu, Drena Dobbs, Vasant Honavar

Research output: Contribution to journalArticle

120 Citations (Scopus)

Abstract

Background: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. Results: We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction. Conclusion: Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.

Original languageEnglish (US)
Article number438
JournalBMC bioinformatics
Volume8
DOIs
StatePublished - Nov 9 2007

Fingerprint

Glycosylation
Support vector machines
Support Vector Machine
Ensemble
Classifiers
Classifier
Prediction
Amino acids
Amino Acids
Cell
Proteins
Protein
Biological Phenomena
Protein folding
Protein Databases
Glycoproteins
Protein Folding
Glycoprotein
Web Server
Eukaryotic Cells

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Caragea, Cornelia ; Sinapov, Jivko ; Silvescu, Adrian ; Dobbs, Drena ; Honavar, Vasant. / Glycosylation site prediction using ensembles of Support Vector Machine classifiers. In: BMC bioinformatics. 2007 ; Vol. 8.
@article{22e0e474609349cb82ab8ae6a40f7ade,
title = "Glycosylation site prediction using ensembles of Support Vector Machine classifiers",
abstract = "Background: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. Results: We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction. Conclusion: Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.",
author = "Cornelia Caragea and Jivko Sinapov and Adrian Silvescu and Drena Dobbs and Vasant Honavar",
year = "2007",
month = "11",
day = "9",
doi = "10.1186/1471-2105-8-438",
language = "English (US)",
volume = "8",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

Glycosylation site prediction using ensembles of Support Vector Machine classifiers. / Caragea, Cornelia; Sinapov, Jivko; Silvescu, Adrian; Dobbs, Drena; Honavar, Vasant.

In: BMC bioinformatics, Vol. 8, 438, 09.11.2007.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Glycosylation site prediction using ensembles of Support Vector Machine classifiers

AU - Caragea, Cornelia

AU - Sinapov, Jivko

AU - Silvescu, Adrian

AU - Dobbs, Drena

AU - Honavar, Vasant

PY - 2007/11/9

Y1 - 2007/11/9

N2 - Background: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. Results: We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction. Conclusion: Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.

AB - Background: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. Results: We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction. Conclusion: Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.

UR - http://www.scopus.com/inward/record.url?scp=38849163717&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38849163717&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-8-438

DO - 10.1186/1471-2105-8-438

M3 - Article

C2 - 17996106

AN - SCOPUS:38849163717

VL - 8

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 438

ER -