Predicting protective bacterial antigens using Random Forest classifiers

Yasser El-Manzalawy, Drena Dobbs, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Identifying protective antigens from bacterial pathogens is important for developing vaccines. Most computational methods for predicting protein antigenicity rely on sequence similarity between a query protein sequence and at least one known antigen. Such methods limit our ability to predict novel antigens (i.e., antigens that are not homologous to any known antigen). Therefore, there is an urgent need for alignment-free computational methods for reliable prediction of protective antigens. We evaluated the discriminative power of four different amino acid composition derived feature representations using three classification methods (Logistic Regression, Support Vector Machine, and Random Forest) on a cross validation data set of 193 protective bacterial antigens and 193 non-antigenic bacterial proteins. Our results show that, with all four data representations, Random Forest classifiers consistently outperform other classifiers. We compared HRF50, one of the best performing Random Forest classifiers with VaxiJen and SignalP on independent test sets derived from the Chlamydia trachomatis and Bartonella proteomes. Our results show that our HRF50 predictor outperforms VaxiJen and is competitive with SignalP and ANTIGENpro in predicting protective antigens. We further showed that when we combine SignalP with HRF50, the resulting method, which we call BacGen, yields performance that is comparable to or better than that of ANTIGENpro in predicting antigens in bacterial sequences. We conclude that amino acid sequence composition derived features can be effectively used to design alignment-free methods for predicting protein antigenicity using Random Forest classifiers.

Original languageEnglish (US)
Title of host publication2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
Pages426-433
Number of pages8
DOIs
StatePublished - Nov 26 2012
Event2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012 - Orlando, FL, United States
Duration: Oct 7 2012Oct 10 2012

Publication series

Name2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012

Other

Other2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
CountryUnited States
CityOrlando, FL
Period10/7/1210/10/12

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Health Information Management

Cite this

El-Manzalawy, Y., Dobbs, D., & Honavar, V. (2012). Predicting protective bacterial antigens using Random Forest classifiers. In 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012 (pp. 426-433). (2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012). https://doi.org/10.1145/2382936.2382991