Using global sequence similarity to enhance biological sequence labeling

Cornelia Caragea, Drena Dobbs, Jivko Sinapov, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Identifying functionally important sites from biological sequences, formulated as a biological sequence labeling problem, has broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. In this paper, we present an approach to biological sequence labeling that takes into account the global similarity between biological sequences. Our approach combines unsupervised and supervised learning techniques. Given a set of sequences and a similarity measure defined on pairs of sequences, we learn a mixture of experts model by using spectral clustering to learn the hierarchical structure of the model and by using bayesian approaches to combine the predictions of the experts. We evaluate our approach on two important biological sequence labeling problems: RNA-protein and DNA-protein interface prediction problems. The results of our experiments show that global sequence similarity can be exploited to improve the performance of classifiers trained to label biological sequence data.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
Pages104-111
Number of pages8
DOIs
StatePublished - Dec 1 2008
Event2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008 - Philadelphia, PA, United States
Duration: Nov 3 2008Nov 5 2008

Publication series

NameProceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008

Other

Other2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
CountryUnited States
CityPhiladelphia, PA
Period11/3/0811/5/08

Fingerprint

Labeling
Bayes Theorem
Drug Design
Proteins
Cluster Analysis
Signal Transduction
Signal transduction
Unsupervised learning
Supervised learning
Learning
RNA
Labels
DNA
Classifiers
Experiments

All Science Journal Classification (ASJC) codes

  • Molecular Biology
  • Information Systems
  • Biomedical Engineering

Cite this

Caragea, C., Dobbs, D., Sinapov, J., & Honavar, V. (2008). Using global sequence similarity to enhance biological sequence labeling. In Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008 (pp. 104-111). [4684880] (Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008). https://doi.org/10.1109/BIBM.2008.54
Caragea, Cornelia ; Dobbs, Drena ; Sinapov, Jivko ; Honavar, Vasant. / Using global sequence similarity to enhance biological sequence labeling. Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008. 2008. pp. 104-111 (Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008).
@inproceedings{0c135ca7aa57467f97e45753f170051c,
title = "Using global sequence similarity to enhance biological sequence labeling",
abstract = "Identifying functionally important sites from biological sequences, formulated as a biological sequence labeling problem, has broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. In this paper, we present an approach to biological sequence labeling that takes into account the global similarity between biological sequences. Our approach combines unsupervised and supervised learning techniques. Given a set of sequences and a similarity measure defined on pairs of sequences, we learn a mixture of experts model by using spectral clustering to learn the hierarchical structure of the model and by using bayesian approaches to combine the predictions of the experts. We evaluate our approach on two important biological sequence labeling problems: RNA-protein and DNA-protein interface prediction problems. The results of our experiments show that global sequence similarity can be exploited to improve the performance of classifiers trained to label biological sequence data.",
author = "Cornelia Caragea and Drena Dobbs and Jivko Sinapov and Vasant Honavar",
year = "2008",
month = "12",
day = "1",
doi = "10.1109/BIBM.2008.54",
language = "English (US)",
isbn = "9780769534527",
series = "Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008",
pages = "104--111",
booktitle = "Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008",

}

Caragea, C, Dobbs, D, Sinapov, J & Honavar, V 2008, Using global sequence similarity to enhance biological sequence labeling. in Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008., 4684880, Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008, pp. 104-111, 2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008, Philadelphia, PA, United States, 11/3/08. https://doi.org/10.1109/BIBM.2008.54

Using global sequence similarity to enhance biological sequence labeling. / Caragea, Cornelia; Dobbs, Drena; Sinapov, Jivko; Honavar, Vasant.

Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008. 2008. p. 104-111 4684880 (Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Using global sequence similarity to enhance biological sequence labeling

AU - Caragea, Cornelia

AU - Dobbs, Drena

AU - Sinapov, Jivko

AU - Honavar, Vasant

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Identifying functionally important sites from biological sequences, formulated as a biological sequence labeling problem, has broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. In this paper, we present an approach to biological sequence labeling that takes into account the global similarity between biological sequences. Our approach combines unsupervised and supervised learning techniques. Given a set of sequences and a similarity measure defined on pairs of sequences, we learn a mixture of experts model by using spectral clustering to learn the hierarchical structure of the model and by using bayesian approaches to combine the predictions of the experts. We evaluate our approach on two important biological sequence labeling problems: RNA-protein and DNA-protein interface prediction problems. The results of our experiments show that global sequence similarity can be exploited to improve the performance of classifiers trained to label biological sequence data.

AB - Identifying functionally important sites from biological sequences, formulated as a biological sequence labeling problem, has broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. In this paper, we present an approach to biological sequence labeling that takes into account the global similarity between biological sequences. Our approach combines unsupervised and supervised learning techniques. Given a set of sequences and a similarity measure defined on pairs of sequences, we learn a mixture of experts model by using spectral clustering to learn the hierarchical structure of the model and by using bayesian approaches to combine the predictions of the experts. We evaluate our approach on two important biological sequence labeling problems: RNA-protein and DNA-protein interface prediction problems. The results of our experiments show that global sequence similarity can be exploited to improve the performance of classifiers trained to label biological sequence data.

UR - http://www.scopus.com/inward/record.url?scp=58049142048&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58049142048&partnerID=8YFLogxK

U2 - 10.1109/BIBM.2008.54

DO - 10.1109/BIBM.2008.54

M3 - Conference contribution

SN - 9780769534527

T3 - Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008

SP - 104

EP - 111

BT - Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008

ER -

Caragea C, Dobbs D, Sinapov J, Honavar V. Using global sequence similarity to enhance biological sequence labeling. In Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008. 2008. p. 104-111. 4684880. (Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008). https://doi.org/10.1109/BIBM.2008.54