Semi-supervised sequence classification using Abstraction Augmented Markov Models

Cornelia Caragea, Adrian Silvescu, Doina Caragea, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Supervised methods for learning sequence classiffiers rely on the vailability of large amounts of labeled data. However, in many applications because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in semi-supervised methods that can exploit large amounts of unlabeled data together with small amounts of labeled data. In this paper, we introduce a novel Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised learning. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) the Markov models (MMs) (which do not take advantage of unlabeled data); and (ii) an expectation maximization (EM) based approach to semi-supervised training of MMs (that make use of unlabeled data). The results of our experiments on three protein subcellular localization prediction tasks show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; and (ii) are more accurate than both the MMs and the EM based semi-supervised MMs.

Original languageEnglish (US)
Title of host publication2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010
Pages257-264
Number of pages8
DOIs
StatePublished - Oct 25 2010
Event2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010 - Niagara Falls, NY, United States
Duration: Aug 2 2010Aug 4 2010

Publication series

Name2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010

Other

Other2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010
CountryUnited States
CityNiagara Falls, NY
Period8/2/108/4/10

Fingerprint

Learning
Costs and Cost Analysis
Supervised learning
Proteins
Labeling
Costs
Experiments
Supervised Machine Learning

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Health Information Management

Cite this

Caragea, C., Silvescu, A., Caragea, D., & Honavar, V. (2010). Semi-supervised sequence classification using Abstraction Augmented Markov Models. In 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010 (pp. 257-264). (2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010). https://doi.org/10.1145/1854776.1854813
Caragea, Cornelia ; Silvescu, Adrian ; Caragea, Doina ; Honavar, Vasant. / Semi-supervised sequence classification using Abstraction Augmented Markov Models. 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010. 2010. pp. 257-264 (2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010).
@inproceedings{fcddba48d06c4afeb53ee36280834a12,
title = "Semi-supervised sequence classification using Abstraction Augmented Markov Models",
abstract = "Supervised methods for learning sequence classiffiers rely on the vailability of large amounts of labeled data. However, in many applications because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in semi-supervised methods that can exploit large amounts of unlabeled data together with small amounts of labeled data. In this paper, we introduce a novel Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised learning. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) the Markov models (MMs) (which do not take advantage of unlabeled data); and (ii) an expectation maximization (EM) based approach to semi-supervised training of MMs (that make use of unlabeled data). The results of our experiments on three protein subcellular localization prediction tasks show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; and (ii) are more accurate than both the MMs and the EM based semi-supervised MMs.",
author = "Cornelia Caragea and Adrian Silvescu and Doina Caragea and Vasant Honavar",
year = "2010",
month = "10",
day = "25",
doi = "10.1145/1854776.1854813",
language = "English (US)",
isbn = "9781450304382",
series = "2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010",
pages = "257--264",
booktitle = "2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010",

}

Caragea, C, Silvescu, A, Caragea, D & Honavar, V 2010, Semi-supervised sequence classification using Abstraction Augmented Markov Models. in 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010. 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010, pp. 257-264, 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010, Niagara Falls, NY, United States, 8/2/10. https://doi.org/10.1145/1854776.1854813

Semi-supervised sequence classification using Abstraction Augmented Markov Models. / Caragea, Cornelia; Silvescu, Adrian; Caragea, Doina; Honavar, Vasant.

2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010. 2010. p. 257-264 (2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Semi-supervised sequence classification using Abstraction Augmented Markov Models

AU - Caragea, Cornelia

AU - Silvescu, Adrian

AU - Caragea, Doina

AU - Honavar, Vasant

PY - 2010/10/25

Y1 - 2010/10/25

N2 - Supervised methods for learning sequence classiffiers rely on the vailability of large amounts of labeled data. However, in many applications because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in semi-supervised methods that can exploit large amounts of unlabeled data together with small amounts of labeled data. In this paper, we introduce a novel Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised learning. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) the Markov models (MMs) (which do not take advantage of unlabeled data); and (ii) an expectation maximization (EM) based approach to semi-supervised training of MMs (that make use of unlabeled data). The results of our experiments on three protein subcellular localization prediction tasks show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; and (ii) are more accurate than both the MMs and the EM based semi-supervised MMs.

AB - Supervised methods for learning sequence classiffiers rely on the vailability of large amounts of labeled data. However, in many applications because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in semi-supervised methods that can exploit large amounts of unlabeled data together with small amounts of labeled data. In this paper, we introduce a novel Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised learning. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) the Markov models (MMs) (which do not take advantage of unlabeled data); and (ii) an expectation maximization (EM) based approach to semi-supervised training of MMs (that make use of unlabeled data). The results of our experiments on three protein subcellular localization prediction tasks show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; and (ii) are more accurate than both the MMs and the EM based semi-supervised MMs.

UR - http://www.scopus.com/inward/record.url?scp=77958074609&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77958074609&partnerID=8YFLogxK

U2 - 10.1145/1854776.1854813

DO - 10.1145/1854776.1854813

M3 - Conference contribution

SN - 9781450304382

T3 - 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010

SP - 257

EP - 264

BT - 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010

ER -

Caragea C, Silvescu A, Caragea D, Honavar V. Semi-supervised sequence classification using Abstraction Augmented Markov Models. In 2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010. 2010. p. 257-264. (2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010). https://doi.org/10.1145/1854776.1854813