Splice site prediction using support vector machines with a Bayes kernel

Ya Zhang, Chao Hsien Chu, Yixin Chen, Hongyuan Zha, Xiang Ji

Research output: Contribution to journalArticle

41 Citations (Scopus)

Abstract

One of the most important tasks in correctly annotating genes in higher organisms is to accurately locate the DNA splice sites. Although relatively high accuracy has been achieved by existing methods, most of these prediction methods are computationally extensive. Due to the enormous amount of DNA sequences to be processed, the computational speed is an important issue to consider. In this paper, we present a new machine learning method for predicting DNA splice sites, which first applies a Bayes feature mapping (kernel) to project the data into a new feature space and then uses a linear Support Vector Machine (SVM) as a classifier to recognize the true splice sites. The computation time is linear to the number of sequences tested, while the performance is notably improved compared with the Naive Bayes classifier in terms of classification accuracy, precision, and recall. Our classification results are also comparable to the solution quality obtained by the SVMs with polynomial kernels, while the speed of our proposed method is significantly faster. This is a notable improvement in computational modeling considering the huge amount of DNA sequences to be processed.

Original languageEnglish (US)
Pages (from-to)73-81
Number of pages9
JournalExpert Systems With Applications
Volume30
Issue number1
DOIs
StatePublished - Jan 1 2006

Fingerprint

DNA sequences
Support vector machines
DNA
Classifiers
Learning systems
Genes
Polynomials

All Science Journal Classification (ASJC) codes

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Cite this

Zhang, Ya ; Chu, Chao Hsien ; Chen, Yixin ; Zha, Hongyuan ; Ji, Xiang. / Splice site prediction using support vector machines with a Bayes kernel. In: Expert Systems With Applications. 2006 ; Vol. 30, No. 1. pp. 73-81.
@article{6865f2fb9c974d73b127a91a10c9b755,
title = "Splice site prediction using support vector machines with a Bayes kernel",
abstract = "One of the most important tasks in correctly annotating genes in higher organisms is to accurately locate the DNA splice sites. Although relatively high accuracy has been achieved by existing methods, most of these prediction methods are computationally extensive. Due to the enormous amount of DNA sequences to be processed, the computational speed is an important issue to consider. In this paper, we present a new machine learning method for predicting DNA splice sites, which first applies a Bayes feature mapping (kernel) to project the data into a new feature space and then uses a linear Support Vector Machine (SVM) as a classifier to recognize the true splice sites. The computation time is linear to the number of sequences tested, while the performance is notably improved compared with the Naive Bayes classifier in terms of classification accuracy, precision, and recall. Our classification results are also comparable to the solution quality obtained by the SVMs with polynomial kernels, while the speed of our proposed method is significantly faster. This is a notable improvement in computational modeling considering the huge amount of DNA sequences to be processed.",
author = "Ya Zhang and Chu, {Chao Hsien} and Yixin Chen and Hongyuan Zha and Xiang Ji",
year = "2006",
month = "1",
day = "1",
doi = "10.1016/j.eswa.2005.09.052",
language = "English (US)",
volume = "30",
pages = "73--81",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",
number = "1",

}

Splice site prediction using support vector machines with a Bayes kernel. / Zhang, Ya; Chu, Chao Hsien; Chen, Yixin; Zha, Hongyuan; Ji, Xiang.

In: Expert Systems With Applications, Vol. 30, No. 1, 01.01.2006, p. 73-81.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Splice site prediction using support vector machines with a Bayes kernel

AU - Zhang, Ya

AU - Chu, Chao Hsien

AU - Chen, Yixin

AU - Zha, Hongyuan

AU - Ji, Xiang

PY - 2006/1/1

Y1 - 2006/1/1

N2 - One of the most important tasks in correctly annotating genes in higher organisms is to accurately locate the DNA splice sites. Although relatively high accuracy has been achieved by existing methods, most of these prediction methods are computationally extensive. Due to the enormous amount of DNA sequences to be processed, the computational speed is an important issue to consider. In this paper, we present a new machine learning method for predicting DNA splice sites, which first applies a Bayes feature mapping (kernel) to project the data into a new feature space and then uses a linear Support Vector Machine (SVM) as a classifier to recognize the true splice sites. The computation time is linear to the number of sequences tested, while the performance is notably improved compared with the Naive Bayes classifier in terms of classification accuracy, precision, and recall. Our classification results are also comparable to the solution quality obtained by the SVMs with polynomial kernels, while the speed of our proposed method is significantly faster. This is a notable improvement in computational modeling considering the huge amount of DNA sequences to be processed.

AB - One of the most important tasks in correctly annotating genes in higher organisms is to accurately locate the DNA splice sites. Although relatively high accuracy has been achieved by existing methods, most of these prediction methods are computationally extensive. Due to the enormous amount of DNA sequences to be processed, the computational speed is an important issue to consider. In this paper, we present a new machine learning method for predicting DNA splice sites, which first applies a Bayes feature mapping (kernel) to project the data into a new feature space and then uses a linear Support Vector Machine (SVM) as a classifier to recognize the true splice sites. The computation time is linear to the number of sequences tested, while the performance is notably improved compared with the Naive Bayes classifier in terms of classification accuracy, precision, and recall. Our classification results are also comparable to the solution quality obtained by the SVMs with polynomial kernels, while the speed of our proposed method is significantly faster. This is a notable improvement in computational modeling considering the huge amount of DNA sequences to be processed.

UR - http://www.scopus.com/inward/record.url?scp=27844524581&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27844524581&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2005.09.052

DO - 10.1016/j.eswa.2005.09.052

M3 - Article

AN - SCOPUS:27844524581

VL - 30

SP - 73

EP - 81

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

IS - 1

ER -