ROC-based utility function maximization for feature selection and classification with applications to high-dimensional protease data

Zhenqiu Liu, Ming Tan

Research output: Contribution to journalArticle

13 Scopus citations

Abstract

In medical diagnosis, the diseased and nondiseased classes are usually unbalanced and one class may be more important than the other depending on the diagnosis purpose. Most standard classification methods, however, are designed to maximize the overall accuracy and cannot incorporate different costs to different classes explicitly. In this article, we propose a novel nonparametric method to directly maximize the weighted specificity and sensitivity of the receiver operating characteristic curve. Combining advances in machine learning, optimization theory, and statistics, the proposed method has excellent generalization property and assigns different error costs to different classes explicitly. We present experiments that compare the proposed algorithms with support vector machines and regularized logistic regression using data from a study on HIV-1 protease as well as six public available datasets. Our main conclusion is that the performance of proposed algorithm is significantly better in most cases than the other classifiers tested. Software package in MATLAB is available upon request.

Original languageEnglish (US)
Pages (from-to)1155-1161
Number of pages7
JournalBiometrics
Volume64
Issue number4
DOIs
StatePublished - Dec 2008

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Fingerprint Dive into the research topics of 'ROC-based utility function maximization for feature selection and classification with applications to high-dimensional protease data'. Together they form a unique fingerprint.

  • Cite this