Regularized f-measure maximization for feature selection and classification

Zhenqiu Liu, Ming Tan, Feng Jiang

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Receiver Operating Characteristic (ROC) analysis is a common tool for assessing the performance of various classifications. It gained much popularity in medical and other fields including biological markers and, diagnostic test. This is particularly due to the fact that in real-world problems misclassification costs are not known, and thus, ROC curve and related utility functions such as F-measure can be more meaningful performance measures. F-measure combines recall and precision into a global measure. In this paper, we propose a novel method through regularized F-measure maximization. The proposed method assigns different costs to positive and negative samples and does simultaneous feature selection and prediction with L 1 penalty. This method is useful especially when data set is highly unbalanced, or the labels for negative (positive) samples are missing. Our experiments with the benchmark, methylation, and high dimensional microarray data show that the performance of proposed algorithm is better or equivalent compared with the other popular classifiers in limited experiments.

Original languageEnglish (US)
Article number617946
JournalJournal of Biomedicine and Biotechnology
Volume2009
DOIs
StatePublished - May 28 2009

Fingerprint

Feature extraction
ROC Curve
Methylation
Microarrays
Labels
Costs
Costs and Cost Analysis
Benchmarking
Classifiers
Experiments
Routine Diagnostic Tests
Biomarkers
Datasets

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Molecular Medicine
  • Molecular Biology
  • Genetics
  • Health, Toxicology and Mutagenesis

Cite this

@article{80d4066c6a854cf1b98f473963bfec3d,
title = "Regularized f-measure maximization for feature selection and classification",
abstract = "Receiver Operating Characteristic (ROC) analysis is a common tool for assessing the performance of various classifications. It gained much popularity in medical and other fields including biological markers and, diagnostic test. This is particularly due to the fact that in real-world problems misclassification costs are not known, and thus, ROC curve and related utility functions such as F-measure can be more meaningful performance measures. F-measure combines recall and precision into a global measure. In this paper, we propose a novel method through regularized F-measure maximization. The proposed method assigns different costs to positive and negative samples and does simultaneous feature selection and prediction with L 1 penalty. This method is useful especially when data set is highly unbalanced, or the labels for negative (positive) samples are missing. Our experiments with the benchmark, methylation, and high dimensional microarray data show that the performance of proposed algorithm is better or equivalent compared with the other popular classifiers in limited experiments.",
author = "Zhenqiu Liu and Ming Tan and Feng Jiang",
year = "2009",
month = "5",
day = "28",
doi = "10.1155/2009/617946",
language = "English (US)",
volume = "2009",
journal = "BioMed Research International",
issn = "2314-6133",
publisher = "Hindawi Publishing Corporation",

}

Regularized f-measure maximization for feature selection and classification. / Liu, Zhenqiu; Tan, Ming; Jiang, Feng.

In: Journal of Biomedicine and Biotechnology, Vol. 2009, 617946, 28.05.2009.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Regularized f-measure maximization for feature selection and classification

AU - Liu, Zhenqiu

AU - Tan, Ming

AU - Jiang, Feng

PY - 2009/5/28

Y1 - 2009/5/28

N2 - Receiver Operating Characteristic (ROC) analysis is a common tool for assessing the performance of various classifications. It gained much popularity in medical and other fields including biological markers and, diagnostic test. This is particularly due to the fact that in real-world problems misclassification costs are not known, and thus, ROC curve and related utility functions such as F-measure can be more meaningful performance measures. F-measure combines recall and precision into a global measure. In this paper, we propose a novel method through regularized F-measure maximization. The proposed method assigns different costs to positive and negative samples and does simultaneous feature selection and prediction with L 1 penalty. This method is useful especially when data set is highly unbalanced, or the labels for negative (positive) samples are missing. Our experiments with the benchmark, methylation, and high dimensional microarray data show that the performance of proposed algorithm is better or equivalent compared with the other popular classifiers in limited experiments.

AB - Receiver Operating Characteristic (ROC) analysis is a common tool for assessing the performance of various classifications. It gained much popularity in medical and other fields including biological markers and, diagnostic test. This is particularly due to the fact that in real-world problems misclassification costs are not known, and thus, ROC curve and related utility functions such as F-measure can be more meaningful performance measures. F-measure combines recall and precision into a global measure. In this paper, we propose a novel method through regularized F-measure maximization. The proposed method assigns different costs to positive and negative samples and does simultaneous feature selection and prediction with L 1 penalty. This method is useful especially when data set is highly unbalanced, or the labels for negative (positive) samples are missing. Our experiments with the benchmark, methylation, and high dimensional microarray data show that the performance of proposed algorithm is better or equivalent compared with the other popular classifiers in limited experiments.

UR - http://www.scopus.com/inward/record.url?scp=65649149784&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65649149784&partnerID=8YFLogxK

U2 - 10.1155/2009/617946

DO - 10.1155/2009/617946

M3 - Article

C2 - 19421401

AN - SCOPUS:65649149784

VL - 2009

JO - BioMed Research International

JF - BioMed Research International

SN - 2314-6133

M1 - 617946

ER -