Sparse support vector machines with L0 approximation for ultra-high dimensional omics data

Zhenqiu Liu, David Elashoff, Steven Piantadosi

Research output: Contribution to journalArticle

Abstract

Omics data usually have ultra-high dimension (p)and small sample size (n). Standard support vector machines (SVMs), which minimize the L2 norm for the primal variables, only lead to sparse solutions for the dual variables. L1 based SVMs, directly minimizing the L1 norm, have been used for feature selection with omics data. However, most current methods directly solve the primal formulations of the problem, which are not computationally scalable. The computational complexity increases with the number of features. In addition, L1 norm is known to be asymptotically biased and not consistent for feature selection. In this paper, we develop an efficient method for sparse support vector machines with L0 norm approximation. The proposed method approximates the L0 minimization through solving a series of L2 optimization problems, which can be formulated with dual variables. It finds the optimal solution for p primal variables through estimating n dual variables, which is more efficient as long as the sample size is small. L0 approximation leads to sparsity in both dual and primal variables, and can be used for both feature and sample selections. The proposed method identifies much less number of features and achieves similar performances in simulations. We apply the proposed method to feature selections with metagenomic sequencing and gene expression data. It can identify biologically important genes and taxa efficiently.

Original languageEnglish (US)
Pages (from-to)134-141
Number of pages8
JournalArtificial Intelligence in Medicine
Volume96
DOIs
StatePublished - May 1 2019

Fingerprint

Support vector machines
Feature extraction
Sample Size
Gene expression
Computational complexity
Metagenomics
Genes
Research Design
Support Vector Machine
Gene Expression

All Science Journal Classification (ASJC) codes

  • Medicine (miscellaneous)
  • Artificial Intelligence

Cite this

@article{51f6a185bb1c406793e00ade2fdb9253,
title = "Sparse support vector machines with L0 approximation for ultra-high dimensional omics data",
abstract = "Omics data usually have ultra-high dimension (p)and small sample size (n). Standard support vector machines (SVMs), which minimize the L2 norm for the primal variables, only lead to sparse solutions for the dual variables. L1 based SVMs, directly minimizing the L1 norm, have been used for feature selection with omics data. However, most current methods directly solve the primal formulations of the problem, which are not computationally scalable. The computational complexity increases with the number of features. In addition, L1 norm is known to be asymptotically biased and not consistent for feature selection. In this paper, we develop an efficient method for sparse support vector machines with L0 norm approximation. The proposed method approximates the L0 minimization through solving a series of L2 optimization problems, which can be formulated with dual variables. It finds the optimal solution for p primal variables through estimating n dual variables, which is more efficient as long as the sample size is small. L0 approximation leads to sparsity in both dual and primal variables, and can be used for both feature and sample selections. The proposed method identifies much less number of features and achieves similar performances in simulations. We apply the proposed method to feature selections with metagenomic sequencing and gene expression data. It can identify biologically important genes and taxa efficiently.",
author = "Zhenqiu Liu and David Elashoff and Steven Piantadosi",
year = "2019",
month = "5",
day = "1",
doi = "10.1016/j.artmed.2019.04.004",
language = "English (US)",
volume = "96",
pages = "134--141",
journal = "Artificial Intelligence in Medicine",
issn = "0933-3657",
publisher = "Elsevier",

}

Sparse support vector machines with L0 approximation for ultra-high dimensional omics data. / Liu, Zhenqiu; Elashoff, David; Piantadosi, Steven.

In: Artificial Intelligence in Medicine, Vol. 96, 01.05.2019, p. 134-141.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Sparse support vector machines with L0 approximation for ultra-high dimensional omics data

AU - Liu, Zhenqiu

AU - Elashoff, David

AU - Piantadosi, Steven

PY - 2019/5/1

Y1 - 2019/5/1

N2 - Omics data usually have ultra-high dimension (p)and small sample size (n). Standard support vector machines (SVMs), which minimize the L2 norm for the primal variables, only lead to sparse solutions for the dual variables. L1 based SVMs, directly minimizing the L1 norm, have been used for feature selection with omics data. However, most current methods directly solve the primal formulations of the problem, which are not computationally scalable. The computational complexity increases with the number of features. In addition, L1 norm is known to be asymptotically biased and not consistent for feature selection. In this paper, we develop an efficient method for sparse support vector machines with L0 norm approximation. The proposed method approximates the L0 minimization through solving a series of L2 optimization problems, which can be formulated with dual variables. It finds the optimal solution for p primal variables through estimating n dual variables, which is more efficient as long as the sample size is small. L0 approximation leads to sparsity in both dual and primal variables, and can be used for both feature and sample selections. The proposed method identifies much less number of features and achieves similar performances in simulations. We apply the proposed method to feature selections with metagenomic sequencing and gene expression data. It can identify biologically important genes and taxa efficiently.

AB - Omics data usually have ultra-high dimension (p)and small sample size (n). Standard support vector machines (SVMs), which minimize the L2 norm for the primal variables, only lead to sparse solutions for the dual variables. L1 based SVMs, directly minimizing the L1 norm, have been used for feature selection with omics data. However, most current methods directly solve the primal formulations of the problem, which are not computationally scalable. The computational complexity increases with the number of features. In addition, L1 norm is known to be asymptotically biased and not consistent for feature selection. In this paper, we develop an efficient method for sparse support vector machines with L0 norm approximation. The proposed method approximates the L0 minimization through solving a series of L2 optimization problems, which can be formulated with dual variables. It finds the optimal solution for p primal variables through estimating n dual variables, which is more efficient as long as the sample size is small. L0 approximation leads to sparsity in both dual and primal variables, and can be used for both feature and sample selections. The proposed method identifies much less number of features and achieves similar performances in simulations. We apply the proposed method to feature selections with metagenomic sequencing and gene expression data. It can identify biologically important genes and taxa efficiently.

UR - http://www.scopus.com/inward/record.url?scp=85065186069&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065186069&partnerID=8YFLogxK

U2 - 10.1016/j.artmed.2019.04.004

DO - 10.1016/j.artmed.2019.04.004

M3 - Article

C2 - 31164207

AN - SCOPUS:85065186069

VL - 96

SP - 134

EP - 141

JO - Artificial Intelligence in Medicine

JF - Artificial Intelligence in Medicine

SN - 0933-3657

ER -