Efficient Regularized Regression with L0 Penalty for Variable Selection and Network Construction

Zhenqiu Liu, Gang Li

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Variable selections for regression with high-dimensional big data have found many applications in bioinformatics and computational biology. One appealing approach is the L0 regularized regression which penalizes the number of nonzero features in the model directly. However, it is well known that L0 optimization is NP-hard and computationally challenging. In this paper, we propose efficient EM (L0EM) and dual L0EM (DL0EM) algorithms that directly approximate the L0 optimization problem. While L0EM is efficient with large sample size, DL0EM is efficient with high-dimensional (n≪m) data. They also provide a natural solution to all Lp p[0,2] problems, including lasso with p=1 and elastic net with p[1,2]. The regularized parameter can be determined through cross validation or AIC and BIC. We demonstrate our methods through simulation and high-dimensional genomic data. The results indicate that L0 has better performance than lasso, SCAD, and MC+, and L0 with AIC or BIC has similar performance as computationally intensive cross validation. The proposed algorithms are efficient in identifying the nonzero variables with less bias and constructing biologically important networks with high-dimensional big data.

Original languageEnglish (US)
Article number3456153
JournalComputational and Mathematical Methods in Medicine
Volume2016
DOIs
StatePublished - Jan 1 2016

Fingerprint

imidazole mustard
Variable Selection
Computational Biology
Penalty
Regression
Lasso
High-dimensional
Bioinformatics
Cross-validation
Sample Size
Elastic Net
Dual Algorithm
High-dimensional Data
Genomics
NP-complete problem
Optimization Problem
alachlor
Big data
Optimization
Demonstrate

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Applied Mathematics

Cite this

@article{576e11cdf95e48faa695a9996c57c030,
title = "Efficient Regularized Regression with L0 Penalty for Variable Selection and Network Construction",
abstract = "Variable selections for regression with high-dimensional big data have found many applications in bioinformatics and computational biology. One appealing approach is the L0 regularized regression which penalizes the number of nonzero features in the model directly. However, it is well known that L0 optimization is NP-hard and computationally challenging. In this paper, we propose efficient EM (L0EM) and dual L0EM (DL0EM) algorithms that directly approximate the L0 optimization problem. While L0EM is efficient with large sample size, DL0EM is efficient with high-dimensional (n≪m) data. They also provide a natural solution to all Lp p[0,2] problems, including lasso with p=1 and elastic net with p[1,2]. The regularized parameter can be determined through cross validation or AIC and BIC. We demonstrate our methods through simulation and high-dimensional genomic data. The results indicate that L0 has better performance than lasso, SCAD, and MC+, and L0 with AIC or BIC has similar performance as computationally intensive cross validation. The proposed algorithms are efficient in identifying the nonzero variables with less bias and constructing biologically important networks with high-dimensional big data.",
author = "Zhenqiu Liu and Gang Li",
year = "2016",
month = "1",
day = "1",
doi = "10.1155/2016/3456153",
language = "English (US)",
volume = "2016",
journal = "Computational and Mathematical Methods in Medicine",
issn = "1748-670X",
publisher = "Hindawi Publishing Corporation",

}

TY - JOUR

T1 - Efficient Regularized Regression with L0 Penalty for Variable Selection and Network Construction

AU - Liu, Zhenqiu

AU - Li, Gang

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Variable selections for regression with high-dimensional big data have found many applications in bioinformatics and computational biology. One appealing approach is the L0 regularized regression which penalizes the number of nonzero features in the model directly. However, it is well known that L0 optimization is NP-hard and computationally challenging. In this paper, we propose efficient EM (L0EM) and dual L0EM (DL0EM) algorithms that directly approximate the L0 optimization problem. While L0EM is efficient with large sample size, DL0EM is efficient with high-dimensional (n≪m) data. They also provide a natural solution to all Lp p[0,2] problems, including lasso with p=1 and elastic net with p[1,2]. The regularized parameter can be determined through cross validation or AIC and BIC. We demonstrate our methods through simulation and high-dimensional genomic data. The results indicate that L0 has better performance than lasso, SCAD, and MC+, and L0 with AIC or BIC has similar performance as computationally intensive cross validation. The proposed algorithms are efficient in identifying the nonzero variables with less bias and constructing biologically important networks with high-dimensional big data.

AB - Variable selections for regression with high-dimensional big data have found many applications in bioinformatics and computational biology. One appealing approach is the L0 regularized regression which penalizes the number of nonzero features in the model directly. However, it is well known that L0 optimization is NP-hard and computationally challenging. In this paper, we propose efficient EM (L0EM) and dual L0EM (DL0EM) algorithms that directly approximate the L0 optimization problem. While L0EM is efficient with large sample size, DL0EM is efficient with high-dimensional (n≪m) data. They also provide a natural solution to all Lp p[0,2] problems, including lasso with p=1 and elastic net with p[1,2]. The regularized parameter can be determined through cross validation or AIC and BIC. We demonstrate our methods through simulation and high-dimensional genomic data. The results indicate that L0 has better performance than lasso, SCAD, and MC+, and L0 with AIC or BIC has similar performance as computationally intensive cross validation. The proposed algorithms are efficient in identifying the nonzero variables with less bias and constructing biologically important networks with high-dimensional big data.

UR - http://www.scopus.com/inward/record.url?scp=84994681086&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994681086&partnerID=8YFLogxK

U2 - 10.1155/2016/3456153

DO - 10.1155/2016/3456153

M3 - Article

C2 - 27843486

AN - SCOPUS:84994681086

VL - 2016

JO - Computational and Mathematical Methods in Medicine

JF - Computational and Mathematical Methods in Medicine

SN - 1748-670X

M1 - 3456153

ER -