A selective overview of feature screening for ultrahigh-dimensional data

Jing Yuan Liu, Wei Zhong, Run Ze Li

Research output: Contribution to journalReview article

14 Citations (Scopus)

Abstract

High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data. Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many highdimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.

Original languageEnglish (US)
JournalScience China Mathematics
Volume58
Issue number10
DOIs
StatePublished - Oct 29 2015

Fingerprint

Screening
High-dimensional Data
Variable Selection
Biomedical Imaging
Penalized Least Squares
Selection of Variables
Proteomics
Tomography
Sparsity
Finance
Feature Selection
Dimensionality
Genomics
Predictors
Tumor
Likelihood
Data analysis
Sample Size
Model
Estimate

All Science Journal Classification (ASJC) codes

  • Mathematics(all)

Cite this

@article{091fe91071254fd2b96fa109227ae572,
title = "A selective overview of feature screening for ultrahigh-dimensional data",
abstract = "High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data. Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many highdimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.",
author = "Liu, {Jing Yuan} and Wei Zhong and Li, {Run Ze}",
year = "2015",
month = "10",
day = "29",
doi = "10.1007/s11425-015-5062-9",
language = "English (US)",
volume = "58",
journal = "Science China Mathematics",
issn = "1674-7283",
publisher = "Science in China Press",
number = "10",

}

A selective overview of feature screening for ultrahigh-dimensional data. / Liu, Jing Yuan; Zhong, Wei; Li, Run Ze.

In: Science China Mathematics, Vol. 58, No. 10, 29.10.2015.

Research output: Contribution to journalReview article

TY - JOUR

T1 - A selective overview of feature screening for ultrahigh-dimensional data

AU - Liu, Jing Yuan

AU - Zhong, Wei

AU - Li, Run Ze

PY - 2015/10/29

Y1 - 2015/10/29

N2 - High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data. Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many highdimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.

AB - High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data. Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many highdimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.

UR - http://www.scopus.com/inward/record.url?scp=84942372194&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942372194&partnerID=8YFLogxK

U2 - 10.1007/s11425-015-5062-9

DO - 10.1007/s11425-015-5062-9

M3 - Review article

C2 - 26779257

AN - SCOPUS:84942372194

VL - 58

JO - Science China Mathematics

JF - Science China Mathematics

SN - 1674-7283

IS - 10

ER -