Statistical challenges with high dimensionality

Feature selection in knowledge discovery

Jianqing Fan, Runze Li

Research output: Contribution to conferencePaper

153 Citations (Scopus)

Abstract

Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of high-dimensionality arise in diverse fields of sciences and the humanities, ranging from computational biology and health studies to financial engineering and risk management. In all of these fields, variable selection and feature extraction are crucial for knowledge discovery. We first give a comprehensive overview of statistical challenges with high dimensionality in these diverse disciplines. We then approach the problem of variable selection and feature extraction using a unified framework: penalized likelihood methods. Issues relevant to the choice of penalty functions are addressed. We demonstrate that for a host of statistical problems, as long as the dimensionality is not excessively large, we can estimate the model parameters as well as if the best model is known in advance. The persistence property in risk minimization is also addressed. The applicability of such a theory and method to diverse statistical problems is demonstrated. Other related problems with high-dimensionality are also discussed.

Original languageEnglish (US)
Pages595-622
Number of pages28
StatePublished - Dec 1 2006
Event25th International Congress of Mathematicians, ICM 2006 - Madrid, Spain
Duration: Aug 22 2006Aug 30 2006

Other

Other25th International Congress of Mathematicians, ICM 2006
CountrySpain
CityMadrid
Period8/22/068/30/06

Fingerprint

Knowledge Discovery
Feature Selection
Dimensionality
Feature Extraction
Selection of Variables
Penalized Likelihood
Likelihood Methods
Computational Biology
Penalty Function
Risk Management
Variable Selection
Persistence
Research and Development
Data analysis
Health
Availability
Engineering
Model
Estimate
Demonstrate

All Science Journal Classification (ASJC) codes

  • Mathematics(all)

Cite this

Fan, J., & Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. 595-622. Paper presented at 25th International Congress of Mathematicians, ICM 2006, Madrid, Spain.
Fan, Jianqing ; Li, Runze. / Statistical challenges with high dimensionality : Feature selection in knowledge discovery. Paper presented at 25th International Congress of Mathematicians, ICM 2006, Madrid, Spain.28 p.
@conference{8cfeb44046904c67ae97bbcd0cb161e8,
title = "Statistical challenges with high dimensionality: Feature selection in knowledge discovery",
abstract = "Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of high-dimensionality arise in diverse fields of sciences and the humanities, ranging from computational biology and health studies to financial engineering and risk management. In all of these fields, variable selection and feature extraction are crucial for knowledge discovery. We first give a comprehensive overview of statistical challenges with high dimensionality in these diverse disciplines. We then approach the problem of variable selection and feature extraction using a unified framework: penalized likelihood methods. Issues relevant to the choice of penalty functions are addressed. We demonstrate that for a host of statistical problems, as long as the dimensionality is not excessively large, we can estimate the model parameters as well as if the best model is known in advance. The persistence property in risk minimization is also addressed. The applicability of such a theory and method to diverse statistical problems is demonstrated. Other related problems with high-dimensionality are also discussed.",
author = "Jianqing Fan and Runze Li",
year = "2006",
month = "12",
day = "1",
language = "English (US)",
pages = "595--622",
note = "25th International Congress of Mathematicians, ICM 2006 ; Conference date: 22-08-2006 Through 30-08-2006",

}

Fan, J & Li, R 2006, 'Statistical challenges with high dimensionality: Feature selection in knowledge discovery' Paper presented at 25th International Congress of Mathematicians, ICM 2006, Madrid, Spain, 8/22/06 - 8/30/06, pp. 595-622.

Statistical challenges with high dimensionality : Feature selection in knowledge discovery. / Fan, Jianqing; Li, Runze.

2006. 595-622 Paper presented at 25th International Congress of Mathematicians, ICM 2006, Madrid, Spain.

Research output: Contribution to conferencePaper

TY - CONF

T1 - Statistical challenges with high dimensionality

T2 - Feature selection in knowledge discovery

AU - Fan, Jianqing

AU - Li, Runze

PY - 2006/12/1

Y1 - 2006/12/1

N2 - Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of high-dimensionality arise in diverse fields of sciences and the humanities, ranging from computational biology and health studies to financial engineering and risk management. In all of these fields, variable selection and feature extraction are crucial for knowledge discovery. We first give a comprehensive overview of statistical challenges with high dimensionality in these diverse disciplines. We then approach the problem of variable selection and feature extraction using a unified framework: penalized likelihood methods. Issues relevant to the choice of penalty functions are addressed. We demonstrate that for a host of statistical problems, as long as the dimensionality is not excessively large, we can estimate the model parameters as well as if the best model is known in advance. The persistence property in risk minimization is also addressed. The applicability of such a theory and method to diverse statistical problems is demonstrated. Other related problems with high-dimensionality are also discussed.

AB - Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of high-dimensionality arise in diverse fields of sciences and the humanities, ranging from computational biology and health studies to financial engineering and risk management. In all of these fields, variable selection and feature extraction are crucial for knowledge discovery. We first give a comprehensive overview of statistical challenges with high dimensionality in these diverse disciplines. We then approach the problem of variable selection and feature extraction using a unified framework: penalized likelihood methods. Issues relevant to the choice of penalty functions are addressed. We demonstrate that for a host of statistical problems, as long as the dimensionality is not excessively large, we can estimate the model parameters as well as if the best model is known in advance. The persistence property in risk minimization is also addressed. The applicability of such a theory and method to diverse statistical problems is demonstrated. Other related problems with high-dimensionality are also discussed.

UR - http://www.scopus.com/inward/record.url?scp=84878031768&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878031768&partnerID=8YFLogxK

M3 - Paper

SP - 595

EP - 622

ER -

Fan J, Li R. Statistical challenges with high dimensionality: Feature selection in knowledge discovery. 2006. Paper presented at 25th International Congress of Mathematicians, ICM 2006, Madrid, Spain.