Model selection for high-dimensional problems

Jing Zhi Huang, Zhan Shi, Wei Zhong

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

High-dimensional data analysis is becoming more and more important to both academics and practitioners in finance and economics but is also very challenging because the number of variables or parameters in connection with such data can be larger than the sample size. Recently, several variable selection approaches have been developed and used to help us select significant variables and construct a parsimonious model simultaneously. In this chapter, we first provide an overview of model selection approaches in the context of penalized least squares. We then review independence screening, a recently developed method for analyzing ultrahigh-dimensional data where the number of variables or parameters can be exponentially larger than the sample size. Finally, we discuss and advocate multistage procedures that combine independence screening and variable selection and that may be especially suitable for analyzing high-frequency financial data. Penalized least squares seek to keep important predictors in a model while penalizing coefficients associated with irrelevant predictors. As such, under certain conditions, penalized least squares can lead to a sparse solution for linear models and achieve asymptotic consistency in separating relevant variables from irrelevant ones. Independence screening selects relevant variables based on certain measures of marginal correlations between candidate variables and the response.

Original languageEnglish (US)
Title of host publicationHandbook of Financial Econometrics and Statistics
PublisherSpringer New York
Pages2093-2118
Number of pages26
ISBN (Electronic)9781461477501
ISBN (Print)9781461477495
DOIs
StatePublished - Jan 1 2015

Fingerprint

Model Selection
High-dimensional
Penalized Least Squares
Screening
Variable Selection
Predictors
Sample Size
High-frequency Data
Financial Data
Several Variables
High-dimensional Data
Finance
Model selection
Linear Model
Data analysis
Economics
Coefficient
Model
Independence
Least squares

All Science Journal Classification (ASJC) codes

  • Economics, Econometrics and Finance(all)
  • Business, Management and Accounting(all)
  • Mathematics(all)

Cite this

Huang, J. Z., Shi, Z., & Zhong, W. (2015). Model selection for high-dimensional problems. In Handbook of Financial Econometrics and Statistics (pp. 2093-2118). Springer New York. https://doi.org/10.1007/978-1-4614-7750-1_77
Huang, Jing Zhi ; Shi, Zhan ; Zhong, Wei. / Model selection for high-dimensional problems. Handbook of Financial Econometrics and Statistics. Springer New York, 2015. pp. 2093-2118
@inbook{f265f7026a7b4f43bcf65bedd4147d4b,
title = "Model selection for high-dimensional problems",
abstract = "High-dimensional data analysis is becoming more and more important to both academics and practitioners in finance and economics but is also very challenging because the number of variables or parameters in connection with such data can be larger than the sample size. Recently, several variable selection approaches have been developed and used to help us select significant variables and construct a parsimonious model simultaneously. In this chapter, we first provide an overview of model selection approaches in the context of penalized least squares. We then review independence screening, a recently developed method for analyzing ultrahigh-dimensional data where the number of variables or parameters can be exponentially larger than the sample size. Finally, we discuss and advocate multistage procedures that combine independence screening and variable selection and that may be especially suitable for analyzing high-frequency financial data. Penalized least squares seek to keep important predictors in a model while penalizing coefficients associated with irrelevant predictors. As such, under certain conditions, penalized least squares can lead to a sparse solution for linear models and achieve asymptotic consistency in separating relevant variables from irrelevant ones. Independence screening selects relevant variables based on certain measures of marginal correlations between candidate variables and the response.",
author = "Huang, {Jing Zhi} and Zhan Shi and Wei Zhong",
year = "2015",
month = "1",
day = "1",
doi = "10.1007/978-1-4614-7750-1_77",
language = "English (US)",
isbn = "9781461477495",
pages = "2093--2118",
booktitle = "Handbook of Financial Econometrics and Statistics",
publisher = "Springer New York",
address = "United States",

}

Huang, JZ, Shi, Z & Zhong, W 2015, Model selection for high-dimensional problems. in Handbook of Financial Econometrics and Statistics. Springer New York, pp. 2093-2118. https://doi.org/10.1007/978-1-4614-7750-1_77

Model selection for high-dimensional problems. / Huang, Jing Zhi; Shi, Zhan; Zhong, Wei.

Handbook of Financial Econometrics and Statistics. Springer New York, 2015. p. 2093-2118.

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Model selection for high-dimensional problems

AU - Huang, Jing Zhi

AU - Shi, Zhan

AU - Zhong, Wei

PY - 2015/1/1

Y1 - 2015/1/1

N2 - High-dimensional data analysis is becoming more and more important to both academics and practitioners in finance and economics but is also very challenging because the number of variables or parameters in connection with such data can be larger than the sample size. Recently, several variable selection approaches have been developed and used to help us select significant variables and construct a parsimonious model simultaneously. In this chapter, we first provide an overview of model selection approaches in the context of penalized least squares. We then review independence screening, a recently developed method for analyzing ultrahigh-dimensional data where the number of variables or parameters can be exponentially larger than the sample size. Finally, we discuss and advocate multistage procedures that combine independence screening and variable selection and that may be especially suitable for analyzing high-frequency financial data. Penalized least squares seek to keep important predictors in a model while penalizing coefficients associated with irrelevant predictors. As such, under certain conditions, penalized least squares can lead to a sparse solution for linear models and achieve asymptotic consistency in separating relevant variables from irrelevant ones. Independence screening selects relevant variables based on certain measures of marginal correlations between candidate variables and the response.

AB - High-dimensional data analysis is becoming more and more important to both academics and practitioners in finance and economics but is also very challenging because the number of variables or parameters in connection with such data can be larger than the sample size. Recently, several variable selection approaches have been developed and used to help us select significant variables and construct a parsimonious model simultaneously. In this chapter, we first provide an overview of model selection approaches in the context of penalized least squares. We then review independence screening, a recently developed method for analyzing ultrahigh-dimensional data where the number of variables or parameters can be exponentially larger than the sample size. Finally, we discuss and advocate multistage procedures that combine independence screening and variable selection and that may be especially suitable for analyzing high-frequency financial data. Penalized least squares seek to keep important predictors in a model while penalizing coefficients associated with irrelevant predictors. As such, under certain conditions, penalized least squares can lead to a sparse solution for linear models and achieve asymptotic consistency in separating relevant variables from irrelevant ones. Independence screening selects relevant variables based on certain measures of marginal correlations between candidate variables and the response.

UR - http://www.scopus.com/inward/record.url?scp=84945162765&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84945162765&partnerID=8YFLogxK

U2 - 10.1007/978-1-4614-7750-1_77

DO - 10.1007/978-1-4614-7750-1_77

M3 - Chapter

AN - SCOPUS:84945162765

SN - 9781461477495

SP - 2093

EP - 2118

BT - Handbook of Financial Econometrics and Statistics

PB - Springer New York

ER -

Huang JZ, Shi Z, Zhong W. Model selection for high-dimensional problems. In Handbook of Financial Econometrics and Statistics. Springer New York. 2015. p. 2093-2118 https://doi.org/10.1007/978-1-4614-7750-1_77