CAREER: Model Selection for Semiparametric Regression Models in High Dimensional Modeling and its Oracle Properties

Project: Research project

Project Details


proposal: DMS 0348869 PI: Runze Li institution: The Pennsylvania State University Model Selection for Semiparametric Regression Models in High Dimensional Modeling and its Oracle Properties Abstract Model selection is fundamental to high-dimensional data analysis, and semiparametric regression models are potentially useful for analysis of high-dimensional data. Model selection for semiparametric regression models consists of two components: model selection (such as choice of smoothing parameters) for the nonparametric component, and variable selection for the parametric portion. Traditional variable selection schemes, such as the stepwise deletion and the best subset variable selection, could be extended to semiparametric modeling, but they are expensive in computation since they require the smoothing parameters to be selected for each submodel. The objectives of this proposal are to develop new widely applicable model selection procedures for three classes of semiparametric models which provide a unified framework for many existing semiparametric regression models in the literature. In this proposal, the PI (a) studies the asymptotic behaviors of the proposed estimators, (b) demonstrates how the rate of convergence of the resulting estimator depends on the regularization parameter, (c) shows that the proposed procedures perform as well as the oracle procedure in variable selection for semiparametric regression models, and (d) addresses issues related to implementation of the proposed procedures. The PI also examines finite sample performance via extensive Monte Carlo simulation studies and applies the proposed procedures to analysis of real data. With modern data collection devices and vast data storage space, one can easily collect high-dimensional data, such as biotech data, financial data, satellite imagery and hyperspectral imagery. Analysis of high-dimensional data poses many challenges for statisticians and is becoming the most important research topic in statistics. This proposal (a) lays down a well-grounded and comprehensive framework for model selection for semiparametric regression modeling in high-dimensional data analysis, (b) has significant impact on the future research of high-dimensional statistical modeling, and (c) enhances significantly the availability of statistical tools and software for high-dimensional statistical modeling. The proposed work is incorporated into a new topic course from which graduate students may directly benefit. The proposed work also benefits a broad range of scientists and researchers in various fields, including automotive engineering, medical studies, prevention studies, public health and social sciences.

Effective start/end date7/1/046/30/11


  • National Science Foundation: $440,000.00
  • National Science Foundation: $440,000.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.