TY - JOUR
T1 - Variable selection for partially linear models via Bayesian subset modeling with diffusing prior
AU - Wang, Jia
AU - Cai, Xizhen
AU - Li, Runze
N1 - Funding Information:
The authors are grateful to the Editor-in-Chief, an Associate Editor and the referees for comments and suggestions that led to significant improvements. This research was supported by NSF, USA grants DMS 1820702 , DMS 1953196 , DMS 2015539 and NIH, USA grants R01CA229542 and R01 ES019672 . The content is solely the responsibility of the authors and does not necessarily represent the official views of NSF and NIH.
Publisher Copyright:
© 2021 Elsevier Inc.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/5
Y1 - 2021/5
N2 - Most existing methods of variable selection in partially linear models (PLM) with ultrahigh dimensional covariates are based on partial residuals, which involve a two-step estimation procedure. While the estimation error produced in the first step may have an impact on the second step, multicollinearity among predictors adds additional challenges in the model selection procedure. In this paper, we propose a new Bayesian variable selection approach for PLM. This new proposal addresses those two issues simultaneously as (1) it is a one-step method which selects variables in PLM, even when the dimension of covariates increases at an exponential rate with the sample size, and (2) the method retains model selection consistency, and outperforms existing ones in the setting of highly correlated predictors. Distinguished from existing ones, our proposed procedure employs the difference-based method to reduce the impact from the estimation of the nonparametric component, and incorporates Bayesian subset modeling with diffusing prior (BSM-DP) to shrink the corresponding estimator in the linear component. The estimation is implemented by Gibbs sampling, and we prove that the posterior probability of the true model being selected converges to one asymptotically. Simulation studies support the theory and the efficiency of our methods as compared to other existing ones, followed by an application in a study of supermarket data.
AB - Most existing methods of variable selection in partially linear models (PLM) with ultrahigh dimensional covariates are based on partial residuals, which involve a two-step estimation procedure. While the estimation error produced in the first step may have an impact on the second step, multicollinearity among predictors adds additional challenges in the model selection procedure. In this paper, we propose a new Bayesian variable selection approach for PLM. This new proposal addresses those two issues simultaneously as (1) it is a one-step method which selects variables in PLM, even when the dimension of covariates increases at an exponential rate with the sample size, and (2) the method retains model selection consistency, and outperforms existing ones in the setting of highly correlated predictors. Distinguished from existing ones, our proposed procedure employs the difference-based method to reduce the impact from the estimation of the nonparametric component, and incorporates Bayesian subset modeling with diffusing prior (BSM-DP) to shrink the corresponding estimator in the linear component. The estimation is implemented by Gibbs sampling, and we prove that the posterior probability of the true model being selected converges to one asymptotically. Simulation studies support the theory and the efficiency of our methods as compared to other existing ones, followed by an application in a study of supermarket data.
UR - http://www.scopus.com/inward/record.url?scp=85101211387&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101211387&partnerID=8YFLogxK
U2 - 10.1016/j.jmva.2021.104733
DO - 10.1016/j.jmva.2021.104733
M3 - Article
AN - SCOPUS:85101211387
VL - 183
JO - Journal of Multivariate Analysis
JF - Journal of Multivariate Analysis
SN - 0047-259X
M1 - 104733
ER -