TY - JOUR
T1 - Assessing influence in variable selection problems
AU - Léger, Christian
AU - Altman, Naomi
N1 - Funding Information:
* Christian Leger is Assistant Professor, IRO Department, University of Montreal, Quebec, Canada H3C 317. Naomi Altman is Assistant Professor Biometrics Unit, Cornell University, Ithaca, NY 14853. Leger's work was supported by NSERC (Canada) and FCAR (Quebec). Altman's work was supported by Hatch Grant 151410 NYF. The authors thank C. W. Martin of the Hubbard Brook Experiment Station. USDA Forest Service, for supplying the Hubbard Brook Forest data, and S. Thurston for discussions of the analysis. They also thank A. Hadi for discussions that greatly clarified the strengths of various approaches to measuring influence.
PY - 1993/6
Y1 - 1993/6
N2 - Variable selection techniques are often used in combination with multiple linear regression to produce a parsimonious model that fits the data well. It is clearly undesirable for the final model to depend strongly on the inclusion of a few influential cases in the data set. This article discusses a measure of influence of single cases on the final model, based on a similar measure used in ordinary multiple regression. When variables are selected objectively, deletion of individual cases can strongly affect the choice of model. The influence of individual cases on the parameters of the selected model are often assessed as part of the model building process. However, such conditional measures fail to evaluate the influence of the cases on the variable selection process. Modern computing environments make it feasible to use an unconditional criterion to determine the influence of each case on the selection procedure. A number of examples are discussed to illustrate the differences between these approaches. Heuristics are developed to explain the examples. We conclude that, although the conditional approach gives valuable information about the selected model, the use of the unconditional approach can lead to greater insight about the influence of individual observations on the process of model selection.
AB - Variable selection techniques are often used in combination with multiple linear regression to produce a parsimonious model that fits the data well. It is clearly undesirable for the final model to depend strongly on the inclusion of a few influential cases in the data set. This article discusses a measure of influence of single cases on the final model, based on a similar measure used in ordinary multiple regression. When variables are selected objectively, deletion of individual cases can strongly affect the choice of model. The influence of individual cases on the parameters of the selected model are often assessed as part of the model building process. However, such conditional measures fail to evaluate the influence of the cases on the variable selection process. Modern computing environments make it feasible to use an unconditional criterion to determine the influence of each case on the selection procedure. A number of examples are discussed to illustrate the differences between these approaches. Heuristics are developed to explain the examples. We conclude that, although the conditional approach gives valuable information about the selected model, the use of the unconditional approach can lead to greater insight about the influence of individual observations on the process of model selection.
UR - http://www.scopus.com/inward/record.url?scp=21144473006&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=21144473006&partnerID=8YFLogxK
U2 - 10.1080/01621459.1993.10476306
DO - 10.1080/01621459.1993.10476306
M3 - Article
AN - SCOPUS:21144473006
VL - 88
SP - 547
EP - 556
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
SN - 0162-1459
IS - 422
ER -