Variable selection techniques are often used in combination with multiple linear regression to produce a parsimonious model that fits the data well. It is clearly undesirable for the final model to depend strongly on the inclusion of a few influential cases in the data set. This article discusses a measure of influence of single cases on the final model, based on a similar measure used in ordinary multiple regression. When variables are selected objectively, deletion of individual cases can strongly affect the choice of model. The influence of individual cases on the parameters of the selected model are often assessed as part of the model building process. However, such conditional measures fail to evaluate the influence of the cases on the variable selection process. Modern computing environments make it feasible to use an unconditional criterion to determine the influence of each case on the selection procedure. A number of examples are discussed to illustrate the differences between these approaches. Heuristics are developed to explain the examples. We conclude that, although the conditional approach gives valuable information about the selected model, the use of the unconditional approach can lead to greater insight about the influence of individual observations on the process of model selection.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty