In this note we give a probabilistic explanation of a phenomenon that is frequently observed but whose reason is not well understood. That is, in a regression setting, the response (Y) is often highly correlated with the leading principal components of the predictor (X) even though there seems no logical reason for this connection. This phenomenon has long been noticed and discussed in the literature, and has received renewed interest recently because of the need for regressing Y on X of very high dimension, often with comparatively few sampling units, in which case it seems natural to regress on the first few principal components of X. This work stems from a discussion of a recent paper by Cook (2007) which, along with other developments, described a historical debate surrounding, and current interest in, this phenomenon.
|Original language||English (US)|
|Number of pages||9|
|State||Published - Oct 1 2009|
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty