On the predictive potential of kernel principal components

Ben Jones, Andreas Artemiou, Bing Li

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

We give a probabilistic analysis of a phenomenon in statistics which, until recently, has not received a convincing explanation. This phenomenon is that the leading principal components tend to possess more predictive power for a response variable than lower-ranking ones despite the procedure being unsupervised. Our result, in its most general form, shows that the phenomenon goes far beyond the context of linear regression and classical principal components — if an arbitrary distribution for the predictor X and an arbitrary conditional distribution for Y |X are chosen then any measureable function g(Y ), subject to a mild condition, tends to be more correlated with the higher-ranking kernel principal components than with the lower-ranking ones. The “arbitrariness” is formulated in terms of unitary invariance then the tendency is explicitly quantified by exploring how unitary invariance relates to the Cauchy distribution. The most general results, for technical reasons, are shown for the case where the kernel space is finite dimensional. The occurency of this tendency in real world databases is also investigated to show that our results are consistent with observation.

Original languageEnglish (US)
Pages (from-to)1-23
Number of pages23
JournalElectronic Journal of Statistics
Volume14
Issue number1
DOIs
StatePublished - 2020

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint Dive into the research topics of 'On the predictive potential of kernel principal components'. Together they form a unique fingerprint.

Cite this