Hypothesis testing using pairwise distances and associated kernels

Dino Sejdinovic, Arthur Gretton, Bharath Sriperumbudur, Kenji Fukumizu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. The equivalence holds when energy distances are computed with semimetrics of negative type, in which case a kernel may be defined such that the RKHS distance between distributions corresponds exactly to the energy distance. We determine the class of probability distributions for which kernels induced by semimetrics are characteristic (that is, for which embeddings of the distributions to an RKHS are injective). Finally, we investigate the performance of this family of kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

Original languageEnglish (US)
Title of host publicationProceedings of the 29th International Conference on Machine Learning, ICML 2012
Pages1111-1118
Number of pages8
StatePublished - Oct 10 2012
Event29th International Conference on Machine Learning, ICML 2012 - Edinburgh, United Kingdom
Duration: Jun 26 2012Jul 1 2012

Publication series

NameProceedings of the 29th International Conference on Machine Learning, ICML 2012
Volume2

Other

Other29th International Conference on Machine Learning, ICML 2012
CountryUnited Kingdom
CityEdinburgh
Period6/26/127/1/12

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Education

Fingerprint Dive into the research topics of 'Hypothesis testing using pairwise distances and associated kernels'. Together they form a unique fingerprint.

Cite this