Optimal kernel choice for large-scale two-sample tests

Arthur Gretton, Bharath Sriperumbudur, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

103 Citations (Scopus)

Abstract

Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 25
Subtitle of host publication26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
Pages1205-1213
Number of pages9
StatePublished - Dec 1 2012
Event26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012 - Lake Tahoe, NV, United States
Duration: Dec 3 2012Dec 6 2012

Publication series

NameAdvances in Neural Information Processing Systems
Volume2
ISSN (Print)1049-5258

Other

Other26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
CountryUnited States
CityLake Tahoe, NV
Period12/3/1212/6/12

Fingerprint

Statistics
Hilbert spaces
Probability distributions
Data storage equipment
Costs
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Gretton, A., Sriperumbudur, B., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., & Fukumizu, K. (2012). Optimal kernel choice for large-scale two-sample tests. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012 (pp. 1205-1213). (Advances in Neural Information Processing Systems; Vol. 2).
Gretton, Arthur ; Sriperumbudur, Bharath ; Sejdinovic, Dino ; Strathmann, Heiko ; Balakrishnan, Sivaraman ; Pontil, Massimiliano ; Fukumizu, Kenji. / Optimal kernel choice for large-scale two-sample tests. Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012. 2012. pp. 1205-1213 (Advances in Neural Information Processing Systems).
@inproceedings{338f6cbe2c234defad6e78eff9f36828,
title = "Optimal kernel choice for large-scale two-sample tests",
abstract = "Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.",
author = "Arthur Gretton and Bharath Sriperumbudur and Dino Sejdinovic and Heiko Strathmann and Sivaraman Balakrishnan and Massimiliano Pontil and Kenji Fukumizu",
year = "2012",
month = "12",
day = "1",
language = "English (US)",
isbn = "9781627480031",
series = "Advances in Neural Information Processing Systems",
pages = "1205--1213",
booktitle = "Advances in Neural Information Processing Systems 25",

}

Gretton, A, Sriperumbudur, B, Sejdinovic, D, Strathmann, H, Balakrishnan, S, Pontil, M & Fukumizu, K 2012, Optimal kernel choice for large-scale two-sample tests. in Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012. Advances in Neural Information Processing Systems, vol. 2, pp. 1205-1213, 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012, Lake Tahoe, NV, United States, 12/3/12.

Optimal kernel choice for large-scale two-sample tests. / Gretton, Arthur; Sriperumbudur, Bharath; Sejdinovic, Dino; Strathmann, Heiko; Balakrishnan, Sivaraman; Pontil, Massimiliano; Fukumizu, Kenji.

Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012. 2012. p. 1205-1213 (Advances in Neural Information Processing Systems; Vol. 2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Optimal kernel choice for large-scale two-sample tests

AU - Gretton, Arthur

AU - Sriperumbudur, Bharath

AU - Sejdinovic, Dino

AU - Strathmann, Heiko

AU - Balakrishnan, Sivaraman

AU - Pontil, Massimiliano

AU - Fukumizu, Kenji

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

AB - Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

UR - http://www.scopus.com/inward/record.url?scp=84877753617&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877753617&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84877753617

SN - 9781627480031

T3 - Advances in Neural Information Processing Systems

SP - 1205

EP - 1213

BT - Advances in Neural Information Processing Systems 25

ER -

Gretton A, Sriperumbudur B, Sejdinovic D, Strathmann H, Balakrishnan S, Pontil M et al. Optimal kernel choice for large-scale two-sample tests. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012. 2012. p. 1205-1213. (Advances in Neural Information Processing Systems).