TY - JOUR

T1 - On the empirical estimation of integral probability metrics

AU - Sriperumbudur, Bharath K.

AU - Fukumizu, Kenji

AU - Gretton, Arthur

AU - Schölkopf, Bernhard

AU - Lanckriet, Gert R.G.

N1 - Copyright:
Copyright 2013 Elsevier B.V., All rights reserved.

PY - 2012

Y1 - 2012

N2 - Given two probability measures, ℙ and ℚ defined on a measurable space, S, the integral probability metric (IPM) is defined as \gamma_{\EuScript{F}}(\mathbb{P},\mathbb{Q})=\sup\left\{\left\vert \int_{S}f\,d\mathbb{P}-\int_{S}f\,d\mathbb{Q}\right\vert\,:\,f\in\EuScript{F}\right\}, where \EuScript{F} is a class of real-valued bounded measurable functions on S. By appropriately choosing \EuScript{F}, various popular distances between ℙ and ℚ, including the Kantorovich metric, Fortet-Mourier metric, dual-bounded Lipschitz distance (also called the Dudley metric), total variation distance, and kernel distance, can be obtained. In this paper, we consider the problem of estimating \gamma_{\EuScript{F}} from finite random samples drawn i.i.d. from ℙ and ℚ. Although the above mentioned distances cannot be computed in closed form for every ℙ and ℚ, we show their empirical estimators to be easily computable, and strongly consistent (except for the total-variation distance). We further analyze their rates of convergence. Based on these results, we discuss the advantages of certain choices of \EuScript{F} (and therefore the corresponding IPMs) over others-in particular, the kernel distance is shown to have three favorable properties compared with the other mentioned distances: it is computationally cheaper, the empirical estimate converges at a faster rate to the population value, and the rate of convergence is independent of the dimension d of the space (for S=ℝd). We also provide a novel interpretation of IPMs and their empirical estimators by relating them to the problem of binary classification: while the IPM between class-conditional distributions is the negative of the optimal risk associated with a binary classifier, the smoothness of an appropriate binary classifier (e.g., support vector machine, Lipschitz classifier, etc.) is inversely related to the empirical estimator of the IPM between these class-conditional distributions.

AB - Given two probability measures, ℙ and ℚ defined on a measurable space, S, the integral probability metric (IPM) is defined as \gamma_{\EuScript{F}}(\mathbb{P},\mathbb{Q})=\sup\left\{\left\vert \int_{S}f\,d\mathbb{P}-\int_{S}f\,d\mathbb{Q}\right\vert\,:\,f\in\EuScript{F}\right\}, where \EuScript{F} is a class of real-valued bounded measurable functions on S. By appropriately choosing \EuScript{F}, various popular distances between ℙ and ℚ, including the Kantorovich metric, Fortet-Mourier metric, dual-bounded Lipschitz distance (also called the Dudley metric), total variation distance, and kernel distance, can be obtained. In this paper, we consider the problem of estimating \gamma_{\EuScript{F}} from finite random samples drawn i.i.d. from ℙ and ℚ. Although the above mentioned distances cannot be computed in closed form for every ℙ and ℚ, we show their empirical estimators to be easily computable, and strongly consistent (except for the total-variation distance). We further analyze their rates of convergence. Based on these results, we discuss the advantages of certain choices of \EuScript{F} (and therefore the corresponding IPMs) over others-in particular, the kernel distance is shown to have three favorable properties compared with the other mentioned distances: it is computationally cheaper, the empirical estimate converges at a faster rate to the population value, and the rate of convergence is independent of the dimension d of the space (for S=ℝd). We also provide a novel interpretation of IPMs and their empirical estimators by relating them to the problem of binary classification: while the IPM between class-conditional distributions is the negative of the optimal risk associated with a binary classifier, the smoothness of an appropriate binary classifier (e.g., support vector machine, Lipschitz classifier, etc.) is inversely related to the empirical estimator of the IPM between these class-conditional distributions.

UR - http://www.scopus.com/inward/record.url?scp=84875150887&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84875150887&partnerID=8YFLogxK

U2 - 10.1214/12-EJS722

DO - 10.1214/12-EJS722

M3 - Article

AN - SCOPUS:84875150887

VL - 6

SP - 1550

EP - 1599

JO - Electronic Journal of Statistics

JF - Electronic Journal of Statistics

SN - 1935-7524

ER -