# On the empirical estimation of integral probability metrics

Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Gert R.G. Lanckriet

Research output: Contribution to journalArticlepeer-review

62 Scopus citations

## Abstract

Given two probability measures, ℙ and ℚ defined on a measurable space, S, the integral probability metric (IPM) is defined as \gamma_{\EuScript{F}}(\mathbb{P},\mathbb{Q})=\sup\left\{\left\vert \int_{S}f\,d\mathbb{P}-\int_{S}f\,d\mathbb{Q}\right\vert\,:\,f\in\EuScript{F}\right\}, where \EuScript{F} is a class of real-valued bounded measurable functions on S. By appropriately choosing \EuScript{F}, various popular distances between ℙ and ℚ, including the Kantorovich metric, Fortet-Mourier metric, dual-bounded Lipschitz distance (also called the Dudley metric), total variation distance, and kernel distance, can be obtained. In this paper, we consider the problem of estimating \gamma_{\EuScript{F}} from finite random samples drawn i.i.d. from ℙ and ℚ. Although the above mentioned distances cannot be computed in closed form for every ℙ and ℚ, we show their empirical estimators to be easily computable, and strongly consistent (except for the total-variation distance). We further analyze their rates of convergence. Based on these results, we discuss the advantages of certain choices of \EuScript{F} (and therefore the corresponding IPMs) over others-in particular, the kernel distance is shown to have three favorable properties compared with the other mentioned distances: it is computationally cheaper, the empirical estimate converges at a faster rate to the population value, and the rate of convergence is independent of the dimension d of the space (for S=ℝd). We also provide a novel interpretation of IPMs and their empirical estimators by relating them to the problem of binary classification: while the IPM between class-conditional distributions is the negative of the optimal risk associated with a binary classifier, the smoothness of an appropriate binary classifier (e.g., support vector machine, Lipschitz classifier, etc.) is inversely related to the empirical estimator of the IPM between these class-conditional distributions.

Original language English (US) 1550-1599 50 Electronic Journal of Statistics 6 https://doi.org/10.1214/12-EJS722 Published - 2012

## All Science Journal Classification (ASJC) codes

• Statistics and Probability
• Statistics, Probability and Uncertainty