Density estimation in infinite dimensional exponential families

Bharath Kumar Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Aapo Hyvärinen, Revant Kumar

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

In this paper, we consider an infinite dimensional exponential family P of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space H, and show it to be quite rich in the sense that a broad class of densities on Rd can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in P. Motivated by this approximation property, the paper addresses the question of estimating an unknown density po through an element in P. Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between po and P, do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. We propose an estimator pn based on minimizing the Fisher divergence, J(po||p) between po and p ϵ P, which involves solving a simple finite-dimensional linear system. When po ϵ P, we show that the pro-posed estimator is consistent, and provide a convergence rate of n-min {2/3, 2β+1/2β+2} in Fisher divergence under the smoothness assumption that log po ϵ 72. (Cβ) for some β ≥ 0, where C is a certain Hilbert-Schmidt operator on H and R(Cβ) denotes the image of Cβ. We also investigate the misspecified case of po ϵ P and show that J(po||pn) → infpϵP J(po||p) as n → ∞, and provide a rate for this convergence under a similar smoothness condition as above. Through numerical simulations we demonstrate that the proposed estimator outperforms the non-parametric kernel density estimator, and that the advantage of the proposed estimator grows as d increases.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume18
StatePublished - Jul 1 2017

Fingerprint

Maximum likelihood estimation
Exponential Family
Density Estimation
Estimator
Sieves
Hilbert spaces
Kullback-Leibler Divergence
Linear systems
Maximum Likelihood Estimation
Smoothness
Divergence
Pseudo-maximum Likelihood
Computer simulation
Hilbert-Schmidt Operator
Kernel Density Estimator
Reproducing Kernel Hilbert Space
Sieve
Approximation Property
Probability Density
Partition Function

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Cite this

Sriperumbudur, Bharath Kumar ; Fukumizu, Kenji ; Gretton, Arthur ; Hyvärinen, Aapo ; Kumar, Revant. / Density estimation in infinite dimensional exponential families. In: Journal of Machine Learning Research. 2017 ; Vol. 18.
@article{8b96d5b26cd64cc8a410d62b04257142,
title = "Density estimation in infinite dimensional exponential families",
abstract = "In this paper, we consider an infinite dimensional exponential family P of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space H, and show it to be quite rich in the sense that a broad class of densities on Rd can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in P. Motivated by this approximation property, the paper addresses the question of estimating an unknown density po through an element in P. Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between po and P, do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. We propose an estimator pn based on minimizing the Fisher divergence, J(po||p) between po and p ϵ P, which involves solving a simple finite-dimensional linear system. When po ϵ P, we show that the pro-posed estimator is consistent, and provide a convergence rate of n-min {2/3, 2β+1/2β+2} in Fisher divergence under the smoothness assumption that log po ϵ 72. (Cβ) for some β ≥ 0, where C is a certain Hilbert-Schmidt operator on H and R(Cβ) denotes the image of Cβ. We also investigate the misspecified case of po ϵ P and show that J(po||pn) → infpϵP J(po||p) as n → ∞, and provide a rate for this convergence under a similar smoothness condition as above. Through numerical simulations we demonstrate that the proposed estimator outperforms the non-parametric kernel density estimator, and that the advantage of the proposed estimator grows as d increases.",
author = "Sriperumbudur, {Bharath Kumar} and Kenji Fukumizu and Arthur Gretton and Aapo Hyv{\"a}rinen and Revant Kumar",
year = "2017",
month = "7",
day = "1",
language = "English (US)",
volume = "18",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

Density estimation in infinite dimensional exponential families. / Sriperumbudur, Bharath Kumar; Fukumizu, Kenji; Gretton, Arthur; Hyvärinen, Aapo; Kumar, Revant.

In: Journal of Machine Learning Research, Vol. 18, 01.07.2017.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Density estimation in infinite dimensional exponential families

AU - Sriperumbudur, Bharath Kumar

AU - Fukumizu, Kenji

AU - Gretton, Arthur

AU - Hyvärinen, Aapo

AU - Kumar, Revant

PY - 2017/7/1

Y1 - 2017/7/1

N2 - In this paper, we consider an infinite dimensional exponential family P of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space H, and show it to be quite rich in the sense that a broad class of densities on Rd can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in P. Motivated by this approximation property, the paper addresses the question of estimating an unknown density po through an element in P. Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between po and P, do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. We propose an estimator pn based on minimizing the Fisher divergence, J(po||p) between po and p ϵ P, which involves solving a simple finite-dimensional linear system. When po ϵ P, we show that the pro-posed estimator is consistent, and provide a convergence rate of n-min {2/3, 2β+1/2β+2} in Fisher divergence under the smoothness assumption that log po ϵ 72. (Cβ) for some β ≥ 0, where C is a certain Hilbert-Schmidt operator on H and R(Cβ) denotes the image of Cβ. We also investigate the misspecified case of po ϵ P and show that J(po||pn) → infpϵP J(po||p) as n → ∞, and provide a rate for this convergence under a similar smoothness condition as above. Through numerical simulations we demonstrate that the proposed estimator outperforms the non-parametric kernel density estimator, and that the advantage of the proposed estimator grows as d increases.

AB - In this paper, we consider an infinite dimensional exponential family P of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space H, and show it to be quite rich in the sense that a broad class of densities on Rd can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in P. Motivated by this approximation property, the paper addresses the question of estimating an unknown density po through an element in P. Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between po and P, do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. We propose an estimator pn based on minimizing the Fisher divergence, J(po||p) between po and p ϵ P, which involves solving a simple finite-dimensional linear system. When po ϵ P, we show that the pro-posed estimator is consistent, and provide a convergence rate of n-min {2/3, 2β+1/2β+2} in Fisher divergence under the smoothness assumption that log po ϵ 72. (Cβ) for some β ≥ 0, where C is a certain Hilbert-Schmidt operator on H and R(Cβ) denotes the image of Cβ. We also investigate the misspecified case of po ϵ P and show that J(po||pn) → infpϵP J(po||p) as n → ∞, and provide a rate for this convergence under a similar smoothness condition as above. Through numerical simulations we demonstrate that the proposed estimator outperforms the non-parametric kernel density estimator, and that the advantage of the proposed estimator grows as d increases.

UR - http://www.scopus.com/inward/record.url?scp=85025452858&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85025452858&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85025452858

VL - 18

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -