Optimal rates for random Fourier features

Research output: Contribution to journalConference article

22 Citations (Scopus)

Abstract

Kernel methods represent one of the most powerful tools in machine learning to tackle problems expressed in terms of function values and derivatives due to their capability to represent and model complex relations. While these methods show good versatility, they are computationally intensive and have poor scalability to large data as they require operations on Gram matrices. In order to mitigate this serious computational limitation, recently randomized constructions have been proposed in the literature, which allow the application of fast linear algorithms. Random Fourier features (RFF) are among the most popular and widely applied constructions: they provide an easily computable, low-dimensional feature representation for shift-invariant kernels. Despite the popularity of RFFs, very little is understood theoretically about their approximation quality. In this paper, we provide a detailed finite-sample theoretical analysis about the approximation quality of RFFs by (i) establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and (ii) presenting guarantees in Lr (1 ≤ r < ∞) norms. We also propose an RFF approximation to derivatives of a kernel with a theoretical study on its approximation quality.

Original languageEnglish (US)
Pages (from-to)1144-1152
Number of pages9
JournalAdvances in Neural Information Processing Systems
Volume2015-January
StatePublished - Jan 1 2015
Event29th Annual Conference on Neural Information Processing Systems, NIPS 2015 - Montreal, Canada
Duration: Dec 7 2015Dec 12 2015

Fingerprint

Derivatives
Learning systems
Scalability

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

@article{279afd2f68b741e3a02200c997c1fcd9,
title = "Optimal rates for random Fourier features",
abstract = "Kernel methods represent one of the most powerful tools in machine learning to tackle problems expressed in terms of function values and derivatives due to their capability to represent and model complex relations. While these methods show good versatility, they are computationally intensive and have poor scalability to large data as they require operations on Gram matrices. In order to mitigate this serious computational limitation, recently randomized constructions have been proposed in the literature, which allow the application of fast linear algorithms. Random Fourier features (RFF) are among the most popular and widely applied constructions: they provide an easily computable, low-dimensional feature representation for shift-invariant kernels. Despite the popularity of RFFs, very little is understood theoretically about their approximation quality. In this paper, we provide a detailed finite-sample theoretical analysis about the approximation quality of RFFs by (i) establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and (ii) presenting guarantees in Lr (1 ≤ r < ∞) norms. We also propose an RFF approximation to derivatives of a kernel with a theoretical study on its approximation quality.",
author = "Sriperumbudur, {Bharath Kumar} and Zoltan Szabo",
year = "2015",
month = "1",
day = "1",
language = "English (US)",
volume = "2015-January",
pages = "1144--1152",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

Optimal rates for random Fourier features. / Sriperumbudur, Bharath Kumar; Szabo, Zoltan.

In: Advances in Neural Information Processing Systems, Vol. 2015-January, 01.01.2015, p. 1144-1152.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Optimal rates for random Fourier features

AU - Sriperumbudur, Bharath Kumar

AU - Szabo, Zoltan

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Kernel methods represent one of the most powerful tools in machine learning to tackle problems expressed in terms of function values and derivatives due to their capability to represent and model complex relations. While these methods show good versatility, they are computationally intensive and have poor scalability to large data as they require operations on Gram matrices. In order to mitigate this serious computational limitation, recently randomized constructions have been proposed in the literature, which allow the application of fast linear algorithms. Random Fourier features (RFF) are among the most popular and widely applied constructions: they provide an easily computable, low-dimensional feature representation for shift-invariant kernels. Despite the popularity of RFFs, very little is understood theoretically about their approximation quality. In this paper, we provide a detailed finite-sample theoretical analysis about the approximation quality of RFFs by (i) establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and (ii) presenting guarantees in Lr (1 ≤ r < ∞) norms. We also propose an RFF approximation to derivatives of a kernel with a theoretical study on its approximation quality.

AB - Kernel methods represent one of the most powerful tools in machine learning to tackle problems expressed in terms of function values and derivatives due to their capability to represent and model complex relations. While these methods show good versatility, they are computationally intensive and have poor scalability to large data as they require operations on Gram matrices. In order to mitigate this serious computational limitation, recently randomized constructions have been proposed in the literature, which allow the application of fast linear algorithms. Random Fourier features (RFF) are among the most popular and widely applied constructions: they provide an easily computable, low-dimensional feature representation for shift-invariant kernels. Despite the popularity of RFFs, very little is understood theoretically about their approximation quality. In this paper, we provide a detailed finite-sample theoretical analysis about the approximation quality of RFFs by (i) establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and (ii) presenting guarantees in Lr (1 ≤ r < ∞) norms. We also propose an RFF approximation to derivatives of a kernel with a theoretical study on its approximation quality.

UR - http://www.scopus.com/inward/record.url?scp=84965121722&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84965121722&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84965121722

VL - 2015-January

SP - 1144

EP - 1152

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -