Private convex empirical risk minimization and high-dimensional regression

Daniel Kifer, Adam Davison Smith, Abhradeep Thakurta

Research output: Contribution to journalConference article

24 Citations (Scopus)

Abstract

We consider differentially private algorithms for convex empirical risk minimization (ERM). Differential privacy (Dwork et al., 2006b) is a recently introduced notion of privacy which guarantees that an algorithm's output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. Our motivation is the design of private algorithms for sparse learning problems, in which one aims to find solutions (e.g., regression parameters) with few non-zero coefficients. To this end: (a)We significantly extend the analysis of the "objective perturbation" algorithm of Chaudhuri et al. (2011) for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and to apply to problems with hard constraints and non-differentiable regularizers. We also give a tighter, data-dependent analysis of the additional error introduced by their method. A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a weak sense, then the limit algorithm is also differentially private. In particular, our methods give the best known algorithms for differentially private linear regression. These methods work in settings where the number of parameters p is less than the number of samples n. (b)We give the first two private algorithms for sparse regression problems in high-dimensional settings, where p is much larger than n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have vanishing empirical risk for n = poly(s; log p) when there exists a good regression vector with s nonzero coefficients. Our algorithms demonstrate that randomized algorithms for sparse regression problems can be both stable and accurate - A combination which is impossible for deterministic algorithms.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume23
StatePublished - Jan 1 2012
Event25th Annual Conference on Learning Theory, COLT 2012 - Edinburgh, United Kingdom
Duration: Jun 25 2012Jun 27 2012

Fingerprint

High-dimensional
Regression
Privacy
Linear regression
Collaborative Filtering
Dependent Data
Deterministic Algorithm
Coefficient
Randomized Algorithms
Collaborative filtering
Limit Theorems
Minimization Problem
Genomics
Economics
Perturbation
Converge
Output
Demonstrate

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Cite this

@article{60b687aafdc84e3d80834c6009577efd,
title = "Private convex empirical risk minimization and high-dimensional regression",
abstract = "We consider differentially private algorithms for convex empirical risk minimization (ERM). Differential privacy (Dwork et al., 2006b) is a recently introduced notion of privacy which guarantees that an algorithm's output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. Our motivation is the design of private algorithms for sparse learning problems, in which one aims to find solutions (e.g., regression parameters) with few non-zero coefficients. To this end: (a)We significantly extend the analysis of the {"}objective perturbation{"} algorithm of Chaudhuri et al. (2011) for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and to apply to problems with hard constraints and non-differentiable regularizers. We also give a tighter, data-dependent analysis of the additional error introduced by their method. A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a weak sense, then the limit algorithm is also differentially private. In particular, our methods give the best known algorithms for differentially private linear regression. These methods work in settings where the number of parameters p is less than the number of samples n. (b)We give the first two private algorithms for sparse regression problems in high-dimensional settings, where p is much larger than n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have vanishing empirical risk for n = poly(s; log p) when there exists a good regression vector with s nonzero coefficients. Our algorithms demonstrate that randomized algorithms for sparse regression problems can be both stable and accurate - A combination which is impossible for deterministic algorithms.",
author = "Daniel Kifer and Smith, {Adam Davison} and Abhradeep Thakurta",
year = "2012",
month = "1",
day = "1",
language = "English (US)",
volume = "23",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

Private convex empirical risk minimization and high-dimensional regression. / Kifer, Daniel; Smith, Adam Davison; Thakurta, Abhradeep.

In: Journal of Machine Learning Research, Vol. 23, 01.01.2012.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Private convex empirical risk minimization and high-dimensional regression

AU - Kifer, Daniel

AU - Smith, Adam Davison

AU - Thakurta, Abhradeep

PY - 2012/1/1

Y1 - 2012/1/1

N2 - We consider differentially private algorithms for convex empirical risk minimization (ERM). Differential privacy (Dwork et al., 2006b) is a recently introduced notion of privacy which guarantees that an algorithm's output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. Our motivation is the design of private algorithms for sparse learning problems, in which one aims to find solutions (e.g., regression parameters) with few non-zero coefficients. To this end: (a)We significantly extend the analysis of the "objective perturbation" algorithm of Chaudhuri et al. (2011) for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and to apply to problems with hard constraints and non-differentiable regularizers. We also give a tighter, data-dependent analysis of the additional error introduced by their method. A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a weak sense, then the limit algorithm is also differentially private. In particular, our methods give the best known algorithms for differentially private linear regression. These methods work in settings where the number of parameters p is less than the number of samples n. (b)We give the first two private algorithms for sparse regression problems in high-dimensional settings, where p is much larger than n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have vanishing empirical risk for n = poly(s; log p) when there exists a good regression vector with s nonzero coefficients. Our algorithms demonstrate that randomized algorithms for sparse regression problems can be both stable and accurate - A combination which is impossible for deterministic algorithms.

AB - We consider differentially private algorithms for convex empirical risk minimization (ERM). Differential privacy (Dwork et al., 2006b) is a recently introduced notion of privacy which guarantees that an algorithm's output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. Our motivation is the design of private algorithms for sparse learning problems, in which one aims to find solutions (e.g., regression parameters) with few non-zero coefficients. To this end: (a)We significantly extend the analysis of the "objective perturbation" algorithm of Chaudhuri et al. (2011) for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and to apply to problems with hard constraints and non-differentiable regularizers. We also give a tighter, data-dependent analysis of the additional error introduced by their method. A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a weak sense, then the limit algorithm is also differentially private. In particular, our methods give the best known algorithms for differentially private linear regression. These methods work in settings where the number of parameters p is less than the number of samples n. (b)We give the first two private algorithms for sparse regression problems in high-dimensional settings, where p is much larger than n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have vanishing empirical risk for n = poly(s; log p) when there exists a good regression vector with s nonzero coefficients. Our algorithms demonstrate that randomized algorithms for sparse regression problems can be both stable and accurate - A combination which is impossible for deterministic algorithms.

UR - http://www.scopus.com/inward/record.url?scp=84893149822&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893149822&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84893149822

VL - 23

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -