### Abstract

We introduce a new, generic framework for private data analysis.The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains.Our framework allows one to release functions f of the data withinstance-based additive noise. That is, the noise magnitude is determined not only by the function we want to release, but also bythe database itself. One of the challenges is to ensure that the noise magnitude does not leak information about the database. To address that, we calibrate the noise magnitude to the smoothsensitivity of f on the database x - - a measure of variabilityof f in the neighborhood of the instance x. The new frameworkgreatly expands the applicability of output perturbation, a technique for protecting individuals' privacy by adding a smallamount of random noise to the released statistics. To our knowledge, this is the first formal analysis of the effect of instance-basednoise in the context of data privacy. Our framework raises many interesting algorithmic questions. Namely,to apply the framework one must compute or approximate the smoothsensitivity of f on x. We show how to do this efficiently for several different functions, including the median and the cost ofthe minimum spanning tree. We also give a generic procedure based on sampling that allows one to release f(x) accurately on manydatabases x. This procedure is applicable even when no efficient algorithm for approximating smooth sensitivity of f is known orwhen f is given as a black box. We illustrate the procedure by applying it to k-SED (k-means) clustering and learning mixtures of Gaussians.

Original language | English (US) |
---|---|

Title of host publication | STOC'07 |

Subtitle of host publication | Proceedings of the 39th Annual ACM Symposium on Theory of Computing |

Pages | 75-84 |

Number of pages | 10 |

DOIs | |

State | Published - Oct 30 2007 |

Event | STOC'07: 39th Annual ACM Symposium on Theory of Computing - San Diego, CA, United States Duration: Jun 11 2007 → Jun 13 2007 |

### Publication series

Name | Proceedings of the Annual ACM Symposium on Theory of Computing |
---|---|

ISSN (Print) | 0737-8017 |

### Other

Other | STOC'07: 39th Annual ACM Symposium on Theory of Computing |
---|---|

Country | United States |

City | San Diego, CA |

Period | 6/11/07 → 6/13/07 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Software

### Cite this

*STOC'07: Proceedings of the 39th Annual ACM Symposium on Theory of Computing*(pp. 75-84). (Proceedings of the Annual ACM Symposium on Theory of Computing). https://doi.org/10.1145/1250790.1250803

}

*STOC'07: Proceedings of the 39th Annual ACM Symposium on Theory of Computing.*Proceedings of the Annual ACM Symposium on Theory of Computing, pp. 75-84, STOC'07: 39th Annual ACM Symposium on Theory of Computing, San Diego, CA, United States, 6/11/07. https://doi.org/10.1145/1250790.1250803

**Smooth sensitivity and sampling in private data analysis.** / Nissim, Kobbi; Raskhodnikova, Sofya; Smith, Adam.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Smooth sensitivity and sampling in private data analysis

AU - Nissim, Kobbi

AU - Raskhodnikova, Sofya

AU - Smith, Adam

PY - 2007/10/30

Y1 - 2007/10/30

N2 - We introduce a new, generic framework for private data analysis.The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains.Our framework allows one to release functions f of the data withinstance-based additive noise. That is, the noise magnitude is determined not only by the function we want to release, but also bythe database itself. One of the challenges is to ensure that the noise magnitude does not leak information about the database. To address that, we calibrate the noise magnitude to the smoothsensitivity of f on the database x - - a measure of variabilityof f in the neighborhood of the instance x. The new frameworkgreatly expands the applicability of output perturbation, a technique for protecting individuals' privacy by adding a smallamount of random noise to the released statistics. To our knowledge, this is the first formal analysis of the effect of instance-basednoise in the context of data privacy. Our framework raises many interesting algorithmic questions. Namely,to apply the framework one must compute or approximate the smoothsensitivity of f on x. We show how to do this efficiently for several different functions, including the median and the cost ofthe minimum spanning tree. We also give a generic procedure based on sampling that allows one to release f(x) accurately on manydatabases x. This procedure is applicable even when no efficient algorithm for approximating smooth sensitivity of f is known orwhen f is given as a black box. We illustrate the procedure by applying it to k-SED (k-means) clustering and learning mixtures of Gaussians.

AB - We introduce a new, generic framework for private data analysis.The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains.Our framework allows one to release functions f of the data withinstance-based additive noise. That is, the noise magnitude is determined not only by the function we want to release, but also bythe database itself. One of the challenges is to ensure that the noise magnitude does not leak information about the database. To address that, we calibrate the noise magnitude to the smoothsensitivity of f on the database x - - a measure of variabilityof f in the neighborhood of the instance x. The new frameworkgreatly expands the applicability of output perturbation, a technique for protecting individuals' privacy by adding a smallamount of random noise to the released statistics. To our knowledge, this is the first formal analysis of the effect of instance-basednoise in the context of data privacy. Our framework raises many interesting algorithmic questions. Namely,to apply the framework one must compute or approximate the smoothsensitivity of f on x. We show how to do this efficiently for several different functions, including the median and the cost ofthe minimum spanning tree. We also give a generic procedure based on sampling that allows one to release f(x) accurately on manydatabases x. This procedure is applicable even when no efficient algorithm for approximating smooth sensitivity of f is known orwhen f is given as a black box. We illustrate the procedure by applying it to k-SED (k-means) clustering and learning mixtures of Gaussians.

UR - http://www.scopus.com/inward/record.url?scp=35448955271&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35448955271&partnerID=8YFLogxK

U2 - 10.1145/1250790.1250803

DO - 10.1145/1250790.1250803

M3 - Conference contribution

AN - SCOPUS:35448955271

SN - 1595936319

SN - 9781595936318

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 75

EP - 84

BT - STOC'07

ER -