### Abstract

Marginal (contingency) tables are the method of choice for government agencies releasing statistical summaries of categorical data. In this paper, we derive lower bounds on how much distortion (noise) is necessary in these tables to ensure the privacy of sensitive data. We extend a line of recent work on impossibility results for private data analysis [9, 12, 13, 15] to a natural and important class of functionalities. Consider a database consisting of n rows (one per individual), each row comprising d binary attributes. For any subset of T attributes of size |T|=k, the marginal table for T has 2^{k} entries; each entry counts how many times in the database a particular setting of these attributes occurs. We provide lower bounds for releasing all d k k-attribute marginal tables under several different notions of privacy. (1) We give efficient polynomial time attacks which allow an adversary to reconstruct sensitive information given insufficiently perturbed marginal table releases. In particular, for a constant k, we obtain a tight bound of Ω̃(min √n, √d^{k-1}) on the average distortion per entry for any mechanism that releases all k-attribute marginals while providing "attribute" privacy (a weak notion implied by most privacy definitions). (2) Our reconstruction attacks require a new lower bound on the least singular value of a random matrix with correlated rows. Let M ^{(k)} be a matrix with d k rows formed by taking all possible k-way entry-wise products of an underlying set of d random vectors from {0,1} ^{n}. For constant k, we show that the least singular value of M ^{(k)} is Ω̃(√d^{k}) with high probability (the same asymptotic bound as for independent rows). (3) We obtain stronger lower bounds for marginal tables satisfying differential privacy. We give a lower bound of Ω̃(min {√n, √ d^{k}), which is tight for n Ω̃ (d^{k}). We extend our analysis to obtain stronger results for mechanisms that add instance-independent noise and weaker results when k is super-constant.

Original language | English (US) |
---|---|

Title of host publication | STOC'10 - Proceedings of the 2010 ACM International Symposium on Theory of Computing |

Pages | 775-784 |

Number of pages | 10 |

DOIs | |

State | Published - Jul 23 2010 |

Event | 42nd ACM Symposium on Theory of Computing, STOC 2010 - Cambridge, MA, United States Duration: Jun 5 2010 → Jun 8 2010 |

### Publication series

Name | Proceedings of the Annual ACM Symposium on Theory of Computing |
---|---|

ISSN (Print) | 0737-8017 |

### Other

Other | 42nd ACM Symposium on Theory of Computing, STOC 2010 |
---|---|

Country | United States |

City | Cambridge, MA |

Period | 6/5/10 → 6/8/10 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Software

### Cite this

*STOC'10 - Proceedings of the 2010 ACM International Symposium on Theory of Computing*(pp. 775-784). (Proceedings of the Annual ACM Symposium on Theory of Computing). https://doi.org/10.1145/1806689.1806795