TY - GEN

T1 - Local, private, efficient protocols for succinct histograms

AU - Bassily, Raef

AU - Smith, Adam

N1 - Copyright:
Copyright 2016 Elsevier B.V., All rights reserved.

PY - 2015/6/14

Y1 - 2015/6/14

N2 - We give efficient protocols and matching accuracy lower bounds for frequency estimation in the local model for differential privacy. In this model, individual users randomize their data themselves, sending differentially private reports to an untrusted server that aggregates them. We study protocols that produce a succinct histogram representation of the data. A succinct histogram is a list of the most frequent items in the data (often called "heavy hitters") along with estimates of their frequencies; the frequency of all other items is implicitly estimated as 0. If there are n users whose items come from a universe of size d, our protocols run in time polynomial in n and log(d). With high probability, they estimate the accuracy of every item up to error O(√ log(d)/(∈2n)). Moreover, we show that this much error is necessary, regardless of computational efficiency, and even for the simple setting where only one item appears with significant frequency in the data set. Previous protocols (Mishra and Sandler, 2006; Hsu, Khanna and Roth, 2012) for this task either ran in time Ω(d) or had much worse error (about 6√log(d)/(∈2n)), and the only known lower bound on error was Ω(1/√n). We also adapt a result of McGregor et al (2010) to the local setting. In a model with public coins, we show that each user need only send 1 bit to the server. For all known local protocols (including ours), the transformation preserves computational efficiency.

AB - We give efficient protocols and matching accuracy lower bounds for frequency estimation in the local model for differential privacy. In this model, individual users randomize their data themselves, sending differentially private reports to an untrusted server that aggregates them. We study protocols that produce a succinct histogram representation of the data. A succinct histogram is a list of the most frequent items in the data (often called "heavy hitters") along with estimates of their frequencies; the frequency of all other items is implicitly estimated as 0. If there are n users whose items come from a universe of size d, our protocols run in time polynomial in n and log(d). With high probability, they estimate the accuracy of every item up to error O(√ log(d)/(∈2n)). Moreover, we show that this much error is necessary, regardless of computational efficiency, and even for the simple setting where only one item appears with significant frequency in the data set. Previous protocols (Mishra and Sandler, 2006; Hsu, Khanna and Roth, 2012) for this task either ran in time Ω(d) or had much worse error (about 6√log(d)/(∈2n)), and the only known lower bound on error was Ω(1/√n). We also adapt a result of McGregor et al (2010) to the local setting. In a model with public coins, we show that each user need only send 1 bit to the server. For all known local protocols (including ours), the transformation preserves computational efficiency.

UR - http://www.scopus.com/inward/record.url?scp=84958771236&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958771236&partnerID=8YFLogxK

U2 - 10.1145/2746539.2746632

DO - 10.1145/2746539.2746632

M3 - Conference contribution

AN - SCOPUS:84958771236

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 127

EP - 135

BT - STOC 2015 - Proceedings of the 2015 ACM Symposium on Theory of Computing

PB - Association for Computing Machinery

T2 - 47th Annual ACM Symposium on Theory of Computing, STOC 2015

Y2 - 14 June 2015 through 17 June 2015

ER -