TY - GEN

T1 - Privacy-preserving statistical estimation with optimal convergence rates

AU - Smith, Adam Davison

PY - 2011/7/4

Y1 - 2011/7/4

N2 - Consider an analyst who wants to release aggregate statistics about a data set containing sensitive information. Using differentially private algorithms guarantees that the released statistics reveal very little about any particular record in the data set. In this paper we study the asymptotic properties of differentially private algorithms for statistical inference. We show that for a large class of statistical estimators T and input distributions P, there is a differentially private estimator AT with the same asymptotic distribution as T. That is, the random variables AT(X) and T(X) converge in distribution when X consists of an i.i.d. sample from P of increasing size. This implies that AT(X) is essentially as good as the original statistic T(X) for statistical inference, for sufficiently large samples. Our technique applies to (almost) any pair T,P such that T is asymptotically normal on i.i.d. samples from P - -in particular, to parametric maximum likelihood estimators and estimators for logistic and linear regression under standard regularity conditions. A consequence of our techniques is the existence of low-space streaming algorithms whose output converges to the same asymptotic distribution as a given estimator T (for the same class of estimators and input distributions as above).

AB - Consider an analyst who wants to release aggregate statistics about a data set containing sensitive information. Using differentially private algorithms guarantees that the released statistics reveal very little about any particular record in the data set. In this paper we study the asymptotic properties of differentially private algorithms for statistical inference. We show that for a large class of statistical estimators T and input distributions P, there is a differentially private estimator AT with the same asymptotic distribution as T. That is, the random variables AT(X) and T(X) converge in distribution when X consists of an i.i.d. sample from P of increasing size. This implies that AT(X) is essentially as good as the original statistic T(X) for statistical inference, for sufficiently large samples. Our technique applies to (almost) any pair T,P such that T is asymptotically normal on i.i.d. samples from P - -in particular, to parametric maximum likelihood estimators and estimators for logistic and linear regression under standard regularity conditions. A consequence of our techniques is the existence of low-space streaming algorithms whose output converges to the same asymptotic distribution as a given estimator T (for the same class of estimators and input distributions as above).

UR - http://www.scopus.com/inward/record.url?scp=79959714549&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959714549&partnerID=8YFLogxK

U2 - 10.1145/1993636.1993743

DO - 10.1145/1993636.1993743

M3 - Conference contribution

AN - SCOPUS:79959714549

SN - 9781450306911

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 813

EP - 821

BT - STOC'11 - Proceedings of the 43rd ACM Symposium on Theory of Computing

T2 - 43rd ACM Symposium on Theory of Computing, STOC'11

Y2 - 6 June 2011 through 8 June 2011

ER -