Differentially private inference for binomial data

Research output: Contribution to journalArticlepeer-review

Abstract

We derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of differential privacy (DP), optimizing finite sample performance. We show that in general, DP hypothesis tests can be written in terms of linear constraints, and for exchangeable data can always be expressed as a function of the empirical distribution. Using this structure, we prove a ‘Neyman-Pearson Lemma’ for binomial data under DP, where the DP-UMP only depends on the sample sum. Our tests can also be stated as a post-processing of a DP summary statistic, whose distribution we coin “Truncated-Uniform-Laplace” (Tulap), a generalization of the Staircase and discrete Laplace distributions. We show that by post-processing the Tulap statistic, we are able to obtain exact p-values corresponding to the DP-UMP, uniformly most accurate (UMA) one-sided confidence intervals, optimal confidence distributions, uniformly most powerful unbiased (UMPU) two-sided tests, and uniformly most accurate unbiased (UMAU) two-sided confidence intervals. As each of these quantities are a post-processing of the same summary statistic, there is no increased cost to privacy by including these additional results, allowing for a complete statistical analysis at a fixed privacy cost. We also show that our results can be applied to distribution-free hypothesis tests for continuous data. Our simulation results demonstrate that all our tests have exact type I error, and are more powerful than current techniques.

Original languageEnglish (US)
Pages (from-to)1-40
Number of pages40
JournalJournal of Privacy and Confidentiality
Volume10
Issue number1
DOIs
StatePublished - 2020

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Computer Science (miscellaneous)
  • Statistics and Probability

Fingerprint Dive into the research topics of 'Differentially private inference for binomial data'. Together they form a unique fingerprint.

Cite this