pMSE mechanism: Differentially private synthetic data with maximal distributional similarity

Joshua Snoke, Aleksandra Slavković

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We propose a method for the release of differentially private synthetic datasets. In many contexts, data contain sensitive values which cannot be released in their original form in order to protect individuals’ privacy. Synthetic data is a protection method that releases alternative values in place of the original ones, and differential privacy (DP) is a formal guarantee for quantifying the privacy loss. We propose a method that maximizes the distributional similarity of the synthetic data relative to the original data using a measure known as the pMSE, while guaranteeing ε-DP. We relax common DP assumptions concerning the distribution and boundedness of the original data. We prove theoretical results for the privacy guarantee and provide simulations for the empirical failure rate of the theoretical results under typical computational limitations. We give simulations for the accuracy of linear regression coefficients generated from the synthetic data compared with the accuracy of non-DP synthetic data and other DP methods. Additionally, our theoretical results extend a prior result for the sensitivity of the Gini Index to include continuous predictors.

Original languageEnglish (US)
Title of host publicationPrivacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2018, Proceedings
EditorsFrancisco Montes, Josep Domingo-Ferrer
PublisherSpringer Verlag
Pages138-159
Number of pages22
ISBN (Print)9783319997704
DOIs
StatePublished - Jan 1 2018
EventInternational Conference on Privacy in Statistical Databases, PSD 2018 - Valencia, Spain
Duration: Sep 26 2018Sep 28 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11126 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherInternational Conference on Privacy in Statistical Databases, PSD 2018
CountrySpain
CityValencia
Period9/26/189/28/18

Fingerprint

Synthetic Data
Linear regression
Privacy
Gini Index
Similarity
Failure Rate
Regression Coefficient
Boundedness
Predictors
Simulation
Maximise
Alternatives

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Snoke, J., & Slavković, A. (2018). pMSE mechanism: Differentially private synthetic data with maximal distributional similarity. In F. Montes, & J. Domingo-Ferrer (Eds.), Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2018, Proceedings (pp. 138-159). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11126 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-99771-1_10
Snoke, Joshua ; Slavković, Aleksandra. / pMSE mechanism : Differentially private synthetic data with maximal distributional similarity. Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2018, Proceedings. editor / Francisco Montes ; Josep Domingo-Ferrer. Springer Verlag, 2018. pp. 138-159 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e9cc760b492646858a998f8e86af3dda,
title = "pMSE mechanism: Differentially private synthetic data with maximal distributional similarity",
abstract = "We propose a method for the release of differentially private synthetic datasets. In many contexts, data contain sensitive values which cannot be released in their original form in order to protect individuals’ privacy. Synthetic data is a protection method that releases alternative values in place of the original ones, and differential privacy (DP) is a formal guarantee for quantifying the privacy loss. We propose a method that maximizes the distributional similarity of the synthetic data relative to the original data using a measure known as the pMSE, while guaranteeing ε-DP. We relax common DP assumptions concerning the distribution and boundedness of the original data. We prove theoretical results for the privacy guarantee and provide simulations for the empirical failure rate of the theoretical results under typical computational limitations. We give simulations for the accuracy of linear regression coefficients generated from the synthetic data compared with the accuracy of non-DP synthetic data and other DP methods. Additionally, our theoretical results extend a prior result for the sensitivity of the Gini Index to include continuous predictors.",
author = "Joshua Snoke and Aleksandra Slavković",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-319-99771-1_10",
language = "English (US)",
isbn = "9783319997704",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "138--159",
editor = "Francisco Montes and Josep Domingo-Ferrer",
booktitle = "Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2018, Proceedings",
address = "Germany",

}

Snoke, J & Slavković, A 2018, pMSE mechanism: Differentially private synthetic data with maximal distributional similarity. in F Montes & J Domingo-Ferrer (eds), Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11126 LNCS, Springer Verlag, pp. 138-159, International Conference on Privacy in Statistical Databases, PSD 2018, Valencia, Spain, 9/26/18. https://doi.org/10.1007/978-3-319-99771-1_10

pMSE mechanism : Differentially private synthetic data with maximal distributional similarity. / Snoke, Joshua; Slavković, Aleksandra.

Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2018, Proceedings. ed. / Francisco Montes; Josep Domingo-Ferrer. Springer Verlag, 2018. p. 138-159 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11126 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - pMSE mechanism

T2 - Differentially private synthetic data with maximal distributional similarity

AU - Snoke, Joshua

AU - Slavković, Aleksandra

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We propose a method for the release of differentially private synthetic datasets. In many contexts, data contain sensitive values which cannot be released in their original form in order to protect individuals’ privacy. Synthetic data is a protection method that releases alternative values in place of the original ones, and differential privacy (DP) is a formal guarantee for quantifying the privacy loss. We propose a method that maximizes the distributional similarity of the synthetic data relative to the original data using a measure known as the pMSE, while guaranteeing ε-DP. We relax common DP assumptions concerning the distribution and boundedness of the original data. We prove theoretical results for the privacy guarantee and provide simulations for the empirical failure rate of the theoretical results under typical computational limitations. We give simulations for the accuracy of linear regression coefficients generated from the synthetic data compared with the accuracy of non-DP synthetic data and other DP methods. Additionally, our theoretical results extend a prior result for the sensitivity of the Gini Index to include continuous predictors.

AB - We propose a method for the release of differentially private synthetic datasets. In many contexts, data contain sensitive values which cannot be released in their original form in order to protect individuals’ privacy. Synthetic data is a protection method that releases alternative values in place of the original ones, and differential privacy (DP) is a formal guarantee for quantifying the privacy loss. We propose a method that maximizes the distributional similarity of the synthetic data relative to the original data using a measure known as the pMSE, while guaranteeing ε-DP. We relax common DP assumptions concerning the distribution and boundedness of the original data. We prove theoretical results for the privacy guarantee and provide simulations for the empirical failure rate of the theoretical results under typical computational limitations. We give simulations for the accuracy of linear regression coefficients generated from the synthetic data compared with the accuracy of non-DP synthetic data and other DP methods. Additionally, our theoretical results extend a prior result for the sensitivity of the Gini Index to include continuous predictors.

UR - http://www.scopus.com/inward/record.url?scp=85053869321&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053869321&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-99771-1_10

DO - 10.1007/978-3-319-99771-1_10

M3 - Conference contribution

AN - SCOPUS:85053869321

SN - 9783319997704

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 138

EP - 159

BT - Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2018, Proceedings

A2 - Montes, Francisco

A2 - Domingo-Ferrer, Josep

PB - Springer Verlag

ER -

Snoke J, Slavković A. pMSE mechanism: Differentially private synthetic data with maximal distributional similarity. In Montes F, Domingo-Ferrer J, editors, Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2018, Proceedings. Springer Verlag. 2018. p. 138-159. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-99771-1_10