Information preservation in statistical privacy and Bayesian estimation of unattributed histograms

Bing Rong Lin, Daniel Kifer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

In statistical privacy, utility refers to two concepts: information preservation - how much statistical information is retained by a sanitizing algorithm, and usability - how (and with how much difficulty) does one extract this information to build statistical models, answer queries, etc. Some scenarios incentivize a separation between information preservation and usability, so that the data owner first chooses a sanitizing algorithm to maximize a measure of information preservation and, afterward, the data consumers process the sanitized output according to their needs [22, 46]. We analyze a variety of utility measures and show that the average (over possible outputs of the sanitizer) error of Bayesian decision makers forms the unique class of utility measures that satisfy three axioms related to information preservation. The axioms are agnostic to Bayesian concepts such as subjective probabilities and hence strengthen support for Bayesian views in privacy research. In particular, this result connects information preservation to aspects of usability - if the information preservation of a sanitizing algorithm should be measured as the average error of a Bayesian decision maker, shouldn't Bayesian decision theory be a good choice when it comes to using the sanitized outputs for various purposes? We put this idea to the test in the unattributed histogram problem where our decision-theoretic post-processing algorithm empirically outperforms previously proposed approaches.

Original languageEnglish (US)
Title of host publicationSIGMOD 2013 - International Conference on Management of Data
Pages677-688
Number of pages12
DOIs
StatePublished - Jul 29 2013
Event2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013 - New York, NY, United States
Duration: Jun 22 2013Jun 27 2013

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013
CountryUnited States
CityNew York, NY
Period6/22/136/27/13

Fingerprint

Decision theory
Processing
Statistical Models

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Lin, B. R., & Kifer, D. (2013). Information preservation in statistical privacy and Bayesian estimation of unattributed histograms. In SIGMOD 2013 - International Conference on Management of Data (pp. 677-688). (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/2463676.2463721
Lin, Bing Rong ; Kifer, Daniel. / Information preservation in statistical privacy and Bayesian estimation of unattributed histograms. SIGMOD 2013 - International Conference on Management of Data. 2013. pp. 677-688 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
@inproceedings{c21b287732814caebd864ae47a4b4c6f,
title = "Information preservation in statistical privacy and Bayesian estimation of unattributed histograms",
abstract = "In statistical privacy, utility refers to two concepts: information preservation - how much statistical information is retained by a sanitizing algorithm, and usability - how (and with how much difficulty) does one extract this information to build statistical models, answer queries, etc. Some scenarios incentivize a separation between information preservation and usability, so that the data owner first chooses a sanitizing algorithm to maximize a measure of information preservation and, afterward, the data consumers process the sanitized output according to their needs [22, 46]. We analyze a variety of utility measures and show that the average (over possible outputs of the sanitizer) error of Bayesian decision makers forms the unique class of utility measures that satisfy three axioms related to information preservation. The axioms are agnostic to Bayesian concepts such as subjective probabilities and hence strengthen support for Bayesian views in privacy research. In particular, this result connects information preservation to aspects of usability - if the information preservation of a sanitizing algorithm should be measured as the average error of a Bayesian decision maker, shouldn't Bayesian decision theory be a good choice when it comes to using the sanitized outputs for various purposes? We put this idea to the test in the unattributed histogram problem where our decision-theoretic post-processing algorithm empirically outperforms previously proposed approaches.",
author = "Lin, {Bing Rong} and Daniel Kifer",
year = "2013",
month = "7",
day = "29",
doi = "10.1145/2463676.2463721",
language = "English (US)",
isbn = "9781450320375",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
pages = "677--688",
booktitle = "SIGMOD 2013 - International Conference on Management of Data",

}

Lin, BR & Kifer, D 2013, Information preservation in statistical privacy and Bayesian estimation of unattributed histograms. in SIGMOD 2013 - International Conference on Management of Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 677-688, 2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013, New York, NY, United States, 6/22/13. https://doi.org/10.1145/2463676.2463721

Information preservation in statistical privacy and Bayesian estimation of unattributed histograms. / Lin, Bing Rong; Kifer, Daniel.

SIGMOD 2013 - International Conference on Management of Data. 2013. p. 677-688 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Information preservation in statistical privacy and Bayesian estimation of unattributed histograms

AU - Lin, Bing Rong

AU - Kifer, Daniel

PY - 2013/7/29

Y1 - 2013/7/29

N2 - In statistical privacy, utility refers to two concepts: information preservation - how much statistical information is retained by a sanitizing algorithm, and usability - how (and with how much difficulty) does one extract this information to build statistical models, answer queries, etc. Some scenarios incentivize a separation between information preservation and usability, so that the data owner first chooses a sanitizing algorithm to maximize a measure of information preservation and, afterward, the data consumers process the sanitized output according to their needs [22, 46]. We analyze a variety of utility measures and show that the average (over possible outputs of the sanitizer) error of Bayesian decision makers forms the unique class of utility measures that satisfy three axioms related to information preservation. The axioms are agnostic to Bayesian concepts such as subjective probabilities and hence strengthen support for Bayesian views in privacy research. In particular, this result connects information preservation to aspects of usability - if the information preservation of a sanitizing algorithm should be measured as the average error of a Bayesian decision maker, shouldn't Bayesian decision theory be a good choice when it comes to using the sanitized outputs for various purposes? We put this idea to the test in the unattributed histogram problem where our decision-theoretic post-processing algorithm empirically outperforms previously proposed approaches.

AB - In statistical privacy, utility refers to two concepts: information preservation - how much statistical information is retained by a sanitizing algorithm, and usability - how (and with how much difficulty) does one extract this information to build statistical models, answer queries, etc. Some scenarios incentivize a separation between information preservation and usability, so that the data owner first chooses a sanitizing algorithm to maximize a measure of information preservation and, afterward, the data consumers process the sanitized output according to their needs [22, 46]. We analyze a variety of utility measures and show that the average (over possible outputs of the sanitizer) error of Bayesian decision makers forms the unique class of utility measures that satisfy three axioms related to information preservation. The axioms are agnostic to Bayesian concepts such as subjective probabilities and hence strengthen support for Bayesian views in privacy research. In particular, this result connects information preservation to aspects of usability - if the information preservation of a sanitizing algorithm should be measured as the average error of a Bayesian decision maker, shouldn't Bayesian decision theory be a good choice when it comes to using the sanitized outputs for various purposes? We put this idea to the test in the unattributed histogram problem where our decision-theoretic post-processing algorithm empirically outperforms previously proposed approaches.

UR - http://www.scopus.com/inward/record.url?scp=84880543792&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880543792&partnerID=8YFLogxK

U2 - 10.1145/2463676.2463721

DO - 10.1145/2463676.2463721

M3 - Conference contribution

AN - SCOPUS:84880543792

SN - 9781450320375

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 677

EP - 688

BT - SIGMOD 2013 - International Conference on Management of Data

ER -

Lin BR, Kifer D. Information preservation in statistical privacy and Bayesian estimation of unattributed histograms. In SIGMOD 2013 - International Conference on Management of Data. 2013. p. 677-688. (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/2463676.2463721