Information measures in statistical privacy and data processing applications

Bing Rong Lin, Daniel Kifer

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

In statistical privacy, utility refers to two concepts: information preservation, how much statistical information is retained by a sanitizing algorithm, and usability, how (and with how much difficulty) one extracts this information to build statistical models, answer queries, and so forth. Some scenarios incentivize a separation between information preservation and usability, so that the data owner first chooses a sanitizing algorithm to maximize a measure of information preservation, and, afterward, the data consumers process the sanitized output according to their various individual needs [Ghosh et al. 2009; Williams and McSherry 2010]. We analyze the information-preserving properties of utility measures with a combination of two new and three existing utility axioms and study how violations of an axiom can be fixed. We show that the average (over possible outputs of the sanitizer) error of Bayesian decision makers forms the unique class of utility measures that satisfy all of the axioms. The axioms are agnostic to Bayesian concepts such as subjective probabilities and hence strengthen support for Bayesian views in privacy research. In particular, this result connects information preservation to aspects of usability - if the information preservation of a sanitizing algorithm should be measured as the average error of a Bayesian decision maker, shouldn't Bayesian decision theory be a good choice when it comes to using the sanitized outputs for various purposes? We put this idea to the test in the unattributed histogram problem where our decision-theoretic postprocessing algorithm empirically outperforms previously proposed approaches.

Original languageEnglish (US)
JournalACM Transactions on Knowledge Discovery from Data
Volume9
Issue number4
DOIs
StatePublished - Jun 1 2015

Fingerprint

Decision theory
Statistical Models

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

@article{5eaeb8be1f25411cbbf6fc3fbc092526,
title = "Information measures in statistical privacy and data processing applications",
abstract = "In statistical privacy, utility refers to two concepts: information preservation, how much statistical information is retained by a sanitizing algorithm, and usability, how (and with how much difficulty) one extracts this information to build statistical models, answer queries, and so forth. Some scenarios incentivize a separation between information preservation and usability, so that the data owner first chooses a sanitizing algorithm to maximize a measure of information preservation, and, afterward, the data consumers process the sanitized output according to their various individual needs [Ghosh et al. 2009; Williams and McSherry 2010]. We analyze the information-preserving properties of utility measures with a combination of two new and three existing utility axioms and study how violations of an axiom can be fixed. We show that the average (over possible outputs of the sanitizer) error of Bayesian decision makers forms the unique class of utility measures that satisfy all of the axioms. The axioms are agnostic to Bayesian concepts such as subjective probabilities and hence strengthen support for Bayesian views in privacy research. In particular, this result connects information preservation to aspects of usability - if the information preservation of a sanitizing algorithm should be measured as the average error of a Bayesian decision maker, shouldn't Bayesian decision theory be a good choice when it comes to using the sanitized outputs for various purposes? We put this idea to the test in the unattributed histogram problem where our decision-theoretic postprocessing algorithm empirically outperforms previously proposed approaches.",
author = "Lin, {Bing Rong} and Daniel Kifer",
year = "2015",
month = "6",
day = "1",
doi = "10.1145/2700407",
language = "English (US)",
volume = "9",
journal = "ACM Transactions on Knowledge Discovery from Data",
issn = "1556-4681",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

Information measures in statistical privacy and data processing applications. / Lin, Bing Rong; Kifer, Daniel.

In: ACM Transactions on Knowledge Discovery from Data, Vol. 9, No. 4, 01.06.2015.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Information measures in statistical privacy and data processing applications

AU - Lin, Bing Rong

AU - Kifer, Daniel

PY - 2015/6/1

Y1 - 2015/6/1

N2 - In statistical privacy, utility refers to two concepts: information preservation, how much statistical information is retained by a sanitizing algorithm, and usability, how (and with how much difficulty) one extracts this information to build statistical models, answer queries, and so forth. Some scenarios incentivize a separation between information preservation and usability, so that the data owner first chooses a sanitizing algorithm to maximize a measure of information preservation, and, afterward, the data consumers process the sanitized output according to their various individual needs [Ghosh et al. 2009; Williams and McSherry 2010]. We analyze the information-preserving properties of utility measures with a combination of two new and three existing utility axioms and study how violations of an axiom can be fixed. We show that the average (over possible outputs of the sanitizer) error of Bayesian decision makers forms the unique class of utility measures that satisfy all of the axioms. The axioms are agnostic to Bayesian concepts such as subjective probabilities and hence strengthen support for Bayesian views in privacy research. In particular, this result connects information preservation to aspects of usability - if the information preservation of a sanitizing algorithm should be measured as the average error of a Bayesian decision maker, shouldn't Bayesian decision theory be a good choice when it comes to using the sanitized outputs for various purposes? We put this idea to the test in the unattributed histogram problem where our decision-theoretic postprocessing algorithm empirically outperforms previously proposed approaches.

AB - In statistical privacy, utility refers to two concepts: information preservation, how much statistical information is retained by a sanitizing algorithm, and usability, how (and with how much difficulty) one extracts this information to build statistical models, answer queries, and so forth. Some scenarios incentivize a separation between information preservation and usability, so that the data owner first chooses a sanitizing algorithm to maximize a measure of information preservation, and, afterward, the data consumers process the sanitized output according to their various individual needs [Ghosh et al. 2009; Williams and McSherry 2010]. We analyze the information-preserving properties of utility measures with a combination of two new and three existing utility axioms and study how violations of an axiom can be fixed. We show that the average (over possible outputs of the sanitizer) error of Bayesian decision makers forms the unique class of utility measures that satisfy all of the axioms. The axioms are agnostic to Bayesian concepts such as subjective probabilities and hence strengthen support for Bayesian views in privacy research. In particular, this result connects information preservation to aspects of usability - if the information preservation of a sanitizing algorithm should be measured as the average error of a Bayesian decision maker, shouldn't Bayesian decision theory be a good choice when it comes to using the sanitized outputs for various purposes? We put this idea to the test in the unattributed histogram problem where our decision-theoretic postprocessing algorithm empirically outperforms previously proposed approaches.

UR - http://www.scopus.com/inward/record.url?scp=84930813442&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84930813442&partnerID=8YFLogxK

U2 - 10.1145/2700407

DO - 10.1145/2700407

M3 - Article

AN - SCOPUS:84930813442

VL - 9

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

SN - 1556-4681

IS - 4

ER -