Differentially private data cubes: Optimizing noise sources and consistency

Bolin Ding, Marianne Winslett, Jiawei Han, Zhenhui Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

85 Citations (Scopus)

Abstract

Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table's dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals' privacy. In this paper, we address this problem using differential privacy (DP), which provides provable privacy guarantees for individuals by adding noise to query answers. We choose an initial subset of cuboids to compute directly from the fact table, injecting DP noise as usual; and then compute the remaining cuboids from the initial set. Given a fixed privacy guarantee, we show that it is NP-hard to choose the initial set of cuboids so that the maximal noise over all published cuboids is minimized, or so that the number of cuboids with noise below a given threshold (precise cuboids) is maximized. We provide an efficient procedure with running time polynomial in the number of cuboids to select the initial set of cuboids, such that the maximal noise in all published cuboids will be within a factor (ln|L| + 1)2 of the optimal, where |L| is the number of cuboids to be published, or the number of precise cuboids will be within a factor (1 - 1/e) of the optimal. We also show how to enforce consistency in the published cuboids while simultaneously improving their utility (reducing error). In an empirical evaluation on real and synthetic data, we report the amounts of error of different publishing algorithms, and show that our approaches outperform baselines significantly.

Original languageEnglish (US)
Title of host publicationProceedings of SIGMOD 2011 and PODS 2011
Pages217-228
Number of pages12
DOIs
StatePublished - Jul 11 2011
Event2011 ACM SIGMOD and 30th PODS 2011 Conference - Athens, Greece
Duration: Jun 12 2011Jun 16 2011

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2011 ACM SIGMOD and 30th PODS 2011 Conference
CountryGreece
CityAthens
Period6/12/116/16/11

Fingerprint

Wages
Polynomials

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Ding, B., Winslett, M., Han, J., & Li, Z. (2011). Differentially private data cubes: Optimizing noise sources and consistency. In Proceedings of SIGMOD 2011 and PODS 2011 (pp. 217-228). (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/1989323.1989347
Ding, Bolin ; Winslett, Marianne ; Han, Jiawei ; Li, Zhenhui. / Differentially private data cubes : Optimizing noise sources and consistency. Proceedings of SIGMOD 2011 and PODS 2011. 2011. pp. 217-228 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
@inproceedings{bde4fa3dfbc34f3687e54de28238e198,
title = "Differentially private data cubes: Optimizing noise sources and consistency",
abstract = "Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table's dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals' privacy. In this paper, we address this problem using differential privacy (DP), which provides provable privacy guarantees for individuals by adding noise to query answers. We choose an initial subset of cuboids to compute directly from the fact table, injecting DP noise as usual; and then compute the remaining cuboids from the initial set. Given a fixed privacy guarantee, we show that it is NP-hard to choose the initial set of cuboids so that the maximal noise over all published cuboids is minimized, or so that the number of cuboids with noise below a given threshold (precise cuboids) is maximized. We provide an efficient procedure with running time polynomial in the number of cuboids to select the initial set of cuboids, such that the maximal noise in all published cuboids will be within a factor (ln|L| + 1)2 of the optimal, where |L| is the number of cuboids to be published, or the number of precise cuboids will be within a factor (1 - 1/e) of the optimal. We also show how to enforce consistency in the published cuboids while simultaneously improving their utility (reducing error). In an empirical evaluation on real and synthetic data, we report the amounts of error of different publishing algorithms, and show that our approaches outperform baselines significantly.",
author = "Bolin Ding and Marianne Winslett and Jiawei Han and Zhenhui Li",
year = "2011",
month = "7",
day = "11",
doi = "10.1145/1989323.1989347",
language = "English (US)",
isbn = "9781450306614",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
pages = "217--228",
booktitle = "Proceedings of SIGMOD 2011 and PODS 2011",

}

Ding, B, Winslett, M, Han, J & Li, Z 2011, Differentially private data cubes: Optimizing noise sources and consistency. in Proceedings of SIGMOD 2011 and PODS 2011. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 217-228, 2011 ACM SIGMOD and 30th PODS 2011 Conference, Athens, Greece, 6/12/11. https://doi.org/10.1145/1989323.1989347

Differentially private data cubes : Optimizing noise sources and consistency. / Ding, Bolin; Winslett, Marianne; Han, Jiawei; Li, Zhenhui.

Proceedings of SIGMOD 2011 and PODS 2011. 2011. p. 217-228 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Differentially private data cubes

T2 - Optimizing noise sources and consistency

AU - Ding, Bolin

AU - Winslett, Marianne

AU - Han, Jiawei

AU - Li, Zhenhui

PY - 2011/7/11

Y1 - 2011/7/11

N2 - Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table's dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals' privacy. In this paper, we address this problem using differential privacy (DP), which provides provable privacy guarantees for individuals by adding noise to query answers. We choose an initial subset of cuboids to compute directly from the fact table, injecting DP noise as usual; and then compute the remaining cuboids from the initial set. Given a fixed privacy guarantee, we show that it is NP-hard to choose the initial set of cuboids so that the maximal noise over all published cuboids is minimized, or so that the number of cuboids with noise below a given threshold (precise cuboids) is maximized. We provide an efficient procedure with running time polynomial in the number of cuboids to select the initial set of cuboids, such that the maximal noise in all published cuboids will be within a factor (ln|L| + 1)2 of the optimal, where |L| is the number of cuboids to be published, or the number of precise cuboids will be within a factor (1 - 1/e) of the optimal. We also show how to enforce consistency in the published cuboids while simultaneously improving their utility (reducing error). In an empirical evaluation on real and synthetic data, we report the amounts of error of different publishing algorithms, and show that our approaches outperform baselines significantly.

AB - Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table's dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals' privacy. In this paper, we address this problem using differential privacy (DP), which provides provable privacy guarantees for individuals by adding noise to query answers. We choose an initial subset of cuboids to compute directly from the fact table, injecting DP noise as usual; and then compute the remaining cuboids from the initial set. Given a fixed privacy guarantee, we show that it is NP-hard to choose the initial set of cuboids so that the maximal noise over all published cuboids is minimized, or so that the number of cuboids with noise below a given threshold (precise cuboids) is maximized. We provide an efficient procedure with running time polynomial in the number of cuboids to select the initial set of cuboids, such that the maximal noise in all published cuboids will be within a factor (ln|L| + 1)2 of the optimal, where |L| is the number of cuboids to be published, or the number of precise cuboids will be within a factor (1 - 1/e) of the optimal. We also show how to enforce consistency in the published cuboids while simultaneously improving their utility (reducing error). In an empirical evaluation on real and synthetic data, we report the amounts of error of different publishing algorithms, and show that our approaches outperform baselines significantly.

UR - http://www.scopus.com/inward/record.url?scp=79959954388&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959954388&partnerID=8YFLogxK

U2 - 10.1145/1989323.1989347

DO - 10.1145/1989323.1989347

M3 - Conference contribution

AN - SCOPUS:79959954388

SN - 9781450306614

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 217

EP - 228

BT - Proceedings of SIGMOD 2011 and PODS 2011

ER -

Ding B, Winslett M, Han J, Li Z. Differentially private data cubes: Optimizing noise sources and consistency. In Proceedings of SIGMOD 2011 and PODS 2011. 2011. p. 217-228. (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/1989323.1989347