Discovering frequent patterns in sensitive data

Raghav Bhaskar, Srivatsan Laxman, Adam Davison Smith, Abhradeep Thakurta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

130 Citations (Scopus)

Abstract

Discovering frequent patterns from data is a popular exploratory technique in data mining. However, if the data are sensitive (e.g., patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there. We present two efficient algorithms for discovering the k most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning 'noisy' lists of patterns that are close to the actual list of k most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-k pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately 'robust' measure of interest.

Original languageEnglish (US)
Title of host publicationKDD'10 - Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data
Pages503-512
Number of pages10
DOIs
StatePublished - Sep 7 2010
Event16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD-2010 - Washington, DC, United States
Duration: Jul 25 2010Jul 28 2010

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD-2010
CountryUnited States
CityWashington, DC
Period7/25/107/28/10

Fingerprint

Data mining
Health
Experiments
Uncertainty

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Bhaskar, R., Laxman, S., Smith, A. D., & Thakurta, A. (2010). Discovering frequent patterns in sensitive data. In KDD'10 - Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data (pp. 503-512). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/1835804.1835869
Bhaskar, Raghav ; Laxman, Srivatsan ; Smith, Adam Davison ; Thakurta, Abhradeep. / Discovering frequent patterns in sensitive data. KDD'10 - Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data. 2010. pp. 503-512 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{379f00759cf4474c8e169fc769a68b03,
title = "Discovering frequent patterns in sensitive data",
abstract = "Discovering frequent patterns from data is a popular exploratory technique in data mining. However, if the data are sensitive (e.g., patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there. We present two efficient algorithms for discovering the k most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning 'noisy' lists of patterns that are close to the actual list of k most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-k pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately 'robust' measure of interest.",
author = "Raghav Bhaskar and Srivatsan Laxman and Smith, {Adam Davison} and Abhradeep Thakurta",
year = "2010",
month = "9",
day = "7",
doi = "10.1145/1835804.1835869",
language = "English (US)",
isbn = "9781450300551",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
pages = "503--512",
booktitle = "KDD'10 - Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data",

}

Bhaskar, R, Laxman, S, Smith, AD & Thakurta, A 2010, Discovering frequent patterns in sensitive data. in KDD'10 - Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 503-512, 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD-2010, Washington, DC, United States, 7/25/10. https://doi.org/10.1145/1835804.1835869

Discovering frequent patterns in sensitive data. / Bhaskar, Raghav; Laxman, Srivatsan; Smith, Adam Davison; Thakurta, Abhradeep.

KDD'10 - Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data. 2010. p. 503-512 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Discovering frequent patterns in sensitive data

AU - Bhaskar, Raghav

AU - Laxman, Srivatsan

AU - Smith, Adam Davison

AU - Thakurta, Abhradeep

PY - 2010/9/7

Y1 - 2010/9/7

N2 - Discovering frequent patterns from data is a popular exploratory technique in data mining. However, if the data are sensitive (e.g., patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there. We present two efficient algorithms for discovering the k most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning 'noisy' lists of patterns that are close to the actual list of k most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-k pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately 'robust' measure of interest.

AB - Discovering frequent patterns from data is a popular exploratory technique in data mining. However, if the data are sensitive (e.g., patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there. We present two efficient algorithms for discovering the k most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning 'noisy' lists of patterns that are close to the actual list of k most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-k pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately 'robust' measure of interest.

UR - http://www.scopus.com/inward/record.url?scp=77956209107&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77956209107&partnerID=8YFLogxK

U2 - 10.1145/1835804.1835869

DO - 10.1145/1835804.1835869

M3 - Conference contribution

SN - 9781450300551

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 503

EP - 512

BT - KDD'10 - Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data

ER -

Bhaskar R, Laxman S, Smith AD, Thakurta A. Discovering frequent patterns in sensitive data. In KDD'10 - Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data. 2010. p. 503-512. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/1835804.1835869