Privacy preserving k-means clustering with chaotic distortion

Jie Li, Yong Xu, Chao Hsien Chu, Yunfeng Wang

Research output: Contribution to journalConference article

Abstract

Randomized data distortion is a popular method used to mask the data for preserving the privacy. But the appropriateness of this method was questioned because of its possibility of disclosing original data. In this paper, the chaos system, with its unique characteristics of sensitivity on initial condition and unpredictability, is advocated to distort the original data with sensitive information for privacy preserving k-means clustering. The chaotic distortion procedure is proposed and three performance metrics specifically for k-means clustering are developed. We use a large scale experiment (with 4 real world data sets and corresponding reproduced 40 data sets) to evaluate its performance. Our study shows that the proposed approach is effective; it not only can protect individual privacy but also maintain original information of cluster centers.

Original languageEnglish (US)
Pages (from-to)61-67
Number of pages7
JournalProceedings of the International Conference on Electronic Business (ICEB)
StatePublished - Dec 1 2007

Fingerprint

Chaos theory
Masks
Experiments
Privacy
Privacy preserving
K-means clustering
Chaos
Initial conditions
Performance metrics
Appropriateness
Experiment

All Science Journal Classification (ASJC) codes

  • Business, Management and Accounting(all)
  • Computer Science(all)

Cite this

@article{8d05a00d48ae4b21a6cb6321fb84606c,
title = "Privacy preserving k-means clustering with chaotic distortion",
abstract = "Randomized data distortion is a popular method used to mask the data for preserving the privacy. But the appropriateness of this method was questioned because of its possibility of disclosing original data. In this paper, the chaos system, with its unique characteristics of sensitivity on initial condition and unpredictability, is advocated to distort the original data with sensitive information for privacy preserving k-means clustering. The chaotic distortion procedure is proposed and three performance metrics specifically for k-means clustering are developed. We use a large scale experiment (with 4 real world data sets and corresponding reproduced 40 data sets) to evaluate its performance. Our study shows that the proposed approach is effective; it not only can protect individual privacy but also maintain original information of cluster centers.",
author = "Jie Li and Yong Xu and Chu, {Chao Hsien} and Yunfeng Wang",
year = "2007",
month = "12",
day = "1",
language = "English (US)",
pages = "61--67",
journal = "Proceedings of the International Conference on Electronic Business (ICEB)",
issn = "1683-0040",

}

Privacy preserving k-means clustering with chaotic distortion. / Li, Jie; Xu, Yong; Chu, Chao Hsien; Wang, Yunfeng.

In: Proceedings of the International Conference on Electronic Business (ICEB), 01.12.2007, p. 61-67.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Privacy preserving k-means clustering with chaotic distortion

AU - Li, Jie

AU - Xu, Yong

AU - Chu, Chao Hsien

AU - Wang, Yunfeng

PY - 2007/12/1

Y1 - 2007/12/1

N2 - Randomized data distortion is a popular method used to mask the data for preserving the privacy. But the appropriateness of this method was questioned because of its possibility of disclosing original data. In this paper, the chaos system, with its unique characteristics of sensitivity on initial condition and unpredictability, is advocated to distort the original data with sensitive information for privacy preserving k-means clustering. The chaotic distortion procedure is proposed and three performance metrics specifically for k-means clustering are developed. We use a large scale experiment (with 4 real world data sets and corresponding reproduced 40 data sets) to evaluate its performance. Our study shows that the proposed approach is effective; it not only can protect individual privacy but also maintain original information of cluster centers.

AB - Randomized data distortion is a popular method used to mask the data for preserving the privacy. But the appropriateness of this method was questioned because of its possibility of disclosing original data. In this paper, the chaos system, with its unique characteristics of sensitivity on initial condition and unpredictability, is advocated to distort the original data with sensitive information for privacy preserving k-means clustering. The chaotic distortion procedure is proposed and three performance metrics specifically for k-means clustering are developed. We use a large scale experiment (with 4 real world data sets and corresponding reproduced 40 data sets) to evaluate its performance. Our study shows that the proposed approach is effective; it not only can protect individual privacy but also maintain original information of cluster centers.

UR - http://www.scopus.com/inward/record.url?scp=84873443638&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873443638&partnerID=8YFLogxK

M3 - Conference article

SP - 61

EP - 67

JO - Proceedings of the International Conference on Electronic Business (ICEB)

JF - Proceedings of the International Conference on Electronic Business (ICEB)

SN - 1683-0040

ER -