Privacy implications of database ranking

Farhadur Rahman, Weimo Liu, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das

Research output: Chapter in Book/Report/Conference proceedingChapter

3 Citations (Scopus)

Abstract

In recent years, there has been much research in the adoption of Ranked Retrieval model (in addition to the Boolean retrieval model) in structured databases, especially those in a client-server environment (e.g., web databases). With this model, a search query returns top-k tuples according to not just exact matches of selection conditions, but a suitable ranking function. While much research has gone into the design of ranking functions and the efficient processing of top-k queries, this paper studies a novel problem on the privacy implications of database ranking. The motivation is a novel yet serious privacy leakage we found on real-world web databases which is caused by the ranking function design. Many such databases feature private attributes - e.g., a social network allows users to specify certain attributes as only visible to him/herself, but not to others. While these websites generally respect the privacy settings by not directly displaying private attribute values in search query answers, many of them nevertheless take into account such private attributes in the ranking function design. The conventional belief might be that tuple ranks alone are not enough to reveal the private attribute values. Our investigation, however, shows that this is not the case in reality. To address the problem, we introduce a taxonomy of the problem space with two dimensions, (1) the type of query interface and (2) the capability of adversaries. For each subspace, we develop a novel technique which either guarantees the successful inference of private attributes, or does so for a significant portion of realworld tuples. We demonstrate the effectiveness and efficiency of our techniques through theoretical analysis, extensive experiments over real-world datasets, as well as successful online attacks over websites with tens to hundreds of millions of users - e.g., Amazon Goodreads and Renren.com.

Original languageEnglish (US)
Title of host publicationProceedings of the VLDB Endowment
PublisherAssociation for Computing Machinery
Pages1106-1117
Number of pages12
Edition10 10
DOIs
StatePublished - Jan 1 2015
Event3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, Korea, Republic of
Duration: Sep 11 2006Sep 11 2006

Publication series

NameProceedings of the VLDB Endowment
Number10 10
Volume8
ISSN (Electronic)2150-8097

Other

Other3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006
CountryKorea, Republic of
CitySeoul
Period9/11/069/11/06

Fingerprint

Websites
Taxonomies
Servers
Processing
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Rahman, F., Liu, W., Thirumuruganathan, S., Zhang, N., & Das, G. (2015). Privacy implications of database ranking. In Proceedings of the VLDB Endowment (10 10 ed., pp. 1106-1117). (Proceedings of the VLDB Endowment; Vol. 8, No. 10 10). Association for Computing Machinery. https://doi.org/10.14778/2794367.2794379
Rahman, Farhadur ; Liu, Weimo ; Thirumuruganathan, Saravanan ; Zhang, Nan ; Das, Gautam. / Privacy implications of database ranking. Proceedings of the VLDB Endowment. 10 10. ed. Association for Computing Machinery, 2015. pp. 1106-1117 (Proceedings of the VLDB Endowment; 10 10).
@inbook{afc7a5b0421d4f26ad4a0a83a916592e,
title = "Privacy implications of database ranking",
abstract = "In recent years, there has been much research in the adoption of Ranked Retrieval model (in addition to the Boolean retrieval model) in structured databases, especially those in a client-server environment (e.g., web databases). With this model, a search query returns top-k tuples according to not just exact matches of selection conditions, but a suitable ranking function. While much research has gone into the design of ranking functions and the efficient processing of top-k queries, this paper studies a novel problem on the privacy implications of database ranking. The motivation is a novel yet serious privacy leakage we found on real-world web databases which is caused by the ranking function design. Many such databases feature private attributes - e.g., a social network allows users to specify certain attributes as only visible to him/herself, but not to others. While these websites generally respect the privacy settings by not directly displaying private attribute values in search query answers, many of them nevertheless take into account such private attributes in the ranking function design. The conventional belief might be that tuple ranks alone are not enough to reveal the private attribute values. Our investigation, however, shows that this is not the case in reality. To address the problem, we introduce a taxonomy of the problem space with two dimensions, (1) the type of query interface and (2) the capability of adversaries. For each subspace, we develop a novel technique which either guarantees the successful inference of private attributes, or does so for a significant portion of realworld tuples. We demonstrate the effectiveness and efficiency of our techniques through theoretical analysis, extensive experiments over real-world datasets, as well as successful online attacks over websites with tens to hundreds of millions of users - e.g., Amazon Goodreads and Renren.com.",
author = "Farhadur Rahman and Weimo Liu and Saravanan Thirumuruganathan and Nan Zhang and Gautam Das",
year = "2015",
month = "1",
day = "1",
doi = "10.14778/2794367.2794379",
language = "English (US)",
series = "Proceedings of the VLDB Endowment",
publisher = "Association for Computing Machinery",
number = "10 10",
pages = "1106--1117",
booktitle = "Proceedings of the VLDB Endowment",
edition = "10 10",

}

Rahman, F, Liu, W, Thirumuruganathan, S, Zhang, N & Das, G 2015, Privacy implications of database ranking. in Proceedings of the VLDB Endowment. 10 10 edn, Proceedings of the VLDB Endowment, no. 10 10, vol. 8, Association for Computing Machinery, pp. 1106-1117, 3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006, Seoul, Korea, Republic of, 9/11/06. https://doi.org/10.14778/2794367.2794379

Privacy implications of database ranking. / Rahman, Farhadur; Liu, Weimo; Thirumuruganathan, Saravanan; Zhang, Nan; Das, Gautam.

Proceedings of the VLDB Endowment. 10 10. ed. Association for Computing Machinery, 2015. p. 1106-1117 (Proceedings of the VLDB Endowment; Vol. 8, No. 10 10).

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Privacy implications of database ranking

AU - Rahman, Farhadur

AU - Liu, Weimo

AU - Thirumuruganathan, Saravanan

AU - Zhang, Nan

AU - Das, Gautam

PY - 2015/1/1

Y1 - 2015/1/1

N2 - In recent years, there has been much research in the adoption of Ranked Retrieval model (in addition to the Boolean retrieval model) in structured databases, especially those in a client-server environment (e.g., web databases). With this model, a search query returns top-k tuples according to not just exact matches of selection conditions, but a suitable ranking function. While much research has gone into the design of ranking functions and the efficient processing of top-k queries, this paper studies a novel problem on the privacy implications of database ranking. The motivation is a novel yet serious privacy leakage we found on real-world web databases which is caused by the ranking function design. Many such databases feature private attributes - e.g., a social network allows users to specify certain attributes as only visible to him/herself, but not to others. While these websites generally respect the privacy settings by not directly displaying private attribute values in search query answers, many of them nevertheless take into account such private attributes in the ranking function design. The conventional belief might be that tuple ranks alone are not enough to reveal the private attribute values. Our investigation, however, shows that this is not the case in reality. To address the problem, we introduce a taxonomy of the problem space with two dimensions, (1) the type of query interface and (2) the capability of adversaries. For each subspace, we develop a novel technique which either guarantees the successful inference of private attributes, or does so for a significant portion of realworld tuples. We demonstrate the effectiveness and efficiency of our techniques through theoretical analysis, extensive experiments over real-world datasets, as well as successful online attacks over websites with tens to hundreds of millions of users - e.g., Amazon Goodreads and Renren.com.

AB - In recent years, there has been much research in the adoption of Ranked Retrieval model (in addition to the Boolean retrieval model) in structured databases, especially those in a client-server environment (e.g., web databases). With this model, a search query returns top-k tuples according to not just exact matches of selection conditions, but a suitable ranking function. While much research has gone into the design of ranking functions and the efficient processing of top-k queries, this paper studies a novel problem on the privacy implications of database ranking. The motivation is a novel yet serious privacy leakage we found on real-world web databases which is caused by the ranking function design. Many such databases feature private attributes - e.g., a social network allows users to specify certain attributes as only visible to him/herself, but not to others. While these websites generally respect the privacy settings by not directly displaying private attribute values in search query answers, many of them nevertheless take into account such private attributes in the ranking function design. The conventional belief might be that tuple ranks alone are not enough to reveal the private attribute values. Our investigation, however, shows that this is not the case in reality. To address the problem, we introduce a taxonomy of the problem space with two dimensions, (1) the type of query interface and (2) the capability of adversaries. For each subspace, we develop a novel technique which either guarantees the successful inference of private attributes, or does so for a significant portion of realworld tuples. We demonstrate the effectiveness and efficiency of our techniques through theoretical analysis, extensive experiments over real-world datasets, as well as successful online attacks over websites with tens to hundreds of millions of users - e.g., Amazon Goodreads and Renren.com.

UR - http://www.scopus.com/inward/record.url?scp=84953931993&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84953931993&partnerID=8YFLogxK

U2 - 10.14778/2794367.2794379

DO - 10.14778/2794367.2794379

M3 - Chapter

AN - SCOPUS:84953931993

T3 - Proceedings of the VLDB Endowment

SP - 1106

EP - 1117

BT - Proceedings of the VLDB Endowment

PB - Association for Computing Machinery

ER -

Rahman F, Liu W, Thirumuruganathan S, Zhang N, Das G. Privacy implications of database ranking. In Proceedings of the VLDB Endowment. 10 10 ed. Association for Computing Machinery. 2015. p. 1106-1117. (Proceedings of the VLDB Endowment; 10 10). https://doi.org/10.14778/2794367.2794379