The power of linear reconstruction attacks

Shiva Prasad Kasiviswanathan, Mark Rudelson, Adam Davison Smith

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

We consider the power of "linear reconstruction attacks" in statistical data privacy, showing that they can be applied to a much wider range of settings than previously understood. Linear attacks have been studied before [3, 6, 11, 1, 14] but have so far been applied only in settings with releases that are "obviously" linear. Consider a database curator who manages a database of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). This setting is widely considered in the literature, partly since it arises with medical data. Specifically, we show one can mount linear reconstruction attacks based on any release that gives: 1. the fraction of records that satisfy a given nondegenerate boolean function. Such releases include contingency tables (previously studied by Kasiviswanathan et al. [11]) as well as more complex outputs like the error rate of classifiers such as decision trees; 2. any one of a large class of M-estimators (that is, the output of empirical risk minimization algorithms), including the standard estimators for linear and logistic regression. We make two contributions: first, we show how these types of releases can be transformed into a linear format, making them amenable to existing polynomial-time reconstruction algorithms. This is already perhaps surprising, since many of the above releases (like M-estimators) are obtained by solving highly nonlinear formulations. Second, we show how to analyze the resulting attacks under various distributional assumptions on the data. Specifically, we consider a setting in which the same statistic (either 1 or 2 above) is released about how the sensitive attribute relates to all subsets of size k (out of a total of d) nonsensitive boolean attributes.

Original languageEnglish (US)
Title of host publicationProceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013
Pages1415-1433
Number of pages19
StatePublished - Apr 16 2013
Event24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013 - New Orleans, LA, United States
Duration: Jan 6 2013Jan 8 2013

Publication series

NameProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

Other

Other24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013
CountryUnited States
CityNew Orleans, LA
Period1/6/131/8/13

Fingerprint

Attack
Attribute
Statistics
M-estimator
Data privacy
Boolean functions
Decision trees
Logistics
Classifiers
Polynomials
Output
Contingency Table
Reconstruction Algorithm
Logistic Regression
Boolean Functions
Linear regression
Decision tree
Polynomial-time Algorithm
Privacy
Statistic

All Science Journal Classification (ASJC) codes

  • Software
  • Mathematics(all)

Cite this

Kasiviswanathan, S. P., Rudelson, M., & Smith, A. D. (2013). The power of linear reconstruction attacks. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013 (pp. 1415-1433). (Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms).
Kasiviswanathan, Shiva Prasad ; Rudelson, Mark ; Smith, Adam Davison. / The power of linear reconstruction attacks. Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013. 2013. pp. 1415-1433 (Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms).
@inproceedings{93fc4b275f674b0d9d54edc1fd980297,
title = "The power of linear reconstruction attacks",
abstract = "We consider the power of {"}linear reconstruction attacks{"} in statistical data privacy, showing that they can be applied to a much wider range of settings than previously understood. Linear attacks have been studied before [3, 6, 11, 1, 14] but have so far been applied only in settings with releases that are {"}obviously{"} linear. Consider a database curator who manages a database of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). This setting is widely considered in the literature, partly since it arises with medical data. Specifically, we show one can mount linear reconstruction attacks based on any release that gives: 1. the fraction of records that satisfy a given nondegenerate boolean function. Such releases include contingency tables (previously studied by Kasiviswanathan et al. [11]) as well as more complex outputs like the error rate of classifiers such as decision trees; 2. any one of a large class of M-estimators (that is, the output of empirical risk minimization algorithms), including the standard estimators for linear and logistic regression. We make two contributions: first, we show how these types of releases can be transformed into a linear format, making them amenable to existing polynomial-time reconstruction algorithms. This is already perhaps surprising, since many of the above releases (like M-estimators) are obtained by solving highly nonlinear formulations. Second, we show how to analyze the resulting attacks under various distributional assumptions on the data. Specifically, we consider a setting in which the same statistic (either 1 or 2 above) is released about how the sensitive attribute relates to all subsets of size k (out of a total of d) nonsensitive boolean attributes.",
author = "Kasiviswanathan, {Shiva Prasad} and Mark Rudelson and Smith, {Adam Davison}",
year = "2013",
month = "4",
day = "16",
language = "English (US)",
isbn = "9781611972511",
series = "Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms",
pages = "1415--1433",
booktitle = "Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013",

}

Kasiviswanathan, SP, Rudelson, M & Smith, AD 2013, The power of linear reconstruction attacks. in Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013. Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1415-1433, 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, LA, United States, 1/6/13.

The power of linear reconstruction attacks. / Kasiviswanathan, Shiva Prasad; Rudelson, Mark; Smith, Adam Davison.

Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013. 2013. p. 1415-1433 (Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - The power of linear reconstruction attacks

AU - Kasiviswanathan, Shiva Prasad

AU - Rudelson, Mark

AU - Smith, Adam Davison

PY - 2013/4/16

Y1 - 2013/4/16

N2 - We consider the power of "linear reconstruction attacks" in statistical data privacy, showing that they can be applied to a much wider range of settings than previously understood. Linear attacks have been studied before [3, 6, 11, 1, 14] but have so far been applied only in settings with releases that are "obviously" linear. Consider a database curator who manages a database of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). This setting is widely considered in the literature, partly since it arises with medical data. Specifically, we show one can mount linear reconstruction attacks based on any release that gives: 1. the fraction of records that satisfy a given nondegenerate boolean function. Such releases include contingency tables (previously studied by Kasiviswanathan et al. [11]) as well as more complex outputs like the error rate of classifiers such as decision trees; 2. any one of a large class of M-estimators (that is, the output of empirical risk minimization algorithms), including the standard estimators for linear and logistic regression. We make two contributions: first, we show how these types of releases can be transformed into a linear format, making them amenable to existing polynomial-time reconstruction algorithms. This is already perhaps surprising, since many of the above releases (like M-estimators) are obtained by solving highly nonlinear formulations. Second, we show how to analyze the resulting attacks under various distributional assumptions on the data. Specifically, we consider a setting in which the same statistic (either 1 or 2 above) is released about how the sensitive attribute relates to all subsets of size k (out of a total of d) nonsensitive boolean attributes.

AB - We consider the power of "linear reconstruction attacks" in statistical data privacy, showing that they can be applied to a much wider range of settings than previously understood. Linear attacks have been studied before [3, 6, 11, 1, 14] but have so far been applied only in settings with releases that are "obviously" linear. Consider a database curator who manages a database of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). This setting is widely considered in the literature, partly since it arises with medical data. Specifically, we show one can mount linear reconstruction attacks based on any release that gives: 1. the fraction of records that satisfy a given nondegenerate boolean function. Such releases include contingency tables (previously studied by Kasiviswanathan et al. [11]) as well as more complex outputs like the error rate of classifiers such as decision trees; 2. any one of a large class of M-estimators (that is, the output of empirical risk minimization algorithms), including the standard estimators for linear and logistic regression. We make two contributions: first, we show how these types of releases can be transformed into a linear format, making them amenable to existing polynomial-time reconstruction algorithms. This is already perhaps surprising, since many of the above releases (like M-estimators) are obtained by solving highly nonlinear formulations. Second, we show how to analyze the resulting attacks under various distributional assumptions on the data. Specifically, we consider a setting in which the same statistic (either 1 or 2 above) is released about how the sensitive attribute relates to all subsets of size k (out of a total of d) nonsensitive boolean attributes.

UR - http://www.scopus.com/inward/record.url?scp=84876061488&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876061488&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84876061488

SN - 9781611972511

T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

SP - 1415

EP - 1433

BT - Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013

ER -

Kasiviswanathan SP, Rudelson M, Smith AD. The power of linear reconstruction attacks. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013. 2013. p. 1415-1433. (Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms).