Logistic regression with variables subject to post randomization method

Yong Ming Jeffrey Woo, Aleksandra B. Slavkovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The Post Randomization Method (PRAM) is a disclosure avoidance method, where values of categorical variables are perturbed via some known probability mechanism, and only the perturbed data are released thus raising issues regarding disclosure risk and data utility. In this paper, we develop and implement a number of EM algorithms to obtain unbiased estimates of the logistic regression model with data subject to PRAM, and thus effectively account for the effects of PRAM and preserve data utility. Three different cases are considered: (1) covariates subject to PRAM, (2) response variable subject to PRAM, and (3) both covariates and response variables subject to PRAM. The proposed techniques improve on current methodology by increasing the applicability of PRAM to a wider range of products and could be extended to other type of generalized linear models. The effects of the level of perturbation and sample size on the estimates are evaluated, and relevant standard error estimates are developed and reported.

Original languageEnglish (US)
Title of host publicationPrivacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2012, Proceedings
Pages116-130
Number of pages15
DOIs
StatePublished - Oct 22 2012
EventInternational Conference on Privacy in Statistical Databases, PSD 2012 - Palermo, Italy
Duration: Sep 26 2012Sep 28 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7556 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherInternational Conference on Privacy in Statistical Databases, PSD 2012
CountryItaly
CityPalermo
Period9/26/129/28/12

Fingerprint

Logistic Regression
Randomisation
Logistics
Disclosure
Covariates
Categorical variable
Logistic Regression Model
Generalized Linear Model
EM Algorithm
Standard error
Estimate
Error Estimates
Sample Size
Perturbation
Methodology
Range of data

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Woo, Y. M. J., & Slavkovic, A. B. (2012). Logistic regression with variables subject to post randomization method. In Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2012, Proceedings (pp. 116-130). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7556 LNCS). https://doi.org/10.1007/978-3-642-33627-0-10
Woo, Yong Ming Jeffrey ; Slavkovic, Aleksandra B. / Logistic regression with variables subject to post randomization method. Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2012, Proceedings. 2012. pp. 116-130 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{6106c40d7287476f827efcc767b20adf,
title = "Logistic regression with variables subject to post randomization method",
abstract = "The Post Randomization Method (PRAM) is a disclosure avoidance method, where values of categorical variables are perturbed via some known probability mechanism, and only the perturbed data are released thus raising issues regarding disclosure risk and data utility. In this paper, we develop and implement a number of EM algorithms to obtain unbiased estimates of the logistic regression model with data subject to PRAM, and thus effectively account for the effects of PRAM and preserve data utility. Three different cases are considered: (1) covariates subject to PRAM, (2) response variable subject to PRAM, and (3) both covariates and response variables subject to PRAM. The proposed techniques improve on current methodology by increasing the applicability of PRAM to a wider range of products and could be extended to other type of generalized linear models. The effects of the level of perturbation and sample size on the estimates are evaluated, and relevant standard error estimates are developed and reported.",
author = "Woo, {Yong Ming Jeffrey} and Slavkovic, {Aleksandra B.}",
year = "2012",
month = "10",
day = "22",
doi = "10.1007/978-3-642-33627-0-10",
language = "English (US)",
isbn = "9783642336263",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "116--130",
booktitle = "Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2012, Proceedings",

}

Woo, YMJ & Slavkovic, AB 2012, Logistic regression with variables subject to post randomization method. in Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2012, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7556 LNCS, pp. 116-130, International Conference on Privacy in Statistical Databases, PSD 2012, Palermo, Italy, 9/26/12. https://doi.org/10.1007/978-3-642-33627-0-10

Logistic regression with variables subject to post randomization method. / Woo, Yong Ming Jeffrey; Slavkovic, Aleksandra B.

Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2012, Proceedings. 2012. p. 116-130 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7556 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Logistic regression with variables subject to post randomization method

AU - Woo, Yong Ming Jeffrey

AU - Slavkovic, Aleksandra B.

PY - 2012/10/22

Y1 - 2012/10/22

N2 - The Post Randomization Method (PRAM) is a disclosure avoidance method, where values of categorical variables are perturbed via some known probability mechanism, and only the perturbed data are released thus raising issues regarding disclosure risk and data utility. In this paper, we develop and implement a number of EM algorithms to obtain unbiased estimates of the logistic regression model with data subject to PRAM, and thus effectively account for the effects of PRAM and preserve data utility. Three different cases are considered: (1) covariates subject to PRAM, (2) response variable subject to PRAM, and (3) both covariates and response variables subject to PRAM. The proposed techniques improve on current methodology by increasing the applicability of PRAM to a wider range of products and could be extended to other type of generalized linear models. The effects of the level of perturbation and sample size on the estimates are evaluated, and relevant standard error estimates are developed and reported.

AB - The Post Randomization Method (PRAM) is a disclosure avoidance method, where values of categorical variables are perturbed via some known probability mechanism, and only the perturbed data are released thus raising issues regarding disclosure risk and data utility. In this paper, we develop and implement a number of EM algorithms to obtain unbiased estimates of the logistic regression model with data subject to PRAM, and thus effectively account for the effects of PRAM and preserve data utility. Three different cases are considered: (1) covariates subject to PRAM, (2) response variable subject to PRAM, and (3) both covariates and response variables subject to PRAM. The proposed techniques improve on current methodology by increasing the applicability of PRAM to a wider range of products and could be extended to other type of generalized linear models. The effects of the level of perturbation and sample size on the estimates are evaluated, and relevant standard error estimates are developed and reported.

UR - http://www.scopus.com/inward/record.url?scp=84867509153&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867509153&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-33627-0-10

DO - 10.1007/978-3-642-33627-0-10

M3 - Conference contribution

AN - SCOPUS:84867509153

SN - 9783642336263

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 116

EP - 130

BT - Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2012, Proceedings

ER -

Woo YMJ, Slavkovic AB. Logistic regression with variables subject to post randomization method. In Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2012, Proceedings. 2012. p. 116-130. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-33627-0-10