TY - JOUR

T1 - Synthetic two-way contingency tables that preserve conditional frequencies

AU - Slavković, Aleksandra B.

AU - Lee, Juyoun

N1 - Funding Information:
The both authors were supported at some point by the grant research reported here was supported in part by NSF Grant SES-0532407 to the Department of Statistics, Pennsylvania State University.

PY - 2010/5

Y1 - 2010/5

N2 - In the area of statistical limitation, releasing synthetic data sets has become a popular method for limiting the risks of disclosure of sensitive information and at the same time maintaining analytic utility of data. However, less work has been done on how to create synthetic contingency tables that preserve some summary statistics of the original table. Studies in this area have primarily focused on generating replacement tables that preserve the margins of the original table since the latter support statistical inferences for a large set of parametric tests and models. Yet, not all synthetic tables that preserve a set of margins yield consistent results. In this paper, we propose alternative synthetic table releases. We describe how to generate complete two-way contingency tables that have the same set of observed conditional frequencies by using tools from computational algebra. We study both the disclosure risk and the data utility associated with such synthetic tabular data releases, and compare them to the traditionally released synthetic tables.

AB - In the area of statistical limitation, releasing synthetic data sets has become a popular method for limiting the risks of disclosure of sensitive information and at the same time maintaining analytic utility of data. However, less work has been done on how to create synthetic contingency tables that preserve some summary statistics of the original table. Studies in this area have primarily focused on generating replacement tables that preserve the margins of the original table since the latter support statistical inferences for a large set of parametric tests and models. Yet, not all synthetic tables that preserve a set of margins yield consistent results. In this paper, we propose alternative synthetic table releases. We describe how to generate complete two-way contingency tables that have the same set of observed conditional frequencies by using tools from computational algebra. We study both the disclosure risk and the data utility associated with such synthetic tabular data releases, and compare them to the traditionally released synthetic tables.

UR - http://www.scopus.com/inward/record.url?scp=77950918549&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950918549&partnerID=8YFLogxK

U2 - 10.1016/j.stamet.2009.11.002

DO - 10.1016/j.stamet.2009.11.002

M3 - Article

AN - SCOPUS:77950918549

VL - 7

SP - 225

EP - 239

JO - Statistical Methodology

JF - Statistical Methodology

SN - 1572-3127

IS - 3

ER -