TY - JOUR
T1 - Providing accurate models across private partitioned data
T2 - Secure maximum likelihood estimation
AU - Snoke, Joshua
AU - Brick, Timothy R.
AU - Slavković, Aleksandra
AU - Hunte, Michael D.
N1 - Funding Information:
Received November 2017; revised April 2018. 1Supported in part by NSF Grants BCS-0941553 and SES-1534433 to the Department of Statistics, Pennsylvania State University. Supported also in part by the U.S. Census Bureau. 2Supported in part by Grant HHS-2012-ACF-ACYF-CF-0510 awarded to NorthCare by the Children’s Bureau. Key words and phrases. Partitioned data, privacy, secure multiparty computation, structural equation models, distributed maximum likelihood estimation.
Funding Information:
Supported in part by NSF Grants BCS-0941553 and SES-1534433 to the Department of Statistics, Pennsylvania State University. Supported also in part by the U.S. Census Bureau.Supported in part by Grant HHS-2012-ACF-ACYF-CF-0510 awarded to NorthCare by the Chil-dren’s Bureau.
Publisher Copyright:
© Institute of Mathematical Statistics, 2018.
PY - 2018/6
Y1 - 2018/6
N2 - This paper focuses on the privacy paradigm of providing access to researchers to remotely carry out analyses on sensitive data stored behind separate firewalls. We address the situation where the analysis demands data from multiple physically separate databases which cannot be combined. Motivating this work is a real model based on research data on kinship foster placement that came from multiple sources and could only be combined through a lengthy process with a trusted research network. We develop and demonstrate a method for accurate calculation of the multivariate normal likelihood, for a set of parameters given the partitioned data, which can then be maximized to obtain estimates. These estimates are achieved without sharing any data or any true intermediate statistics of the data across firewalls. We show that under a certain set of assumptions our method for estimation across these partitions achieves identical results as estimation with the full data. Privacy is maintained by adding noise at each partition. This ensures each party receives noisy statistics, such that the noise cannot be removed until the last step to obtain a single value, the true total log likelihood. Potential applications include all methods utilizing parameter estimation through maximizing the multivariate normal likelihood. We give detailed algorithms, along with available software, and present simulations and analyze the kinship foster placement data estimating structural equation models (SEMs) with partitioned data.
AB - This paper focuses on the privacy paradigm of providing access to researchers to remotely carry out analyses on sensitive data stored behind separate firewalls. We address the situation where the analysis demands data from multiple physically separate databases which cannot be combined. Motivating this work is a real model based on research data on kinship foster placement that came from multiple sources and could only be combined through a lengthy process with a trusted research network. We develop and demonstrate a method for accurate calculation of the multivariate normal likelihood, for a set of parameters given the partitioned data, which can then be maximized to obtain estimates. These estimates are achieved without sharing any data or any true intermediate statistics of the data across firewalls. We show that under a certain set of assumptions our method for estimation across these partitions achieves identical results as estimation with the full data. Privacy is maintained by adding noise at each partition. This ensures each party receives noisy statistics, such that the noise cannot be removed until the last step to obtain a single value, the true total log likelihood. Potential applications include all methods utilizing parameter estimation through maximizing the multivariate normal likelihood. We give detailed algorithms, along with available software, and present simulations and analyze the kinship foster placement data estimating structural equation models (SEMs) with partitioned data.
UR - http://www.scopus.com/inward/record.url?scp=85050905020&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050905020&partnerID=8YFLogxK
U2 - 10.1214/18-AOAS1171
DO - 10.1214/18-AOAS1171
M3 - Article
AN - SCOPUS:85050905020
VL - 12
SP - 877
EP - 914
JO - Annals of Applied Statistics
JF - Annals of Applied Statistics
SN - 1932-6157
IS - 2
ER -