TY - JOUR
T1 - How differential privacy will affect our understanding of health disparities in the United States
AU - Santos-Lozada, Alexis R.
AU - Howard, Jeffrey T.
AU - Verdery, Ashton M.
N1 - Funding Information:
ACKNOWLEDGMENTS. We thank the Integrated Public Use Microdata Series (IPUMS) and the National Historical Geographic Information System (NHGIS) teams for sharing the 2010 demonstration products in a format accessible for the data user community. This work was supported by the Population Research Institute (Grant R24-HD041025 and P2CHD041025), the Data Accelerator, and the Social Science Research Institute at the Pennsylvania State University. This work was also supported by the Center for Community Based and Applied Health Research at the University of Texas at San Antonio. We thank David Van Riper for his support in the early stages of this work. Finally, we thank the employees of the US Census Bureau for allowing the data-user community to provide timely feedback regarding this crucial issue.
Publisher Copyright:
© 2020 National Academy of Sciences. All rights reserved.
PY - 2020/6/16
Y1 - 2020/6/16
N2 - The application of a currently proposed differential privacy algorithm to the 2020 United States Census data and additional data products may affect the usefulness of these data, the accuracy of estimates and rates derived from them, and critical knowledge about social phenomena such as health disparities. We test the ramifications of applying differential privacy to released data by studying estimates of US mortality rates for the overall population and three major racial/ethnic groups. We ask how changes in the denominators of these vital rates due to the implementation of differential privacy can lead to biased estimates. We situate where these changes are most likely to matter by disaggregating biases by population size, degree of urbanization, and adjacency to a metropolitan area. Our results suggest that differential privacy will more strongly affect mortality rate estimates for non-Hispanic blacks and Hispanics than estimates for non-Hispanic whites. We also find significant changes in estimated mortality rates for less populous areas, with more pronounced changes when stratified by race/ethnicity. We find larger changes in estimated mortality rates for areas with lower levels of urbanization or adjacency to metropolitan areas, with these changes being greater for non-Hispanic blacks and Hispanics. These findings highlight the consequences of implementing differential privacy, as proposed, for research examining population composition, particularly mortality disparities across racial/ethnic groups and along the urban/rural continuum. Overall, they demonstrate the challenges in using the data products derived from the proposed disclosure avoidance methods, while highlighting critical instances where scientific understandings may be negatively impacted.
AB - The application of a currently proposed differential privacy algorithm to the 2020 United States Census data and additional data products may affect the usefulness of these data, the accuracy of estimates and rates derived from them, and critical knowledge about social phenomena such as health disparities. We test the ramifications of applying differential privacy to released data by studying estimates of US mortality rates for the overall population and three major racial/ethnic groups. We ask how changes in the denominators of these vital rates due to the implementation of differential privacy can lead to biased estimates. We situate where these changes are most likely to matter by disaggregating biases by population size, degree of urbanization, and adjacency to a metropolitan area. Our results suggest that differential privacy will more strongly affect mortality rate estimates for non-Hispanic blacks and Hispanics than estimates for non-Hispanic whites. We also find significant changes in estimated mortality rates for less populous areas, with more pronounced changes when stratified by race/ethnicity. We find larger changes in estimated mortality rates for areas with lower levels of urbanization or adjacency to metropolitan areas, with these changes being greater for non-Hispanic blacks and Hispanics. These findings highlight the consequences of implementing differential privacy, as proposed, for research examining population composition, particularly mortality disparities across racial/ethnic groups and along the urban/rural continuum. Overall, they demonstrate the challenges in using the data products derived from the proposed disclosure avoidance methods, while highlighting critical instances where scientific understandings may be negatively impacted.
UR - http://www.scopus.com/inward/record.url?scp=85086682034&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086682034&partnerID=8YFLogxK
U2 - 10.1073/pnas.2003714117
DO - 10.1073/pnas.2003714117
M3 - Article
C2 - 32467167
AN - SCOPUS:85086682034
SN - 0027-8424
VL - 117
SP - 13405
EP - 13412
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 24
ER -