Analyzing vocabulary intersections of expert annotations and topic models for data practices in privacy policies

Frederick Liu, Shomir Wilson, Florian Schaub, Norman Sadeh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations


Privacy policies are commonly used to inform users about the data collection and use practices of websites, mobile apps, and other products and services. However, the average Internet user struggles to understand the contents of these documents and generally does not read them. Natural language and machine learning techniques offer the promise of automatically extracting relevant statements from privacy policies to help generate succinct summaries, but current techniques require large amounts of annotated data. The highest quality annotations require law experts, but their efforts do not scale efficiently. In this paper, we present results on bridging the gap between privacy practice categories defined by law experts with topics learned from Non-negative Matrix Factorization (NMF). To do this, we investigate the intersections between vocabulary sets identified as most significant for each category, using a logistic regression model, and vocabulary sets identified by topic modeling. The intersections exhibit strong matches between some categories and topics, although other categories have weaker affinities with topics. Our results show a path forward for applying unsupervised methods to the determination of data practice categories in privacy policy text.

Original languageEnglish (US)
Title of host publicationFS-16-01
Subtitle of host publicationArtificial Intelligence for Human-Robot Interaction; FS-16-02: Cognitive Assistance in Government and Public Sector Applications; FS-16-03: Cross-Disciplinary Challenges for Autonomous Systems; FS-16-04: Privacy and Language Technologies; FS-16-05: Shared Autonomy in Research and Practice
PublisherAI Access Foundation
Number of pages6
ISBN (Electronic)9781577357759
StatePublished - Jan 1 2016
Event2016 AAAI Fall Symposium - Arlington, United States
Duration: Nov 17 2016Nov 19 2016

Publication series

NameAAAI Fall Symposium - Technical Report
VolumeFS-16-01 - FS-16-05


Conference2016 AAAI Fall Symposium
Country/TerritoryUnited States

All Science Journal Classification (ASJC) codes

  • Engineering(all)


Dive into the research topics of 'Analyzing vocabulary intersections of expert annotations and topic models for data practices in privacy policies'. Together they form a unique fingerprint.

Cite this