A machine learning approach for detecting third-party trackers on the web

Qianru Wu, Qixu Liu, Yuqing Zhang, Peng Liu, Guanxing Wen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Nowadays, privacy violation caused by third-party tracking has become a serious problem and yet the most effective method to defend against third-party tracking is based on blacklists. Such method highly depends on the quality of the blacklist database, whose records need to be updated frequently. However, most records are curated manually and very difficult to maintain. To efficiently generate blacklists, we propose a system with high accuracy, named DMTrackerDetector, to detect third-party trackers automatically. Existing methods to detect online tracking have two shortcomings. Firstly, they treat first-party tracking and third-party tracking the same. Secondly, they always focus on a certain way of tracking and can only detect limited trackers. Since anti-tracking technology based on blacklists highly depends on the coverage of the blacklist database, these methods cannot generate high-quality blacklists. To solve these problems, we firstly use the structural hole theory to preserve first-party trackers, and only detect third-party trackers based on supervised machine learning by exploiting the fact that trackers and non-trackers always call different JavaScript APIs for different purposes. The results show that 97.8% of the third-party trackers in our test set can be correctly detected. The blacklist generated by our system not only covers almost all records in the Ghostery list (one of the most popular anti-tracking tools), but also detects 35 unrevealed trackers.

Original languageEnglish (US)
Title of host publicationComputer Security - 21st European Symposium on Research in Computer Security, ESORICS 2016, Proceedings
EditorsSokratis Katsikas, Catherine Meadows, Ioannis Askoxylakis, Sotiris Ioannidis
PublisherSpringer Verlag
Pages238-258
Number of pages21
ISBN (Print)9783319457437
DOIs
StatePublished - Jan 1 2016
Event21st European Symposium on Research in Computer Security, ESORICS 2016 - Heraklion, Greece
Duration: Sep 26 2016Sep 30 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9878 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other21st European Symposium on Research in Computer Security, ESORICS 2016
CountryGreece
CityHeraklion
Period9/26/169/30/16

Fingerprint

Learning systems
Machine Learning
Application programming interfaces (API)
JavaScript
Supervised Learning
Test Set
Privacy
High Accuracy
Coverage
Cover

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Wu, Q., Liu, Q., Zhang, Y., Liu, P., & Wen, G. (2016). A machine learning approach for detecting third-party trackers on the web. In S. Katsikas, C. Meadows, I. Askoxylakis, & S. Ioannidis (Eds.), Computer Security - 21st European Symposium on Research in Computer Security, ESORICS 2016, Proceedings (pp. 238-258). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9878 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-45744-4_12
Wu, Qianru ; Liu, Qixu ; Zhang, Yuqing ; Liu, Peng ; Wen, Guanxing. / A machine learning approach for detecting third-party trackers on the web. Computer Security - 21st European Symposium on Research in Computer Security, ESORICS 2016, Proceedings. editor / Sokratis Katsikas ; Catherine Meadows ; Ioannis Askoxylakis ; Sotiris Ioannidis. Springer Verlag, 2016. pp. 238-258 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e3067230f0ea48f5bbee60b3c5ea9402,
title = "A machine learning approach for detecting third-party trackers on the web",
abstract = "Nowadays, privacy violation caused by third-party tracking has become a serious problem and yet the most effective method to defend against third-party tracking is based on blacklists. Such method highly depends on the quality of the blacklist database, whose records need to be updated frequently. However, most records are curated manually and very difficult to maintain. To efficiently generate blacklists, we propose a system with high accuracy, named DMTrackerDetector, to detect third-party trackers automatically. Existing methods to detect online tracking have two shortcomings. Firstly, they treat first-party tracking and third-party tracking the same. Secondly, they always focus on a certain way of tracking and can only detect limited trackers. Since anti-tracking technology based on blacklists highly depends on the coverage of the blacklist database, these methods cannot generate high-quality blacklists. To solve these problems, we firstly use the structural hole theory to preserve first-party trackers, and only detect third-party trackers based on supervised machine learning by exploiting the fact that trackers and non-trackers always call different JavaScript APIs for different purposes. The results show that 97.8{\%} of the third-party trackers in our test set can be correctly detected. The blacklist generated by our system not only covers almost all records in the Ghostery list (one of the most popular anti-tracking tools), but also detects 35 unrevealed trackers.",
author = "Qianru Wu and Qixu Liu and Yuqing Zhang and Peng Liu and Guanxing Wen",
year = "2016",
month = "1",
day = "1",
doi = "10.1007/978-3-319-45744-4_12",
language = "English (US)",
isbn = "9783319457437",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "238--258",
editor = "Sokratis Katsikas and Catherine Meadows and Ioannis Askoxylakis and Sotiris Ioannidis",
booktitle = "Computer Security - 21st European Symposium on Research in Computer Security, ESORICS 2016, Proceedings",
address = "Germany",

}

Wu, Q, Liu, Q, Zhang, Y, Liu, P & Wen, G 2016, A machine learning approach for detecting third-party trackers on the web. in S Katsikas, C Meadows, I Askoxylakis & S Ioannidis (eds), Computer Security - 21st European Symposium on Research in Computer Security, ESORICS 2016, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9878 LNCS, Springer Verlag, pp. 238-258, 21st European Symposium on Research in Computer Security, ESORICS 2016, Heraklion, Greece, 9/26/16. https://doi.org/10.1007/978-3-319-45744-4_12

A machine learning approach for detecting third-party trackers on the web. / Wu, Qianru; Liu, Qixu; Zhang, Yuqing; Liu, Peng; Wen, Guanxing.

Computer Security - 21st European Symposium on Research in Computer Security, ESORICS 2016, Proceedings. ed. / Sokratis Katsikas; Catherine Meadows; Ioannis Askoxylakis; Sotiris Ioannidis. Springer Verlag, 2016. p. 238-258 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9878 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A machine learning approach for detecting third-party trackers on the web

AU - Wu, Qianru

AU - Liu, Qixu

AU - Zhang, Yuqing

AU - Liu, Peng

AU - Wen, Guanxing

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Nowadays, privacy violation caused by third-party tracking has become a serious problem and yet the most effective method to defend against third-party tracking is based on blacklists. Such method highly depends on the quality of the blacklist database, whose records need to be updated frequently. However, most records are curated manually and very difficult to maintain. To efficiently generate blacklists, we propose a system with high accuracy, named DMTrackerDetector, to detect third-party trackers automatically. Existing methods to detect online tracking have two shortcomings. Firstly, they treat first-party tracking and third-party tracking the same. Secondly, they always focus on a certain way of tracking and can only detect limited trackers. Since anti-tracking technology based on blacklists highly depends on the coverage of the blacklist database, these methods cannot generate high-quality blacklists. To solve these problems, we firstly use the structural hole theory to preserve first-party trackers, and only detect third-party trackers based on supervised machine learning by exploiting the fact that trackers and non-trackers always call different JavaScript APIs for different purposes. The results show that 97.8% of the third-party trackers in our test set can be correctly detected. The blacklist generated by our system not only covers almost all records in the Ghostery list (one of the most popular anti-tracking tools), but also detects 35 unrevealed trackers.

AB - Nowadays, privacy violation caused by third-party tracking has become a serious problem and yet the most effective method to defend against third-party tracking is based on blacklists. Such method highly depends on the quality of the blacklist database, whose records need to be updated frequently. However, most records are curated manually and very difficult to maintain. To efficiently generate blacklists, we propose a system with high accuracy, named DMTrackerDetector, to detect third-party trackers automatically. Existing methods to detect online tracking have two shortcomings. Firstly, they treat first-party tracking and third-party tracking the same. Secondly, they always focus on a certain way of tracking and can only detect limited trackers. Since anti-tracking technology based on blacklists highly depends on the coverage of the blacklist database, these methods cannot generate high-quality blacklists. To solve these problems, we firstly use the structural hole theory to preserve first-party trackers, and only detect third-party trackers based on supervised machine learning by exploiting the fact that trackers and non-trackers always call different JavaScript APIs for different purposes. The results show that 97.8% of the third-party trackers in our test set can be correctly detected. The blacklist generated by our system not only covers almost all records in the Ghostery list (one of the most popular anti-tracking tools), but also detects 35 unrevealed trackers.

UR - http://www.scopus.com/inward/record.url?scp=84990050965&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84990050965&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-45744-4_12

DO - 10.1007/978-3-319-45744-4_12

M3 - Conference contribution

SN - 9783319457437

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 238

EP - 258

BT - Computer Security - 21st European Symposium on Research in Computer Security, ESORICS 2016, Proceedings

A2 - Katsikas, Sokratis

A2 - Meadows, Catherine

A2 - Askoxylakis, Ioannis

A2 - Ioannidis, Sotiris

PB - Springer Verlag

ER -

Wu Q, Liu Q, Zhang Y, Liu P, Wen G. A machine learning approach for detecting third-party trackers on the web. In Katsikas S, Meadows C, Askoxylakis I, Ioannidis S, editors, Computer Security - 21st European Symposium on Research in Computer Security, ESORICS 2016, Proceedings. Springer Verlag. 2016. p. 238-258. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-45744-4_12