Opening the blackbox of virustotal: Analyzing online phishing scan engines

Peng Peng, Limin Yang, Linhai Song, Gang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Online scan engines such as VirusTotal are heavily used by researchers to label malicious URLs and files. Unfortunately, it is not well understood how the labels are generated and how reliable the scanning results are. In this paper, we focus on VirusTotal and its 68 third-party vendors to examine their labeling process on phishing URLs. We perform a series of measurements by setting up our own phishing websites (mimicking PayPal and IRS) and submitting the URLs for scanning. By analyzing the incoming network traffic and the dynamic label changes at VirusTotal, we reveal new insights into how VirusTotal works and the quality of their labels. Among other things, we show that vendors have trouble flagging all phishing sites, and even the best vendors missed 30% of our phishing sites. In addition, the scanning results are not immediately updated to VirusTotal after the scanning, and there are inconsistent results between VirusTotal scan and some vendors' own scanners. Our results reveal the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.

Original languageEnglish (US)
Title of host publicationIMC 2019 - Proceedings of the 2019 ACM Internet Measurement Conference
PublisherAssociation for Computing Machinery
Pages478-485
Number of pages8
ISBN (Electronic)9781450369480
DOIs
StatePublished - Oct 21 2019
Event19th ACM Internet Measurement Conference, IMC 2019 - Amsterdam, Netherlands
Duration: Oct 21 2019Oct 23 2019

Publication series

NameProceedings of the ACM SIGCOMM Internet Measurement Conference, IMC

Conference

Conference19th ACM Internet Measurement Conference, IMC 2019
CountryNetherlands
CityAmsterdam
Period10/21/1910/23/19

Fingerprint

Labels
Websites
Engines
Scanning
Labeling

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Cite this

Peng, P., Yang, L., Song, L., & Wang, G. (2019). Opening the blackbox of virustotal: Analyzing online phishing scan engines. In IMC 2019 - Proceedings of the 2019 ACM Internet Measurement Conference (pp. 478-485). (Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC). Association for Computing Machinery. https://doi.org/10.1145/3355369.3355585
Peng, Peng ; Yang, Limin ; Song, Linhai ; Wang, Gang. / Opening the blackbox of virustotal : Analyzing online phishing scan engines. IMC 2019 - Proceedings of the 2019 ACM Internet Measurement Conference. Association for Computing Machinery, 2019. pp. 478-485 (Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC).
@inproceedings{54ae0937f99442579e055d785bda7ce5,
title = "Opening the blackbox of virustotal: Analyzing online phishing scan engines",
abstract = "Online scan engines such as VirusTotal are heavily used by researchers to label malicious URLs and files. Unfortunately, it is not well understood how the labels are generated and how reliable the scanning results are. In this paper, we focus on VirusTotal and its 68 third-party vendors to examine their labeling process on phishing URLs. We perform a series of measurements by setting up our own phishing websites (mimicking PayPal and IRS) and submitting the URLs for scanning. By analyzing the incoming network traffic and the dynamic label changes at VirusTotal, we reveal new insights into how VirusTotal works and the quality of their labels. Among other things, we show that vendors have trouble flagging all phishing sites, and even the best vendors missed 30{\%} of our phishing sites. In addition, the scanning results are not immediately updated to VirusTotal after the scanning, and there are inconsistent results between VirusTotal scan and some vendors' own scanners. Our results reveal the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.",
author = "Peng Peng and Limin Yang and Linhai Song and Gang Wang",
year = "2019",
month = "10",
day = "21",
doi = "10.1145/3355369.3355585",
language = "English (US)",
series = "Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC",
publisher = "Association for Computing Machinery",
pages = "478--485",
booktitle = "IMC 2019 - Proceedings of the 2019 ACM Internet Measurement Conference",

}

Peng, P, Yang, L, Song, L & Wang, G 2019, Opening the blackbox of virustotal: Analyzing online phishing scan engines. in IMC 2019 - Proceedings of the 2019 ACM Internet Measurement Conference. Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC, Association for Computing Machinery, pp. 478-485, 19th ACM Internet Measurement Conference, IMC 2019, Amsterdam, Netherlands, 10/21/19. https://doi.org/10.1145/3355369.3355585

Opening the blackbox of virustotal : Analyzing online phishing scan engines. / Peng, Peng; Yang, Limin; Song, Linhai; Wang, Gang.

IMC 2019 - Proceedings of the 2019 ACM Internet Measurement Conference. Association for Computing Machinery, 2019. p. 478-485 (Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Opening the blackbox of virustotal

T2 - Analyzing online phishing scan engines

AU - Peng, Peng

AU - Yang, Limin

AU - Song, Linhai

AU - Wang, Gang

PY - 2019/10/21

Y1 - 2019/10/21

N2 - Online scan engines such as VirusTotal are heavily used by researchers to label malicious URLs and files. Unfortunately, it is not well understood how the labels are generated and how reliable the scanning results are. In this paper, we focus on VirusTotal and its 68 third-party vendors to examine their labeling process on phishing URLs. We perform a series of measurements by setting up our own phishing websites (mimicking PayPal and IRS) and submitting the URLs for scanning. By analyzing the incoming network traffic and the dynamic label changes at VirusTotal, we reveal new insights into how VirusTotal works and the quality of their labels. Among other things, we show that vendors have trouble flagging all phishing sites, and even the best vendors missed 30% of our phishing sites. In addition, the scanning results are not immediately updated to VirusTotal after the scanning, and there are inconsistent results between VirusTotal scan and some vendors' own scanners. Our results reveal the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.

AB - Online scan engines such as VirusTotal are heavily used by researchers to label malicious URLs and files. Unfortunately, it is not well understood how the labels are generated and how reliable the scanning results are. In this paper, we focus on VirusTotal and its 68 third-party vendors to examine their labeling process on phishing URLs. We perform a series of measurements by setting up our own phishing websites (mimicking PayPal and IRS) and submitting the URLs for scanning. By analyzing the incoming network traffic and the dynamic label changes at VirusTotal, we reveal new insights into how VirusTotal works and the quality of their labels. Among other things, we show that vendors have trouble flagging all phishing sites, and even the best vendors missed 30% of our phishing sites. In addition, the scanning results are not immediately updated to VirusTotal after the scanning, and there are inconsistent results between VirusTotal scan and some vendors' own scanners. Our results reveal the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.

UR - http://www.scopus.com/inward/record.url?scp=85074823687&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074823687&partnerID=8YFLogxK

U2 - 10.1145/3355369.3355585

DO - 10.1145/3355369.3355585

M3 - Conference contribution

AN - SCOPUS:85074823687

T3 - Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC

SP - 478

EP - 485

BT - IMC 2019 - Proceedings of the 2019 ACM Internet Measurement Conference

PB - Association for Computing Machinery

ER -

Peng P, Yang L, Song L, Wang G. Opening the blackbox of virustotal: Analyzing online phishing scan engines. In IMC 2019 - Proceedings of the 2019 ACM Internet Measurement Conference. Association for Computing Machinery. 2019. p. 478-485. (Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC). https://doi.org/10.1145/3355369.3355585