Integration of static and dynamic code stylometry analysis for programmer de-anonymization

Ningfei Wang, Shouling Ji, Ting Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

De-anonymizing the authors of anonymous code (i.e., code stylometry) entails significant privacy and security implications. Most existing code stylometry methods solely rely on static (e.g., lexical, layout, and syntactic) features extracted from source code, while neglecting its key difference from regular text – it is executable! In this paper, we present Sundae, a novel code de-anonymization framework that integrates both static and dynamic stylometry analysis. Compared with the existing solutions, Sundae departs in significant ways: (i) it requires much less number of static, handcrafted features; (ii) it requires much less labeled data for training; and (iii) it can be readily extended to new programmers once their stylometry information becomes available. Through extensive evaluation on benchmark datasets, we demonstrate that Sundae delivers strong empirical performance. For example, under the setting of 229 programmers and 9 problems, it outperforms the state-of-art method by a margin of 45.65% on Python code de-anonymization. The empirical results highlight the integration of static and dynamic analysis as a promising direction for code stylometry research.

Original languageEnglish (US)
Title of host publicationAISec 2018 - Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, co-located with CCS 2018
PublisherAssociation for Computing Machinery
Pages74-84
Number of pages11
ISBN (Electronic)9781450360043
DOIs
StatePublished - Oct 15 2018
Event11th ACM Workshop on Artificial Intelligence and Security, AISec 2018, co-located with CCS 2018 - Toronto, Canada
Duration: Oct 19 2018 → …

Publication series

NameProceedings of the ACM Conference on Computer and Communications Security
ISSN (Print)1543-7221

Conference

Conference11th ACM Workshop on Artificial Intelligence and Security, AISec 2018, co-located with CCS 2018
CountryCanada
CityToronto
Period10/19/18 → …

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Integration of static and dynamic code stylometry analysis for programmer de-anonymization'. Together they form a unique fingerprint.

Cite this