Understanding and Conquering the Difficulties in Identifying Third-Party Libraries From Millions of Android Apps

Yanghua Zhang, Jice Wang, Hexiang Huang, Yuqing Zhang, Peng Liu

Research output: Contribution to journalArticlepeer-review

Abstract

With the thriving of the Android ecosystem, codes are widely reused in Android apps in the form of third-party libraries. Recent research shows that emerging third-party libraries may introduce a lot of privacy risks and other security threats. Nevertheless, current approaches on libraries identification are far away from the demand for accuracy and efficiency. In this article, we present LibHawkeye, a new clustering-based technique to identify third-party libraries in millions of Android apps. Our approach utilizes four different kinds of dependencies inside Android apps to build intra-app dependency graphs but discards package homogeny which is heavily depended upon by most previous works. What's more, we propose three steps of refinement to eliminate false positives in the initial result as much as possible. The experiment on 1,000 apps reports that compared to existing tools, LibHawkeye can precisely identify at least 26.5 percent more libraries. We also evaluate it with 3,987,206 Android apps published in Google Play, and the accuracy of sampled libraries from the clustering result is 93.25 percent. Results show that LibHawkeye significantly outperforms the state-of-the-art tools without loss of scalability.

Original languageEnglish (US)
Pages (from-to)1511-1523
Number of pages13
JournalIEEE Transactions on Big Data
Volume8
Issue number6
DOIs
StatePublished - Dec 1 2022

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Understanding and Conquering the Difficulties in Identifying Third-Party Libraries From Millions of Android Apps'. Together they form a unique fingerprint.

Cite this