TY - JOUR
T1 - Understanding and Conquering the Difficulties in Identifying Third-Party Libraries From Millions of Android Apps
AU - Zhang, Yanghua
AU - Wang, Jice
AU - Huang, Hexiang
AU - Zhang, Yuqing
AU - Liu, Peng
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2022/12/1
Y1 - 2022/12/1
N2 - With the thriving of the Android ecosystem, codes are widely reused in Android apps in the form of third-party libraries. Recent research shows that emerging third-party libraries may introduce a lot of privacy risks and other security threats. Nevertheless, current approaches on libraries identification are far away from the demand for accuracy and efficiency. In this article, we present LibHawkeye, a new clustering-based technique to identify third-party libraries in millions of Android apps. Our approach utilizes four different kinds of dependencies inside Android apps to build intra-app dependency graphs but discards package homogeny which is heavily depended upon by most previous works. What's more, we propose three steps of refinement to eliminate false positives in the initial result as much as possible. The experiment on 1,000 apps reports that compared to existing tools, LibHawkeye can precisely identify at least 26.5 percent more libraries. We also evaluate it with 3,987,206 Android apps published in Google Play, and the accuracy of sampled libraries from the clustering result is 93.25 percent. Results show that LibHawkeye significantly outperforms the state-of-the-art tools without loss of scalability.
AB - With the thriving of the Android ecosystem, codes are widely reused in Android apps in the form of third-party libraries. Recent research shows that emerging third-party libraries may introduce a lot of privacy risks and other security threats. Nevertheless, current approaches on libraries identification are far away from the demand for accuracy and efficiency. In this article, we present LibHawkeye, a new clustering-based technique to identify third-party libraries in millions of Android apps. Our approach utilizes four different kinds of dependencies inside Android apps to build intra-app dependency graphs but discards package homogeny which is heavily depended upon by most previous works. What's more, we propose three steps of refinement to eliminate false positives in the initial result as much as possible. The experiment on 1,000 apps reports that compared to existing tools, LibHawkeye can precisely identify at least 26.5 percent more libraries. We also evaluate it with 3,987,206 Android apps published in Google Play, and the accuracy of sampled libraries from the clustering result is 93.25 percent. Results show that LibHawkeye significantly outperforms the state-of-the-art tools without loss of scalability.
UR - http://www.scopus.com/inward/record.url?scp=85112193178&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112193178&partnerID=8YFLogxK
U2 - 10.1109/TBDATA.2021.3093244
DO - 10.1109/TBDATA.2021.3093244
M3 - Article
AN - SCOPUS:85112193178
SN - 2332-7790
VL - 8
SP - 1511
EP - 1523
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
IS - 6
ER -