Large-scale Third-party Library Detection in Android Markets

Menghao Li, Pei Wang, Wei Wang, Shuai Wang, Dinghao Wu, Jian Liu, Rui Xue, Wei Huo, Wei Zou

Research output: Contribution to journalArticle

Abstract

With the thriving of mobile app markets, third-party libraries are pervasively used in Android applications. The libraries provide functionality such as advertising, location, and social networking services, making app development much more productive. However, the spread of vulnerable and harmful third-party libraries can also hurt the mobile ecosystem, leading to various security problems. Therefore, third-party library identification has emerged as an important problem and the basis of many security applications such as repackaging detection, vulnerability identification, and malware analysis. Previously, we proposed a novel approach to identifying third-party Android libraries at a massive scale. Our method uses the internal code dependencies of an app to detect and classify library candidates. With a fine-grained feature hashing strategy, it can better handle code whose package and method names are obfuscated. We have developed a prototypical tool called LibD and evaluated it with an up-to-date and humongous dataset. Our experimental results on 1,427,395 apps show that compared to existing tools, LibD can better handle multi-package third-party libraries in the presence of name-based obfuscation, leading to significantly improved precision without the loss of scalability. In this paper, we extend our previous work by demonstrating that effective and scalable library detection can significantly improve the performance of large-scale app analyses in the real world. We show that the technique of LibD can be used to speed up whole-app Android vulnerability detection and quickly identify variants of vulnerable third-party libraries. The extension sheds light on the practical value of our previous work.

Original languageEnglish (US)
JournalIEEE Transactions on Software Engineering
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Application programs
Android (operating system)
Ecosystems
Scalability
Marketing

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Li, Menghao ; Wang, Pei ; Wang, Wei ; Wang, Shuai ; Wu, Dinghao ; Liu, Jian ; Xue, Rui ; Huo, Wei ; Zou, Wei. / Large-scale Third-party Library Detection in Android Markets. In: IEEE Transactions on Software Engineering. 2018.
@article{9232762ba0534a239e4f599ff0b20128,
title = "Large-scale Third-party Library Detection in Android Markets",
abstract = "With the thriving of mobile app markets, third-party libraries are pervasively used in Android applications. The libraries provide functionality such as advertising, location, and social networking services, making app development much more productive. However, the spread of vulnerable and harmful third-party libraries can also hurt the mobile ecosystem, leading to various security problems. Therefore, third-party library identification has emerged as an important problem and the basis of many security applications such as repackaging detection, vulnerability identification, and malware analysis. Previously, we proposed a novel approach to identifying third-party Android libraries at a massive scale. Our method uses the internal code dependencies of an app to detect and classify library candidates. With a fine-grained feature hashing strategy, it can better handle code whose package and method names are obfuscated. We have developed a prototypical tool called LibD and evaluated it with an up-to-date and humongous dataset. Our experimental results on 1,427,395 apps show that compared to existing tools, LibD can better handle multi-package third-party libraries in the presence of name-based obfuscation, leading to significantly improved precision without the loss of scalability. In this paper, we extend our previous work by demonstrating that effective and scalable library detection can significantly improve the performance of large-scale app analyses in the real world. We show that the technique of LibD can be used to speed up whole-app Android vulnerability detection and quickly identify variants of vulnerable third-party libraries. The extension sheds light on the practical value of our previous work.",
author = "Menghao Li and Pei Wang and Wei Wang and Shuai Wang and Dinghao Wu and Jian Liu and Rui Xue and Wei Huo and Wei Zou",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/TSE.2018.2872958",
language = "English (US)",
journal = "IEEE Transactions on Software Engineering",
issn = "0098-5589",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

Large-scale Third-party Library Detection in Android Markets. / Li, Menghao; Wang, Pei; Wang, Wei; Wang, Shuai; Wu, Dinghao; Liu, Jian; Xue, Rui; Huo, Wei; Zou, Wei.

In: IEEE Transactions on Software Engineering, 01.01.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Large-scale Third-party Library Detection in Android Markets

AU - Li, Menghao

AU - Wang, Pei

AU - Wang, Wei

AU - Wang, Shuai

AU - Wu, Dinghao

AU - Liu, Jian

AU - Xue, Rui

AU - Huo, Wei

AU - Zou, Wei

PY - 2018/1/1

Y1 - 2018/1/1

N2 - With the thriving of mobile app markets, third-party libraries are pervasively used in Android applications. The libraries provide functionality such as advertising, location, and social networking services, making app development much more productive. However, the spread of vulnerable and harmful third-party libraries can also hurt the mobile ecosystem, leading to various security problems. Therefore, third-party library identification has emerged as an important problem and the basis of many security applications such as repackaging detection, vulnerability identification, and malware analysis. Previously, we proposed a novel approach to identifying third-party Android libraries at a massive scale. Our method uses the internal code dependencies of an app to detect and classify library candidates. With a fine-grained feature hashing strategy, it can better handle code whose package and method names are obfuscated. We have developed a prototypical tool called LibD and evaluated it with an up-to-date and humongous dataset. Our experimental results on 1,427,395 apps show that compared to existing tools, LibD can better handle multi-package third-party libraries in the presence of name-based obfuscation, leading to significantly improved precision without the loss of scalability. In this paper, we extend our previous work by demonstrating that effective and scalable library detection can significantly improve the performance of large-scale app analyses in the real world. We show that the technique of LibD can be used to speed up whole-app Android vulnerability detection and quickly identify variants of vulnerable third-party libraries. The extension sheds light on the practical value of our previous work.

AB - With the thriving of mobile app markets, third-party libraries are pervasively used in Android applications. The libraries provide functionality such as advertising, location, and social networking services, making app development much more productive. However, the spread of vulnerable and harmful third-party libraries can also hurt the mobile ecosystem, leading to various security problems. Therefore, third-party library identification has emerged as an important problem and the basis of many security applications such as repackaging detection, vulnerability identification, and malware analysis. Previously, we proposed a novel approach to identifying third-party Android libraries at a massive scale. Our method uses the internal code dependencies of an app to detect and classify library candidates. With a fine-grained feature hashing strategy, it can better handle code whose package and method names are obfuscated. We have developed a prototypical tool called LibD and evaluated it with an up-to-date and humongous dataset. Our experimental results on 1,427,395 apps show that compared to existing tools, LibD can better handle multi-package third-party libraries in the presence of name-based obfuscation, leading to significantly improved precision without the loss of scalability. In this paper, we extend our previous work by demonstrating that effective and scalable library detection can significantly improve the performance of large-scale app analyses in the real world. We show that the technique of LibD can be used to speed up whole-app Android vulnerability detection and quickly identify variants of vulnerable third-party libraries. The extension sheds light on the practical value of our previous work.

UR - http://www.scopus.com/inward/record.url?scp=85054389625&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054389625&partnerID=8YFLogxK

U2 - 10.1109/TSE.2018.2872958

DO - 10.1109/TSE.2018.2872958

M3 - Article

AN - SCOPUS:85054389625

JO - IEEE Transactions on Software Engineering

JF - IEEE Transactions on Software Engineering

SN - 0098-5589

ER -