Achieving accuracy and scalability simultaneously in detecting application clones on Android markets

Kai Chen, Peng Liu, Yingjun Zhang

Research output: Contribution to journalConference article

136 Citations (Scopus)

Abstract

Besides traditional problems such as potential bugs, (smartphone) application clones on Android markets bring new threats. That is, attackers clone the code from legitimate Android applications, assemble it with malicious code or advertisements, and publish these ''purpose-added" app clones on the same or other markets for benefits. Three inherent and unique characteristics make app clones difficult to detect by existing techniques: a billion opcode problem caused by cross-market publishing, gap between code clones and app clones, and prevalent Type 2 and Type 3 clones. Existing techniques achieve either accuracy or scalability, but not both. To achieve both goals, we use a geometry characteristic, called centroid, of dependency graphs to measure the similarity between methods (code fragments) in two apps. Then we synthesize the method-level similarities and draw a Y/N conclusion on app (core functionality) cloning. The observed ''centroid effect" and the inherent ''monotonicity" property enable our approach to achieve both high accuracy and scalability. We implemented the app clone detection system and evaluated it on five whole Android markets (including 150,145 apps, 203 million methods and 26 billion opcodes). It takes less than one hour to perform cross-market app clone detection on the five markets after generating centroids only once.

Original languageEnglish (US)
Pages (from-to)175-186
Number of pages12
JournalProceedings - International Conference on Software Engineering
Issue number1
DOIs
StatePublished - May 31 2014
Event36th International Conference on Software Engineering, ICSE 2014 - Hyderabad, India
Duration: May 31 2014Jun 7 2014

Fingerprint

Application programs
Scalability
Cloning
Smartphones
Geometry

All Science Journal Classification (ASJC) codes

  • Software

Cite this

@article{f9f31446f0594898a1acb4564a4a8b8e,
title = "Achieving accuracy and scalability simultaneously in detecting application clones on Android markets",
abstract = "Besides traditional problems such as potential bugs, (smartphone) application clones on Android markets bring new threats. That is, attackers clone the code from legitimate Android applications, assemble it with malicious code or advertisements, and publish these ''purpose-added{"} app clones on the same or other markets for benefits. Three inherent and unique characteristics make app clones difficult to detect by existing techniques: a billion opcode problem caused by cross-market publishing, gap between code clones and app clones, and prevalent Type 2 and Type 3 clones. Existing techniques achieve either accuracy or scalability, but not both. To achieve both goals, we use a geometry characteristic, called centroid, of dependency graphs to measure the similarity between methods (code fragments) in two apps. Then we synthesize the method-level similarities and draw a Y/N conclusion on app (core functionality) cloning. The observed ''centroid effect{"} and the inherent ''monotonicity{"} property enable our approach to achieve both high accuracy and scalability. We implemented the app clone detection system and evaluated it on five whole Android markets (including 150,145 apps, 203 million methods and 26 billion opcodes). It takes less than one hour to perform cross-market app clone detection on the five markets after generating centroids only once.",
author = "Kai Chen and Peng Liu and Yingjun Zhang",
year = "2014",
month = "5",
day = "31",
doi = "10.1145/2568225.2568286",
language = "English (US)",
pages = "175--186",
journal = "Proceedings - International Conference on Software Engineering",
issn = "0270-5257",
publisher = "IEEE Computer Society",
number = "1",

}

Achieving accuracy and scalability simultaneously in detecting application clones on Android markets. / Chen, Kai; Liu, Peng; Zhang, Yingjun.

In: Proceedings - International Conference on Software Engineering, No. 1, 31.05.2014, p. 175-186.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Achieving accuracy and scalability simultaneously in detecting application clones on Android markets

AU - Chen, Kai

AU - Liu, Peng

AU - Zhang, Yingjun

PY - 2014/5/31

Y1 - 2014/5/31

N2 - Besides traditional problems such as potential bugs, (smartphone) application clones on Android markets bring new threats. That is, attackers clone the code from legitimate Android applications, assemble it with malicious code or advertisements, and publish these ''purpose-added" app clones on the same or other markets for benefits. Three inherent and unique characteristics make app clones difficult to detect by existing techniques: a billion opcode problem caused by cross-market publishing, gap between code clones and app clones, and prevalent Type 2 and Type 3 clones. Existing techniques achieve either accuracy or scalability, but not both. To achieve both goals, we use a geometry characteristic, called centroid, of dependency graphs to measure the similarity between methods (code fragments) in two apps. Then we synthesize the method-level similarities and draw a Y/N conclusion on app (core functionality) cloning. The observed ''centroid effect" and the inherent ''monotonicity" property enable our approach to achieve both high accuracy and scalability. We implemented the app clone detection system and evaluated it on five whole Android markets (including 150,145 apps, 203 million methods and 26 billion opcodes). It takes less than one hour to perform cross-market app clone detection on the five markets after generating centroids only once.

AB - Besides traditional problems such as potential bugs, (smartphone) application clones on Android markets bring new threats. That is, attackers clone the code from legitimate Android applications, assemble it with malicious code or advertisements, and publish these ''purpose-added" app clones on the same or other markets for benefits. Three inherent and unique characteristics make app clones difficult to detect by existing techniques: a billion opcode problem caused by cross-market publishing, gap between code clones and app clones, and prevalent Type 2 and Type 3 clones. Existing techniques achieve either accuracy or scalability, but not both. To achieve both goals, we use a geometry characteristic, called centroid, of dependency graphs to measure the similarity between methods (code fragments) in two apps. Then we synthesize the method-level similarities and draw a Y/N conclusion on app (core functionality) cloning. The observed ''centroid effect" and the inherent ''monotonicity" property enable our approach to achieve both high accuracy and scalability. We implemented the app clone detection system and evaluated it on five whole Android markets (including 150,145 apps, 203 million methods and 26 billion opcodes). It takes less than one hour to perform cross-market app clone detection on the five markets after generating centroids only once.

UR - http://www.scopus.com/inward/record.url?scp=84994101812&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994101812&partnerID=8YFLogxK

U2 - 10.1145/2568225.2568286

DO - 10.1145/2568225.2568286

M3 - Conference article

AN - SCOPUS:84994101812

SP - 175

EP - 186

JO - Proceedings - International Conference on Software Engineering

JF - Proceedings - International Conference on Software Engineering

SN - 0270-5257

IS - 1

ER -