Recent years have witnessed lots of attacks targeted at the widespread IoT devices and malicious activities conducted by compromised IoT devices. After some notorious IoT malware released their source-code, many new variants emerge, which are usually more powerful and stealthy. Although numerous existing studies have analyzed some exposed families, there is a lack of systematic study to make full use of them, which can be a fundamental step for provenance, triage, labeling, lineage analysis, and authorship attribution. The key challenge of conducting an IoT malware evolutionary study is how to collect sufficient and accurate information about malware and identify the relationships among them. In this paper, we take the first step to investigate the IoT malware evolution by leveraging the information from two sources that complement each other. First, we crawl online articles about IoT malware and employ Natural Language Processing techniques to extract the features of malware samples and their relationships with other malware family, which allow us to form the basic lineage graph. Second, we collect real malware samples through our widely-deployed honeypots and design a new classifier to group them into families and identify lineage relationships among them. Such results are used to enhance the basic lineage graph. Eventually, we construct the final lineage graph for 72 IoT malware families by correlating the information from the aforementioned sources, which can help the research community better understand and fight IoT malware now and in the future. Our study has been incorporated into the threat awareness system of NSFOCUS company.
All Science Journal Classification (ASJC) codes
- Signal Processing
- Information Systems
- Hardware and Architecture
- Computer Science Applications
- Computer Networks and Communications