As a multitude of new malware instances and cyber threats arise every day, automatic analysis systems have assumed a pivotal role in the detection and characterization of malicious behaviours. Modern malware strains are, however, increasingly adopting reconnaissance techniques to detect known artifacts that possibly indicate the presence of a monitoring system, and consequently conceal their harmful behavior to avoid detection and analysis.
Researchers and security companies have thus invested in stealth analysis environments, typically based on hardware virtualization, full emulation, and bare-metal solutions. Unfortunately, though, realizing an analysis system that is indistinguishable from the victim machine is in practice impossible, due to the inevitable discrepancies introduced by the monitoring agents. Analysts have tried to patch such imperfections but malware writers still often find unanticipated ways to defeat automatic analyses. The malware analyst is then left with the sole option of manually dissecting the involved malware samples in order to fully understand the employed anti-analysis technique, a process that is typically lengthy and complex.
We thus propose a methodology for identifying and possibly aiding the disarm of unexpected malware evasion strategies. Our approach is unprecedented: although malware may adopt different anti-analysis strategies for various monitoring systems - both known and unforeseen - there are still identifiable patterns characterizing the points in which the environmental checks lead to exit sequences. The idea is to initially utilize data-flow analysis to fully understand how the known checks are carried out; the gained information will be then used to guide dynamic taint propagation and fuzzing with the goal of discovering novel evasive fingerprinting operations. We wish to boost the productivity of analysts that have to dismantle unforeseen evasions, and also contribute to making sandboxing systems more robust.
[Innovation]
The proposed project could potentially contribute to rapid advancements in the field of malware analysis. In particular, designing emulated and virtualized sandboxes that remain robust against ever-evolving evasion strategies is an open challenge in the security industry. A thorough understanding of the anti-analysis techniques employed by malware constitutes the first indispensable step in the fight against sandbox evasion attacks, since the gathered insights can be in turn used to harden sandboxing solutions, thus making them more robust with respect to anti-analysis behaviour.
Moreover, the identification and the understanding of evasive checks is currently a manual process performed by analysts; such a process is not scalable as it is extremely time-consuming and resource-intensive. This clashes with the need for large scale analysis, which is required in order to keep up with the staggering amount of new malware samples surfacing every day. In light of these considerations, automatic ways of identifying and disarming novel evasive strategies are clearly needed.
A first step towards the automatic identification of malware evasion patterns has been carried out by [KV15], which proposes a system aimed at extracting evasion signatures from evasive malware. Such a system makes use of data mining and data-flow analysis techniques to automate the signatures' extraction as well as leveraging algorithms borrowed from bioinformatics to detect evasive behaviour in system call sequences. Although [KV15] has the merit of being the first and only work tackling the problem of automatically identifying evasion schemes, it suffers from a number of severe limitations that hinder its effectiveness and its applicability in practice.
Our ideas go one step further since they address these problems in a very practical manner, showing that the current project could potentially provide important contributions towards the advancement of the state of the art in the malware analysis research and practice. In [KV15], evasion patterns are extracted by looking at the difference in system call sequences when executing each sample in two different environments: one in which the malware instance evades analysis and one in which the sample reveals its malicious behaviour. This approach is unlikely to be effective if the malware sample under analysis is equipped with anti-analysis techniques targeting each of the employed execution environments, especially if the checks are combined, as in this scenario we would observe no difference in the invoked system calls. Our approach does not suffer from such issues as we force a malware sample to exhibit anti-analysis behaviour against environments of our choice on-demand through the use of BluePill. On top of that, [KV15]'s strategy involves the execution of a malware instance on multiple environments (e.g., QEMU, VirtualBox), making the whole analysis process cumbersome and heavy-duty. Instead, we use lighter program analyses and setups, as we require only a virtual machine equipped with BluePill, thus making our strategy easier and less costly to incorporate for security firms willing to adopt this approach. Moreover, [KV15]'s evasion signatures are based solely on system calls while we go deeper since we characterize each evasion pattern using the involved instruction sequences and build profiles from tainted memory locations and invoked functions. Lastly, [KV15] is vulnerable to time-stalling strategies while we are immune to them due to BluePill's ability to selectively massage the outcome of these types of checks.
[Scientific Impact]
We hope to disseminate the results achieved by the present research project via publications in top-ranked security venues such as IEEE S&P, ACM CCS, NDSS, and USENIX Security.
In fact, in recent years, the problem of evasive malware has been a very topical one in academia, with flagship venues showing clear interest in works addressing this issue.
Moreover, given the rising popularity of the topic we investigate, we are ready to share our ideas with other research groups from all over the world, hoping to establish new successful collaborations like the one currently in place with King's College London.
Additional References
[KV15] Kirat et al. MalGene: Automatic Extraction of Malware Analysis Evasion Signature. CCS, 2015.
[KVK14] Kirat et al. BareCloud: Bare-metal Analysis-based Evasive Malware Detection. USENIX Security, 2014.