Automatic test generation is a crucial problem in computer science. It helps the developers in finding bugs (and vulnerabilities) early in the development, hopefully before release. The two main techniques used in this context are fuzzing and symbolic execution. Over the last two decades, both fuzzing and symbolic execution techniques have been extensively used both in academia and industry.
Fuzzing, in its simplest form, randomly mutates a set of seed inputs to generate new test cases, often using genetic algorithms. While it is very efficient and has been proven to be effective in generating test cases that cover a high percentage of the tested program, the technique struggles in bypassing checks against magic numbers and checksums.
Symbolic execution encodes the behavior of the program to first-order logic formulas and uses a satisfiability modulo theory (SMT) solver to reason about those formulas and generate new inputs (i.e., test cases) automatically. The main disadvantage of this technique lies in the difficulty that it has to scale on complex programs.
One crucial performance bottleneck in symbolic execution is the solver. The formulas built on paths of real-world applications may be arbitrary complex (e.g., due to a checksum), and the solver often struggles in trying to solve them.
The strength of fuzzers lies in their ability to produce test cases by mutating only a few bytes of the input: a small modification of the input is often sufficient to explore new code. This insight could be exploited in the context of symbolic execution: a small modification of the input may be sufficient to solve a query, without using an SMT solver.
The goal of this project is to devise a technique that can reason about SMT formulas, without using a solver. The idea is to trade accuracy (i.e., the technique cannot solve all queries) in exchange for better scalability.
Binary analysis, and in particular symbolic execution, is growing in popularity in the context of software testing, application security, and vulnerability detection. For instance, symbolic execution tools have been used by Microsoft since 2008 in the testing process of its applications. Microsoft announced that symbolic execution was able to reveal nearly 30% of all the bugs discovered during the development of Windows 7 [16]. The upsurge in the number of cyber-attacks and the increasingly complex applications they target call for automatic tools to help developers discover software vulnerabilities in the earliest stages of the development process.
From a theoretical perspective, exhaustive symbolic execution provides a sound and complete methodology for any decidable analysis; thus, it could be used to discover bugs and to reason about the reliability of software patches. In spite of the flexibility of the technique, which stems from maintaining a detailed view of the machine state across multi-path executions and using satisfiability modulo theory (SMT) solvers to reason over symbolic inputs, symbolic execution is fraught with many pitfalls that limit its practical applicability. Most prominently, it can be severely affected by time spent solving complex formulas using the SMT solver.
On the other hand, fuzzing has been proven to be an effective and efficient technique for automatic test case generation and vulnerability discovery: AFL [9], one of the most used fuzzer tool, discovered vulnerabilities in more than 600 programs [9]. Furthermore, Google started, in the last few years, two open-source projects focused on fuzzing: oss-fuzz [13] and fuzzbench [12]. In the first one, they use state-of-the-art fuzzing (and hybrid-fuzzing) tools to look for vulnerabilities in open-source projects considered crucial for the company. Google found thousands [13] of bugs in this way. In fuzzbench, they test the effectiveness of the fuzzing tools by comparing the code coverage reached on a set of benchmarks. It is worth noting that QSYM, the state-of-the-art hybrid-fuzzer, is among the tested tools.
In spite of the effectiveness of fuzzing, it still struggles in covering part of the tested programs that are protected by checks against magic numbers or checksums.
Motivated by the scalability issues of symbolic execution, the main goal of this research project is to build a library that can be used in place of an SMT solver in a symbolic execution engine. The library uses a fuzzing approach to solve formulas: the accuracy of the library (i.e., its capacity of solving SAT queries) is traded for performance so that the symbolic engine can spend much less time trying to solve queries. Indeed, we know from fuzzing that even a little mutation of the test case can bring the program to cover new code when using the mutated test case. We can exploit this fact for solving queries: we speculate that a small modification in the current assignment could make a branch condition SAT.
The development of new methodologies for the automatic generation of exploits is crucial in order to detect non-trivial software vulnerabilities. Indeed, hidden vulnerabilities may remain silent for months or even years, striking then back as harmful zero-day exploits that can severely affect both users and developers.