Rethinking Pointer Reasoning in Symbolic Execution

Inviato da Anonimo (non verificato) il Lun, 18/04/2022 - 18:55

Anno:

2017

Nome e qualifica del proponente del progetto:

sb_p_670667

Abstract:

Symbolic execution is a popular program analysis technique that allows seeking for bugs by reasoning over multiple alternative execution states at once. In the last decade, techniques for symbolic execution have evolved significantly, leading to major practical breakthroughs. In 2008, Microsoft has announced that symbolic execution has been able to reveal nearly 30% of all the bugs discovered during the development of Windows 7. In 2016, the DARPA Cyber Grand Challenge (CGC) hosted systems that can detect and fix vulnerabilities in unknown software with no human intervention. The event demonstrated that symbolic executors can be competitive with human experts, paving the road to unprecedented applications that have the potential to shape software reliability in the next decades.

In practice, as the number of states to explore may grow exponentially, a symbolic executor may quickly run out of space. For instance, a memory access to a symbolic address may potentially reference the entire address space, leading to a combinatorial explosion of the possible execution states. To cope with this issue, state-of-the-art executors concretize symbolic addresses, i.e., a single value for the address is chosen using a heuristic. Unfortunately, this could result in missing interesting states, e.g., where a bug arises.

The main goal of this project is to devise a new approach to symbolic memory that reduces the need for concretization, hence offering the opportunity for broader state explorations and more precise pointer reasoning. Rather than mapping address instances to data as previous tools do, our technique will map symbolic address expressions with data, maintaining the possible alternative states resulting from the memory referenced by a symbolic address in a compact, implicit form. To test our approach, we plan to implement it in ANGR, a state-of-the-art symbolic engine, and to perform an experimental investigation using benchmarks taken from the DARPA CGC.

Componenti gruppo di ricerca:

sb_cp_is_971984

Innovatività:

Techniques for symbolic execution have evolved significantly in the last decade, leading to major practical breakthroughs. In 2016, the DARPA Cyber Grand Challenge [D14] hosted systems that can detect and fix vulnerabilities in unknown software with no human intervention, such as ANGR [S16] and MAYHEM [C12], which won the $2M first prize. MAYHEM was also the first autonomous software to play the Capture-The-Flag contest at the DEFCON 24 hacker convention. The event demonstrated that tools for automatic exploit detection based on symbolic execution can be competitive with human experts, paving the road to unprecedented applications that have the potential to shape software reliability in the next decades.

Nonetheless, there is still a lot of work to be done before symbolic execution can be used in many industrial environments. Indeed, from a theoretical perspective, exhaustive symbolic execution provides a sound and complete methodology for any decidable analysis. Soundness prevents false negatives, e.g., all possible unsafe inputs are guaranteed to be found, while completeness prevents false positives, e.g., input values deemed as unsafe are actually unsafe. Unfortunately, exhaustive symbolic execution is unlikely to scale beyond small applications. Hence, in practice we often settle for less ambitious goals, e.g., by trading soundness for performance. However, tuning symbolic execution in order to be effective is not a trivial task. Many heuristics are used to reduce the path explosion problem. Often, these heuristics are domain-specific and thus not suitable in a general context. An aspect that typically affects the soundness is how memory accesses are handled by the symbolic executor. In the past, state-of-the-art symbolic engines have chosen to keep the design and the implementation of the symbolic store very simple. On the other hand, previous works (e.g., [T14]) that have instead tried to propose new ideas for handling symbolic accesses, have failed to keep their implementation easy to integrate into state-of-the-art symbolic executors and often not accounted for mainstream optimization techniques such as state merging [A14].

In this project we will devise a different perspective in the design of symbolic pointer reasoning: we will show how to compactly associate values with symbolic address expressions rather than concrete addresses, and investigate efficient implementations of a fully symbolic memory. Our approach will natively accounts for state merging [A14] -- a mainstream performance enabler in modern executors -- and will be integrated in the ANGR [S16] framework. We expect that our approach will allow ANGR for broader state exploration on prominent benchmarks, such as the DARPA Cyber Grand Challenge binaries [D14], revealing behaviors that previous techniques would miss or use too much resources to identify. To achieve this goal, we will build on top of three main ideas.
First, our symbolic store will keep a mapping between symbolic addresses and (concrete or symbolic) values. As a consequence, whenever there is a write operation, only one element in the symbolic store will need to be added or updated. On the other hand, whenever there is a read operation, our symbolic store will compute a "conflict formula" which will describe the value associated to a (possibly symbolic) address using an if-then-else expression.
Second, to make efficient the construction of the conflict formula, each address contained in the symbolic store will be bundled with two kinds of information: the lower value and upper value that the address can assume when it is added to the symbolic store. These can be obtained by querying a constraint solver during the symbolic exploration.
Third, each symbolic address contained inside the symbolic store will be indexed based on its upper and lower value using a data structure that supports range queries. A natural solution would be to implement an interval tree that supports fast and memory efficient cloning operations (which need to be performed whenever a forking operation is done on the current state due a branch in the program).

We expect that the impact of our approach could be crucial in many application scenarios. Indeed, there are many cases reported in literature [E08] where concretization has been shown to be a strong limitation for the effectiveness of symbolic execution. Two benefits of our technique are: (1) since our solution avoids eager concretization, it should be able to accurately handle many application scenarios where only few values of an address are actually interesting for the the program; (2) at any time during the symbolic exploration (e.g., in case of excessive resource consumption), our symbolic store can be "reduced" to a traditional symbolic store, allowing symbolic executors to trade soundness for performance. We believe that these two benefits should make it appealing to many state-of-the-art executors.

Codice Bando:

670667

Keywords:

name