With the rise of strong control-flow defenses such as Control-Flow Integrity (CFI), attackers will increasingly resort to data-only attacks that can be equally powerful. Earlier research demonstrated that data-only attacks can be as devastating as control-flow hijacking attacks. So far, constructing data-only attacks was cumbersome and required deep manual analysis. We introduce the idea of Block-Oriented Programming (BOP) where, based on a C-like programming language and the help of constraint solving, we automatically synthesize data-only exploits that run arbitrary payloads on host programs. The payloads are expressed as changes to the program state that can be injected into the program based on arbitrary read/write primitives. As input, BOP requires an exploit written in our C dialect, a host program, a location in the program with an arbitrary write primitive (i.e., the starting point of the exploit), and potentially the location of an arbitrary read primitive (i.e., to bypass ASLR). As output, our BOP compiler (BOPC) will produce (i) a set of program state modifications through the arbitrary write primitive to execute the program, (ii) prove that the exploit program is unsatisfiable, i.e., it cannot be synthesized on this program, or (iii) timeout.
We assume the host program is protected through common mitigations such as R^X, shadow stacks (or stack canaries), ASLR, and CFI. CFI is a strong stop the exploit defense that restricts the program's control flow to valid targets. While the control flow is guarded, the attacker is still free to make arbitrary changes to the program state. These program state changes allow the attacker to bend the control flow to unintended functionality. While CFI restricts execution to the program functionality, it does not restrict how this functionality is used. This gap allows the attacker to expose any functionality by bending control flow to that functionality and controlling its parameters. For example, an attacker may overwrite the is_admin variable to raise their privilege level. By corrupting program state, the attacker controls the targets of conditional branches, enabling them to reach unintended functionality. By corrupting function pointers with other valid targets (e.g., replacing a benign function pointer with another function pointer in the same equivalence set, for more details, check out the earlier CFI blog post) the attacker can stitch together different parts of the program.
SPL payload
Finding gadgets is already hard for regular ROP payloads but BOP payloads are orders of magnitude harder to find. We therefore introduce a programming language that allows an analyst to express BOP payloads in a high level language. On one hand, this increases the flexibility of the search as the language implements degrees of freedom that can be mapped to different constructs. On the other hand, the language simplifies the task of the analyst. We base our language on a C dialect that allows explicit use of virtual registers (every virtual register will be mapped to a machine register but the compiler may spill them or switch registers at arbitrary points during the payload execution).
An simple example of an execve payload is as follows:
void payload() { string prog = "/bin/sh\0"; int64 * argv = { &prog, 0x0 }; __r0 = &prog; __r1 = &argv; __r2 = 0; execve (__r0, __r1, __r2); }
Given the SPL payload, BOPC translates each SPL statement into a set of constraints. These constraints are then mapped to code blocks in the host program.
Finding BOP gadgets
BOP adapts the concept of ROP (or JOP) in an environment where changes to the control flow are highly restricted. Classic code reuse techniques such as ROP and JOP assume that control flow can be arbitrarily redirected to attacker-controlled locations. For BOP, we assume that indirect control flow transfers (both function returns and indirect jumps/calls) are highly restricted to benign targets. To execute arbitrary computation on top of a host program, the attacker now needs to (a) find parts of the program that implement the desired behavior and (b) stitch these program locations together based on a valid path in the program. In the first task, the attacker finds gadgets in the program. In the second task, the attacker connects these gadgets through a path, carefully controlling all side effects along that path through program state adjustments. A final BOP gadget then consists of a functional part that executes the desired behavior and a dispatcher part that bends execution to the next gadget. The exploit must adjust all program state to correctly execute both the functional part and the dispatcher part.
Functional blocks
Given the constraints from an SPL program, our compiler searches for so called functional blocks, i.e., code blocks that implement or satisfy a certain SPL statement. In this step, all code blocks are marked. Each SPL statement is assigned a set of code blocks that could be used to implement the underlying functionality.
At this step, if any SPL statement has no corresponding functional block we will return unsat and inform the analyst which statement is unsatisfiable.
Dispatcher blocks
Given the candidate functional blocks, we try to synthesize paths between functional blocks. The process starts with two functional blocks where we try to find a path that has satisfiable constraints. This path is then extended to three, four, five statements, gradually extending the path until we have constructed the complete program. The underlying problem is NP hard and similar to symbolic execution, the success rate depends on the underlying heuristics and strategy of which functional blocks are chosen and where we start with the path construction. If successful, this step returns a path that executes the SPL payload on top of the host program.
The image above shows the path synthesis. Starting from the blue entry point, any selection of a functional gadget candidate restricts the feasible paths for the second gadget. While many candidates exist, synthesizing paths is hard due to the requirement of a satisfiable path from the gadget to the next.
Implementation and evaluation
Please refer to the BOPC paper or the BOPC presentation (in the GitHub repository) for more detail. To play with the code, check out the readme in the GitHub page. Also, for more information on related work, please refer to the related work section in the paper and to Data-oriented programming and Automatic Generation of Data-Oriented Exploits, two earlier works on the same topic that inspired BOPC.
In the paper, we evaluate BOPC using a set of 10 real programs with real exploits. We assume that they are protected with a strong forward edge CFI mechanism and a shadow stack on the backward edge. Using 13 SPL exploit payloads, we evaluate if BOPC can synthesize the payloads based on existing CVEs as entry points. The full evaluation is again in the paper. We focus on nginx as a case study and successfully synthesize execve, loops, and if-else conditions.
If-else condition on nginx that allows an SPL program to update state depending on a simple conditional.
Complex arbitrary loop that allows the SPL program to, e.g., continuously read or write data, a stepping stone for a generic information leak if, e.g., execve is not available.
As always, we release the full BOPC code on GitHub and we invite you to play with the tools. A longer readme on the GitHub page explains how you can run our SPL payloads, describes the language, and allows you to play with different programs.
Summary
BOPC enables fully-automatic (or mostly automatic) synthesis of data-flow payloads that execute arbitrary computation on top of existing programs. Exploits are encoded using a simplified C dialect and the path synthesis leverages several heuristics to tame the NP-hard problem. Go ahead and have fun with the code!