With the rise of strong control-flow defenses such as Control-Flow Integrity
(CFI), attackers will increasingly resort to data-only attacks that can be
equally powerful. Earlier research demonstrated that data-only attacks can be as
devastating as control-flow hijacking attacks. So far, constructing data-only
attacks was cumbersome and required deep manual analysis. We introduce the idea
of Block-Oriented Programming (BOP) where, based on a C-like programming
language and the help of constraint solving, we automatically synthesize
data-only exploits that run arbitrary payloads on host programs. The payloads
are expressed as changes to the program state that can be injected into the
program based on arbitrary read/write primitives. As input, BOP requires an
exploit written in our C dialect, a host program, a location in the program
with an arbitrary write primitive (i.e., the starting point of the exploit),
and potentially the location of an arbitrary read primitive (i.e., to bypass
ASLR). As output, our BOP compiler (BOPC) will produce (i) a set of program
state modifications through the arbitrary write primitive to execute the
program, (ii) prove that the exploit program is unsatisfiable, i.e., it cannot
be synthesized on this program, or (iii) timeout.
We assume the host program is protected through common mitigations such as R^X,
shadow stacks (or stack canaries), ASLR, and CFI. CFI is a strong stop the
exploit defense that restricts the program's control flow to valid targets.
While the control flow is guarded, the attacker is still free to make arbitrary
changes to the program state. These program state changes allow the attacker to
bend the control flow to unintended functionality. While CFI restricts
execution to the program functionality, it does not restrict how this
functionality is used. This gap allows the attacker to expose any functionality
by bending control flow to that functionality and controlling its parameters.
For example, an attacker may overwrite the is_admin variable to raise their
privilege level. By corrupting program state, the attacker controls the targets
of conditional branches, enabling them to reach unintended functionality. By
corrupting function pointers with other valid targets (e.g., replacing a benign
function pointer with another function pointer in the same equivalence set, for
more details, check out the earlier CFI blog post) the attacker can stitch
together different parts of the program.
SPL payload
Finding gadgets is already hard for regular ROP payloads but BOP payloads are
orders of magnitude harder to find. We therefore introduce a programming
language that allows an analyst to express BOP payloads in a high level
language. On one hand, this increases the flexibility of the search as the
language implements degrees of freedom that can be mapped to different
constructs. On the other hand, the language simplifies the task of the analyst.
We base our language on a C dialect that allows explicit use of virtual
registers (every virtual register will be mapped to a machine register but the
compiler may spill them or switch registers at arbitrary points during the
payload execution).
An simple example of an execve payload is as follows:
void payload() {
string prog = "/bin/sh\0";
int64 * argv = { &prog, 0x0 };
__r0 = &prog;
__r1 = &argv;
__r2 = 0;
execve (__r0, __r1, __r2);
}
Given the SPL payload, BOPC translates each SPL statement into a set of
constraints. These constraints are then mapped to code blocks in the host
program.
Finding BOP gadgets
BOP adapts the concept of ROP (or JOP) in an environment where changes to the
control flow are highly restricted. Classic code reuse techniques such as ROP
and JOP assume that control flow can be arbitrarily redirected to
attacker-controlled locations. For BOP, we assume that indirect control flow
transfers (both function returns and indirect jumps/calls) are highly restricted
to benign targets. To execute arbitrary computation on top of a host program,
the attacker now needs to (a) find parts of the program that implement the desired
behavior and (b) stitch these program locations together based on a valid path
in the program. In the first task, the attacker finds gadgets in the program.
In the second task, the attacker connects these gadgets through a path,
carefully controlling all side effects along that path through program state
adjustments. A final BOP gadget then consists of a functional part that
executes the desired behavior and a dispatcher part that bends execution to
the next gadget. The exploit must adjust all program state to correctly execute
both the functional part and the dispatcher part.
Functional blocks
Given the constraints from an SPL program, our compiler searches for so called
functional blocks, i.e., code blocks that implement or satisfy a certain SPL
statement. In this step, all code blocks are marked. Each SPL statement is
assigned a set of code blocks that could be used to implement the underlying
functionality.
At this step, if any SPL statement has no corresponding functional block we
will return unsat and inform the analyst which statement is unsatisfiable.
Dispatcher blocks
Given the candidate functional blocks, we try to synthesize paths between
functional blocks. The process starts with two functional blocks where we try to
find a path that has satisfiable constraints. This path is then extended to
three, four, five statements, gradually extending the path until we have
constructed the complete program. The underlying problem is NP hard and similar
to symbolic execution, the success rate depends on the underlying heuristics and
strategy of which functional blocks are chosen and where we start with the path
construction. If successful, this step returns a path that executes the SPL
payload on top of the host program.
The image above shows the path synthesis. Starting from the blue entry point,
any selection of a functional gadget candidate restricts the feasible paths for
the second gadget. While many candidates exist, synthesizing paths is hard due
to the requirement of a satisfiable path from the gadget to the next.
Implementation and evaluation
Please refer to the BOPC paper or the BOPC presentation
(in the GitHub repository) for more detail. To play with the code, check out the
readme in the GitHub page. Also, for more
information on related work, please refer to the related work section in the
paper and to Data-oriented programming and Automatic Generation of
Data-Oriented Exploits,
two earlier works on the same topic that inspired BOPC.
In the paper, we evaluate BOPC using a set of 10 real programs with real
exploits. We assume that they are protected with a strong forward edge CFI
mechanism and a shadow stack on the backward edge. Using 13 SPL exploit
payloads, we evaluate if BOPC can synthesize the payloads based on existing CVEs
as entry points. The full evaluation is again in the paper. We focus on nginx as
a case study and successfully synthesize execve, loops, and if-else
conditions.
If-else condition on nginx that allows an SPL program to update state depending
on a simple conditional.
Complex arbitrary loop that allows the SPL program to, e.g., continuously read
or write data, a stepping stone for a generic information leak if, e.g., execve
is not available.
As always, we release the full BOPC code on GitHub and we invite you to play with the tools. A
longer readme on the GitHub page explains how you can run our SPL payloads,
describes the language, and allows you to play with different programs.
Summary
BOPC enables fully-automatic (or mostly automatic) synthesis of data-flow
payloads that execute arbitrary computation on top of existing programs.
Exploits are encoded using a simplified C dialect and the path synthesis
leverages several heuristics to tame the NP-hard problem. Go ahead and have fun
with the code!