Control-Flow Integrity

Defenses that leverage runtime monitors detect specific attack vectors (or bug classes) and flag an active exploitation attempt, usually terminating the underlying application (potentially collecting information for post-mortem debugging). Control-Flow Integrity (CFI) is such a runtime monitor that observes the execution of an application, comparing it to a well-known state. At a high level, CFI restricts the control-flow of an application to valid execution traces.

CFI is a defense mechanism that detects control-flow hijacking attacks by limiting the targets of control-flow transfers. In a control-flow hijack attack an attacker redirects the control-flow of the application to locations that would not be reached in a benign execution, e.g., to injected code or to code that is reused in an alternate context.

Since the initial idea for the [1] defense mechanism and the first (closed source) prototype were presented in 2005 a plethora of alternate CFI-style defenses were propose and implemented. While all these alternatives slightly change the underlying enforcement or analysis, they all try to implement the CFI policy. The goal of this blog post is neither to look at differences between individual CFI mechanisms as we did in [2] (see this paper for an exhaustive enumeration of related work in the area of CFI) nor to compare CFI against alternate mechanisms as we did in [3] but tries to explain what CFI is, what it can do, and where its limits are.

Any CFI mechanism consists of two abstract components: the (often static) analysis component that recovers the Control-Flow Graph (CFG) of the application and the dynamic enforcement mechanism that restricts control flows according to the generated CFG.

Control-Flow Transfer Primer

Instructions on an architecture can be grouped into control-flow transfer instructions and computational instructions. Computational instructions are executed in sequence (one after the other), while control-flow transfer instructions change control-flow in a specific way, conditionally or unconditionally redirecting control-flow to a specific location.

Control-flow transfer instructions can be further grouped into direct and indirect control-flow transfer instructions. The target of indirect control-flow transfers depends on the runtime value, e.g., of a register or a memory value compared to direct control-flow transfers where the target is usually encoded as immediate offset in the instruction itself.

Direct control-flow transfers are straight-forward to protect as, on most architectures, executable code is read-only and the target can therefore not be modified by an attacker (even with arbitrary memory corruption as the write protection bit must first be disabled). These direct control-flow transfers are therefore protected through the read-only permission of individual memory pages and the protection is enforced (at no overhead) by the underlying hardware (the memory management unit).

Indirect control-flow transfers are further divided into forward-edge control-flow transfers and backward-edge control-flow transfer. Forward-edge control flow transfers direct code forward to a new location and are used in indirect jump and indirect call instructions which are mapped at the source code level to, e.g., switch statements, indirect calls, or virtual calls. The backward-edge is used to return to a location that was used in a forward-edge earlier, e.g., when returning from a function call through a return instruction. For simplicity, we leave interrupts, interrupt returns, and exceptions out of the discussion.

Generating Control-Flow Graphs

A CFG is a graph that covers all valid executions of the program. Nodes in the graph are locations of control-flow transfers in the program and edges encode reachable targets. The CFG is an abstract concept and the different existing CFI mechanisms use different approaches to generate the underlying CFGs using both static and dynamic analysis, relying either on the binary or source code.

For forward edges, the CFG generation enumerates all possible targets, often leveraging information from the underlying source language. Switch statements in C/C++ are a good example as the different targets are statically known and the compiler can generate a fixed jump table and emit an indirect jump with a bound check to guarantee that the target used at runtime is one of the valid targets in the switch statement.

For indirect function calls through a function pointer, the underlying analysis becomes more complicated as the target may not be a-priori known. Common source-based analyses use a type-based approach and, looking at the function prototype of the function pointer that is used, enumerate all matching functions. Different CFI mechanisms use different forms of type equality, e.g., any valid function, functions with the same arity (number of arguments), or functions with the same types of arguments. At runtime, any of these functions with matching prototype is allowed. Just looking at function prototypes likely yields several collisions where functions are reachable that may never be called in practice. The analysis therefore over-approximates the valid set of targets. In practice, the compiler can check which functions are address taken, i.e., there is a source line that generates the address of the function and stores it.

For virtual calls, i.e., indirect calls in C++ that depend on the type of the object and the class relationship, the analysis can further leverage the type of the object to further restrict the valid functions (e.g., all object constructors have the same signature but only the subset constructors of related classes are feasible.

So far, the constructed CFGs are stateless, i.e., the context of the execution is not considered and each control-flow transfer is independent of all others. At runtime only one target is allowed for any possible transfer while the described CFG construction will over-approximate the number of valid targets with different granularities, depending on the precision of the analysis. Some mechanisms take path constraints into consideration and check (for a limited depth) if the path taken through the application is feasible by using a dynamic analysis approach that validates the current execution. So far, only few mechanisms look at the path context as this incurs dynamic tracking costs at runtime.

Enforcing CFI

CFI can be enforced at different levels and sometimes even the analysis phase (CFG construction) and enforcement phase overlap, e.g., when considering path constraints. Most mechanisms have two fundamental mechanisms, one for forward-edge transfers and one for backward-edge transfers.

For forward-edge transfers, the code is often instrumented with some form of equivalence check. The check ensures that the target observed at runtime is in the set of valid targets. This can be done through a full set check or a simple type comparison that, e.g., hashes function prototypes and checks if the hash for the current target equals the expected hash at the callsite. The hash for the function can be embedded inline in the function, before the function, or in an orthogonal metadata table.

Backward-edge transfers are harder to protect as, when using the same approach, the attacker may redirect the control-flow to any valid callsite when returning from a function. Strong backward-edge protections therefore leverage the context through the priorly called functions on the stack. A mechanism that enforces stack integrity ensures that any backward-edge transfers can only return to the most recent prior caller. This property can be enforced by storing the prior call sites in a shadow stack or guaranteeing memory safety on the stack, i.e., if the return instructions cannot be modified then stack integrity trivially holds.

Summary

CFI is a strong defense mechanism that restricts the freedom of an attacker. Attackers may still corrupt memory and data-only attacks are still in scope. For the forward-edge a strong mechanism must consider language-specific semantics to restrict the set of valid targets as much as possible. For the forward-edge, most mechanisms are stateless and allow an attacker to redirect control-flow to any valid location as identified by the CFG construction. It is therefore crucial to limit the size of the target sets as much as possible to constrain the attacker on the forward edge. For the backward edge, a context-sensitive approach that enforces stack integrity guarantees full protection.

[1]Control-Flow Integrity. Martin Abadi, Mihai Budiu, Ulfar Erlingsson, Jay Ligatti. In CCS'05
[2]Control-Flow Integrity: Precision, Security, And Performance.
[3]`TODO <>`_

Docutils System Messages

System Message: ERROR/3 (/opt/blog/nebelwelt-blog/content/20160911-ControlFlowIntegrity.rst, line 165); backlink

Unknown target name: "todo <>".

blogroll

social