2024 was an active year for the HexHive research group, marked by tireless
efforts to enhance the security of various complex systems. A key trend
throughout the year was the continued evolution of fuzzing research. Notably, we
observed a gradual shift away from general-purpose fuzzing as a primary research
focus, suggesting that this year may represent the peak of activity in this
area.
Over the past decade, fuzzing research has seen explosive growth, with many
researchers focusing on general-purpose techniques. This surge has led to the
discovery of countless bugs, turning fuzzing into a critical tool in software
security. However, the frontier of general-purpose fuzzing has largely been
explored, and the focus is transitioning from research to engineering.
Developers are increasingly integrating fuzzing into their standard workflows,
reflecting its maturation as a practice. Major software companies like Google,
Microsoft, and Meta now require developers to write fuzz drivers as part of the
software development lifecycle --- a testament to the enduring impact of
research-driven innovations over the past ten years.
The landscape of fuzzing tools is also consolidating. Effective new mechanisms
are being integrated into AFL++, which continues to thrive thanks to tireless
community maintenance. Modern alternatives, such as libAFL -- a reimplementation of
AFL++ in Rust --- are also gaining traction. Despite these advancements, most
fuzzers in 2024 remain focused on detecting memory-safety errors. Consequently,
they are predominantly used to test code written in low-level languages like C
and C++. This focus limits the adoption of fuzzing in companies that favor
higher-level languages, presenting a potential avenue for future research.
This year, our research emphasized several key areas:
- Trusted system components: Strengthening foundational elements of secure systems.
- Browser security: Exploring innovative techniques to enhance the safety of web browsers.
- Oracle creation: Developing novel mechanisms to identify diverse types of software bugs.
- Android ecosystem analysis: Investigating the unique security challenges posed by Android's distinct development and deployment paradigms.
As we move forward, we anticipate new research directions and opportunities to
further refine and expand the impact of our work in system security.
General purpose fuzzing
As discussed, general-purpose fuzzing has been extensively explored, and we are
approaching a Pareto-optimal balance between generating better inputs, enhancing
feedback mechanisms, and optimizing execution speeds. With Halo, we investigated two nuanced
aspects within this well-charted territory:
- Counter-example generation: By analyzing fuzzing campaigns, we
characterized the input space to bias the fuzzer toward generating more
effective inputs.
- Input space refinement: Negative examples --- inputs that neither trigger new
coverage nor cause crashes --- were used to improve the description of the input
space. This approach allows the fuzzer to "tighten" its input generation and
focus on more promising paths.
In Tango (published at RAID'24 and
recipient of the Distinguished Paper Award), we addressed a different challenge
in fuzzing: statefulness. Many software systems require navigating multiple
state transitions before reaching specific functionality. For instance:
- Protocol fuzzers must transition through several states to access deeper features.
- Games often demand intricate state triggers to achieve specific goals, such as "winning."
Using state inference, Tango enabled us to tackle this complexity. As a striking
demonstration, we successfully used it to play challenging "pseudo" 3D games
like Doom.
Securing system software
System software remains a critical target for our research. Despite its inherent
challenges, its high privilege level makes it essential to secure. The primary
challenges lie in its complex interfaces and the fuzzing environment itself.
System software exposes a wide range of interfaces that attackers can exploit,
such as virtual devices, buses, and hypercalls.
The secure monitor is the most privileged software on an ARM device,
operating above the hypervisor to manage interactions between different software
domains. Hypervisors, in turn, interact with diverse devices and buses to
provide I/O and other services. These diverse interfaces are complex, often
stateful, and challenging to model. This complexity is compounded by the lack of
documentation and source code for privileged software, forming the first major
research challenge.
The second challenge is the fuzzing environment itself. Unlike user-space
fuzzing, where the fork() system call simplifies cloning the program under
test, system software is much harder to replicate. Creating a new instance is
resource-intensive, requiring tasks such as booting a kernel, instantiating a
virtual machine, or resetting a phone. This results in high latency and
necessitates better techniques to offset the cost, emphasizing the importance of
high-quality input to maximize efficiency.
A core aspect of fuzzing is optimization given finite resources. Given limited
compute cycles, the goal is to discover as many bugs as possible. While creating
high-quality input is costly, it is often beneficial to include some low-quality
inputs to explore diverse feedback. The more complex environments of system
software demand higher-quality and more stateful inputs to balance the cost of
each fuzzing iteration.
EL3XIR: Fuzzing the Secure Monitor
In EL3XIR (SEC'24), we customized a
fuzzer to target the secure monitor in the ARMv8-A ecosystem. This component
orchestrates transitions between the normal and secure worlds via secure monitor
(smc) calls. Key challenges included limited introspection, rehosting
difficulties, and a complex input space. Our contributions addressed these
challenges by:
- Partially rehosting the secure monitor firmware, enabling us to fuzz snapshots of partially booted systems.
- Developing a reflected peripheral model that infers peripheral behavior from observed interactions and replicates them during fuzzing.
- Synthesizing a starting harness by analyzing code in the rich operating system to generate effective initial inputs.
These innovations allowed us to deeply explore the secure monitor firmware,
uncovering 34 significant bugs in this highly privileged component.
HyperPill: Targeting the Hardware Virtualization Interface
Building on our work with ViDeZZo (Oakland'23), which explored
stateful interactions with peripherals, we developed HyperPill (SEC'24 --- winning a distinguished
paper award). This project shifted focus
to the hardware virtualization interface by snapshotting the vmcs state and
exploring it via emulation. While emulators are typically slow, they allow cheap
instantiation of existing snapshots. Our evaluation covered major x86
hypervisors and uncovered critical bugs in QEMU, Hyper-V, and the macOS
virtualization framework. Looking ahead, we are extending this approach to more
stateful devices with Truman
(NDSS'25).
SyzRisk: Prioritizing Fast-Moving Codebases
The Linux kernel, with its ~20 million lines of code and ~400,000 annual
changes, exemplifies the challenge of fast-moving codebases. In SyzRisk (AsiaCCS'24), we analyzed commit
message patterns to identify changes likely to expose vulnerabilities. By
directing fuzzing efforts toward these areas, we demonstrated a more efficient
allocation of fuzzing resources.
SyzTrust: Exploring Trusted Applications
In SyzTrust (Oakland'24), we
focused on trusted operating systems of embedded systems. Using external debuggers,
we collected precise execution traces to analyze interactions between trusted
applications and the rest of the system. This approach allowed the fuzzer to
target promising inputs more effectively, leading to better bug discovery.
Fuzzing Summary
Our work on fuzzing privileged systems underscores the importance of extracting
meaningful signals. Due to the higher cost of each fuzzing iteration compared to
user-space applications, producing high-quality input and feedback is crucial.
Each of these projects represents a tailored approach to specific challenges,
advancing the state of fuzzing for complex and privileged software.
The browser, a complex target
This year, we advanced browser fuzzing with the development of a new
intermediate representation that enhances our ability to target complex browser
components. In GraphIR (CCS'24), we
introduced this intermediate representation to provide the fuzzer with more
effective mutational capabilities. By maintaining target JavaScript programs in
this intermediate representation, our fuzzer applies mutation operators directly
to it, enabling more sophisticated and efficient exploration.
Looking ahead, we will present DUMPLING at NDSS'25, where we leverage
differential testing on the V8 JavaScript engine. This approach exposes subtle
desynchronization bugs through a novel oracle, further advancing our
capabilities in browser fuzzing.
In previous work, we explored specific browser components, such as fuzzing the
WebGL interface (SEC'23). WebGL
exposes the OpenGL interface to JavaScript, enabling 3D computations. We
hypothesized that the graphics stack, being highly optimized, was (likely)
under-tested. Targeting this interface posed unique challenges due to its span
across multiple layers and abstractions, including the browser, libraries, the
operating system kernel, and even the GPU. Achieving adequate coverage in such a
complex system was difficult.
Our key innovation was replacing traditional coverage metrics with debug signals
from the browser. The fuzzer generated JavaScript code that interacted with
WebGL and monitored debug messages in the browser console to identify malformed
code and its effects. Using these debug signals, the fuzzer iteratively mutated
the code to provoke more interesting and intricate interactions with the WebGL
stack.
These advancements highlight our desire to improve browser security through
innovative fuzzing techniques and tailored solutions for complex components.
Introducing new Oracles
Memory safety violations are effectively detected using oracles such as
AddressSanitizer, which are well-suited to low-level software and have uncovered
numerous bugs in the past. However, there are many bugs that go beyond memory
safety errors. In Monarch (ATC'24),
we explored alternative oracles designed to detect logic bugs in distributed
filesystems, such as desynchronization and other failure modes.
In parallel, we have investigated methods to prove memory accesses safe. In
Uriah (CCS'24), we focused on
analyzing heap accesses and demonstrated that a significant portion can be
statically proven safe. This eliminates the need for instrumentation during
fuzzing campaigns or even at runtime, offering stronger safety guarantees with
reduced overhead. At NDSS'25, we will further explore this theme with QMSan, which uses binary rewriting to
detect uninitialized reads as part of fuzzing campaigns.
In the domain of theoretical language foundations, we built upon our prior work
on Enclosure (ASPLOS'21) to
develop Gradient (OOPSLA'24), a
language-based compartmentalization mechanism. Gradient allows developers to
define fine-grained enclosures to control data accessibility. This enables
natural expression of security policies when loading potentially untrusted or
buggy library code, empowering developers to enforce robust compartmentalization
directly within the application.
These projects represent our continued commitment to advancing memory safety,
bug detection, and secure software development practices through innovative
tools and foundational research.
Android security
This year, we placed a strong emphasis on Android security, building on our
earlier work such as EL3XIR, where
we fuzzed the secure monitor. Beyond this, we delved deeper into the security of
trusted applications, uncovering critical vulnerabilities in the API used to
access them.
In Spill the TeA, we conducted an
empirical study examining how trusted applications are patched and whether they
are protected against rollback attacks. While these applications are signed,
they often lack robust rollback protection. Specifically, if rollback counters
are not incremented, older (and potentially vulnerable) versions of applications
can still be loaded and exploited. Alarmingly, we found that rollback counters
are rarely utilized, leaving Android users exposed to attacks from outdated
applications across different devices and models.
Our study also highlighted a fundamental flaw in the GlobalPlatform API, which
governs the interaction between Android applications and trusted applications.
This API requires developers to verify whether each argument in a call is a
scalar or a pointer. Unfortunately, this crucial step is frequently overlooked,
leading to arbitrary write vulnerabilities in trusted applications. This issue
was widespread, and our findings, detailed in GlobalConfusion, prompted vendors to update the
GlobalPlatform API standard to address this recurring vulnerability.
Additionally, we investigated the security of Android's hardened scudo
memory allocator, which is designed to make heap allocations less predictable
using probabilistic keys. However, we discovered that Android's "Zygote" fork
model compromises several of Scudo's mitigations, making exploitation easier. In
our Scudo paper, presented at
WOOT'24 and awarded Best Paper, we explored various attack patterns that exploit
these weakened protections.
In summary, Android's complexity and the interactions between its many
components create a unique security landscape, exposing vulnerabilities in
trusted application access, inter-application communication, and library usage.
Over the coming years, we plan to continue exploring the Android ecosystem, as
it presents numerous opportunities for impactful security research.
Conclusion and Outlook
This year, we made significant strides in security research across fuzzing,
system software, browser security, and Android security. Our work in fuzzing,
including Halo and Tango, tackled stateful systems and advanced input
generation. On privileged systems, tools like EL3XIR and HyperPill revealed
critical bugs in secure monitors and hypervisors.
In browser security, GraphIR enhanced fuzzing capabilities, while alternative
oracles such as Monarch uncovered logic bugs. In Android security, we identified
rollback vulnerabilities and critical API flaws, while demonstrating
exploitation risks in memory allocators.
Looking ahead, we aim to deepen our focus on stateful fuzzing, evolving security
frameworks, and exploring new attack surfaces.
While fuzzing research is gradually slowing down, we continue to explore niche
areas and develop specialized oracles to address underexplored challenges. In
the medium term, fuzzing is expected to transition from a research focus to an
engineering discipline, with research efforts shifting to other domains.
Promising areas of exploration include alternative oracles that go beyond memory
safety, advancements in compartmentalization, and the complex and dynamic field
of Android security.
We are excited about the opportunities that lie ahead and look forward to
another productive year of research. None of this would be possible without the
dedication and creativity of our exceptional team of postdocs, PhD students,
researchers, and collaborators whose passion and vision drive our efforts to
push the boundaries of security.