From Fuzzing to Frameworks: 2024 Research Highlights

2024 was an active year for the HexHive research group, marked by tireless efforts to enhance the security of various complex systems. A key trend throughout the year was the continued evolution of fuzzing research. Notably, we observed a gradual shift away from general-purpose fuzzing as a primary research focus, suggesting that this year may represent the peak of activity in this area.

Over the past decade, fuzzing research has seen explosive growth, with many researchers focusing on general-purpose techniques. This surge has led to the discovery of countless bugs, turning fuzzing into a critical tool in software security. However, the frontier of general-purpose fuzzing has largely been explored, and the focus is transitioning from research to engineering. Developers are increasingly integrating fuzzing into their standard workflows, reflecting its maturation as a practice. Major software companies like Google, Microsoft, and Meta now require developers to write fuzz drivers as part of the software development lifecycle --- a testament to the enduring impact of research-driven innovations over the past ten years.

The landscape of fuzzing tools is also consolidating. Effective new mechanisms are being integrated into AFL++, which continues to thrive thanks to tireless community maintenance. Modern alternatives, such as libAFL -- a reimplementation of AFL++ in Rust --- are also gaining traction. Despite these advancements, most fuzzers in 2024 remain focused on detecting memory-safety errors. Consequently, they are predominantly used to test code written in low-level languages like C and C++. This focus limits the adoption of fuzzing in companies that favor higher-level languages, presenting a potential avenue for future research.

This year, our research emphasized several key areas: - Trusted system components: Strengthening foundational elements of secure systems. - Browser security: Exploring innovative techniques to enhance the safety of web browsers. - Oracle creation: Developing novel mechanisms to identify diverse types of software bugs. - Android ecosystem analysis: Investigating the unique security challenges posed by Android's distinct development and deployment paradigms.

As we move forward, we anticipate new research directions and opportunities to further refine and expand the impact of our work in system security.

General purpose fuzzing

As discussed, general-purpose fuzzing has been extensively explored, and we are approaching a Pareto-optimal balance between generating better inputs, enhancing feedback mechanisms, and optimizing execution speeds. With Halo, we investigated two nuanced aspects within this well-charted territory:

Counter-example generation: By analyzing fuzzing campaigns, we characterized the input space to bias the fuzzer toward generating more effective inputs.
Input space refinement: Negative examples --- inputs that neither trigger new coverage nor cause crashes --- were used to improve the description of the input space. This approach allows the fuzzer to "tighten" its input generation and focus on more promising paths.

In Tango (published at RAID'24 and recipient of the Distinguished Paper Award), we addressed a different challenge in fuzzing: statefulness. Many software systems require navigating multiple state transitions before reaching specific functionality. For instance:

Protocol fuzzers must transition through several states to access deeper features.
Games often demand intricate state triggers to achieve specific goals, such as "winning."

Using state inference, Tango enabled us to tackle this complexity. As a striking demonstration, we successfully used it to play challenging "pseudo" 3D games like Doom.

Securing system software

System software remains a critical target for our research. Despite its inherent challenges, its high privilege level makes it essential to secure. The primary challenges lie in its complex interfaces and the fuzzing environment itself. System software exposes a wide range of interfaces that attackers can exploit, such as virtual devices, buses, and hypercalls.

The secure monitor is the most privileged software on an ARM device, operating above the hypervisor to manage interactions between different software domains. Hypervisors, in turn, interact with diverse devices and buses to provide I/O and other services. These diverse interfaces are complex, often stateful, and challenging to model. This complexity is compounded by the lack of documentation and source code for privileged software, forming the first major research challenge.

The second challenge is the fuzzing environment itself. Unlike user-space fuzzing, where the fork() system call simplifies cloning the program under test, system software is much harder to replicate. Creating a new instance is resource-intensive, requiring tasks such as booting a kernel, instantiating a virtual machine, or resetting a phone. This results in high latency and necessitates better techniques to offset the cost, emphasizing the importance of high-quality input to maximize efficiency.

A core aspect of fuzzing is optimization given finite resources. Given limited compute cycles, the goal is to discover as many bugs as possible. While creating high-quality input is costly, it is often beneficial to include some low-quality inputs to explore diverse feedback. The more complex environments of system software demand higher-quality and more stateful inputs to balance the cost of each fuzzing iteration.

EL3XIR: Fuzzing the Secure Monitor

In EL3XIR (SEC'24), we customized a fuzzer to target the secure monitor in the ARMv8-A ecosystem. This component orchestrates transitions between the normal and secure worlds via secure monitor (smc) calls. Key challenges included limited introspection, rehosting difficulties, and a complex input space. Our contributions addressed these challenges by:

Partially rehosting the secure monitor firmware, enabling us to fuzz snapshots of partially booted systems.
Developing a reflected peripheral model that infers peripheral behavior from observed interactions and replicates them during fuzzing.
Synthesizing a starting harness by analyzing code in the rich operating system to generate effective initial inputs.

These innovations allowed us to deeply explore the secure monitor firmware, uncovering 34 significant bugs in this highly privileged component.

HyperPill: Targeting the Hardware Virtualization Interface

Building on our work with ViDeZZo (Oakland'23), which explored stateful interactions with peripherals, we developed HyperPill (SEC'24 --- winning a distinguished paper award). This project shifted focus to the hardware virtualization interface by snapshotting the vmcs state and exploring it via emulation. While emulators are typically slow, they allow cheap instantiation of existing snapshots. Our evaluation covered major x86 hypervisors and uncovered critical bugs in QEMU, Hyper-V, and the macOS virtualization framework. Looking ahead, we are extending this approach to more stateful devices with Truman (NDSS'25).

SyzRisk: Prioritizing Fast-Moving Codebases

The Linux kernel, with its ~20 million lines of code and ~400,000 annual changes, exemplifies the challenge of fast-moving codebases. In SyzRisk (AsiaCCS'24), we analyzed commit message patterns to identify changes likely to expose vulnerabilities. By directing fuzzing efforts toward these areas, we demonstrated a more efficient allocation of fuzzing resources.

SyzTrust: Exploring Trusted Applications

In SyzTrust (Oakland'24), we focused on trusted operating systems of embedded systems. Using external debuggers, we collected precise execution traces to analyze interactions between trusted applications and the rest of the system. This approach allowed the fuzzer to target promising inputs more effectively, leading to better bug discovery.

Fuzzing Summary

Our work on fuzzing privileged systems underscores the importance of extracting meaningful signals. Due to the higher cost of each fuzzing iteration compared to user-space applications, producing high-quality input and feedback is crucial. Each of these projects represents a tailored approach to specific challenges, advancing the state of fuzzing for complex and privileged software.

The browser, a complex target

This year, we advanced browser fuzzing with the development of a new intermediate representation that enhances our ability to target complex browser components. In GraphIR (CCS'24), we introduced this intermediate representation to provide the fuzzer with more effective mutational capabilities. By maintaining target JavaScript programs in this intermediate representation, our fuzzer applies mutation operators directly to it, enabling more sophisticated and efficient exploration.

Looking ahead, we will present DUMPLING at NDSS'25, where we leverage differential testing on the V8 JavaScript engine. This approach exposes subtle desynchronization bugs through a novel oracle, further advancing our capabilities in browser fuzzing.

In previous work, we explored specific browser components, such as fuzzing the WebGL interface (SEC'23). WebGL exposes the OpenGL interface to JavaScript, enabling 3D computations. We hypothesized that the graphics stack, being highly optimized, was (likely) under-tested. Targeting this interface posed unique challenges due to its span across multiple layers and abstractions, including the browser, libraries, the operating system kernel, and even the GPU. Achieving adequate coverage in such a complex system was difficult.

Our key innovation was replacing traditional coverage metrics with debug signals from the browser. The fuzzer generated JavaScript code that interacted with WebGL and monitored debug messages in the browser console to identify malformed code and its effects. Using these debug signals, the fuzzer iteratively mutated the code to provoke more interesting and intricate interactions with the WebGL stack.

These advancements highlight our desire to improve browser security through innovative fuzzing techniques and tailored solutions for complex components.

Introducing new Oracles

Memory safety violations are effectively detected using oracles such as AddressSanitizer, which are well-suited to low-level software and have uncovered numerous bugs in the past. However, there are many bugs that go beyond memory safety errors. In Monarch (ATC'24), we explored alternative oracles designed to detect logic bugs in distributed filesystems, such as desynchronization and other failure modes.

In parallel, we have investigated methods to prove memory accesses safe. In Uriah (CCS'24), we focused on analyzing heap accesses and demonstrated that a significant portion can be statically proven safe. This eliminates the need for instrumentation during fuzzing campaigns or even at runtime, offering stronger safety guarantees with reduced overhead. At NDSS'25, we will further explore this theme with QMSan, which uses binary rewriting to detect uninitialized reads as part of fuzzing campaigns.

In the domain of theoretical language foundations, we built upon our prior work on Enclosure (ASPLOS'21) to develop Gradient (OOPSLA'24), a language-based compartmentalization mechanism. Gradient allows developers to define fine-grained enclosures to control data accessibility. This enables natural expression of security policies when loading potentially untrusted or buggy library code, empowering developers to enforce robust compartmentalization directly within the application.

These projects represent our continued commitment to advancing memory safety, bug detection, and secure software development practices through innovative tools and foundational research.

Android security

This year, we placed a strong emphasis on Android security, building on our earlier work such as EL3XIR, where we fuzzed the secure monitor. Beyond this, we delved deeper into the security of trusted applications, uncovering critical vulnerabilities in the API used to access them.

In Spill the TeA, we conducted an empirical study examining how trusted applications are patched and whether they are protected against rollback attacks. While these applications are signed, they often lack robust rollback protection. Specifically, if rollback counters are not incremented, older (and potentially vulnerable) versions of applications can still be loaded and exploited. Alarmingly, we found that rollback counters are rarely utilized, leaving Android users exposed to attacks from outdated applications across different devices and models.

Our study also highlighted a fundamental flaw in the GlobalPlatform API, which governs the interaction between Android applications and trusted applications. This API requires developers to verify whether each argument in a call is a scalar or a pointer. Unfortunately, this crucial step is frequently overlooked, leading to arbitrary write vulnerabilities in trusted applications. This issue was widespread, and our findings, detailed in GlobalConfusion, prompted vendors to update the GlobalPlatform API standard to address this recurring vulnerability.

Additionally, we investigated the security of Android's hardened scudo memory allocator, which is designed to make heap allocations less predictable using probabilistic keys. However, we discovered that Android's "Zygote" fork model compromises several of Scudo's mitigations, making exploitation easier. In our Scudo paper, presented at WOOT'24 and awarded Best Paper, we explored various attack patterns that exploit these weakened protections.

In summary, Android's complexity and the interactions between its many components create a unique security landscape, exposing vulnerabilities in trusted application access, inter-application communication, and library usage. Over the coming years, we plan to continue exploring the Android ecosystem, as it presents numerous opportunities for impactful security research.

Conclusion and Outlook

This year, we made significant strides in security research across fuzzing, system software, browser security, and Android security. Our work in fuzzing, including Halo and Tango, tackled stateful systems and advanced input generation. On privileged systems, tools like EL3XIR and HyperPill revealed critical bugs in secure monitors and hypervisors.

In browser security, GraphIR enhanced fuzzing capabilities, while alternative oracles such as Monarch uncovered logic bugs. In Android security, we identified rollback vulnerabilities and critical API flaws, while demonstrating exploitation risks in memory allocators.

Looking ahead, we aim to deepen our focus on stateful fuzzing, evolving security frameworks, and exploring new attack surfaces. While fuzzing research is gradually slowing down, we continue to explore niche areas and develop specialized oracles to address underexplored challenges. In the medium term, fuzzing is expected to transition from a research focus to an engineering discipline, with research efforts shifting to other domains. Promising areas of exploration include alternative oracles that go beyond memory safety, advancements in compartmentalization, and the complex and dynamic field of Android security.

We are excited about the opportunities that lie ahead and look forward to another productive year of research. None of this would be possible without the dedication and creativity of our exceptional team of postdocs, PhD students, researchers, and collaborators whose passion and vision drive our efforts to push the boundaries of security.