QMSan: discovering uninitialized memory errors in binaries

Sanitizers serve as the primary bug detection Oracle during automated testing. They "crash" the program gracefully and tell the fuzzer when and where a bug was triggered. The most well-known sanitizer is ASan or AddressSanitizer which adds redzones around memory objects to detect whenever an access is out-of-bounds. MSan or MemorySanitizer detects access to uninitialized memory. Upon allocation, all memory of an object is marked as uninitialized. When written, the memory is marked as initialized. For each read, the sanitizer validates that the underlying memory was properly initialized.

Compared to ASan, MSan faces a key challenge: all code must be instrumented, otherwise, some memory initialization may be missed resulting in false positive crashes. For ASan, missing instrumentation only results in false negatives (i.e., some bugs will be missed) but MSan is prone to both false positives and false negatives which so far hindered MSan's deployment. Therefore, only 210 out of 528 targets of OSS-Fuzz were fuzzed with MSan.

In our QMSan project, we develop a mechanism that reduces these false positives, creating an efficient MSan sanitizer for fuzzing environments. Our system uses two key ideas to reduce false positives: first, we use binary instrumentation to track all writes and second, we build a two-stage approach to massively reduce the cost of tracking initialized data.

The first contribution is simple: instead of using a compiler-pass to add instrumentation during compilation, we add the instrumentation through QEMU when code is being executed. The QEMU binary instrumentation engine allows us to analyze and instrument all code as it is being executed and gives us complete support for all code. Binary instrumentation comes at a slight overhead to compiler-based instrumentation but the reduction of false positives is worth the trade-off.

qmsan

The second contribution is geared towards making MSan efficient for fuzzing. MSan requires that the state of memory (initialized or not) is copied whenever data is copied. This requires an expensive propagation of metadata information whenever data is read and stored in other areas of memory. Our key idea here was that fuzzing allows us to replay executions as we have the exact input. For the majority of executions, we only track reads and writes but ignore shadow propagation (that is otherwise required on nearly all other instructions). If our sanitizer detects a crash, we call i ta potential violation and replay the same input with the full sanitizer that extensively propagates memory. If it turns out to be true positive, we report it as a bug. If it turns out to be a false positive, we mark the location of the false positive and ignore future false positives at this location.

Our evaluation on 10 OSS-Fuzz targets and 5 proprietary discovered 44 new bugs that we responsibly disclosed. Our implementation is competitive to the compiler-based one with massively increased compatibility. The source code of our sanitizer is available as open-source. Please reach out to us with any questions!

This work was a collaboration between Matteo Marini, Daniele Cono D'Elia, Mathias Payer, and Leonardo Querzoni. Matteo was the main PhD student working on the project. He and Daniele deserve the majority of the credit for this work.

links

social