Sanitizers serve as the primary bug detection Oracle during automated testing.
They "crash" the program gracefully and tell the fuzzer when and where a bug was
triggered. The most well-known sanitizer is ASan or AddressSanitizer which adds
redzones around memory objects to detect whenever an access is out-of-bounds.
MSan or MemorySanitizer detects access to uninitialized memory. Upon allocation,
all memory of an object is marked as uninitialized. When written, the memory is
marked as initialized. For each read, the sanitizer validates that the
underlying memory was properly initialized.
Compared to ASan, MSan faces a key challenge: all code must be instrumented,
otherwise, some memory initialization may be missed resulting in false positive
crashes. For ASan, missing instrumentation only results in false negatives
(i.e., some bugs will be missed) but MSan is prone to both false positives and
false negatives which so far hindered MSan's deployment. Therefore, only 210 out
of 528 targets of OSS-Fuzz were fuzzed with MSan.
In our QMSan project, we develop a
mechanism that reduces these false positives, creating an efficient MSan
sanitizer for fuzzing environments. Our system uses two key ideas to reduce
false positives: first, we use binary instrumentation to track all writes and
second, we build a two-stage approach to massively reduce the cost of tracking
initialized data.
The first contribution is simple: instead of using a compiler-pass to add
instrumentation during compilation, we add the instrumentation through QEMU when
code is being executed. The QEMU binary instrumentation engine allows us to
analyze and instrument all code as it is being executed and gives us complete
support for all code. Binary instrumentation comes at a slight overhead to
compiler-based instrumentation but the reduction of false positives is worth the
trade-off.

The second contribution is geared towards making MSan efficient for fuzzing.
MSan requires that the state of memory (initialized or not) is copied whenever
data is copied. This requires an expensive propagation of metadata information
whenever data is read and stored in other areas of memory. Our key idea here was
that fuzzing allows us to replay executions as we have the exact input. For the
majority of executions, we only track reads and writes but ignore shadow
propagation (that is otherwise required on nearly all other instructions). If
our sanitizer detects a crash, we call i ta potential violation and replay the
same input with the full sanitizer that extensively propagates memory. If it
turns out to be true positive, we report it as a bug. If it turns out to be a
false positive, we mark the location of the false positive and ignore future
false positives at this location.
Our evaluation on 10 OSS-Fuzz targets and 5 proprietary discovered 44 new bugs
that we responsibly disclosed. Our implementation is competitive to the
compiler-based one with massively increased compatibility. The
source code of our sanitizer is available
as open-source. Please reach out to us with any questions!
This work was a collaboration between Matteo Marini, Daniele Cono D'Elia,
Mathias Payer, and Leonardo Querzoni. Matteo was the main PhD student working on
the project. He and Daniele deserve the majority of the credit for this work.