In the last 3-4 months, AI models have made an immense jump in exploitation
capabilities. Several talks and
blog posts highlight the "new"
capabilities of frontier AI models.
The agents have learned from
countless CTF writeups, research papers on exploitation techniques, and
conference talks/demonstrations on how to automate diverse techniques.
In an agentic workflow, these models are
incredibly skilled at advanced exploitation.
The key finding is that they render tedious mitigations useless as the agent can
incrementally improve the exploit on its own thereby weaponizing
proof-of-concept crashes into full chains.
"Early" LLM systems (i.e., in 2025) were used as a static analysis. Essentially
telling the chat bot to find bugs in a given file and to write a vulnerability
report. This worked, sometimes, very well but, most of the time, produced AI
slop that was not instantiateable. Bug bounty programs were flooded with this
unvalidated slop and many open source developers spoke out against this type of
contribution.
But with the rise of agentic workflows, LLMs can leverage feedback to improve.
A simple workflow, similar to the one presented by Nicholas in his unprompted
talk breaks down the exploit synthesis into several steps.
Seeding reports: The first step is to seeding potential bug candidates and
asking the first agent to go, one module at a time, one file at a time, through
the source base and to find potential security bugs. One may have to tell it
that this is for defense purposes to side step some ethic mitigations but
generally the LLMs comply well with this task. This will result in hundreds of
mostly incomplete vulnerability reports for the average project. Instead of
submitting them to bug bounty programs and DoSing developers, the next agent
improves the reports. This step is most like a static analysis.
Reaching locations: The second agent instantiates the PoCs. Based on the
vulnerability reports, the agent tries to infer a path that reaches the bug
location. This will result in a few vulnerability reports with reachable bug
locations and a set of false positives. This under-approximation already
leverages a concrete execution environment in which the agent can validate its
findings. This step is comparable to a poor man's symbolic execution engine.
Triggering the bug: The third agent tries to trigger the bug based on probable
bugs from the previous steps. The advantage is that the earlier agent has
already validated that the location is reachable and this agent can not
concentrate on the mutation of the seed to trigger the bug. This step results
in a few validates crashes and is akin to a fuzzer.
Exploitation: The fourth step takes the initial PoC crashes, analyzes them and
bypasses any deployed mitigations. Some of the frontier models try to hold back
in this step and one may have to convince its agent that they are playing a CTF.
The key advantage of LLMs and AI in this environment is the automation. This
simple four-stage pipeline combines code review (static analysis), path
synthesis (symbolic execution), seed mutation (fuzzing), and exploitation
(bypassing mitigations). All of these steps required significant manual analysis
before and are now promptable.
In their current state, LLM agents favor attackers. On one hand, this will
destroy the market for exploits as they now becomes a cheap commodity. On the
other hand, developers will be flooded with validated bug reports. Initially,
this will be tough but hopefully, as projects embrace LLM-based bug search, we
will see an improvement in code quality. The area I'm most excited about is how
to further improve defensive capabilities and, most essentially, how to automate
patching of the discovered bugs. Let me know if you have ideas!