Oligo Security researchers discovered a remote code execution (RCE) vulnerability in Docling, a widely-used document parsing framework that allows developers to convert documents into specialized data for AI processing at scale.

The vulnerability, tracked as CVE-2026-24009, occurs when Docling deserializes untrusted YAML input using an unsafe PyYAML loader, allowing attacker-controlled documents to execute arbitrary Python code during normal document parsing.
Effectively, it allows an attacker to execute arbitrary code on the system running Docling simply by supplying a malicious document. That means an attacker could:
- Run commands with the privileges of the Docling process
- Access or exfiltrate sensitive data processed by the application
- Move laterally within an environment to establish persistence
We found that all of this can occur during normal document parsing, without crashing the application or triggering obvious errors. This led to us responsibly disclosing the issue to the Docling maintainers, who released a fix in Docling Core v2.48.4.

The most interesting part of this bug is that the root cause is a well-documented class of vulnerability: unsafe YAML deserialization via PyYAML. What makes this issue important is the pattern it represents. It is ultimately a shadow vulnerability introduced through a transitive dependency, triggered during normal application behavior, and invisible to most security tooling until exploitation.
The Discovery
Our investigation started while analyzing document parsing flows in modern application stacks.
Docling is designed to ingest and parse documents automatically – an operation most teams assume is safe. During runtime analysis, we observed that Docling relied on PyYAML in a way that allowed unsafe deserialization of untrusted YAML input.
When PyYAML is invoked with an unsafe loader, it can construct arbitrary Python objects during parsing. With attacker-controlled input, that behavior can lead to arbitrary code execution without memory corruption, sandbox escapes, or malformed inputs.
In Docling's case, an attacker-controlled document could trigger code execution as part of the normal parsing workflow.
What went wrong
At a technical level, the issue is straightforward.
Docling parsed YAML content using PyYAML in a configuration that allowed the deserialization of arbitrary objects. When untrusted YAML data is treated as trusted input, deserialization becomes execution.
This class of vulnerability has been exploited repeatedly in the wild. Variants of unsafe deserialization flaws have been used in real-world attacks. One of the most notable ones is the ByBit crypto heist from 2025 – where over $1.5 billion in crypto was stolen. These types of flaws are also commonly included in intentionally vulnerable applications like PyGoat for security training purposes.
In other words, this is a well understood and repeatedly exploited class of vulnerabilities, and one that attackers actively search for in environments.
Serialized Data as an Application Attack Vector
CVE-2026-24009 is not only a parsing bug, but also an application-layer attack that abuses serialized data. For example, in the community-driven Application Attack Matrix, this behavior maps directly to the Serialized Data tactic:
- Attacker-controlled structured data (YAML)
- Parsed in legitimate application workflows
- Deserialization logic capable of object construction
- Code execution within parsing workflow
These issues are powerful because to most security tooling, they don't look like attacks at all. The application is behaving as designed. The input is valid, parsing succeeds, and execution happens implicitly.
This is why serialized data vulnerabilities continue to surface across frameworks, languages, and runtimes (even in well-maintained projects), and why attackers continue to pursue them. They enable high impact inside workflows that appear legitimate.
Why These Issues Persist: Shadow Vulnerabilities
PyYAML deserialization risks are well documented. Safe loaders exist, and the mitigations are clearly described in PyYAML documentation:

However, the same issue keeps resurfacing across different projects. Why? Because PyYAML is often not a direct dependency. It sits under helper libraries, parsing layers, or internal abstractions. Many projects never import PyYAML themselves – and still end up exposed to its most dangerous behavior.
This type of flaw is what we call a shadow vulnerability: a real, exploitable risk that exists in an application but not visibility within your codebase. And because it only becomes visible at runtime, it often escapes detection until an attacker has exploited it.
Vulnerabilities like CVE-2026-24009 are difficult to identify with traditional security approaches:
- There is no obviously dangerous code in the application repository
- Static analysis cannot determine exploitability
- Dependency lists do not indicate risk
- The vulnerable behavior only triggers during specific runtime paths (document parsing)
By the time the issue is discovered, the vulnerable behavior is already deployed, reachable, and often assumed to be safe. This is where shadow vulnerabilities live – in the gap between code intent and runtime reality.
Does Oligo’s CADR Detect and Stop Deserialization Attacks?
Yes. This is exactly the class of risk that Cloud Application Detection and Response (CADR) is designed to address. CVE-2026-24009 is difficult to reason about statically, but clear at runtime.
To provide some examples, Oligo’s CADR solution can:
- Observe unsafe deserialization behavior in real time, detecting when PyYAML is invoked with unsafe loaders during document parsing flows, regardless of whether PyYAML is a direct or transitive dependency.
- Correlate serialized input with execution outcomes, identifying when YAML parsing leads to object instantiation, code execution, process creation, or other suspicious runtime behavior.
- Detect and block malicious deserialization flows at runtime, preventing unsafe object instantiation or process execution by stopping the specific offending function call, without killing the container or disrupting the application.
- Detect exploitation regardless of how it is triggered. The vulnerable behavior could be invoked through Docling, a helper library, or another abstraction. CADR observes and controls the actual execution path.
This allows security teams to:
- Detect exploitation attempts in real time
- Prove which deserialization paths are truly exploitable in production
- Stop unsafe execution before it progresses deeper into the environment
- Maintain application availability while preventing attacker footholds

Takeaway
CVE-2026-24009 is notable because issues like it keep popping up.
When a parsing library exposes an RCE primitive, the risk spreads far beyond a single repository, reaching downstream systems that never knowingly accepted it. Oligo highlighted this pattern in our OWASP Global 2023 talk on YAML shadow vulnerabilities across runtimes, yet the same issues persist today.
In modern applications, risk is inherited through dependencies and only becomes visible at runtime, which is exactly where shadow vulnerabilities thrive.






