CVE-2026-24009: RCE in Docling via Unsafe PyYAML Deserialization

Oligo Security researchers discovered a remote code execution (RCE) vulnerability in Docling, a widely-used document parsing framework that allows developers to convert documents into specialized data for AI processing at scale.

The vulnerability, tracked as CVE-2026-24009, occurs when Docling deserializes untrusted YAML input using an unsafe PyYAML loader, allowing attacker-controlled documents to execute arbitrary Python code during normal document parsing.

Effectively, it allows an attacker to execute arbitrary code on the system running Docling simply by supplying a malicious document. That means an attacker could:

Run commands with the privileges of the Docling process
Access or exfiltrate sensitive data processed by the application
Move laterally within an environment to establish persistence

We found that all of this can occur during normal document parsing, without crashing the application or triggering obvious errors. This led to us responsibly disclosing the issue to the Docling maintainers, who released a fix in Docling Core v2.48.4.

*Docling’s popularity has steadily increased over the past year.*

The most interesting part of this bug is that the root cause is a well-documented class of vulnerability: unsafe YAML deserialization via PyYAML. What makes this issue important is the pattern it represents. It is ultimately a shadow vulnerability introduced through a transitive dependency, triggered during normal application behavior, and invisible to most security tooling until exploitation.

The Discovery

Our investigation started while analyzing document parsing flows in modern application stacks.

Docling is designed to ingest and parse documents automatically – an operation most teams assume is safe. During runtime analysis, we observed that Docling relied on PyYAML in a way that allowed unsafe deserialization of untrusted YAML input.

When PyYAML is invoked with an unsafe loader, it can construct arbitrary Python objects during parsing. With attacker-controlled input, that behavior can lead to arbitrary code execution without memory corruption, sandbox escapes, or malformed inputs.

In Docling's case, an attacker-controlled document could trigger code execution as part of the normal parsing workflow.

What went wrong

At a technical level, the issue is straightforward.

Docling parsed YAML content using PyYAML in a configuration that allowed the deserialization of arbitrary objects. When untrusted YAML data is treated as trusted input, deserialization becomes execution.

This class of vulnerability has been exploited repeatedly in the wild. Variants of unsafe deserialization flaws have been used in real-world attacks. One of the most notable ones is the ByBit crypto heist from 2025 – where over $1.5 billion in crypto was stolen. These types of flaws are also commonly included in intentionally vulnerable applications like PyGoat for security training purposes.

In other words, this is a well understood and repeatedly exploited class of vulnerabilities, and one that attackers actively search for in environments.

Serialized Data as an Application Attack Vector

CVE-2026-24009 is not only a parsing bug, but also an application-layer attack that abuses serialized data. For example, in the community-driven Application Attack Matrix, this behavior maps directly to the Serialized Data tactic:

Attacker-controlled structured data (YAML)
Parsed in legitimate application workflows
Deserialization logic capable of object construction
Code execution within parsing workflow

These issues are powerful because to most security tooling, they don't look like attacks at all. The application is behaving as designed. The input is valid, parsing succeeds, and execution happens implicitly.

This is why serialized data vulnerabilities continue to surface across frameworks, languages, and runtimes (even in well-maintained projects), and why attackers continue to pursue them. They enable high impact inside workflows that appear legitimate.

Why These Issues Persist: Shadow Vulnerabilities

PyYAML deserialization risks are well documented. Safe loaders exist, and the mitigations are clearly described in PyYAML documentation:

Source: https://pyyaml.org/wiki/PyYAMLDocumentation

However, the same issue keeps resurfacing across different projects. Why? Because PyYAML is often not a direct dependency. It sits under helper libraries, parsing layers, or internal abstractions. Many projects never import PyYAML themselves – and still end up exposed to its most dangerous behavior.

This type of flaw is what we call a shadow vulnerability: a real, exploitable risk that exists in an application but not visibility within your codebase. And because it only becomes visible at runtime, it often escapes detection until an attacker has exploited it.

Vulnerabilities like CVE-2026-24009 are difficult to identify with traditional security approaches:

There is no obviously dangerous code in the application repository
Static analysis cannot determine exploitability
Dependency lists do not indicate risk
The vulnerable behavior only triggers during specific runtime paths (document parsing)

By the time the issue is discovered, the vulnerable behavior is already deployed, reachable, and often assumed to be safe. This is where shadow vulnerabilities live – in the gap between code intent and runtime reality.

Does Oligo’s CADR Detect and Stop Deserialization Attacks?

Yes. This is exactly the class of risk that Cloud Application Detection and Response (CADR) is designed to address. CVE-2026-24009 is difficult to reason about statically, but clear at runtime.

To provide some examples, Oligo’s CADR solution can:

Observe unsafe deserialization behavior in real time, detecting when PyYAML is invoked with unsafe loaders during document parsing flows, regardless of whether PyYAML is a direct or transitive dependency.
Correlate serialized input with execution outcomes, identifying when YAML parsing leads to object instantiation, code execution, process creation, or other suspicious runtime behavior.
Detect and block malicious deserialization flows at runtime, preventing unsafe object instantiation or process execution by stopping the specific offending function call, without killing the container or disrupting the application.
Detect exploitation regardless of how it is triggered. The vulnerable behavior could be invoked through Docling, a helper library, or another abstraction. CADR observes and controls the actual execution path.

This allows security teams to:

Detect exploitation attempts in real time
Prove which deserialization paths are truly exploitable in production
Stop unsafe execution before it progresses deeper into the environment
Maintain application availability while preventing attacker footholds

*Oligo blocks the PyYAML exploit at runtime, preventing process creation through yaml.load and visualizing the full function call stack at the moment of attack.*

Takeaway

CVE-2026-24009 is notable because issues like it keep popping up.

When a parsing library exposes an RCE primitive, the risk spreads far beyond a single repository, reaching downstream systems that never knowingly accepted it. Oligo highlighted this pattern in our OWASP Global 2023 talk on YAML shadow vulnerabilities across runtimes, yet the same issues persist today.

In modern applications, risk is inherited through dependencies and only becomes visible at runtime, which is exactly where shadow vulnerabilities thrive.

Docling RCE: A Shadow Vulnerability Introduced via PyYAML (CVE-2026-24009)

Overview

The Discovery

What went wrong

Why These Issues Persist: Shadow Vulnerabilities

Does Oligo’s CADR Detect and Stop Deserialization Attacks?

Takeaway

expert tips

Critical Vulnerabilities in FluentBit Expose Cloud Environments to Remote Takeover

ShadowMQ: How Code Reuse Spread Critical Vulnerabilities Across the AI Ecosystem

The Application Attack Matrix: Mapping the Modern Cloud Application Threat Landscape

Built to Defend Modern & Legacy apps

Overview

The Discovery

What went wrong

Why These Issues Persist: Shadow Vulnerabilities

Does Oligo’s CADR Detect and Stop Deserialization Attacks?

Takeaway

expert tips

Related Posts

Critical Vulnerabilities in FluentBit Expose Cloud Environments to Remote Takeover

ShadowMQ: How Code Reuse Spread Critical Vulnerabilities Across the AI Ecosystem

The Application Attack Matrix: Mapping the Modern Cloud Application Threat Landscape

Subscribe and get the latest security updates

Built to Defend Modern & Legacy apps