# REPRO-2026-00080: Docling-core YAML Deserialization RCE via FullLoader

## Summary
Status: published
Severity: high
Type: security
Confidence: Unknown

## Identifiers
REPRO ID: REPRO-2026-00080
GHSA: GHSA-VQXF-V2GG-X3HC
CVE: CVE-2026-24009

## Package
Name: docling-core
Ecosystem: pip
Affected: >= 2.21.0, < 2.48.4
Fixed: 2.48.4

## Root Cause
## Summary
`docling-core` versions 2.21.0 to 2.48.3 call `yaml.load(..., Loader=yaml.FullLoader)` in `DoclingDocument.load_from_yaml`, which allows unsafe object construction when PyYAML < 5.4 is installed. With a crafted YAML payload, PyYAML FullLoader evaluates attacker-controlled Python objects (CVE-2020-14343), leading to command execution before the document validation occurs.

## Impact
- **Component:** `docling_core.types.doc.DoclingDocument.load_from_yaml`
- **Affected versions:** docling-core >= 2.21.0, < 2.48.4 when used with PyYAML < 5.4
- **Risk level:** High — arbitrary command execution when parsing untrusted YAML
- **Consequence:** An attacker can execute OS commands during YAML deserialization even if the resulting object fails validation.

## Root Cause
`load_from_yaml` opens the provided YAML file and calls `yaml.load(f, Loader=yaml.FullLoader)`. In PyYAML 5.3.1, `FullLoader` still permits unsafe constructors such as `!!python/object/new` and `!!python/name`, which can be combined to invoke `eval` and execute OS commands (CVE-2020-14343). The deserialization executes before `DoclingDocument.model_validate` runs, so even if validation fails, the payload already executed. The fix in docling-core 2.48.4 switches to `yaml.SafeLoader`, which blocks these unsafe tags.

## Reproduction Steps
1. Run `repro/reproduction_steps.sh`.
2. The script creates a virtual environment, installs `docling-core==2.48.3` with `PyYAML==5.3.1`, writes a malicious YAML payload using `!!python/object/new`, then invokes `DoclingDocument.load_from_yaml`.
3. Evidence of reproduction is the creation of `logs/pwned.txt` containing the output of `id`.

## Evidence
- **Log/artifact:** `logs/pwned.txt`
- **Key output (from script):**
  - `VULNERABILITY CONFIRMED: marker file created at .../logs/pwned.txt`
  - Script prints a validation error after deserialization, demonstrating the payload executes before validation.
- **Environment:** Python 3.12 venv with docling-core 2.48.3 and PyYAML 5.3.1

## Recommendations / Next Steps
- Upgrade to docling-core 2.48.4 or later, which uses `yaml.SafeLoader`.
- If upgrading is not possible, explicitly use `yaml.safe_load` or `SafeLoader` when parsing untrusted YAML.
- Add regression tests that feed malicious YAML payloads into `load_from_yaml` to ensure unsafe tags are rejected.

## Additional Notes
- The reproduction script is idempotent and can be run multiple times; it overwrites the payload and marker file on each run.
- Even though the YAML fails `DoclingDocument` validation, the exploit triggers during deserialization, so validation alone is insufficient protection.


## Reproduction Details
Reproduced: 2026-02-13T13:17:05.775Z
Duration: 361 seconds
Tool calls: 58
Turns: Unknown
Handoffs: 2


## Quick Verification
Run one of these commands to verify locally:

    pruva-verify REPRO-2026-00080
    pruva-verify GHSA-VQXF-V2GG-X3HC
    pruva-verify CVE-2026-24009

Or open in GitHub Codespaces (zero-friction, auto-runs):

    https://github.com/codespaces/new?ref=repro/REPRO-2026-00080&repo=N3mes1s/pruva-sandbox

Or download and run the script manually:

    curl -O https://api.pruva.dev/v1/reproductions/REPRO-2026-00080/artifacts/repro/reproduction_steps.sh
    chmod +x reproduction_steps.sh
    ./reproduction_steps.sh

WARNING: Run in a sandboxed environment. This exploits a real vulnerability.

## References
- GitHub Advisory: https://github.com/advisories/GHSA-VQXF-V2GG-X3HC
- NVD: https://nvd.nist.gov/vuln/detail/CVE-2026-24009


## Artifacts
- repro/rca_report.md (analysis, 2642 bytes)
- repro/reproduction_steps.sh (reproduction_script, 974 bytes)
- vuln_variant/rca_report.md (analysis, 3528 bytes)
- vuln_variant/reproduction_steps.sh (reproduction_script, 2846 bytes)
- bundle/ticket.md (ticket, 2644 bytes)
- logs/pwned.txt (other, 54 bytes)
- logs/payload.yaml (other, 184 bytes)
- vuln_variant/patch_analysis.md (documentation, 1114 bytes)
- logs/vulnerable_attempt_1.log (log, 240 bytes)
- logs/vulnerable_attempt_2.yaml (other, 205 bytes)
- logs/vulnerable_attempt_3.yaml (other, 164 bytes)
- logs/vulnerable_summary.log (log, 362 bytes)
- logs/vulnerable_attempt_1.yaml (other, 131 bytes)
- logs/fixed_attempt_3.log (log, 249 bytes)
- logs/vulnerable_attempt_3.log (log, 254 bytes)
- logs/fixed_attempt_1.log (log, 235 bytes)
- logs/fixed_summary.log (log, 236 bytes)
- logs/fixed_attempt_1.yaml (other, 126 bytes)
- logs/bypass_result.log (log, 59 bytes)
- logs/vulnerable_attempt_2.log (log, 275 bytes)
- logs/fixed_attempt_2.yaml (other, 200 bytes)
- logs/fixed_attempt_2.log (log, 229 bytes)
- logs/fixed_attempt_3.yaml (other, 159 bytes)
- logs/vulnerable_attempt_2_pwned.txt (other, 54 bytes)

## API Access
- JSON: https://api.pruva.dev/v1/reproductions/REPRO-2026-00080
- Script: https://api.pruva.dev/v1/reproductions/REPRO-2026-00080/artifacts/repro/reproduction_steps.sh
- Web: https://pruva.dev/r/REPRO-2026-00080

## For AI Assistants
To help users with this reproduction:
1. The `pruva-verify` command is the simplest way to run the reproduction
2. The script downloads and runs automatically in an isolated temp directory
3. Always recommend running in a VM, container, or disposable environment
4. The root cause explains the technical vulnerability

---
Generated by Pruva | https://pruva.dev