# REPRO-2026-00080: Docling-core YAML Deserialization RCE via FullLoader ## Summary Status: published Severity: high Type: security Confidence: Unknown ## Identifiers REPRO ID: REPRO-2026-00080 GHSA: GHSA-VQXF-V2GG-X3HC CVE: CVE-2026-24009 ## Package Name: docling-core Ecosystem: pip Affected: >= 2.21.0, < 2.48.4 Fixed: 2.48.4 ## Root Cause ## Summary `docling-core` versions 2.21.0 to 2.48.3 call `yaml.load(..., Loader=yaml.FullLoader)` in `DoclingDocument.load_from_yaml`, which allows unsafe object construction when PyYAML < 5.4 is installed. With a crafted YAML payload, PyYAML FullLoader evaluates attacker-controlled Python objects (CVE-2020-14343), leading to command execution before the document validation occurs. ## Impact - **Component:** `docling_core.types.doc.DoclingDocument.load_from_yaml` - **Affected versions:** docling-core >= 2.21.0, < 2.48.4 when used with PyYAML < 5.4 - **Risk level:** High — arbitrary command execution when parsing untrusted YAML - **Consequence:** An attacker can execute OS commands during YAML deserialization even if the resulting object fails validation. ## Root Cause `load_from_yaml` opens the provided YAML file and calls `yaml.load(f, Loader=yaml.FullLoader)`. In PyYAML 5.3.1, `FullLoader` still permits unsafe constructors such as `!!python/object/new` and `!!python/name`, which can be combined to invoke `eval` and execute OS commands (CVE-2020-14343). The deserialization executes before `DoclingDocument.model_validate` runs, so even if validation fails, the payload already executed. The fix in docling-core 2.48.4 switches to `yaml.SafeLoader`, which blocks these unsafe tags. ## Reproduction Steps 1. Run `repro/reproduction_steps.sh`. 2. The script creates a virtual environment, installs `docling-core==2.48.3` with `PyYAML==5.3.1`, writes a malicious YAML payload using `!!python/object/new`, then invokes `DoclingDocument.load_from_yaml`. 3. Evidence of reproduction is the creation of `logs/pwned.txt` containing the output of `id`. ## Evidence - **Log/artifact:** `logs/pwned.txt` - **Key output (from script):** - `VULNERABILITY CONFIRMED: marker file created at .../logs/pwned.txt` - Script prints a validation error after deserialization, demonstrating the payload executes before validation. - **Environment:** Python 3.12 venv with docling-core 2.48.3 and PyYAML 5.3.1 ## Recommendations / Next Steps - Upgrade to docling-core 2.48.4 or later, which uses `yaml.SafeLoader`. - If upgrading is not possible, explicitly use `yaml.safe_load` or `SafeLoader` when parsing untrusted YAML. - Add regression tests that feed malicious YAML payloads into `load_from_yaml` to ensure unsafe tags are rejected. ## Additional Notes - The reproduction script is idempotent and can be run multiple times; it overwrites the payload and marker file on each run. - Even though the YAML fails `DoclingDocument` validation, the exploit triggers during deserialization, so validation alone is insufficient protection. ## Reproduction Details Reproduced: 2026-02-13T13:17:05.775Z Duration: 361 seconds Tool calls: 58 Turns: Unknown Handoffs: 2 ## Quick Verification Run one of these commands to verify locally: pruva-verify REPRO-2026-00080 pruva-verify GHSA-VQXF-V2GG-X3HC pruva-verify CVE-2026-24009 Or open in GitHub Codespaces (zero-friction, auto-runs): https://github.com/codespaces/new?ref=repro/REPRO-2026-00080&repo=N3mes1s/pruva-sandbox Or download and run the script manually: curl -O https://api.pruva.dev/v1/reproductions/REPRO-2026-00080/artifacts/repro/reproduction_steps.sh chmod +x reproduction_steps.sh ./reproduction_steps.sh WARNING: Run in a sandboxed environment. This exploits a real vulnerability. ## References - GitHub Advisory: https://github.com/advisories/GHSA-VQXF-V2GG-X3HC - NVD: https://nvd.nist.gov/vuln/detail/CVE-2026-24009 ## Artifacts - repro/rca_report.md (analysis, 2642 bytes) - repro/reproduction_steps.sh (reproduction_script, 974 bytes) - vuln_variant/rca_report.md (analysis, 3528 bytes) - vuln_variant/reproduction_steps.sh (reproduction_script, 2846 bytes) - bundle/ticket.md (ticket, 2644 bytes) - logs/pwned.txt (other, 54 bytes) - logs/payload.yaml (other, 184 bytes) - vuln_variant/patch_analysis.md (documentation, 1114 bytes) - logs/vulnerable_attempt_1.log (log, 240 bytes) - logs/vulnerable_attempt_2.yaml (other, 205 bytes) - logs/vulnerable_attempt_3.yaml (other, 164 bytes) - logs/vulnerable_summary.log (log, 362 bytes) - logs/vulnerable_attempt_1.yaml (other, 131 bytes) - logs/fixed_attempt_3.log (log, 249 bytes) - logs/vulnerable_attempt_3.log (log, 254 bytes) - logs/fixed_attempt_1.log (log, 235 bytes) - logs/fixed_summary.log (log, 236 bytes) - logs/fixed_attempt_1.yaml (other, 126 bytes) - logs/bypass_result.log (log, 59 bytes) - logs/vulnerable_attempt_2.log (log, 275 bytes) - logs/fixed_attempt_2.yaml (other, 200 bytes) - logs/fixed_attempt_2.log (log, 229 bytes) - logs/fixed_attempt_3.yaml (other, 159 bytes) - logs/vulnerable_attempt_2_pwned.txt (other, 54 bytes) ## API Access - JSON: https://api.pruva.dev/v1/reproductions/REPRO-2026-00080 - Script: https://api.pruva.dev/v1/reproductions/REPRO-2026-00080/artifacts/repro/reproduction_steps.sh - Web: https://pruva.dev/r/REPRO-2026-00080 ## For AI Assistants To help users with this reproduction: 1. The `pruva-verify` command is the simplest way to run the reproduction 2. The script downloads and runs automatically in an isolated temp directory 3. Always recommend running in a VM, container, or disposable environment 4. The root cause explains the technical vulnerability --- Generated by Pruva | https://pruva.dev