# REPRO-2026-00086: RAGFlow MinerU parser Zip Slip allows arbitrary file overwrite and potential RCE via malicious ZIP archives. ## Summary Status: published Severity: Unknown Type: security Confidence: Unknown ## Identifiers REPRO ID: REPRO-2026-00086 CVE: CVE-2026-24770 ## Package Name: ragflow (RAGFlow) Ecosystem: pip (per GitHub advisory) Affected: Versions prior to 0.23.1 (advisory text says 0.23.1 and possibly earlier; dbugs says prior to 0.23.1) Fixed: 0.23.1 (ticket); GitHub advisory lists patched versions: none; patch commit 64c75d558e4a17a4a48953b4c201526431d8338f ## Root Cause ## Summary RAGFlow versions prior to 0.23.1 contain a Zip Slip path traversal in the MinerU parser. The `_extract_zip_no_root` routine accepts ZIP member filenames verbatim and joins them with the extraction directory, allowing crafted ZIP entries to escape the intended destination and overwrite arbitrary files. By embedding an entry like `test//tmp/ragflow_zip_slip_pwned`, an attacker can cause the parser to write to `/tmp` when the ZIP is processed. ## Impact - **Component**: `deepdoc/parser/mineru_parser.py` (`MinerUParser._extract_zip_no_root`). - **Affected versions**: RAGFlow < 0.23.1 (verified on v0.23.0). - **Risk**: High. Arbitrary file overwrite enables potential RCE by replacing executables, configuration, or scripts executed by the service. ## Root Cause `MinerUParser._extract_zip_no_root` iterates ZIP entries, strips a presumed root folder, and concatenates the remaining path with the extraction directory using `os.path.join`. It does not normalize or validate the member path for traversal (`../`), absolute paths, or alternative separators. As a result, a filename like `test//tmp/ragflow_zip_slip_pwned` resolves outside the extraction root and gets written directly to `/tmp`. The fix in commit `64c75d558e4a17a4a48953b4c201526431d8338f` adds checks for absolute paths, traversal segments, symlinks, and enforces that the resolved path remains under the extraction directory. ## Reproduction Steps 1. Run `repro/reproduction_steps.sh`. 2. The script checks out RAGFlow v0.23.0, crafts a ZIP with a malicious entry, and calls `MinerUParser._extract_zip_no_root` to extract it. 3. Expected evidence: `/tmp/ragflow_zip_slip_pwned` is created with attacker-controlled content. ## Evidence - Script output (stdout): `"[+] Vulnerability reproduced: /tmp/ragflow_zip_slip_pwned created"`. - Evidence file: `/tmp/ragflow_zip_slip_pwned` exists after running the script. - Environment: Python packages installed via pip in the script (numpy, pdfplumber, Pillow, strenum, requests, beartype). ## Recommendations / Next Steps - Apply the upstream patch from commit `64c75d558e4a17a4a48953b4c201526431d8338f` (upgrade to >= 0.23.1). - Add path normalization and explicit checks to reject absolute paths, traversal segments, and symlink entries in ZIP archives. - Add regression tests that ensure ZIP entries cannot escape the extraction directory. ## Additional Notes - The reproduction script was executed twice successfully and is idempotent. - The script stubs `deepdoc.parser.pdf_parser` to avoid unrelated dependencies while exercising the vulnerable extraction routine directly. ## Reproduction Details Reproduced: 2026-02-13T15:21:08.787Z Duration: 499 seconds Tool calls: 112 Turns: Unknown Handoffs: 2 ## Quick Verification Run one of these commands to verify locally: pruva-verify REPRO-2026-00086 pruva-verify CVE-2026-24770 Or open in GitHub Codespaces (zero-friction, auto-runs): https://github.com/codespaces/new?ref=repro/REPRO-2026-00086&repo=N3mes1s/pruva-sandbox Or download and run the script manually: curl -O https://api.pruva.dev/v1/reproductions/REPRO-2026-00086/artifacts/repro/reproduction_steps.sh chmod +x reproduction_steps.sh ./reproduction_steps.sh WARNING: Run in a sandboxed environment. This exploits a real vulnerability. ## References - NVD: https://nvd.nist.gov/vuln/detail/CVE-2026-24770 - Source: https://nvd.nist.gov/vuln/detail/CVE-2026-24770 ## Artifacts - repro/rca_report.md (analysis, 2609 bytes) - repro/reproduction_steps.sh (reproduction_script, 2023 bytes) - vuln_variant/rca_report.md (analysis, 4312 bytes) - vuln_variant/reproduction_steps.sh (reproduction_script, 5083 bytes) - bundle/ticket.md (ticket, 2248 bytes) - vuln_variant/patch_analysis.md (documentation, 1934 bytes) - logs/vuln_variant_attempt.log (log, 207 bytes) ## API Access - JSON: https://api.pruva.dev/v1/reproductions/REPRO-2026-00086 - Script: https://api.pruva.dev/v1/reproductions/REPRO-2026-00086/artifacts/repro/reproduction_steps.sh - Web: https://pruva.dev/r/REPRO-2026-00086 ## For AI Assistants To help users with this reproduction: 1. The `pruva-verify` command is the simplest way to run the reproduction 2. The script downloads and runs automatically in an isolated temp directory 3. Always recommend running in a VM, container, or disposable environment 4. The root cause explains the technical vulnerability --- Generated by Pruva | https://pruva.dev