What's the vulnerability?

RAGFlow MinerU parser Zip Slip allows arbitrary file overwrite and potential RCE via malicious ZIP archives.

Root Cause Analysis

## Summary
RAGFlow versions prior to 0.23.1 contain a Zip Slip path traversal in the MinerU parser. The `_extract_zip_no_root` routine accepts ZIP member filenames verbatim and joins them with the extraction directory, allowing crafted ZIP entries to escape the intended destination and overwrite arbitrary files. By embedding an entry like `test//tmp/ragflow_zip_slip_pwned`, an attacker can cause the parser to write to `/tmp` when the ZIP is processed.

## Impact
- **Component**: `deepdoc/parser/mineru_parser.py` (`MinerUParser._extract_zip_no_root`).
- **Affected versions**: RAGFlow < 0.23.1 (verified on v0.23.0).
- **Risk**: High. Arbitrary file overwrite enables potential RCE by replacing executables, configuration, or scripts executed by the service.

## Root Cause
`MinerUParser._extract_zip_no_root` iterates ZIP entries, strips a presumed root folder, and concatenates the remaining path with the extraction directory using `os.path.join`. It does not normalize or validate the member path for traversal (`../`), absolute paths, or alternative separators. As a result, a filename like `test//tmp/ragflow_zip_slip_pwned` resolves outside the extraction root and gets written directly to `/tmp`. The fix in commit `64c75d558e4a17a4a48953b4c201526431d8338f` adds checks for absolute paths, traversal segments, symlinks, and enforces that the resolved path remains under the extraction directory.

## Reproduction Steps
1. Run `repro/reproduction_steps.sh`.
2. The script checks out RAGFlow v0.23.0, crafts a ZIP with a malicious entry, and calls `MinerUParser._extract_zip_no_root` to extract it.
3. Expected evidence: `/tmp/ragflow_zip_slip_pwned` is created with attacker-controlled content.

## Evidence
- Script output (stdout): `"[+] Vulnerability reproduced: /tmp/ragflow_zip_slip_pwned created"`.
- Evidence file: `/tmp/ragflow_zip_slip_pwned` exists after running the script.
- Environment: Python packages installed via pip in the script (numpy, pdfplumber, Pillow, strenum, requests, beartype).

## Recommendations / Next Steps
- Apply the upstream patch from commit `64c75d558e4a17a4a48953b4c201526431d8338f` (upgrade to >= 0.23.1).
- Add path normalization and explicit checks to reject absolute paths, traversal segments, and symlink entries in ZIP archives.
- Add regression tests that ensure ZIP entries cannot escape the extraction directory.

## Additional Notes
- The reproduction script was executed twice successfully and is idempotent.
- The script stubs `deepdoc.parser.pdf_parser` to avoid unrelated dependencies while exercising the vulnerable extraction routine directly.
One Command

Verify with pruva-verify

Run the Pruva CLI to automatically fetch and execute the reproduction script.

pruva-verify REPRO-2026-00086
or pruva-verify CVE-2026-24770
Install: curl -fsSL https://pruva.dev/install.sh | sh

Or Run Manually

1

Download the script

curl -O https://pruva.dev/api/v1/reproductions/REPRO-2026-00086/artifacts/reproduction_steps.sh
2

Make executable

chmod +x reproduction_steps.sh
3

Run the script

./reproduction_steps.sh
Run in a VM, container, or disposable environment. This exploits a real vulnerability.

How Pruva Reproduced This

Watch the AI agent's step-by-step process.

Loading session...

Artifacts

No artifacts available