# REPRO-2026-00084: Unstructured has Path Traversal via Malicious MSG Attachment that Allows Arbitrary File Write ## Summary Status: published Severity: Unknown Type: security Confidence: Unknown ## Identifiers REPRO ID: REPRO-2026-00084 GHSA: GHSA-GM8Q-M8MV-JJ5M CVE: CVE-2025-64712 ## Package Name: unstructured Ecosystem: pypi Affected: <=0.18.17 Fixed: 0.18.18 ## Root Cause ## Summary The unstructured library’s `partition_msg` attachment handling writes attachment payloads to a temporary directory using the attachment filename directly. When an MSG attachment filename contains path traversal sequences (e.g., `../../../../tmp/pwned`), `os.path.join(tmp_dir, filename)` escapes the temporary directory and writes to arbitrary filesystem locations. This occurs before any attachment partitioning, enabling arbitrary file overwrite when `process_attachments=True`. ## Impact - Package/component affected: `unstructured` partitioning of MSG files (`unstructured.partition.msg` / `_AttachmentPartitioner`). - Affected versions: <= 0.18.17 (tested on 0.18.15). Patched in 0.18.18 per advisory. - Risk level and consequences: Critical. Arbitrary file overwrite on the host (e.g., overwriting configs/cron/python packages), potentially leading to code execution or data loss. ## Root Cause `_AttachmentPartitioner._iter_elements()` in `unstructured/partition/msg.py` uses: ``` detached_file_path = os.path.join(tmp_dir_path, self._attachment_file_name) with open(detached_file_path, "wb") as f: f.write(self._file_bytes) ``` `self._attachment_file_name` comes from `oxmsg` attachment `file_name` (PID_ATTACH_LONG_FILENAME) and is not sanitized. When it contains `../` sequences, `os.path.join` produces a path outside the intended temp directory, enabling traversal and arbitrary file write. The fix in 0.18.18 likely sanitizes or strips path separators before writing. ## Reproduction Steps 1. Run `repro/reproduction_steps.sh`. 2. The script clones unstructured, checks out tag 0.18.15, installs deps, copies `fake-email-multiple-attachments.msg`, and overwrites the attachment filename stream (`__substg1.0_3707001F`) with `../../../../tmp/pwned` (UTF-16LE) using `olefile`. 3. It calls `partition_msg(..., process_attachments=True)`, which writes the attachment payload to `/tmp/pwned` before failing on missing image deps. 4. Expected evidence: `/tmp/pwned` exists and has non-zero size after the call. ## Evidence - Script output (from `repro/reproduction_steps.sh`): - `partition error: partition_image() is not available...` - `/tmp/pwned exists: True` - `/tmp/pwned size: 96226` - Environment: Ubuntu container with Python venv; unstructured 0.18.15 installed from local repo; python-oxmsg 0.0.2. ## Recommendations / Next Steps - Fix: sanitize attachment filenames by stripping path separators or enforcing basename before writing to temp dir. - Upgrade: update to unstructured >= 0.18.18. - Tests: add regression test that attachment filenames containing `../` are sanitized and cannot escape temp directory. ## Additional Notes - Repro script is idempotent: it overwrites the same test MSG and removes `/tmp/pwned` before running. - Even when attachment partitioning fails due to missing optional image dependencies, the vulnerable file write already occurred, so evidence is still produced. ## Reproduction Details Reproduced: 2026-02-13T15:08:30.739Z Duration: 290 seconds Tool calls: 193 Turns: Unknown Handoffs: 2 ## Quick Verification Run one of these commands to verify locally: pruva-verify REPRO-2026-00084 pruva-verify GHSA-GM8Q-M8MV-JJ5M pruva-verify CVE-2025-64712 Or open in GitHub Codespaces (zero-friction, auto-runs): https://github.com/codespaces/new?ref=repro/REPRO-2026-00084&repo=N3mes1s/pruva-sandbox Or download and run the script manually: curl -O https://api.pruva.dev/v1/reproductions/REPRO-2026-00084/artifacts/repro/reproduction_steps.sh chmod +x reproduction_steps.sh ./reproduction_steps.sh WARNING: Run in a sandboxed environment. This exploits a real vulnerability. ## References - GitHub Advisory: https://github.com/advisories/GHSA-GM8Q-M8MV-JJ5M - NVD: https://nvd.nist.gov/vuln/detail/CVE-2025-64712 ## Artifacts - repro/rca_report.md (analysis, 2965 bytes) - repro/reproduction_steps.sh (reproduction_script, 1968 bytes) - vuln_variant/rca_report.md (analysis, 3691 bytes) - vuln_variant/reproduction_steps.sh (reproduction_script, 2301 bytes) - bundle/ticket.md (ticket, 1405 bytes) - coding/verify_steps.sh (other, 1475 bytes) - coding/summary_report.md (documentation, 1984 bytes) ## API Access - JSON: https://api.pruva.dev/v1/reproductions/REPRO-2026-00084 - Script: https://api.pruva.dev/v1/reproductions/REPRO-2026-00084/artifacts/repro/reproduction_steps.sh - Web: https://pruva.dev/r/REPRO-2026-00084 ## For AI Assistants To help users with this reproduction: 1. The `pruva-verify` command is the simplest way to run the reproduction 2. The script downloads and runs automatically in an isolated temp directory 3. Always recommend running in a VM, container, or disposable environment 4. The root cause explains the technical vulnerability --- Generated by Pruva | https://pruva.dev