What's the vulnerability?

A Path Traversal vulnerability in the partition_msg function allows an attacker to write or overwrite arbitrary files on the filesystem when processing malicious MSG files with attachments.

Root Cause Analysis

## Summary
The unstructured library’s `partition_msg` attachment handling writes attachment payloads to a temporary directory using the attachment filename directly. When an MSG attachment filename contains path traversal sequences (e.g., `../../../../tmp/pwned`), `os.path.join(tmp_dir, filename)` escapes the temporary directory and writes to arbitrary filesystem locations. This occurs before any attachment partitioning, enabling arbitrary file overwrite when `process_attachments=True`.

## Impact
- Package/component affected: `unstructured` partitioning of MSG files (`unstructured.partition.msg` / `_AttachmentPartitioner`).
- Affected versions: <= 0.18.17 (tested on 0.18.15). Patched in 0.18.18 per advisory.
- Risk level and consequences: Critical. Arbitrary file overwrite on the host (e.g., overwriting configs/cron/python packages), potentially leading to code execution or data loss.

## Root Cause
`_AttachmentPartitioner._iter_elements()` in `unstructured/partition/msg.py` uses:
```
detached_file_path = os.path.join(tmp_dir_path, self._attachment_file_name)
with open(detached_file_path, "wb") as f:
    f.write(self._file_bytes)
```
`self._attachment_file_name` comes from `oxmsg` attachment `file_name` (PID_ATTACH_LONG_FILENAME) and is not sanitized. When it contains `../` sequences, `os.path.join` produces a path outside the intended temp directory, enabling traversal and arbitrary file write. The fix in 0.18.18 likely sanitizes or strips path separators before writing.

## Reproduction Steps
1. Run `repro/reproduction_steps.sh`.
2. The script clones unstructured, checks out tag 0.18.15, installs deps, copies `fake-email-multiple-attachments.msg`, and overwrites the attachment filename stream (`__substg1.0_3707001F`) with `../../../../tmp/pwned` (UTF-16LE) using `olefile`.
3. It calls `partition_msg(..., process_attachments=True)`, which writes the attachment payload to `/tmp/pwned` before failing on missing image deps.
4. Expected evidence: `/tmp/pwned` exists and has non-zero size after the call.

## Evidence
- Script output (from `repro/reproduction_steps.sh`):
  - `partition error: <class 'ImportError'> partition_image() is not available...`
  - `/tmp/pwned exists: True`
  - `/tmp/pwned size: 96226`
- Environment: Ubuntu container with Python venv; unstructured 0.18.15 installed from local repo; python-oxmsg 0.0.2.

## Recommendations / Next Steps
- Fix: sanitize attachment filenames by stripping path separators or enforcing basename before writing to temp dir.
- Upgrade: update to unstructured >= 0.18.18.
- Tests: add regression test that attachment filenames containing `../` are sanitized and cannot escape temp directory.

## Additional Notes
- Repro script is idempotent: it overwrites the same test MSG and removes `/tmp/pwned` before running.
- Even when attachment partitioning fails due to missing optional image dependencies, the vulnerable file write already occurred, so evidence is still produced.
One Command

Verify with pruva-verify

Run the Pruva CLI to automatically fetch and execute the reproduction script.

pruva-verify REPRO-2026-00084
or pruva-verify GHSA-GM8Q-M8MV-JJ5M
or pruva-verify CVE-2025-64712
Install: curl -fsSL https://pruva.dev/install.sh | sh

Or Run Manually

1

Download the script

curl -O https://pruva.dev/api/v1/reproductions/REPRO-2026-00084/artifacts/reproduction_steps.sh
2

Make executable

chmod +x reproduction_steps.sh
3

Run the script

./reproduction_steps.sh
Run in a VM, container, or disposable environment. This exploits a real vulnerability.

How Pruva Reproduced This

Watch the AI agent's step-by-step process.

Loading session...

Artifacts

No artifacts available