# REPRO-2026-00119: PyTorch: weights_only Unpickler RCE via SETITEM Type Confusion

## Summary
Status: published
Severity: high
Type: security
Confidence: Unknown

## Identifiers
REPRO ID: REPRO-2026-00119
GHSA: GHSA-63cw-57p8-fm3p
CVE: CVE-2026-24747

## Package
Name: torch
Ecosystem: pip
Affected: <=2.9.1
Fixed: 2.10.0

## Root Cause
# RCA Report: CVE-2026-24747 — PyTorch weights_only Unpickler Memory Corruption

## Summary

CVE-2026-24747 is a high-severity vulnerability (CVSS 8.8) in PyTorch's `weights_only` unpickler that allows an attacker to craft a malicious checkpoint file (`.pth`) which, when loaded with `torch.load(..., weights_only=True)`, corrupts heap memory and can potentially lead to arbitrary code execution. The vulnerability exists because the `SETITEM` and `SETITEMS` pickle opcodes in `torch/_weights_only_unpickler.py` perform **no type check** on the target object before calling `__setitem__`. This allows an attacker to invoke `Tensor.__setitem__()` through the pickle stream, writing arbitrary float values directly into tensor storage memory on the heap.

## Impact

- **Package/component affected:** PyTorch (`torch`), specifically `torch/_weights_only_unpickler.py`
- **Affected versions:** All PyTorch versions prior to 2.10.0 (confirmed on 2.9.1)
- **Patched version:** PyTorch 2.10.0
- **Risk level:** HIGH (CVSS 8.8 — Network/Low complexity/No privileges/User interaction required)
- **Consequences:**
  - Heap memory corruption via controlled writes to tensor storage
  - The `weights_only=True` safety feature, designed to prevent pickle-based code execution, is bypassed
  - An attacker who distributes a malicious `.pth` model file can corrupt arbitrary heap memory in the victim's process
  - This memory corruption primitive can potentially be chained with heap layout techniques to achieve arbitrary code execution
  - Particularly dangerous in ML pipelines where model checkpoints are routinely downloaded from public repositories (Hugging Face, GitHub, model zoos)

## Root Cause

The root cause is the **absence of type checking** in the `SETITEM` and `SETITEMS` opcode handlers within the `Unpickler` class in `torch/_weights_only_unpickler.py`.

### Vulnerable Code (PyTorch 2.9.1)

In the `Unpickler.load()` method:

```python
# SETITEM handler (line ~440)
elif key[0] == SETITEM[0]:
    (v, k) = (self.stack.pop(), self.stack.pop())
    self.stack[-1][k] = v        # <-- NO TYPE CHECK!

# SETITEMS handler (line ~443)
elif key[0] == SETITEMS[0]:
    items = self.pop_mark()
    for i in range(0, len(items), 2):
        self.stack[-1][items[i]] = items[i + 1]  # <-- NO TYPE CHECK!
```

The code performs `self.stack[-1][k] = v` without verifying that `self.stack[-1]` is a dictionary type. In normal pickle usage, `SETITEM`/`SETITEMS` are used to populate dictionaries. However, the restricted unpickler allows construction of Tensor objects via `_rebuild_tensor_v2`, and if a Tensor ends up as the top-of-stack when SETITEM executes, it invokes `Tensor.__setitem__(key, value)`.

`Tensor.__setitem__` writes float values directly to the tensor's underlying storage buffer, which is heap-allocated. This gives the attacker a **controlled heap write primitive**: they can write arbitrary float values at specific indices within the tensor's storage region.

### Attack Flow

1. Attacker crafts a `.pth` (zip) file containing a pickle payload
2. The pickle uses `GLOBAL` + `REDUCE` to construct a Tensor via `_rebuild_tensor_v2` (this is allowed)
3. Before the Tensor is consumed by a dict SETITEM, the pickle inserts additional `SETITEMS` opcodes that target the Tensor on the stack
4. Each `SETITEMS` pair `(index, value)` calls `tensor[index] = value`, writing to heap memory
5. The victim loads this file with `torch.load("file.pth", weights_only=True)` — the `weights_only=True` flag is supposed to prevent code execution, but this bypass circumvents the protection

### Fix

The fix in PyTorch 2.10.0 adds type checking to `SETITEM`/`SETITEMS`, ensuring they can only operate on dictionary types (`dict`, `OrderedDict`), not on Tensor or other arbitrary objects.

- **CVE:** [CVE-2026-24747](https://nvd.nist.gov/vuln/detail/CVE-2026-24747)
- **Advisory:** [GHSA-63cw-57p8-fm3p](https://github.com/pytorch/pytorch/security/advisories/GHSA-63cw-57p8-fm3p)
- **Fix release:** [PyTorch v2.10.0](https://github.com/pytorch/pytorch/releases/tag/v2.10.0)

## Reproduction Steps

1. Run `repro/reproduction_steps.sh` which:
   - Installs PyTorch 2.9.1 (CPU, vulnerable version)
   - Verifies the vulnerable SETITEM handler has no type check
   - Crafts a malicious `.pth` checkpoint file with pickle bytecode that:
     - Constructs a Tensor via the allowed `_rebuild_tensor_v2` path
     - Uses `SETITEMS` opcode to write 10 attacker-controlled float values to the Tensor
   - Loads the malicious checkpoint with `torch.load(..., weights_only=True)`
   - Verifies that all 10 controlled values were written successfully

2. Expected output: `VULNERABILITY_CONFIRMED` — all 10 magic values match (1337.0, 31337.0, 42.0, 0xDEAD, 0xBEEF, 0xCAFE, 0xBABE, 0xFACE, 9999.99, 12345.0)

## Evidence

### Vulnerable Code Path

The `SETITEM`/`SETITEMS` handlers in the weights_only unpickler at `torch/_weights_only_unpickler.py` perform `self.stack[-1][k] = v` without any type check. This allows calling `Tensor.__setitem__()` through the pickle stream.

### Exploit Output

```
[+] torch.load succeeded!
[+] Result type: <class 'dict'>
[+] Keys: ['malicious_weights']
[+] Tensor shape: torch.Size([10]), dtype: torch.float32
[+] Tensor values: tensor([1.3370e+03, 3.1337e+04, 4.2000e+01, 5.7005e+04, 4.8879e+04, ...])

[*] Verifying attacker-controlled memory writes:
    tensor[0] =       1337.0  (expected       1337.0) [MATCH]
    tensor[1] =      31337.0  (expected      31337.0) [MATCH]
    tensor[2] =         42.0  (expected         42.0) [MATCH]
    ...all 10 values MATCH...

[+] VULNERABILITY CONFIRMED: CVE-2026-24747
[+] SETITEMS opcode called __setitem__ on a Tensor object
[+] without any type check in the weights_only unpickler.
[+] Attacker wrote 10 controlled values to tensor memory.
```

### Environment

- PyTorch 2.9.1+cpu
- Python 3.12
- CPU-only (no CUDA required)
- Linux x86_64

## Recommendations / Next Steps

1. **Immediate fix:** Upgrade PyTorch to version 2.10.0 or later
2. **Fix approach:** Add type checking to `SETITEM`/`SETITEMS` handlers:
   ```python
   elif key[0] == SETITEM[0]:
       (v, k) = (self.stack.pop(), self.stack.pop())
       if type(self.stack[-1]) not in (dict, OrderedDict):
           raise UnpicklingError(
               f"Can only SETITEM on dict/OrderedDict, but got {type(self.stack[-1])}"
           )
       self.stack[-1][k] = v
   ```
3. **Defense in depth:** Organizations should validate the integrity (hash verification) of all `.pth` checkpoint files before loading
4. **Consider migration:** Use `safetensors` format for model distribution instead of pickle-based `.pth` files
5. **Variant analysis:** The `BUILD` opcode handler also has potential issues — while `OrderedDict.__dict__.update(state)` can't override `__setitem__` (a C-slot method), other BUILD targets or new allowlisted types could introduce similar bypass opportunities

## Additional Notes

- **Idempotency:** The reproduction script runs consistently on repeated executions. Confirmed with two consecutive successful runs.
- **Limitations:** 
  - The exploit demonstrates the memory corruption primitive (controlled writes to tensor storage). Converting this to direct arbitrary code execution would require heap spraying techniques that are environment-dependent and non-deterministic.
  - The CVE itself classifies this as "can corrupt memory and **potentially** lead to arbitrary code execution" — the memory corruption primitive is the core vulnerability.
  - The `weights_only=True` parameter was specifically designed as a safety measure against pickle-based attacks, making this bypass particularly impactful from a trust boundary perspective.
- **Storage size mismatch:** The CVE also mentions "storage size mismatch between declared element count and actual data." This is validated at the storage loading level by PyTorch's zip reader, but the SETITEM bypass is independently exploitable.


## Reproduction Details
Reproduced: 2026-03-02T08:52:57.271Z
Duration: 2888 seconds
Tool calls: 178
Turns: Unknown
Handoffs: 3


## Quick Verification
Run one of these commands to verify locally:

    pruva-verify REPRO-2026-00119
    pruva-verify GHSA-63cw-57p8-fm3p
    pruva-verify CVE-2026-24747

Or open in GitHub Codespaces (zero-friction, auto-runs):

    https://github.com/codespaces/new?ref=repro/REPRO-2026-00119&repo=N3mes1s/pruva-sandbox

Or download and run the script manually:

    curl -O https://api.pruva.dev/v1/reproductions/REPRO-2026-00119/artifacts/repro/reproduction_steps.sh
    chmod +x reproduction_steps.sh
    ./reproduction_steps.sh

WARNING: Run in a sandboxed environment. This exploits a real vulnerability.

## References
- GitHub Advisory: https://github.com/advisories/GHSA-63cw-57p8-fm3p
- NVD: https://nvd.nist.gov/vuln/detail/CVE-2026-24747


## Artifacts
- repro/rca_report.md (analysis, 7962 bytes)
- repro/reproduction_steps.sh (reproduction_script, 9248 bytes)
- vuln_variant/rca_report.md (analysis, 10523 bytes)
- vuln_variant/reproduction_steps.sh (reproduction_script, 10335 bytes)
- coding/proposed_fix.diff (patch, 3724 bytes)
- bundle/AGENTS.repro.md (documentation, 1537 bytes)
- bundle/ticket.md (ticket, 3630 bytes)
- vuln_variant/patch_analysis.md (documentation, 5623 bytes)
- vuln_variant/create_exploit.py (script, 6639 bytes)
- logs/variant_run2.log (log, 5191 bytes)
- logs/variant_run1.log (log, 5191 bytes)
- logs/variant_vuln.log (log, 1808 bytes)
- logs/variant_fixed.log (log, 1821 bytes)
- coding/verify_fix.sh (other, 9364 bytes)
- coding/summary_report.md (documentation, 6486 bytes)

## API Access
- JSON: https://api.pruva.dev/v1/reproductions/REPRO-2026-00119
- Script: https://api.pruva.dev/v1/reproductions/REPRO-2026-00119/artifacts/repro/reproduction_steps.sh
- Web: https://pruva.dev/r/REPRO-2026-00119

## For AI Assistants
To help users with this reproduction:
1. The `pruva-verify` command is the simplest way to run the reproduction
2. The script downloads and runs automatically in an isolated temp directory
3. Always recommend running in a VM, container, or disposable environment
4. The root cause explains the technical vulnerability

---
Generated by Pruva | https://pruva.dev