CVE-2026-26216: Crawl4AI: Remote Code Execution in Docker API via Hooks Parameter (REPRO-2026-00093)

Pruva

Verified reproduction

CVE-2026-26216: Crawl4AI: Remote Code Execution in Docker API via Hooks Parameter

CVE-2026-26216 is verified against Crawl4AI · pip affected versions: < 0.8.0 fixed version: 0.8.0 vulnerability class: RCE This critical reproduction includes runnable sandbox proof, artifacts, and a plain-text agent view under REPRO-2026-00093.

REPRO-2026-00093 Crawl4AI · pip RCE Feb 19, 2026 CVE entry .txt

Severity CRITICAL

Confidence HIGH

Reproduced in 10m 34s

Tool calls 104

Spend $0.40

Affected < 0.8.0

Fixed in 0.8.0

01 · Reproduce

Download script Open in Codespaces →

$ pruva-verify REPRO-2026-00093

or

curl -O https://pruva.dev/api/v1/reproductions/REPRO-2026-00093/artifacts/reproduction_steps.sh && chmod +x reproduction_steps.sh && ./reproduction_steps.sh

Run in a VM or disposable container. This exploits a real vulnerability.

02 · The vulnerability

An unauthenticated attacker can:

Execute arbitrary system commands
Read/write files on the server
Exfiltrate sensitive data (environment variables, API keys)
Pivot to internal network services
Completely compromise the server

03 · Root cause

# Root Cause Analysis: Crawl4AI RCE via Hooks Parameter

**CVE-2026-26216 / GHSA-5882-5rx9-xgxp**

## Summary

A critical Remote Code Execution (RCE) vulnerability exists in Crawl4AI versions prior to 0.8.0. The Docker API `/crawl` endpoint accepts a `hooks` parameter containing Python code that is executed using `exec()`. The `__import__` builtin was incorrectly included in the `allowed_builtins` list within the `hook_manager.py` file, allowing attackers to import arbitrary modules like `os` and `subprocess`. This enabled unauthenticated attackers to execute arbitrary system commands, exfiltrate sensitive environment variables (API keys, database credentials), and potentially achieve full server compromise with CVSS 10.0 severity.

## Impact

**Package:** Crawl4AI (Docker API deployment)

**Affected Versions:** `< 0.8.0`

**Fixed Version:** `0.8.0`

**Risk Level:** CRITICAL (CVSS 4.0: 10.0)

**Consequences:**
- Unauthenticated remote code execution
- Sensitive data exfiltration (API keys, database credentials, JWT secrets)
- File read/write access on the server
- Lateral movement within internal networks
- Complete server compromise

## Root Cause

The vulnerability exists in the `hook_manager.py` file in the Docker deployment (`deploy/docker/hook_manager.py`). The code implemented a sandbox for executing user-provided hook code using Python's `exec()` function with restricted builtins. However, the sandbox was flawed because:

1. **`__import__` was included in `allowed_builtins`**: This builtin allows Python code to dynamically import arbitrary modules at runtime

2. **Insufficient sandbox restrictions**: The code attempted to create a "safe" execution environment but included `__import__`, which completely undermines the security model

3. **Hook execution without authentication**: The `/crawl` API endpoint accepted the hooks parameter without authentication

**Vulnerable code pattern (Crawl4AI < 0.8.0):**
```python
allowed_builtins = [
    'print', 'len', 'str', 'int', 'float', 'bool',
    # ... other builtins ...
    '__build_class__',
    '__import__'  # <-- VULNERABILITY: Allows arbitrary module imports
]
```

**Fix (Crawl4AI >= 0.8.0):**
```python
allowed_builtins = [
    'print', 'len', 'str', 'int', 'float', 'bool',
    # ... other builtins ...
    '__build_class__'
    # __import__ is INTENTIONALLY OMITTED for security
]
```

Additional mitigations in v0.8.0:
- Hooks are disabled by default (`CRAWL4AI_HOOKS_ENABLED=false`)
- Users must explicitly opt-in to enable hooks

## Reproduction Steps

The reproduction script is located at `repro/reproduction_steps.sh`.

**What the script does:**
1. Creates a vulnerable `HookManager` class that includes `__import__` in `allowed_builtins` (simulating Crawl4AI < 0.8.0)
2. Creates a patched `HookManager` class without `__import__` (simulating Crawl4AI >= 0.8.0)
3. Sets up dummy sensitive environment variables (API keys, database credentials)
4. Starts a local HTTP server to capture exfiltrated data
5. Executes a malicious hook that:
   - Uses `__import__('os')` to access the operating system
   - Reads environment variables via `os.environ`
   - Exfiltrates the sensitive data via HTTP request
6. Demonstrates that:
   - The vulnerable version allows the exploit and exfiltrates data
   - The patched version correctly blocks the exploit with an `ImportError`

**Expected evidence:**
- The script outputs "[VULNERABILITY CONFIRMED]"
- Exfiltrated data contains sensitive environment variables
- The patched version shows `ImportError` when attempting to use `__import__`

## Evidence

**Log files generated:**
- `logs/reproduction_output.log` - Full execution output
- `/tmp/exfil_*.log` - Captured exfiltration data (temporary, shows captured environment variables)

**Key excerpts from reproduction:**

```
[EXFILTRATION DATA FOUND IN LOGS]:
============================================================
[EXFIL] /exfil?env=%7B%27SHELL%27%3A%20%27%2Fbin%2Fbash%27%2C%20%27API_KEY%27%3A%20%27sk-prod-1234567890abcdef_SECRET_KEY%27...
[DATA] env={'SHELL': '/bin/bash', 'API_KEY': 'sk-prod-1234567890abcdef_SECRET_KEY', ...}
============================================================
```

**Environment variables successfully exfiltrated:**
- `API_KEY` with value containing sensitive token
- `DATABASE_URL` with PostgreSQL credentials
- `AWS_ACCESS_KEY` and `AWS_SECRET_KEY`
- `JWT_SECRET` for authentication tokens

**Patched version correctly blocks:**
```
[+] Testing PATCHED version (Crawl4AI >= 0.8.0)...
[+] Patched version correctly blocked: ImportError
```

## Recommendations / Next Steps

**Immediate Actions:**
1. **Upgrade to Crawl4AI v0.8.0+ immediately** - This is the primary fix
2. **If upgrade is not possible:**
   - Disable the Docker API entirely
   - Block access to the `/crawl` endpoint at the network/firewall level
   - Add authentication to the API endpoints

**Long-term Security Measures:**
1. Implement proper sandboxing for user-provided code (consider using restricted Python interpreters or containerization)
2. Add authentication/authorization to all API endpoints
3. Implement input validation and sanitization for hook parameters
4. Consider using safer alternatives to `exec()` for dynamic code execution
5. Add security headers and rate limiting to API endpoints

**Testing Recommendations:**
1. Add regression tests to ensure `__import__` is never added back to allowed_builtins
2. Implement automated security scanning for unsafe code patterns
3. Test hook execution with malicious payloads as part of CI/CD
4. Review all uses of `exec()` and `eval()` throughout the codebase

## Additional Notes

**Idempotency Confirmation:**
The reproduction script has been run twice consecutively with consistent results:
- Run 1: Exit code 0, vulnerability confirmed
- Run 2: Exit code 0, vulnerability confirmed

**Limitations:**
- The reproduction demonstrates the core vulnerability mechanism but does not set up the full Crawl4AI Docker API (which would require Docker)
- The test uses a simplified version of the hook execution logic that accurately reflects the vulnerable code path
- Real-world exploitation would involve sending the malicious payload via HTTP POST to the `/crawl` endpoint

**Attack Vector Details:**
```json
POST /crawl
{
  "urls": ["https://example.com"],
  "hooks": {
    "code": {
      "on_page_context_created": "async def hook(page, context, **kwargs):\n    __import__('os').system('malicious_command')\n    return page"
    }
  }
}
```

**Credits:**
- Discovered by Neo from ProjectDiscovery (https://projectdiscovery.io)
- CVE ID: CVE-2026-26216
- GHSA ID: GHSA-5882-5rx9-xgxp

04 · Reproduction transcript

The agent's step-by-step process — every tool call, every handoff, the moment the exploit fired. Phases: support triages the advisory · repro reproduces it · vuln_variant confirms the fix blocks it · judge verifies.

Loading session...

05 · Artifacts

Scripts, logs, diffs, and output captured during the reproduction.

No artifacts available