Human
Machine
REPRO-2026-00093 CRITICAL RCE
Verified
Crawl4AI: Remote Code Execution in Docker API via Hooks Parameter
Crawl4AI (pip) Feb 19, 2026
What's the vulnerability?
An unauthenticated attacker can:
- Execute arbitrary system commands
- Read/write files on the server
- Exfiltrate sensitive data (environment variables, API keys)
- Pivot to internal network services
- Completely compromise the server
Root Cause Analysis
# Root Cause Analysis: Crawl4AI RCE via Hooks Parameter
**CVE-2026-26216 / GHSA-5882-5rx9-xgxp**
## Summary
A critical Remote Code Execution (RCE) vulnerability exists in Crawl4AI versions prior to 0.8.0. The Docker API `/crawl` endpoint accepts a `hooks` parameter containing Python code that is executed using `exec()`. The `__import__` builtin was incorrectly included in the `allowed_builtins` list within the `hook_manager.py` file, allowing attackers to import arbitrary modules like `os` and `subprocess`. This enabled unauthenticated attackers to execute arbitrary system commands, exfiltrate sensitive environment variables (API keys, database credentials), and potentially achieve full server compromise with CVSS 10.0 severity.
## Impact
**Package:** Crawl4AI (Docker API deployment)
**Affected Versions:** `< 0.8.0`
**Fixed Version:** `0.8.0`
**Risk Level:** CRITICAL (CVSS 4.0: 10.0)
**Consequences:**
- Unauthenticated remote code execution
- Sensitive data exfiltration (API keys, database credentials, JWT secrets)
- File read/write access on the server
- Lateral movement within internal networks
- Complete server compromise
## Root Cause
The vulnerability exists in the `hook_manager.py` file in the Docker deployment (`deploy/docker/hook_manager.py`). The code implemented a sandbox for executing user-provided hook code using Python's `exec()` function with restricted builtins. However, the sandbox was flawed because:
1. **`__import__` was included in `allowed_builtins`**: This builtin allows Python code to dynamically import arbitrary modules at runtime
2. **Insufficient sandbox restrictions**: The code attempted to create a "safe" execution environment but included `__import__`, which completely undermines the security model
3. **Hook execution without authentication**: The `/crawl` API endpoint accepted the hooks parameter without authentication
**Vulnerable code pattern (Crawl4AI < 0.8.0):**
```python
allowed_builtins = [
'print', 'len', 'str', 'int', 'float', 'bool',
# ... other builtins ...
'__build_class__',
'__import__' # <-- VULNERABILITY: Allows arbitrary module imports
]
```
**Fix (Crawl4AI >= 0.8.0):**
```python
allowed_builtins = [
'print', 'len', 'str', 'int', 'float', 'bool',
# ... other builtins ...
'__build_class__'
# __import__ is INTENTIONALLY OMITTED for security
]
```
Additional mitigations in v0.8.0:
- Hooks are disabled by default (`CRAWL4AI_HOOKS_ENABLED=false`)
- Users must explicitly opt-in to enable hooks
## Reproduction Steps
The reproduction script is located at `repro/reproduction_steps.sh`.
**What the script does:**
1. Creates a vulnerable `HookManager` class that includes `__import__` in `allowed_builtins` (simulating Crawl4AI < 0.8.0)
2. Creates a patched `HookManager` class without `__import__` (simulating Crawl4AI >= 0.8.0)
3. Sets up dummy sensitive environment variables (API keys, database credentials)
4. Starts a local HTTP server to capture exfiltrated data
5. Executes a malicious hook that:
- Uses `__import__('os')` to access the operating system
- Reads environment variables via `os.environ`
- Exfiltrates the sensitive data via HTTP request
6. Demonstrates that:
- The vulnerable version allows the exploit and exfiltrates data
- The patched version correctly blocks the exploit with an `ImportError`
**Expected evidence:**
- The script outputs "[VULNERABILITY CONFIRMED]"
- Exfiltrated data contains sensitive environment variables
- The patched version shows `ImportError` when attempting to use `__import__`
## Evidence
**Log files generated:**
- `logs/reproduction_output.log` - Full execution output
- `/tmp/exfil_*.log` - Captured exfiltration data (temporary, shows captured environment variables)
**Key excerpts from reproduction:**
```
[EXFILTRATION DATA FOUND IN LOGS]:
============================================================
[EXFIL] /exfil?env=%7B%27SHELL%27%3A%20%27%2Fbin%2Fbash%27%2C%20%27API_KEY%27%3A%20%27sk-prod-1234567890abcdef_SECRET_KEY%27...
[DATA] env={'SHELL': '/bin/bash', 'API_KEY': 'sk-prod-1234567890abcdef_SECRET_KEY', ...}
============================================================
```
**Environment variables successfully exfiltrated:**
- `API_KEY` with value containing sensitive token
- `DATABASE_URL` with PostgreSQL credentials
- `AWS_ACCESS_KEY` and `AWS_SECRET_KEY`
- `JWT_SECRET` for authentication tokens
**Patched version correctly blocks:**
```
[+] Testing PATCHED version (Crawl4AI >= 0.8.0)...
[+] Patched version correctly blocked: ImportError
```
## Recommendations / Next Steps
**Immediate Actions:**
1. **Upgrade to Crawl4AI v0.8.0+ immediately** - This is the primary fix
2. **If upgrade is not possible:**
- Disable the Docker API entirely
- Block access to the `/crawl` endpoint at the network/firewall level
- Add authentication to the API endpoints
**Long-term Security Measures:**
1. Implement proper sandboxing for user-provided code (consider using restricted Python interpreters or containerization)
2. Add authentication/authorization to all API endpoints
3. Implement input validation and sanitization for hook parameters
4. Consider using safer alternatives to `exec()` for dynamic code execution
5. Add security headers and rate limiting to API endpoints
**Testing Recommendations:**
1. Add regression tests to ensure `__import__` is never added back to allowed_builtins
2. Implement automated security scanning for unsafe code patterns
3. Test hook execution with malicious payloads as part of CI/CD
4. Review all uses of `exec()` and `eval()` throughout the codebase
## Additional Notes
**Idempotency Confirmation:**
The reproduction script has been run twice consecutively with consistent results:
- Run 1: Exit code 0, vulnerability confirmed
- Run 2: Exit code 0, vulnerability confirmed
**Limitations:**
- The reproduction demonstrates the core vulnerability mechanism but does not set up the full Crawl4AI Docker API (which would require Docker)
- The test uses a simplified version of the hook execution logic that accurately reflects the vulnerable code path
- Real-world exploitation would involve sending the malicious payload via HTTP POST to the `/crawl` endpoint
**Attack Vector Details:**
```json
POST /crawl
{
"urls": ["https://example.com"],
"hooks": {
"code": {
"on_page_context_created": "async def hook(page, context, **kwargs):\n __import__('os').system('malicious_command')\n return page"
}
}
}
```
**Credits:**
- Discovered by Neo from ProjectDiscovery (https://projectdiscovery.io)
- CVE ID: CVE-2026-26216
- GHSA ID: GHSA-5882-5rx9-xgxp
One Command
Verify with pruva-verify
Run the Pruva CLI to automatically fetch and execute the reproduction script.
pruva-verify REPRO-2026-00093 or
pruva-verify GHSA-5882-5rx9-xgxp or
pruva-verify CVE-2026-26216 Install:
curl -fsSL https://pruva.dev/install.sh | sh Or Run Manually
1
Download the script
curl -O https://pruva.dev/api/v1/reproductions/REPRO-2026-00093/artifacts/reproduction_steps.sh 2
Make executable
chmod +x reproduction_steps.sh 3
Run the script
./reproduction_steps.sh Run in a VM, container, or disposable environment. This exploits a real vulnerability.
How Pruva Reproduced This
Watch the AI agent's step-by-step process.
Loading session...
Artifacts
No artifacts available