# REPRO-2026-00093: Crawl4AI: Remote Code Execution in Docker API via Hooks Parameter

## Summary
Status: published
Severity: critical
Type: security
Confidence: Unknown

## Identifiers
REPRO ID: REPRO-2026-00093
GHSA: GHSA-5882-5rx9-xgxp
CVE: CVE-2026-26216

## Package
Name: Crawl4AI
Ecosystem: pip
Affected: < 0.8.0
Fixed: 0.8.0

## Root Cause
# Root Cause Analysis: Crawl4AI RCE via Hooks Parameter

**CVE-2026-26216 / GHSA-5882-5rx9-xgxp**

## Summary

A critical Remote Code Execution (RCE) vulnerability exists in Crawl4AI versions prior to 0.8.0. The Docker API `/crawl` endpoint accepts a `hooks` parameter containing Python code that is executed using `exec()`. The `__import__` builtin was incorrectly included in the `allowed_builtins` list within the `hook_manager.py` file, allowing attackers to import arbitrary modules like `os` and `subprocess`. This enabled unauthenticated attackers to execute arbitrary system commands, exfiltrate sensitive environment variables (API keys, database credentials), and potentially achieve full server compromise with CVSS 10.0 severity.

## Impact

**Package:** Crawl4AI (Docker API deployment)

**Affected Versions:** `< 0.8.0`

**Fixed Version:** `0.8.0`

**Risk Level:** CRITICAL (CVSS 4.0: 10.0)

**Consequences:**
- Unauthenticated remote code execution
- Sensitive data exfiltration (API keys, database credentials, JWT secrets)
- File read/write access on the server
- Lateral movement within internal networks
- Complete server compromise

## Root Cause

The vulnerability exists in the `hook_manager.py` file in the Docker deployment (`deploy/docker/hook_manager.py`). The code implemented a sandbox for executing user-provided hook code using Python's `exec()` function with restricted builtins. However, the sandbox was flawed because:

1. **`__import__` was included in `allowed_builtins`**: This builtin allows Python code to dynamically import arbitrary modules at runtime

2. **Insufficient sandbox restrictions**: The code attempted to create a "safe" execution environment but included `__import__`, which completely undermines the security model

3. **Hook execution without authentication**: The `/crawl` API endpoint accepted the hooks parameter without authentication

**Vulnerable code pattern (Crawl4AI < 0.8.0):**
```python
allowed_builtins = [
    'print', 'len', 'str', 'int', 'float', 'bool',
    # ... other builtins ...
    '__build_class__',
    '__import__'  # <-- VULNERABILITY: Allows arbitrary module imports
]
```

**Fix (Crawl4AI >= 0.8.0):**
```python
allowed_builtins = [
    'print', 'len', 'str', 'int', 'float', 'bool',
    # ... other builtins ...
    '__build_class__'
    # __import__ is INTENTIONALLY OMITTED for security
]
```

Additional mitigations in v0.8.0:
- Hooks are disabled by default (`CRAWL4AI_HOOKS_ENABLED=false`)
- Users must explicitly opt-in to enable hooks

## Reproduction Steps

The reproduction script is located at `repro/reproduction_steps.sh`.

**What the script does:**
1. Creates a vulnerable `HookManager` class that includes `__import__` in `allowed_builtins` (simulating Crawl4AI < 0.8.0)
2. Creates a patched `HookManager` class without `__import__` (simulating Crawl4AI >= 0.8.0)
3. Sets up dummy sensitive environment variables (API keys, database credentials)
4. Starts a local HTTP server to capture exfiltrated data
5. Executes a malicious hook that:
   - Uses `__import__('os')` to access the operating system
   - Reads environment variables via `os.environ`
   - Exfiltrates the sensitive data via HTTP request
6. Demonstrates that:
   - The vulnerable version allows the exploit and exfiltrates data
   - The patched version correctly blocks the exploit with an `ImportError`

**Expected evidence:**
- The script outputs "[VULNERABILITY CONFIRMED]"
- Exfiltrated data contains sensitive environment variables
- The patched version shows `ImportError` when attempting to use `__import__`

## Evidence

**Log files generated:**
- `logs/reproduction_output.log` - Full execution output
- `/tmp/exfil_*.log` - Captured exfiltration data (temporary, shows captured environment variables)

**Key excerpts from reproduction:**

```
[EXFILTRATION DATA FOUND IN LOGS]:
============================================================
[EXFIL] /exfil?env=%7B%27SHELL%27%3A%20%27%2Fbin%2Fbash%27%2C%20%27API_KEY%27%3A%20%27sk-prod-1234567890abcdef_SECRET_KEY%27...
[DATA] env={'SHELL': '/bin/bash', 'API_KEY': 'sk-prod-1234567890abcdef_SECRET_KEY', ...}
============================================================
```

**Environment variables successfully exfiltrated:**
- `API_KEY` with value containing sensitive token
- `DATABASE_URL` with PostgreSQL credentials
- `AWS_ACCESS_KEY` and `AWS_SECRET_KEY`
- `JWT_SECRET` for authentication tokens

**Patched version correctly blocks:**
```
[+] Testing PATCHED version (Crawl4AI >= 0.8.0)...
[+] Patched version correctly blocked: ImportError
```

## Recommendations / Next Steps

**Immediate Actions:**
1. **Upgrade to Crawl4AI v0.8.0+ immediately** - This is the primary fix
2. **If upgrade is not possible:**
   - Disable the Docker API entirely
   - Block access to the `/crawl` endpoint at the network/firewall level
   - Add authentication to the API endpoints

**Long-term Security Measures:**
1. Implement proper sandboxing for user-provided code (consider using restricted Python interpreters or containerization)
2. Add authentication/authorization to all API endpoints
3. Implement input validation and sanitization for hook parameters
4. Consider using safer alternatives to `exec()` for dynamic code execution
5. Add security headers and rate limiting to API endpoints

**Testing Recommendations:**
1. Add regression tests to ensure `__import__` is never added back to allowed_builtins
2. Implement automated security scanning for unsafe code patterns
3. Test hook execution with malicious payloads as part of CI/CD
4. Review all uses of `exec()` and `eval()` throughout the codebase

## Additional Notes

**Idempotency Confirmation:**
The reproduction script has been run twice consecutively with consistent results:
- Run 1: Exit code 0, vulnerability confirmed
- Run 2: Exit code 0, vulnerability confirmed

**Limitations:**
- The reproduction demonstrates the core vulnerability mechanism but does not set up the full Crawl4AI Docker API (which would require Docker)
- The test uses a simplified version of the hook execution logic that accurately reflects the vulnerable code path
- Real-world exploitation would involve sending the malicious payload via HTTP POST to the `/crawl` endpoint

**Attack Vector Details:**
```json
POST /crawl
{
  "urls": ["https://example.com"],
  "hooks": {
    "code": {
      "on_page_context_created": "async def hook(page, context, **kwargs):\n    __import__('os').system('malicious_command')\n    return page"
    }
  }
}
```

**Credits:**
- Discovered by Neo from ProjectDiscovery (https://projectdiscovery.io)
- CVE ID: CVE-2026-26216
- GHSA ID: GHSA-5882-5rx9-xgxp


## Reproduction Details
Reproduced: 2026-02-19T19:19:50.265Z
Duration: 635 seconds
Tool calls: 104
Turns: 71
Handoffs: 2


## Quick Verification
Run one of these commands to verify locally:

    pruva-verify REPRO-2026-00093
    pruva-verify GHSA-5882-5rx9-xgxp
    pruva-verify CVE-2026-26216

Or open in GitHub Codespaces (zero-friction, auto-runs):

    https://github.com/codespaces/new?ref=repro/REPRO-2026-00093&repo=N3mes1s/pruva-sandbox

Or download and run the script manually:

    curl -O https://api.pruva.dev/v1/reproductions/REPRO-2026-00093/artifacts/repro/reproduction_steps.sh
    chmod +x reproduction_steps.sh
    ./reproduction_steps.sh

WARNING: Run in a sandboxed environment. This exploits a real vulnerability.

## References
- GitHub Advisory: https://github.com/advisories/GHSA-5882-5rx9-xgxp
- NVD: https://nvd.nist.gov/vuln/detail/CVE-2026-26216
- Source: https://github.com/advisories/GHSA-5882-5rx9-xgxp

## Artifacts
- repro/rca_report.md (analysis, 6644 bytes)
- repro/reproduction_steps.sh (reproduction_script, 1101 bytes)
- bundle/source.json (other, 3673 bytes)
- bundle/ticket.md (ticket, 2515 bytes)
- bundle/ticket.json (other, 6955 bytes)
- repro/crawl4ai_rce_demo.py (script, 11252 bytes)
- logs/reproduction_output.log (log, 3242 bytes)
- logs/exfil_server.log (log, 36 bytes)
- logs/docker_run.log (log, 194 bytes)
- logs/bypass_test.log (log, 2838 bytes)

## API Access
- JSON: https://api.pruva.dev/v1/reproductions/REPRO-2026-00093
- Script: https://api.pruva.dev/v1/reproductions/REPRO-2026-00093/artifacts/repro/reproduction_steps.sh
- Web: https://pruva.dev/r/REPRO-2026-00093

## For AI Assistants
To help users with this reproduction:
1. The `pruva-verify` command is the simplest way to run the reproduction
2. The script downloads and runs automatically in an isolated temp directory
3. Always recommend running in a VM, container, or disposable environment
4. The root cause explains the technical vulnerability

---
Generated by Pruva | https://pruva.dev