# REPRO-2026-00093: Crawl4AI: Remote Code Execution in Docker API via Hooks Parameter ## Summary Status: published Severity: critical Type: security Confidence: Unknown ## Identifiers REPRO ID: REPRO-2026-00093 GHSA: GHSA-5882-5rx9-xgxp CVE: CVE-2026-26216 ## Package Name: Crawl4AI Ecosystem: pip Affected: < 0.8.0 Fixed: 0.8.0 ## Root Cause # Root Cause Analysis: Crawl4AI RCE via Hooks Parameter **CVE-2026-26216 / GHSA-5882-5rx9-xgxp** ## Summary A critical Remote Code Execution (RCE) vulnerability exists in Crawl4AI versions prior to 0.8.0. The Docker API `/crawl` endpoint accepts a `hooks` parameter containing Python code that is executed using `exec()`. The `__import__` builtin was incorrectly included in the `allowed_builtins` list within the `hook_manager.py` file, allowing attackers to import arbitrary modules like `os` and `subprocess`. This enabled unauthenticated attackers to execute arbitrary system commands, exfiltrate sensitive environment variables (API keys, database credentials), and potentially achieve full server compromise with CVSS 10.0 severity. ## Impact **Package:** Crawl4AI (Docker API deployment) **Affected Versions:** `< 0.8.0` **Fixed Version:** `0.8.0` **Risk Level:** CRITICAL (CVSS 4.0: 10.0) **Consequences:** - Unauthenticated remote code execution - Sensitive data exfiltration (API keys, database credentials, JWT secrets) - File read/write access on the server - Lateral movement within internal networks - Complete server compromise ## Root Cause The vulnerability exists in the `hook_manager.py` file in the Docker deployment (`deploy/docker/hook_manager.py`). The code implemented a sandbox for executing user-provided hook code using Python's `exec()` function with restricted builtins. However, the sandbox was flawed because: 1. **`__import__` was included in `allowed_builtins`**: This builtin allows Python code to dynamically import arbitrary modules at runtime 2. **Insufficient sandbox restrictions**: The code attempted to create a "safe" execution environment but included `__import__`, which completely undermines the security model 3. **Hook execution without authentication**: The `/crawl` API endpoint accepted the hooks parameter without authentication **Vulnerable code pattern (Crawl4AI < 0.8.0):** ```python allowed_builtins = [ 'print', 'len', 'str', 'int', 'float', 'bool', # ... other builtins ... '__build_class__', '__import__' # <-- VULNERABILITY: Allows arbitrary module imports ] ``` **Fix (Crawl4AI >= 0.8.0):** ```python allowed_builtins = [ 'print', 'len', 'str', 'int', 'float', 'bool', # ... other builtins ... '__build_class__' # __import__ is INTENTIONALLY OMITTED for security ] ``` Additional mitigations in v0.8.0: - Hooks are disabled by default (`CRAWL4AI_HOOKS_ENABLED=false`) - Users must explicitly opt-in to enable hooks ## Reproduction Steps The reproduction script is located at `repro/reproduction_steps.sh`. **What the script does:** 1. Creates a vulnerable `HookManager` class that includes `__import__` in `allowed_builtins` (simulating Crawl4AI < 0.8.0) 2. Creates a patched `HookManager` class without `__import__` (simulating Crawl4AI >= 0.8.0) 3. Sets up dummy sensitive environment variables (API keys, database credentials) 4. Starts a local HTTP server to capture exfiltrated data 5. Executes a malicious hook that: - Uses `__import__('os')` to access the operating system - Reads environment variables via `os.environ` - Exfiltrates the sensitive data via HTTP request 6. Demonstrates that: - The vulnerable version allows the exploit and exfiltrates data - The patched version correctly blocks the exploit with an `ImportError` **Expected evidence:** - The script outputs "[VULNERABILITY CONFIRMED]" - Exfiltrated data contains sensitive environment variables - The patched version shows `ImportError` when attempting to use `__import__` ## Evidence **Log files generated:** - `logs/reproduction_output.log` - Full execution output - `/tmp/exfil_*.log` - Captured exfiltration data (temporary, shows captured environment variables) **Key excerpts from reproduction:** ``` [EXFILTRATION DATA FOUND IN LOGS]: ============================================================ [EXFIL] /exfil?env=%7B%27SHELL%27%3A%20%27%2Fbin%2Fbash%27%2C%20%27API_KEY%27%3A%20%27sk-prod-1234567890abcdef_SECRET_KEY%27... [DATA] env={'SHELL': '/bin/bash', 'API_KEY': 'sk-prod-1234567890abcdef_SECRET_KEY', ...} ============================================================ ``` **Environment variables successfully exfiltrated:** - `API_KEY` with value containing sensitive token - `DATABASE_URL` with PostgreSQL credentials - `AWS_ACCESS_KEY` and `AWS_SECRET_KEY` - `JWT_SECRET` for authentication tokens **Patched version correctly blocks:** ``` [+] Testing PATCHED version (Crawl4AI >= 0.8.0)... [+] Patched version correctly blocked: ImportError ``` ## Recommendations / Next Steps **Immediate Actions:** 1. **Upgrade to Crawl4AI v0.8.0+ immediately** - This is the primary fix 2. **If upgrade is not possible:** - Disable the Docker API entirely - Block access to the `/crawl` endpoint at the network/firewall level - Add authentication to the API endpoints **Long-term Security Measures:** 1. Implement proper sandboxing for user-provided code (consider using restricted Python interpreters or containerization) 2. Add authentication/authorization to all API endpoints 3. Implement input validation and sanitization for hook parameters 4. Consider using safer alternatives to `exec()` for dynamic code execution 5. Add security headers and rate limiting to API endpoints **Testing Recommendations:** 1. Add regression tests to ensure `__import__` is never added back to allowed_builtins 2. Implement automated security scanning for unsafe code patterns 3. Test hook execution with malicious payloads as part of CI/CD 4. Review all uses of `exec()` and `eval()` throughout the codebase ## Additional Notes **Idempotency Confirmation:** The reproduction script has been run twice consecutively with consistent results: - Run 1: Exit code 0, vulnerability confirmed - Run 2: Exit code 0, vulnerability confirmed **Limitations:** - The reproduction demonstrates the core vulnerability mechanism but does not set up the full Crawl4AI Docker API (which would require Docker) - The test uses a simplified version of the hook execution logic that accurately reflects the vulnerable code path - Real-world exploitation would involve sending the malicious payload via HTTP POST to the `/crawl` endpoint **Attack Vector Details:** ```json POST /crawl { "urls": ["https://example.com"], "hooks": { "code": { "on_page_context_created": "async def hook(page, context, **kwargs):\n __import__('os').system('malicious_command')\n return page" } } } ``` **Credits:** - Discovered by Neo from ProjectDiscovery (https://projectdiscovery.io) - CVE ID: CVE-2026-26216 - GHSA ID: GHSA-5882-5rx9-xgxp ## Reproduction Details Reproduced: 2026-02-19T19:19:50.265Z Duration: 635 seconds Tool calls: 104 Turns: 71 Handoffs: 2 ## Quick Verification Run one of these commands to verify locally: pruva-verify REPRO-2026-00093 pruva-verify GHSA-5882-5rx9-xgxp pruva-verify CVE-2026-26216 Or open in GitHub Codespaces (zero-friction, auto-runs): https://github.com/codespaces/new?ref=repro/REPRO-2026-00093&repo=N3mes1s/pruva-sandbox Or download and run the script manually: curl -O https://api.pruva.dev/v1/reproductions/REPRO-2026-00093/artifacts/repro/reproduction_steps.sh chmod +x reproduction_steps.sh ./reproduction_steps.sh WARNING: Run in a sandboxed environment. This exploits a real vulnerability. ## References - GitHub Advisory: https://github.com/advisories/GHSA-5882-5rx9-xgxp - NVD: https://nvd.nist.gov/vuln/detail/CVE-2026-26216 - Source: https://github.com/advisories/GHSA-5882-5rx9-xgxp ## Artifacts - repro/rca_report.md (analysis, 6644 bytes) - repro/reproduction_steps.sh (reproduction_script, 1101 bytes) - bundle/source.json (other, 3673 bytes) - bundle/ticket.md (ticket, 2515 bytes) - bundle/ticket.json (other, 6955 bytes) - repro/crawl4ai_rce_demo.py (script, 11252 bytes) - logs/reproduction_output.log (log, 3242 bytes) - logs/exfil_server.log (log, 36 bytes) - logs/docker_run.log (log, 194 bytes) - logs/bypass_test.log (log, 2838 bytes) ## API Access - JSON: https://api.pruva.dev/v1/reproductions/REPRO-2026-00093 - Script: https://api.pruva.dev/v1/reproductions/REPRO-2026-00093/artifacts/repro/reproduction_steps.sh - Web: https://pruva.dev/r/REPRO-2026-00093 ## For AI Assistants To help users with this reproduction: 1. The `pruva-verify` command is the simplest way to run the reproduction 2. The script downloads and runs automatically in an isolated temp directory 3. Always recommend running in a VM, container, or disposable environment 4. The root cause explains the technical vulnerability --- Generated by Pruva | https://pruva.dev