# REPRO-2026-00201: Unauthenticated RCE in Langflow via public flow build endpoint

## Summary
Status: published
Severity: critical
Type: security
Confidence: high

## Identifiers
REPRO ID: REPRO-2026-00201

CVE: CVE-2026-33017

## Package
Name: langflow
Ecosystem: pip
Affected: <= 1.8.2 (all versions < 1.9.0)
Fixed: 1.9.0

## Root Cause
# Root Cause Analysis: CVE-2026-33017

## Summary

CVE-2026-33017 is an unauthenticated remote code execution (RCE) vulnerability in
Langflow prior to version 1.9.0. The public flow-build endpoint
`POST /api/v1/build_public_tmp/{flow_id}/flow` accepts an attacker-controlled
`data` parameter (`FlowDataRequest`) containing arbitrary Python code inside a
custom component node. Because the endpoint is intentionally unauthenticated for
public flows, any remote attacker can reach it. The supplied flow definition is
passed through `start_flow_build()` → `build_graph_from_data()` →
`Graph.from_payload()` and ultimately to the custom-component loader, which
extracts the `code` field and executes it with `exec()` inside
`prepare_global_scope()` (in `lfx/custom/validate.py`) without any sandboxing.
A module-level assignment such as `_rce = os.system(...)` is an `ast.Assign`
node that `prepare_global_scope()` collects and `exec()`s at graph-build time,
yielding arbitrary command execution with the privileges of the Langflow server
process. A single HTTP request is sufficient.

## Impact

- **Product:** Langflow (PyPI package `langflow`; Docker image `langflowai/langflow`)
- **Affected versions:** `langflow < 1.9.0` (reproduction uses `1.8.1` as the
  vulnerable image).
- **Patched versions:** `>= 1.9.0` (the public build endpoint hardcodes
  `data=None` and loads the stored flow from the database only).
- **Risk level:** Critical (CISA KEV added 2026-03-25).
- **Consequences:** An unauthenticated, remote attacker can run arbitrary system
  commands, read environment variables (including LLM API keys / cloud
  credentials), access/modify the database and flow data, and establish
  persistence. The server process ran as `uid=1000(user) gid=0(root)` in the
  container image.

## Impact Parity

- **Disclosed/claimed maximum impact:** Unauthenticated remote code execution
  (code execution) via a single HTTP request to the public build endpoint.
- **Reproduced impact from this run:** Confirmed code execution. The vulnerable
  `langflowai/langflow:1.8.1` container wrote `/tmp/rce-proof` containing the
  output of the `id` command (`uid=1000(user) gid=0(root) groups=0(root)`) plus a
  unique per-attempt token, after receiving an **unauthenticated** HTTP POST to
  `/api/v1/build_public_tmp/{flow_id}/flow`. The fixed
  `langflowai/langflow:1.9.0` container did **not** write the proof file under
  the identical request (negative control).
- **Parity:** `full`.
- **Not demonstrated:** None relevant; the claimed unauthenticated-RCE impact
  was directly demonstrated end-to-end against the real product.

## Root Cause

The vulnerable endpoint `build_public_tmp` in
`src/backend/base/langflow/api/v1/chat.py` (v1.8.1) declares an inbound
`data: FlowDataRequest` parameter and forwards it directly to
`start_flow_build()`:

```python
@router.post("/build_public_tmp/{flow_id}/flow")
async def build_public_tmp(..., data: Annotated[FlowDataRequest | None, Body(embed=True)] = None, ...):
    owner_user, new_flow_id = await verify_public_flow_and_get_user(flow_id=flow_id, client_id=client_id)
    job_id = await start_flow_build(flow_id=new_flow_id, ..., data=data, ...)
```

`start_flow_build()` (`src/backend/base/langflow/api/build.py`) builds the graph
from the attacker-supplied data when it is present:

```python
async def create_graph(...):
    if not data:
        return await build_graph_from_db(...)
    return await build_graph_from_data(flow_id=..., payload=data.model_dump(), ...)
```

`build_graph_from_data()` → `Graph.from_payload()` constructs vertices from the
attacker nodes. For a custom component (`template._type == "Component"`), the
loader calls `create_class(code, class_name)` in `src/lfx/src/lfx/custom/validate.py`,
which calls `prepare_global_scope(module)`. That function iterates the module
body, collects top-level `ast.Assign` / `ast.AnnAssign` / `ast.ClassDef` /
`ast.FunctionDef` nodes into `definitions`, compiles them, and runs:

```python
if definitions:
    combined_module = ast.Module(body=definitions, type_ignores=[])
    compiled_code = compile(combined_module, "<string>", "exec")
    exec(compiled_code, exec_globals)   # <-- attacker module-level code runs here
```

Therefore a top-level `_rce = os.system("id > /tmp/rce-proof ...")` executes
during graph construction, before any output is produced.

The only access control on the endpoint is `verify_public_flow_and_get_user()`,
which merely checks that the targeted `flow_id` is marked `PUBLIC` in the
database and that a `client_id` cookie is present (any value). The attacker
creates the public flow themselves (using the AUTO_LOGIN superuser session), so
this check is satisfied trivially.

**Fix (v1.9.0):** the endpoint no longer accepts a `data` parameter and hardcodes
`data=None`, so the build always loads the stored flow definition from the
database. It also validates the stored flow with
`validate_flow_for_current_settings()` and rejects custom components on the
public path (`CustomComponentValidationError` → HTTP 400). The diff is the
removal of `data: ... = None` from the signature and `data=data` → `data=None`
in the `start_flow_build(...)` call.

```python
# v1.9.0
job_id = await start_flow_build(flow_id=new_flow_id, source_flow_id=flow_id,
    ..., data=None,  # Always None - public flows load from database only
    ...)
```

## Reproduction Steps

1. The reproduction is fully automated by
   `bundle/repro/reproduction_steps.sh` (with helper
   `bundle/repro/repro_attempt.py`).
2. The script pulls `langflowai/langflow:1.8.1` (vulnerable) and
   `langflowai/langflow:1.9.0` (fixed), then runs **2 vulnerable** and **2 fixed**
   isolated attempts. Each attempt:
   - starts a fresh Langflow container with `LANGFLOW_AUTO_LOGIN=true` and
     `--backend-only`,
   - waits for the `/health` endpoint,
   - performs `GET /api/v1/auto_login` to obtain a superuser access token,
   - creates a PUBLIC flow via `POST /api/v1/flows/`,
   - sends the unauthenticated exploit `POST
     /api/v1/build_public_tmp/{flow_id}/flow` with a `client_id` cookie and a
     body whose `data` contains one `CustomComponent` node whose `code` holds a
     top-level `_rce = os.system("id > /tmp/rce-proof && echo RCE_CONFIRMED
     <token> >> /tmp/rce-proof")`,
   - polls for `/tmp/rce-proof` inside the container and copies it out as
     evidence, then tears the container down.
3. Expected evidence: on the vulnerable image each attempt produces
   `logs/proof_vuln_N.txt` containing the `id` output and the unique token, with
   `exploit_status: 200` and `proof_exists: true` in `logs/result_vuln_N.json`.
   On the fixed image `proof_exists: false` for every attempt
   (`logs/result_fixed_N.json`).

## Evidence

- `bundle/logs/reproduction_steps.log` — full orchestrator log.
- `bundle/logs/result_vuln_{1,2}.json` — per-attempt JSON results for the
  vulnerable image (auto_login=200, create_flow=201, exploit=200,
  proof_exists=true, proof_content with `uid=1000(user)...` + token).
- `bundle/logs/proof_vuln_{1,2}.txt` — the proof file exfiltrated from the
  vulnerable container (`id` output + `RCE_CONFIRMED <token>`).
- `bundle/logs/result_fixed_{1,2}.json` — per-attempt JSON results for the fixed
  image (exploit=200 but proof_exists=false).
- `bundle/logs/container_{vuln,fixed}_{1,2}.log` — container startup/runtime
  logs.
- `bundle/repro/runtime_manifest.json` — structured runtime evidence
  (`entrypoint_kind=api_remote`, `service_started=true`,
  `healthcheck_passed=true`, `target_path_reached=true`).
- `bundle/repro/validation_verdict.json` — structured verdict.

Key excerpt from a manual run against `langflowai/langflow:1.8.1`:

```json
{"role":"vuln","token":"manualtest1","auto_login_status":200,"create_flow_status":201,
 "flow_id":"b43e6614-...","exploit_status":200,
 "exploit_body":"{\"job_id\":\"8449e0de-...\"}","proof_exists":true,
 "proof_content":"uid=1000(user) gid=0(root) groups=0(root)\nRCE_CONFIRMED manualtest1",
 "success":true}
```

Negative control against `langflowai/langflow:1.9.0` (identical request):

```json
{"role":"fixed","token":"fixedtest1","auto_login_status":200,"create_flow_status":201,
 "exploit_status":200,"proof_exists":false,"proof_content":null,"success":false}
```

Environment: Docker 29.6.1; official images `langflowai/langflow:1.8.1` and
`:1.9.0`; exploit executed via `docker exec` inside each container (the sandbox
cannot reach published host ports, so all HTTP traffic is generated inside the
container against `http://127.0.0.1:7860`). No sanitizers were used; this is a
non-sanitized production-path proof.

## Recommendations / Next Steps

- **Upgrade** to Langflow `>= 1.9.0` immediately. The public build endpoint no
  longer accepts client-supplied flow definitions and validates stored flows.
- If AUTO_LOGIN must stay enabled in production, restrict network exposure of
  the Langflow HTTP port and place it behind an authenticated reverse proxy;
  AUTO_LOGIN issues a superuser session without credentials.
- Consider disabling custom components entirely on public flows
  (`allow_custom_components=false`) and enforce `access_type=PRIVATE` by default.
- Add an integration test that posts a custom-component payload with a
  module-level side-effect sentinel to `build_public_tmp` and asserts it never
  fires, to prevent regressions of this fix.

## Additional Notes

- **Idempotency:** The script removes any prior `/tmp/rce-proof` at the start of
  each attempt and tears down the container afterwards, so consecutive runs are
  clean and reproducible. Verified by running two vulnerable and two fixed
  attempts back-to-back.
- The malicious payload is delivered as a top-level **assignment**
  (`_rce = os.system(...)`) rather than a bare expression, because
  `prepare_global_scope()` only `exec()`s nodes it classifies as `ast.Assign` /
  `ast.AnnAssign` / `ast.ClassDef` / `ast.FunctionDef`; an assignment is
  guaranteed to execute at graph-build time.
- The flow created for the exploit uses a benign empty `data`
  (`{"nodes":[],"edges":[]}`) simply to satisfy the `access_type=PUBLIC`
  requirement; on the vulnerable path the attacker-supplied `data` overrides the
  stored definition, so the stored content is irrelevant.
- The proof file is written inside the container filesystem and copied out via
  `docker cp` for durable evidence.


## Reproduction Details
Reproduced: 2026-07-02T16:46:15.644Z
Duration: 1508 seconds
Tool calls: 259
Turns: Unknown
Handoffs: 2


## Quick Verification
Run one of these commands to verify locally:

    pruva-verify REPRO-2026-00201

    pruva-verify CVE-2026-33017

Or open in GitHub Codespaces (zero-friction, auto-runs):

    https://github.com/codespaces/new?ref=repro/REPRO-2026-00201&repo=N3mes1s/pruva-sandbox

Or download and run the script manually:

    curl -O https://api.pruva.dev/v1/reproductions/REPRO-2026-00201/artifacts/bundle/repro/reproduction_steps.sh
    chmod +x reproduction_steps.sh
    ./reproduction_steps.sh

WARNING: Run in a sandboxed environment. This exploits a real vulnerability.

## References

- NVD: https://nvd.nist.gov/vuln/detail/CVE-2026-33017


## Artifacts
- bundle/repro/reproduction_steps.sh (reproduction_script, 10347 bytes)
- bundle/repro/rca_report.md (analysis, 10401 bytes)
- bundle/vuln_variant/reproduction_steps.sh (reproduction_script, 11967 bytes)
- bundle/vuln_variant/rca_report.md (analysis, 15556 bytes)
- bundle/ticket.md (ticket, 930 bytes)
- bundle/ticket.json (other, 1419 bytes)
- bundle/repro/repro_attempt.py (script, 8045 bytes)
- bundle/repro/validation_verdict.json (other, 828 bytes)
- bundle/repro/runtime_manifest.json (other, 923 bytes)
- bundle/logs/reproduction_steps.log (log, 6676 bytes)
- bundle/logs/container_vuln_1.log (log, 3243 bytes)
- bundle/logs/result_vuln_1.json (other, 466 bytes)
- bundle/logs/result_vuln_1_stderr.log (log, 0 bytes)
- bundle/logs/container_vuln_2.log (log, 3243 bytes)
- bundle/logs/result_vuln_2.json (other, 466 bytes)
- bundle/logs/result_vuln_2_stderr.log (log, 0 bytes)
- bundle/logs/container_fixed_1.log (log, 3501 bytes)
- bundle/logs/result_fixed_1.json (other, 398 bytes)
- bundle/logs/result_fixed_1_stderr.log (log, 0 bytes)
- bundle/logs/container_fixed_2.log (log, 3527 bytes)
- bundle/logs/result_fixed_2.json (other, 398 bytes)
- bundle/logs/result_fixed_2_stderr.log (log, 0 bytes)
- bundle/logs/proof_vuln_1.txt (other, 73 bytes)
- bundle/logs/proof_vuln_2.txt (other, 73 bytes)
- bundle/logs/vuln_variant/reproduction_steps.log (log, 12112 bytes)
- bundle/logs/vuln_variant/claimed_fixed_image_identity.txt (other, 406 bytes)
- bundle/logs/vuln_variant/followup_fixed_image_identity.txt (other, 409 bytes)
- bundle/logs/vuln_variant/container_claimed_fixed_1.log (log, 3042 bytes)
- bundle/logs/vuln_variant/result_claimed_fixed_1.json (other, 500 bytes)
- bundle/logs/vuln_variant/result_claimed_fixed_1_stderr.log (log, 0 bytes)
- bundle/logs/vuln_variant/container_claimed_fixed_2.log (log, 3016 bytes)
- bundle/logs/vuln_variant/result_claimed_fixed_2.json (other, 500 bytes)
- bundle/logs/vuln_variant/result_claimed_fixed_2_stderr.log (log, 0 bytes)
- bundle/logs/vuln_variant/container_followup_fixed_1.log (log, 91965 bytes)
- bundle/logs/vuln_variant/result_followup_fixed_1.json (other, 425 bytes)
- bundle/logs/vuln_variant/result_followup_fixed_1_stderr.log (log, 0 bytes)
- bundle/logs/vuln_variant/container_followup_fixed_2.log (log, 91913 bytes)
- bundle/logs/vuln_variant/result_followup_fixed_2.json (other, 425 bytes)
- bundle/logs/vuln_variant/result_followup_fixed_2_stderr.log (log, 0 bytes)
- bundle/logs/vuln_variant/proof_claimed_fixed_1.txt (other, 73 bytes)
- bundle/logs/vuln_variant/proof_claimed_fixed_2.txt (other, 73 bytes)
- bundle/logs/vuln_variant/fixed_version.txt (other, 519 bytes)
- bundle/logs/vuln_variant/claimed_fixed_version.txt (other, 465 bytes)
- bundle/vuln_variant/variant_attempt.py (script, 9155 bytes)
- bundle/vuln_variant/runtime_manifest.json (other, 1392 bytes)
- bundle/vuln_variant/validation_verdict.json (other, 924 bytes)
- bundle/vuln_variant/source_identity.json (other, 1946 bytes)
- bundle/vuln_variant/variant_manifest.json (other, 5174 bytes)
- bundle/vuln_variant/patch_analysis.md (documentation, 10923 bytes)
- bundle/vuln_variant/root_cause_equivalence.json (other, 3346 bytes)

## API Access
- JSON: https://api.pruva.dev/v1/reproductions/REPRO-2026-00201
- Script: https://api.pruva.dev/v1/reproductions/REPRO-2026-00201/artifacts/bundle/repro/reproduction_steps.sh
- Web: https://pruva.dev/r/REPRO-2026-00201

## For AI Assistants
To help users with this reproduction:
1. The `pruva-verify` command is the simplest way to run the reproduction
2. The script downloads and runs automatically in an isolated temp directory
3. Always recommend running in a VM, container, or disposable environment
4. The root cause explains the technical vulnerability

---
Generated by Pruva | https://pruva.dev