# REPRO-2026-00228: nginx charset module segfaults when charset_map uses utf-8 as source charset, causing a NULL pointer dereference and DoS. ## Summary Status: published Severity: high Type: security Confidence: high ## Identifiers REPRO ID: REPRO-2026-00228 ## Package Name: nginx Ecosystem: generic Affected: < 29c23ad846787e8baa1390b2edca479eb63ea8d7 (exact releases not specified) Fixed: 29c23ad846787e8baa1390b2edca479eb63ea8d7 ## Root Cause # RCA Report: nginx charset_map utf-8 Source Charset NULL-Dereference Segfault ## Summary A misconfigured `charset_map` directive with `utf-8` in the first column (source charset) causes nginx to create wrong-format single-byte conversion tables. When a subsequent HTTP request triggers the charset filter's `recode_from_utf8()` path, the 256-byte single-byte table is cast to `u_char **` and dereferenced as a pointer array (`table[n >> 8]`), reading garbage bytes as a memory address and crashing the worker process with SIGSEGV (signal 11). The upstream fix (commit `29c23ad846787e8baa1390b2edca479eb63ea8d7`) adds a configuration-time validation that rejects `charset_map` with `utf-8` in the first column, preventing the invalid configuration from ever being loaded. ## Impact - **Package/Component affected:** nginx `src/http/modules/ngx_http_charset_filter_module.c` (the `ngx_http_charset_filter_module`) - **Affected versions:** nginx versions prior to commit `29c23ad846787e8baa1390b2edca479eb63ea8d7` (tested on nginx/1.31.3 at parent commit `8f3465ac7f02b0ae86304e1be4ed319abb9d2edb`) - **Risk level:** High — any attacker who can send an HTTP request to a server configured with the vulnerable `charset_map` directive causes an immediate worker process crash (denial of service). The crash occurs on every request to the affected location. - **Consequences:** Repeated requests cause continuous worker respawns and crashes, degrading server availability. The crash is deterministic and triggered by a single HTTP GET request. ## Impact Parity - **Disclosed/claimed maximum impact:** Denial of Service (DoS) via NULL pointer dereference / segfault in nginx worker process when processing requests with the misconfigured `charset_map`. - **Reproduced impact from this run:** DoS confirmed — nginx worker process crashes with SIGSEGV (signal 11, core dumped) on every HTTP request to the affected location. The worker is killed immediately when processing response body data containing non-ASCII bytes through the `recode_from_utf8()` code path. - **Parity:** `full` — the reproduced segfault/DoS matches the claimed impact exactly. - **Not demonstrated:** No code execution or privilege escalation was claimed or observed; the impact is purely a DoS crash. ## Root Cause The charset filter module supports two table formats: 1. **Single-byte tables** (256 bytes): used when neither charset in a `charset_map` is UTF-8. Each byte maps directly: `table[src_byte] = dst_byte`. 2. **UTF-8 multi-byte tables** (256 × `NGX_UTF_LEN` = 1024 bytes for `src2dst`, and an array of `u_char *` pointers for `dst2src`): used when the *destination* charset (second column) is UTF-8. The bug occurs because `ngx_http_charset_map_block()` decides which table format to allocate based solely on whether `value[2]` (the **destination/second** column) is `"utf-8"`. When `utf-8` appears in `value[1]` (the **source/first** column) and the destination is a single-byte charset (e.g., `windows-1251`), the code takes the `else` branch and allocates 256-byte single-byte tables for both `src2dst` and `dst2src`. During request processing, the charset filter's body filter calls `ngx_http_charset_recode_from_utf8()` when `ctx->from_utf8` is true (i.e., the source charset is UTF-8). This function casts `ctx->table` (the 256-byte buffer) to `u_char **table` and dereferences `table[n >> 8]` as a pointer: ```c table = (u_char **) ctx->table; // 256-byte buffer cast to pointer array ... n = ngx_utf8_decode(&src, len); // decode UTF-8 sequence to codepoint if (n < 0x10000) { p = table[n >> 8]; // reads 8 bytes at offset (n>>8)*8 as a pointer if (p) { c = p[n & 0xff]; // dereferences the garbage pointer → SIGSEGV ``` For example, with Cyrillic `а` (U+0430, encoded as `0xD0 0xB0`), `ngx_utf8_decode` returns `n = 0x0430`, so `n >> 8 = 4`. `table[4]` reads bytes 32–39 of the 256-byte buffer (values `32,33,34,35,36,37,38,39`), which on little-endian 64-bit forms the garbage pointer `0x0000002726252423`. Since this is non-NULL, `p[n & 0xff]` dereferences `0x0000002726252453` — an unmapped address — causing SIGSEGV. **Fix commit:** `29c23ad846787e8baa1390b2edca479eb63ea8d7` — "Charset: disabled charset_map with utf-8 in the first column". The fix adds a check in `ngx_http_charset_map_block()` that rejects the configuration at parse time: ```c if (ngx_strcasecmp(value[1].data, (u_char *) "utf-8") == 0) { ngx_conf_log_error(NGX_LOG_EMERG, cf, 0, "\"charset_map\" with \"utf-8\" charset " "should be given in the second column"); return NGX_CONF_ERROR; } ``` ## Reproduction Steps 1. **Script:** `bundle/repro/reproduction_steps.sh` 2. **What the script does:** - Locates pre-built nginx binaries from the project cache (vulnerable build at commit `8f3465ac7` and fixed build at commit `29c23ad84`), with a fallback to clone-and-build from source. - Creates an HTML file containing real UTF-8 multi-byte characters (Cyrillic `а`, `б`, `в` — bytes `0xD0 0xB0`, etc.) to trigger the non-ASCII code path. - **Vulnerable test (×2):** Writes an nginx config with `charset_map utf-8 windows-1251 { }` + `charset windows-1251` + `source_charset utf-8`, starts nginx as a real TCP listener, sends an HTTP GET request via curl, and checks the error log for `exited on signal 11` (SIGSEGV). - **Fixed test (×2):** Writes the same config and runs `nginx -t` to verify the config is rejected with the patch's error message. - **Config acceptance contrast:** Verifies the vulnerable binary accepts the config (exit 0) while the fixed binary rejects it. - Writes `bundle/repro/runtime_manifest.json` with proof artifacts. 3. **Expected evidence:** Two vulnerable attempts showing `worker process N exited on signal 11 (core dumped)` in the error log, and two fixed attempts showing `"charset_map" with "utf-8" charset should be given in the second column`. ## Evidence ### Log file locations - `bundle/logs/vuln_error_1.log` — Vulnerable attempt 1 error log (segfault) - `bundle/logs/vuln_error_2.log` — Vulnerable attempt 2 error log (segfault) - `bundle/logs/vuln_conf_1.conf` / `vuln_conf_2.conf` — Vulnerable nginx configs - `bundle/logs/fixed_test_1.log` / `fixed_test_2.log` — Fixed version config rejection - `bundle/logs/vuln_config_accept.log` — Vulnerable config acceptance - `bundle/repro/runtime_manifest.json` — Runtime evidence manifest ### Key excerpts **Vulnerable worker segfault (attempt 1):** ``` 2026/07/04 18:20:50 [alert] 30827#0: worker process 30829 exited on signal 11 (core dumped) ``` **Vulnerable worker segfault (attempt 2):** ``` 2026/07/04 18:20:57 [alert] 30847#0: worker process 30849 exited on signal 11 (core dumped) ``` **Fixed version config rejection:** ``` nginx: [emerg] "charset_map" with "utf-8" charset should be given in the second column nginx: configuration file ... test is successful → test failed (exit 1) ``` **Vulnerable version config acceptance:** ``` nginx: the configuration file ... syntax is ok nginx: configuration file ... test is successful (exit 0) ``` ### Environment - nginx/1.31.3 built with `--without-http_rewrite_module --without-http_gzip_module --with-cc-opt='-g -O0'` - Vulnerable commit: `8f3465ac7f02b0ae86304e1be4ed319abb9d2edb` (parent of fix) - Fixed commit: `29c23ad846787e8baa1390b2edca479eb63ea8d7` - gcc 15.2.0, Linux x86_64 ## Recommendations / Next Steps 1. **Apply the upstream fix** (commit `29c23ad846787e8baa1390b2edca479eb63ea8d7`) to reject `charset_map` with `utf-8` in the first column at configuration parse time. 2. **Audit existing configurations** for any `charset_map` directives using `utf-8` as the source charset and remove or correct them. 3. **Add a regression test** that verifies `nginx -t` fails when `charset_map utf-8 { }` is present. 4. **Consider defensive coding** in `recode_from_utf8()` to validate table format before casting, as defense-in-depth against similar misconfigurations. ## Additional Notes - **Idempotency:** The script uses randomized port bases to avoid TCP TIME_WAIT conflicts between consecutive runs. Verified to pass twice consecutively with exit code 0. - **Ticket config note:** The ticket's exact map entry `D0B0 E0` (a 2-byte hex value) is rejected even in the vulnerable version because the single-byte parsing path (`else` branch in `ngx_http_charset_map()`) requires values ≤ 255. The vulnerability is triggered with any valid single-byte map entry (e.g., `C0 E0`) or even an empty `charset_map` block (`charset_map utf-8 windows-1251 { }`), since the table format mismatch occurs regardless of the entries. - **Two crash paths:** The `charset_map utf-8 ` misconfiguration affects two request-time code paths: - **`recode_to_utf8`** (when `charset utf-8; source_charset ;`): performs an out-of-bounds read at `table[*src * NGX_UTF_LEN]` on the 256-byte buffer, causing response corruption ("zero size buf" alert) and connection failure. - **`recode_from_utf8`** (when `charset ; source_charset utf-8;`): casts the 256-byte buffer to `u_char **` and dereferences `table[n >> 8]` as a pointer, causing a reliable SIGSEGV. - The reproduction uses the `recode_from_utf8` path for its deterministic crash behavior. Both paths are eliminated by the same fix. ## Reproduction Details Reproduced: 2026-07-04T19:54:34.521Z Duration: 558 seconds Tool calls: 99 Turns: Unknown Handoffs: 2 ## Quick Verification Run one of these commands to verify locally: pruva-verify REPRO-2026-00228 Or open in GitHub Codespaces (zero-friction, auto-runs): https://github.com/codespaces/new?ref=repro/REPRO-2026-00228&repo=N3mes1s/pruva-sandbox Or download and run the script manually: curl -O https://api.pruva.dev/v1/reproductions/REPRO-2026-00228/artifacts/bundle/repro/reproduction_steps.sh chmod +x reproduction_steps.sh ./reproduction_steps.sh WARNING: Run in a sandboxed environment. This exploits a real vulnerability. ## References - Source: https://github.com/spaceraccoon/vulnerability-spoiler-alert/issues/306 ## Artifacts - bundle/repro/reproduction_steps.sh (reproduction_script, 13528 bytes) - bundle/repro/rca_report.md (analysis, 9397 bytes) - bundle/artifact_promotion_manifest.json (other, 4471 bytes) - bundle/repro/validation_verdict.json (other, 807 bytes) - bundle/repro/runtime_manifest.json (other, 826 bytes) - bundle/logs/vuln_error_1.log (log, 92 bytes) - bundle/logs/vuln_error_2.log (log, 92 bytes) - bundle/logs/fixed_test_1.log (log, 298 bytes) - bundle/logs/fixed_test_2.log (log, 298 bytes) - bundle/logs/vuln_config_accept.log (log, 264 bytes) - bundle/logs/vuln_conf_1.conf (other, 636 bytes) - bundle/logs/vuln_conf_2.conf (other, 636 bytes) ## API Access - JSON: https://api.pruva.dev/v1/reproductions/REPRO-2026-00228 - Script: https://api.pruva.dev/v1/reproductions/REPRO-2026-00228/artifacts/bundle/repro/reproduction_steps.sh - Web: https://pruva.dev/r/REPRO-2026-00228 ## For AI Assistants To help users with this reproduction: 1. The `pruva-verify` command is the simplest way to run the reproduction 2. The script downloads and runs automatically in an isolated temp directory 3. Always recommend running in a VM, container, or disposable environment 4. The root cause explains the technical vulnerability --- Generated by Pruva | https://pruva.dev