Aggregate · proof, quantified

Stats

Aggregate stats across every published reproduction. Spend figures cover the 160 reproductions with measured cost; counts cover all 189.

189

Published

160 /189

With measured cost

$2.80

Median spend

$831.47

Total spent

Min

$0.23

P90

$12.21

Max

$37.97

Cost distribution

160 reproductions with measured cost, bucketed by total spend.

<$0.10

$0.10–$0.50

$0.50–$1.00

$1.00–$2.00

$2.00–$5.00

≥$5.00

Spend by agent and model

Total dollars charged to each agent, split by LLM. The sum may exceed the measured-cost total because each reproduction's spend is fully attributed.

repro

$559.28

vuln_variant

$229.46

judge

$16.83

coding

$14.08

support

$10.15

hypothesis_generator

$1.66

accounts/fireworks/models/glm-5p2

accounts/fireworks/models/kimi-k2p5

accounts/fireworks/models/kimi-k2p6

accounts/fireworks/models/kimi-k2p7-code

accounts/fireworks/routers/glm-5p2-fast

claude-opus-4-7

gpt-5.1-codex

gpt-5.2-codex

gpt-5.4-mini

gpt-5.5

gpt-5.5-2026-04-23

By ecosystem

Count, median cost, and median duration per package ecosystem.

Ecosystem	Count	Median cost	Median duration
github	35	$4.83	29m 54s
npm	30	$0.79	15m 47s
pip	26	$2.23	17m 47s
composer	11	$0.55	13m 7s
go	10	$1.77	23m 47s
c	7	$1.33	29m 29s
maven	5	$7.43	42m 34s
generic	3	$1.64	21m 31s
source	3	$1.68	16m 28s
Composer	2	$12.42	61m 3s
Go	2	$14.78	26m 51s
Maven	2	$15.34	45m 9s
PyPI	2	$10.68	35m 23s
firmware	2	$18.88	61m 51s
linux	2	$6.90	67m 57s
other	2	$2.46	20m 50s
Go module	1	—	38m 55s
Joomla extension	1	$10.73	77m 51s
Ruby	1	$1.67	18m 33s
WordPress plugin (hosted on WordPress.org SVN, not GitHub)	1	$7.05	22m 56s
cargo	1	$4.18	33m 23s
cpp	1	—	47m 53s
gnu	1	—	35m 50s
joomla	1	$18.76	107m 17s
nodejs	1	$3.26	47m 12s
other (commercial, Java-based server application)	1	$37.97	94m 25s
pip (per GitHub advisory)	1	—	8m 19s
pypi	1	—	4m 50s
rubygems	1	$0.67	10m 15s
rust	1	$0.27	10m 6s
standalone application (Python/FastAPI)	1	$2.02	14m 56s
wordpress	1	$7.43	20m 35s

By severity

Count and median cost per CVSS severity bucket.

critical 72 · median $3.96

high 87 · median $2.22

medium 21 · median $2.69

low 5 · median $2.20

Top CWEs

Most common weakness types across published reproductions.

CWE-22 (Path Traversal) 5

CWE-22 4

CWE-347 (Improper Verification of Cryptographic Signature) 3

CWE-502 3

CWE-78 3

CWE-78 (OS Command Injection) 3

CWE-1321 2

CWE-1392 Use of Default Credentials 2

CWE-345 (Insufficient Verification of Data Authenticity) 2

CWE-502 Deserialization of Untrusted Data 2