This is the live full-transparency dashboard. It renders the frontier eval matrix beside Theron self-grading in real time, so you can watch the numbers move rather than take them on faith. Every row is a Stoa-signed eval record on a Neon store.
The dashboard publishes full audit bundles for benchmarks, including the SecQA 99 percent audit for Theron-Cyber and a HumanEval correction trail that shows what changed and why. Audit bundles include raw model responses, configuration hashes, the judge specification, and a reproduction command.
The contrast with frontier labs is deliberate. They sell a model you trust on faith; Theron is a mind you can audit. Provenance, reproducibility, and honest labelling are the defaults, down to the live grading you see here.
The frontier eval matrix and Theron grading herself in real time, plus full audit bundles such as the SecQA 99 percent audit and the HumanEval correction trail, each with a reproduction command.
Yes. Corrections such as the HumanEval correction trail are published on the dashboard so the record is honest and reproducible.