Theron rests on a single parameter-free verifier that returns one of three verdicts: correct, not correct, or cannot-check. When it cannot re-derive the work against something that cannot lie, it abstains and says so rather than guess. That one engine can be pointed in four directions: sealing a verdict into a receipt, gating an action before it commits, monitoring an agent for drift over time, and judging the model's own weight edits.
Only one of the four jobs is live in production today: receipts. The verifier's verdict is bound into a record that is ES256-signed and anchored to a daily Merkle root, and you verify it yourself, offline, with the open-source @vextlabs/stoa-verifier, with no Vext account. The thing that checks the receipt is not us and is not the vendor of the agent that produced it. A receipt does not promise the work was good; it promises an honest record of what was checked and how.
The gate is the same verifier pointed at an action before it commits instead of after. Its status is exact: the code compiles and is unit-tested, it runs in observe-off, and it gates no action today. It is the next build, not a live feature, and we do not put a date on it. The no-regression monitor, which would catch the moment an agent silently drifts, is roadmap. Verifier-gated weight learning, where the verifier judges the model's own updates so it could learn without forgetting, is a research frontier proven on toy arithmetic only, never a present-tense fact.
The reason three of the four jobs are not live falls out of a measured boundary. On the executable lane, where code is run against tests, the verifier is sound and catches wrong work with a near-zero false-positive rate, a number that is measured and reproducible with one command on a laptop with no GPU. On the free-text lane it is unsafe as an autonomous gate: a confident, self-consistent wrong answer was certified as correct 58 times out of 60. Because that lane cannot yet be safely auto-gated, a self-improvement loop trusting the free-text checker would compound confident wrong updates, so verifier-gated weight learning is explicitly blocked on that boundary being solved. None of these ideas is first of its kind; prior art exists. The honest differentiator is the combination of a deterministic correctness gate and a neutral, offline-verifiable receipt whose verifier is not the vendor.
Only receipts. The verifier's verdict is bound into a record that is ES256-signed and anchored to a daily Merkle root, and you verify it yourself offline with the open-source @vextlabs/stoa-verifier, with no Vext account. The gate, the no-regression monitor, and verifier-gated weight learning are not live: the gate compiles and is unit-tested but runs observe-off and gates no action today, the monitor is roadmap, and weight learning is a research frontier proven on toy arithmetic only.
No. It is the next build, not a live feature. The code compiles and is unit-tested, but it runs in observe-off and gates no action today. We will not describe it as live, enforcing, or imminent, and we do not put a date on it. The honest milestone is wired from observe to enforce, kill-tested, and placed only on the lane where the verifier is sound.
It is measured and reproducible. The companion post, The Verifier Boundary, carries the full reproduce block, and the same one-command probe runs on a laptop with no GPU: build the corpus with scripts/eval/build_stse_corpus.py, then run scripts/eval/adversarial_eps_fp_probe.py against our own verifier. The corpus and verdicts are hash-chained, so a re-run re-verifies rather than re-asserts. If the near-zero false-positive number is wrong, that probe is where you catch us.
Because of a measured boundary. On the executable lane the verifier is sound, but on free-text reasoning it certified confident, self-consistent wrong answers 58 times out of 60. A self-improvement loop that trusted that free-text checker would consolidate confident wrong updates and compound the error, so verifier-gated weight learning is explicitly blocked on the free-text boundary being solved. It is a research frontier, proven on toy arithmetic only, not a present-tense capability.