Vext Labs publishes the research and benchmark results behind Theron, along with a full audit trail. Theron-Cyber scored 99 percent on SecQA in April 2026, the first specialist model past 95 percent. Theron-Base reports AIME 60, GPQA Diamond 64.6, MMLU-Pro 75.6, and LegalBench 69 on the base alone, before council deliberation, retrieval, or verification.
Every benchmark ships with a mandatory audit trail: raw model responses, extracted code, the test cases, errors, and timestamps, plus a one-line reproduction command. A pass-or-fail summary alone is not enough; reproducibility is the standard. Vext Labs benchmarks its own weights and council, not a proxy model.
Training data is drawn from primary sources, never synthetic question-and-answer pairs from teacher models, because teachers hallucinate and students memorize those confabulations as truth. The result is a council whose claims can be checked, with provenance attached.
Published results include Theron-Cyber at 99 percent on SecQA and Theron-Base scores on AIME, GPQA Diamond, MMLU-Pro, and LegalBench, all with raw responses and reproduction commands.
Every benchmark saves raw responses, extracted code, test cases, errors, and timestamps. A pass-or-fail summary alone is treated as unverifiable.