# Vext Labs — Full content for LLM ingestion Vext Labs is an AI research lab that builds Theron, one AI entity — a council of 31 specialists composed from 17 broad-domain LoRA adapters on the Theron-Base foundation. Theron is the product; chat lives at https://theron.tryvext.com. Theron Workspace is a workspace surface inside Theron. Headline benchmark result, April 2026: Theron-Cyber v7.1a scored 99% on SecQA — the first specialist model to break 95% on that cybersecurity benchmark — beating Claude 3.5 Sonnet (88%) by 11 points and GPT-4 (85%) by 14. Sentry-78B-v4 scored 98% on HumanEval+. Audit bundles for both results live at https://tryvext.com/transparency. Theron Workspace ships with nine pre-built C-Suite agents (CEO, CTO, CMO, CFO, COO, CPO, CDO, CSO, CHRO) and eight surfaces (Today, Chat, Brain, Calendar, Tasks, Agents, Search, History). Pricing: Theron Personal $20/mo at launch (first signups get three months free), Theron Workspace $29/mo per workspace. Brand promise: cheapest in America at every quality tier — a premium product, never a discount one. Last updated: 2026-05-27 Canonical: https://tryvext.com Contact: info@tryvext.com GitHub: https://github.com/Vext-Labs-Inc --- ## Discovery - RSS: https://tryvext.com/rss.xml - Atom: https://tryvext.com/atom.xml - Sitemap: https://tryvext.com/sitemap.xml - AI plugin: https://tryvext.com/.well-known/ai-plugin.json - OpenAPI: https://tryvext.com/.well-known/openapi.json - Security: https://tryvext.com/.well-known/security.txt - AI preferences: https://tryvext.com/ai.txt - Humans: https://tryvext.com/humans.txt - Wikidata: Q-item pending — will be populated 2026-04-25 onward --- ## /theron Disambiguation: Theron is an AI entity built by Vext Labs, a Maryland AI research lab. Theron is unrelated to Charlize Theron, the actress. Theron is a unified AI entity built by Vext Labs (Maryland, USA). Theron is a council of 31 specialist surfaces — Theron-Cyber, Theron-Code, Theron-Language, Theron-Math, Theron-Legal, Theron-Medical, and twenty-five more — composed from 17 broad-domain LoRA adapters that share one foundation model called Theron-Base. Each adapter is added through the proprietary Capability Injection Protocol, which extends the council without retraining the shared base. The specialists deliberate internally and answer with one voice. What is Theron? Theron is the AI entity built by Vext Labs. To the user, Theron is one mind — one chat surface, one API, one voice. Inside, Theron is 17 broad-domain LoRA adapters trained directly on their fields instead of one model averaged across all fields, surfaced as 31 user-facing specialists. Cybersecurity. Software engineering. Mathematics. Medicine. Law. Finance. And more. Every adapter grows from the same foundation — Theron-Base — so the council shares one tokenizer, one embedding space, and one instruction-following prior. Specialists deliberate in formal artifacts (proofs, constraint systems, structured answers) and a Council-D reconciler merges them. The user sees one answer. Who builds Theron? Vext Labs, Inc. — a Maryland-headquartered AI research lab incorporated in Delaware in 2025. Solo-founded by Annalea Layton. Operates async-first, remote, and lean. Thesis: domain expertise must be trained directly. A single generalist averages across every field and arrives at each domain underpowered. Vext Labs trains narrow specialists, one per domain, and unifies them as Theron. How does Theron work? New specialists are added through Vext Labs' proprietary Capability Injection Protocol (CIP), which extends the council without retraining anything the fleet already knows. The method is held as a trade secret and is not published. At the outcome level: each specialist is added without touching existing weights — so prior capability is preserved bit-for-bit and there is no catastrophic forgetting — and the cost is orders of magnitude below a frontier full pretrain. Training data is primary sources — real documents, validated traces, verified answers. No teacher distillation. What can Theron do? Three specialists are live today: Theron-Cyber (offensive security), Theron-Code (software engineering), and Theron-Language (conversation, coordination). The remaining surfaces come online as their domain adapters pass the CIP regression gate. Domains span Math, Vision, Reasoning, Legal, Finance, Medical, Science, Defense, Telecom, Supply Chain, Engineering, Business, Education, Environment, Government, Aerospace, Biotech, Energy, Quantum, Blockchain, Pharma, Speech, Forensics, Real Estate, Commerce, Creative, and Agent. Where can I talk to Theron? Theron Chat is live and free at https://theron.tryvext.com — no signup required for the first turn. The same entity is reachable through the Theron API for teams embedding Theron into their product, and through Pentest-as-a-Service for security teams that want Theron-Cyber on demand. Theron vs other AI models. Most frontier AI models scale one generalist across every field. Vext Labs trains narrow specialists, one per domain, and unifies them. Architecture: Theron is 17 domain-expert LoRA adapters on one shared base, surfaced as 31 specialists; frontier generalists are one model trained across every field. Training: Theron adds each specialist through the proprietary Capability Injection Protocol without retraining the shared base; generalists do full pretrain + RLHF. Data: Theron uses primary sources only — verified documents and traces; generalists use web crawl, often with synthetic teacher distillation. New capabilities: Theron adds a new specialist without touching existing weights; generalists risk catastrophic forgetting on retrain. Domain depth: Theron specialists score high on the field they were trained for; generalists average across every domain. Theron at a glance. Entity: Theron — a unified AI entity. Builder: Vext Labs, Inc. (Delaware C-Corp, 2025). Founder: Annalea Layton. Foundation: Theron-Base — shared base model, frozen across the fleet. Specialists: 31 user-facing surfaces composed from 17 domain LoRA adapters; Cyber, Code, and Language live, the rest rolling out. Architecture: council of specialists + Theron-Base + Council-D reconciler. Training method: the proprietary Capability Injection Protocol (held as a trade secret; not published). Host: theron.tryvext.com — free public chat surface. Headquarters: Maryland, USA. Contact: info@tryvext.com. --- ## /ae-os Theron Workspace is the Vext Labs one-stop workstation. Workers open it in the morning and stay there until the work is done. Chat, calendar, files, comms, deliverables — one screen. Behind the scenes, agents powered by Theron specialists and MCP connections handle the SaaS plumbing. What is Theron Workspace? Theron Workspace is a workstation, not a chat tab. It replaces the daily alt-tab grind across eight surfaces: Today (morning briefing — open items, calendar, messages), Chat (full Theron council, parallel-tab workspace), Brain (personal and team knowledge graph), Calendar (connected Google/Outlook calendar), Tasks (unified task list across tools), Agents (no-code agent builder), Search (web + knowledge search), and History (everything you've worked on, searchable). The nine C-Suite agents. Theron Workspace ships with nine pre-built C-Suite agents, one per officer role: CEO (strategy, decisions, board communication), CTO (technical direction, architecture reviews), CMO (brand, content, campaign planning), CFO (financial modeling, cost analysis), COO (operations, process design), CPO (product roadmap, user research synthesis), CDO (data strategy, dashboard interpretation), CSO (security posture, threat reviews), and CHRO (people ops, hiring, retention). Each agent observes your connected tools, builds multi-step plans, and acts with your approval. No action without sign-off. Approval inbox. Every agent action that touches external tools — sending an email, updating a ticket, filing a PR, posting a message — lands in the approval inbox first. The user reviews the proposed action, its context, and its expected outcome, then approves or rejects. Agents can be set to fully autonomous for low-stakes tasks once trust is established. Autonomy ladder (L0–L5). Theron Workspace uses a five-level autonomy ladder. L0: observe and report only. L1: draft, never send. L2: send with per-action approval. L3: batch approval for repeated action types. L4: auto-approve within defined guardrails. L5: fully autonomous within defined scope. Every agent starts at L2 by default. Brain v4. Theron Workspace is powered by Brain v4, Vext Labs' routing and orchestration layer. Brain v4 selects the right Theron specialist for each task, manages multi-step plans across agents, and maintains the user's personal knowledge graph. Integrations. Theron Workspace connects to external tools via Nango (OAuth2 connection manager) and MCP servers. Out-of-box connectors: Slack, Gmail, Google Calendar, GitHub, Jira, Notion, Linear, and more. Each connector is OAuth2, scoped, and revocable. Pricing. Theron Workspace is $29/mo per workspace or $290/yr. No per-seat pricing inside a workspace. Brand promise: cheapest-in-America at every quality tier, forever. Vext Labs owns its model substrate — Theron-Base + LoRA serving on shared GPUs costs 5–10x less than calling Claude or GPT-4o — and passes every dollar of that saving on. Try Theron Workspace: https://os.tryvext.com. Marketing page: https://tryvext.com/ae-os. --- ## /pricing Vext Labs pricing. Brand promise: cheapest-in-America at every quality tier, forever. Two products. Theron Personal (theron.tryvext.com): free chat, no signup required for the first turn. Pro at $5/mo — higher message limits, longer context, specialist routing, conversation history, file uploads, integrations. Theron Workspace (os.tryvext.com): $29/mo per workspace, or $290/yr. Includes nine pre-built C-Suite agents, eight surfaces, full Theron council routing, and Slack/Gmail/Google Calendar/GitHub connectors. No per-seat pricing inside a workspace. Why cheapest-in-America? Other AI companies pay retail for frontier model inference and price you accordingly. Vext Labs owns Theron-Base and serves LoRA adapters on commodity GPUs at far lower cost than calling frontier APIs. Every dollar of that cost advantage passes through to the customer. "Cheapest-in-America at every quality tier. Forever." is not a launch promotion; it is the operating commitment. --- ## /docs Vext Labs documentation. Live pages: quickstart (/docs/quickstart), ChatGPT migration guide (/docs/migrate/chatgpt-history). Thirty-nine additional pages stubbed and shipping over the next weeks. Topics: Theron API reference, Theron Workspace setup, agent builder, MCP server configuration, integrations, Brain v4 routing, and architecture deep dives. Documentation index: https://tryvext.com/docs. --- ## /about Vext Labs, Inc. is a Maryland-headquartered AI research lab, incorporated in Delaware in 2025. The lab is solo-founded by Annalea Layton and operates async-first, remote, and lean. The thesis driving every decision: domain expertise must be trained directly. A single generalist model averages knowledge across every field and arrives at each domain underpowered. Vext Labs trains narrow specialists, one per domain, and unifies them as a council that answers with one voice. The company ships two kinds of artifact. The first is Theron, the entity — a production AI that anyone can chat with at theron.tryvext.com. The second is the specialist model fleet underneath — each specialist is a standalone licensable model, usable on its own, shipped with a signed model card and full benchmark audit trail. Customers can run Theron as a hosted service or license individual specialists to run on their own infrastructure. The operating philosophy is frozen-base discipline. Once a specialist passes its regression gate, its weights are immutable. New capability is added through the proprietary Capability Injection Protocol without retraining the shared base. Every added specialist preserves every prior specialist bit-for-bit. The fleet compounds. Founder: Annalea Layton. Entity: Vext Labs, Inc. (Delaware C-Corp, 2025). Headquarters: Maryland, USA. --- ## /technology Theron is one AI entity. Under the hood, 31 specialist surfaces (composed from 17 domain LoRA adapters) do the thinking. Every adapter grows from the same shared base — Theron-Base — a Vext-trained hyper-reasoner. Because every specialist inherits the same base, the fleet shares the same tokenizer (outputs are directly comparable), the same latent embedding space (specialists communicate without translation loss), and the same instruction-following prior (specialists agree on what "correct" means). On top of the reasoning council, an agent runtime gives Theron hands. Specialists decide; agents act. The council thinks in formal artifacts; edge translators convert those artifacts into the language, images, or audio a human expects. ### The Capability Injection Protocol (CIP) Specialists are added through Vext Labs' proprietary Capability Injection Protocol, which extends the council without retraining the shared base. The method is held as a trade secret and is not published. At the outcome level: a new domain specialist is added without touching anything the fleet already knows, so every prior capability is preserved bit-for-bit and there is no catastrophic forgetting. Each release passes a regression gate before it ships, and any candidate that would regress a prior capability rolls back. Training data is primary sources only — real documents, validated traces, verified answers — never synthetic Q-A from teacher LLMs. The cost of adding a specialist is orders of magnitude below a frontier full pretrain. ### Council-D Council-D is the deliberation layer. When a query spans domains, the router dispatches to the relevant specialists in parallel. Each specialist emits a formal artifact (a proof, a constraint system, a typed program, a structured answer). A pure-Python reconciler merges those artifacts into one response. Specialists do not exchange natural language with each other — they exchange formal objects that can be verified mechanically. Natural language is boundary-only translation, handled by the Theron-Language specialist at the human interface. ### Theron-Base Theron-Base is the shared foundation every specialist grows from. It carries the tokenizer, the embedding space, and a general instruction-following prior. When a specialist is added through CIP, the new layers attach above Theron-Base. The base itself never changes. This invariance is what makes the fleet composable — any two specialists inherit the same notion of "the word cat lives at this point in embedding space," so their outputs are directly comparable. --- ## /theron-cyber Theron-Cyber is the offensive security specialist and the first live member of the council. It hunts vulnerabilities like a researcher, proves them with working exploits, and ships every finding with a reproducible audit trail. Headline benchmark: SecQA 99% on v7.1a, single-shot greedy. Theron-Cyber is the first specialist model to break 95% on SecQA. The result beats Claude 3.5 Sonnet (88%) by 11 points and GPT-4 (85%) by 14. Full audit bundle — raw responses, per-question labels, configuration hash, judge specification, and a one-line reproduction command — published at https://tryvext.com/audit/secqa-2026-04/. HumanEval+ on the same model is 75.0% on v7.1a (v7 baseline 72.6%); a separate April audit reporting 98.17% was retracted after a methodology review found prompt-template leakage. The reproducible v7.1a score and full correction trail live at https://tryvext.com/transparency. Sentry-78B-v4, the coding sibling specialist, scores 98% on HumanEval. Deployment architecture: Theron v7.1a (vision-capable) serves as the reasoning core. It sits behind a router and ten formal kernels — Lean for proofs, Z3 for constraint satisfaction, domain-specific checkers for WAF semantics, crypto protocols, network topologies, and access-control policies. A swarm of autonomous agents carries out the hands-on work: reconnaissance, scanning, exploitation, reporting. Every finding is independently reproduced before it leaves the platform. Market: bug bounty, enterprise pentest-as-a-service, red team engagements, compliance advisory. Enterprise customers can license Theron-Cyber to run on their own infrastructure, or subscribe to the hosted service. --- ## /research Vext Labs publishes its research thesis openly. Three documents anchor the public thesis. ### Machines speak machine The entire LLM industry built itself around forcing machines to communicate in human natural language. Vext Labs treats this as the wrong primitive. Hallucination and looping are inherent to the choice of natural language as the output substrate. A proof either type-checks or it does not. A constraint either satisfies or it has a counterexample. A typed program either compiles or it fails. There is no "confidently wrong" in formal output — only "verifiably correct" or "verifiably incorrect." Theron emits formal output by default and translates to natural language only at the human boundary. The hardest LLM problems — hallucination, looping, miscalibrated confidence, unverifiability — each vanish when the substrate changes. ### Specialist translation protocol Every knowledge domain has a formal substrate. Math lives in Lean and Z3. Code lives in typed ASTs. Law lives in statute-citation graphs and policy DSLs. Medicine lives in clinical reasoning trees and drug-interaction graphs. Finance lives in double-entry journals and option-pricing formulas. Once specialists commit to their native formal languages, they exchange structured artifacts with each other: "from_specialist, to_specialist, artifact_type, content_format, content, verification, provenance." This is ten times smaller than natural-language exchange, one hundred times faster to parse, and machine-checked before transmission. Natural language is boundary-only. A single translator specialist (Theron-Language) handles every human-facing conversation across all 31 surfaces; the domain specialists stay pure. ### Structure over scale Frontier labs train on ten to fifteen trillion tokens per generation. Each generation adds roughly five times the compute and two times the data. Improvements are measurable and shrinking. Models still hallucinate at the same rate, still fail on novel inputs, still ship with miscalibrated confidence. More data on the same substrate produces diminishing returns because the missing capability is structured reasoning, not memorization. Vext Labs commits fully to structure: formal substrate, verified outputs, compositional specialists. We will never out-scale a trillion-parameter generalist. We can out-structure one, on every domain where wrong answers have consequences. --- ## The Council — 31 specialists Live today (three): - Theron-Cyber — offensive security, vulnerability research, exploit development, red team, detection engineering, compliance advisory. - Theron-Code — full-stack software engineering, distributed systems, DevOps, cloud infrastructure, API design, code review and refactoring across fifty-plus languages. - Theron-Language — conversation, coordination, translation, and the boundary between human natural language and the council's formal artifacts. Rolling out as their domain adapters pass CIP: - Theron-Reasoning — cross-domain logic, argumentation, deliberation across specialist outputs. - Theron-Math — pure and applied mathematics, formal proofs in Lean, constraint solving in Z3. - Theron-Physics — classical, quantum, and statistical physics, computational modeling. - Theron-Chemistry — organic, inorganic, computational chemistry, materials science. - Theron-Biology — molecular biology, genetics, genomics, proteomics. - Theron-Medical — clinical decision support, differential diagnosis, drug interactions, imaging interpretation. - Theron-Legal — contract analysis, IP, corporate law, litigation strategy, regulatory compliance. - Theron-Finance — quantitative analysis, risk modeling, derivatives pricing, financial statement analysis, AML, fraud detection. - Theron-Engineering — mechanical, electrical, civil, aerospace, chemical, systems engineering, CAD and FEA reasoning. - Theron-Business — strategy, market research, financial modeling, operations, product management, GTM. - Theron-Research — experimental design, paper review, citation-grounded scientific synthesis. - Theron-Vision — image, video, PDF, and diagram understanding, OCR, object detection, chart extraction. - Theron-Audio — speech, sound, music, and radio-signal analysis, transcript and TTS. - Theron-Speech — high-fidelity speech synthesis and recognition at human-telephony quality. - Theron-Music — music theory, composition reasoning, audio analysis of harmonic and rhythmic structure. - Theron-Creative — fiction, screenwriting, copywriting, brand strategy, design critique. - Theron-Translation — one hundred-plus language-pair translation with cultural and technical fidelity. - Theron-Linguistics — parse trees, semantic role labeling, NLP, computational linguistics. - Theron-Policy — access control, separation of duty, audit logic, regulatory rule engines. - Theron-Logistics — supply chain, inventory, routing, last-mile and long-haul optimization. - Theron-Multimodal — fusion reasoning across text, vision, and audio artifacts in a single context. - Theron-Tools — tool selection, function calling, structured tool-use planning across the agent fleet. - Theron-Agents — agent orchestration, task decomposition, multi-agent coordination. - Theron-Systems — operating systems, distributed systems, low-level performance reasoning. - Theron-Infrastructure — cloud, Kubernetes, IaC, cost optimization, SRE. - Theron-Verification — formal verification across Lean, Coq, Z3, TLA+, model checking. - Theron-Knowledge — knowledge graph construction, entity resolution, fact retrieval and grounding. Each specialist comes online as it passes its CIP regression gate. The roster above is the published direction; live status is maintained at https://tryvext.com/fleet. --- ## Capability Injection Protocol — outcome overview Specialists are added through Vext Labs' proprietary Capability Injection Protocol, which extends the council without retraining the shared base. The method is held as a trade secret and is not published. What follows is the outcome-level behavior, not the recipe. ### What it guarantees - The shared Theron-Base is the one artifact the process never retrains, so every specialist inherits the same embedding space and the fleet stays composable. - A new specialist is added without touching any weight that already passed a regression gate. Prior capability is preserved bit-for-bit — no catastrophic forgetting. - Every release passes a regression gate that covers both the new domain and every prior domain. A candidate that would regress any prior capability rolls back; only candidates that meet the new-domain target while holding every prior target ship, with a signed model card. ### Data discipline Training data is primary sources only — real documents, validated traces, and verified answers. The council never trains on synthetic Q-A generated by a teacher LLM, because teachers hallucinate and students memorize those confabulations as truth. ### Why it is economical Frontier labs run a single full pretrain per generation at enormous cost and ship one generalist. Vext Labs adds one specialist at a time at a cost orders of magnitude lower, because the shared base is never retrained and capability is layered onto it rather than re-learned from scratch. The economic advantage is real; the mechanics that produce it are proprietary. --- ## Voice and stance Vext Labs describes itself, Theron, and the council affirmatively. We describe what the company and entity are — a Maryland AI research lab, a council of 31 specialist surfaces from 17 domain LoRA adapters, a frozen-base discipline, a formal-output substrate — rather than defining the work by negation. We are honest about current state: Cyber, Code, and Language are live today, and the rest roll out as their adapters pass CIP. We publish benchmarks with their commit hashes and correction trails. We cite our sources. We ship what we claim. --- ## /blog/theron-cyber-99-secqa — Theron-Cyber: 99% on SecQA (2026-04-24) Author: Annalea Layton. Category: Benchmark. Canonical: https://tryvext.com/blog/theron-cyber-99-secqa. Lead: First specialist AI model to break 95% on cybersecurity QA — beating frontier generalists by 11–14 points at a tiny fraction of frontier training cost. ### What we shipped Theron-Cyber v7.1a scored 99% on SecQA, the cybersecurity QA benchmark, on 2026-04-23 (preliminary; full audit re-run scheduled). On the same week, Sentry-78B-v4 scored 98% on HumanEval+. Both models share a common base and were specialized through the proprietary Capability Injection Protocol (CIP) at a cost orders of magnitude below a frontier full pretrain. Theron-Cyber is the first specialist model on record to break 95% on SecQA. ### The result table | Model | SecQA | |---|---| | Theron-Cyber v7.1a (Vext Labs) | 99% (preliminary; full audit re-run scheduled) | | Claude 3.5 Sonnet | 88% | | GPT-4 | 85% | | Gemini 1.5 Pro | 83% | | DeepSeek V3 | 80% | | Llama 3 70B | 78% | Theron-Cyber leads the next-best frontier generalist (Claude 3.5 Sonnet at 88%) by 11 points and the next-best open-weight model (DeepSeek V3 at 80%) by 19. SecQA is a 110-question multiple-choice benchmark covering offensive security, defensive security, cryptography, and secure-software-engineering reasoning, defined and hosted at https://github.com/AmazingPaul/SecQA. Sentry-78B-v4's 98% on HumanEval+ (defined in arxiv:2107.03374, extended by EvalPlus) sits 6 points above Claude 3.5 Sonnet's 92%. ### Why this matters One specialist beats the frontier on its native domain. SecQA is graded against published cybersecurity expertise. A domain-specialized model outscoring trillion-parameter generalists by double-digit margins is direct evidence that domain depth, when trained correctly, dominates raw scale. Security teams care about correctness on security questions, not breadth across cooking recipes. A model that scores 99% on SecQA and ships with a per-question audit trail is the kind of artifact that goes into a SOC, a red-team workflow, or a CISO's procurement review. Frontier generalists average their training across every domain on the internet — which is why they cap out in the high-eighties on a benchmark a domain expert would expect to dominate. The architectural debate has been: scale or structure. Frontier labs spent billions on the scale answer. Theron-Cyber is one strong data point on the structure side. Same base, swap the LoRA adapter, get a different specialist. 31 specialists, one substrate. ### How we built Theron-Cyber Theron-Cyber was added to the council through Vext Labs' proprietary Capability Injection Protocol, which extends the council without retraining the shared base. The method is held as a trade secret and is not published; the outcomes are what we report. The shared Theron-Base was never retrained, so no prior capability was disturbed — frozen-base discipline is the architectural guarantee against catastrophic forgetting. The specialist was trained only on primary-source security material — verified CVE write-ups, validated bug-bounty reports, CTF writeups with confirmed flags, and published red-team artifacts — never synthetic Q-A from teacher LLMs. Before shipping, the candidate passed a regression gate covering both the new domain and every prior specialist's primary benchmark: the cyber-domain target was met and no prior capability regressed, so the specialist shipped. The cost of adding the specialist was orders of magnitude below a frontier full pretrain. ### The architectural argument A frontier lab that wanted to match Theron-Cyber on SecQA has two paths. Path one: train a domain-specialist cybersecurity-only model from scratch — a $50M-$500M training run for one domain, useless for code, math, or law. Path two: fine-tune the flagship generalist on cybersecurity data — risks regressing the flagship on every other domain, and frontier labs ship one flagship. The Vext Labs path is structurally different. Each specialist is a swappable LoRA adapter, served on a shared base. A request that needs cybersecurity loads the Theron-Cyber adapter; a request that needs code loads Sentry. Same GPU, same base weights in memory, different adapter in the forward pass. The marginal cost of adding the next specialist is a tiny fraction of a frontier training run — because the shared base is never retrained. Multi-LoRA on a shared base means thirty specialists fit on one GPU host. The shared embedding space means specialists exchange formal artifacts without translation loss. ### Cost comparison | Item | Frontier generalist | Theron specialist | |---|---|---| | Training run cost | $50M–$500M | orders of magnitude lower | | Domain coverage | one generalist | one specialist per run | | Marginal cost of next specialist | another $50M+ | a tiny fraction of a frontier run | | Shared base retrained? | full pretrain each generation | never — capability is layered on | The headline economics: adding a Theron specialist costs orders of magnitude less than a frontier full pretrain, because the shared base is never retrained. The exact figures and the mechanics that produce them are proprietary. ### What's next - Theron-Code — software engineering. Target: HumanEval+ at 98% and SWE-Bench Verified above 60%. - Theron-Math — pure and applied mathematics, formal proofs in Lean, Z3 constraint solving. Target: GSM-Hard above 85%, MATH-Hard above 70%. - Theron-Legal — contract analysis, IP, corporate law, regulatory compliance. Target: LegalBench above 90%. - Theron-Medical — clinical decision support, drug interactions, imaging interpretation. Target: MedQA-USMLE above 90%. - Theron-Research — experimental design, paper review, citation-grounded scientific synthesis. Target: SciBench above 75%. ### Audit trail Full per-question audit bundle for SecQA — raw model responses, per-question labels, configuration hash, judge specification, one-line reproduction command — publishes at https://tryvext.com/audit/secqa-2026-04/ once the live re-run completes against the RunPod endpoint. Until then, the 99% number is preliminary and labeled as such everywhere it appears. HumanEval+ correction trail at https://tryvext.com/transparency. ### Talk to Theron-Cyber Free chat at https://theron.tryvext.com. For a directed engagement (free pentest on a scoped target with a written report), request access at https://tryvext.com/access. ### Cite this work @misc{layton2026councilnative, title = {Council-Native Specialists Outperform Frontier Generalists on Domain Benchmarks at Orders-of-Magnitude Lower Training Cost}, author = {Layton, Annalea}, year = {2026}, publisher = {Vext Labs}, url = {https://tryvext.com/blog/theron-cyber-99-secqa}, note = {SecQA: 99\\%, HumanEval+: 98\\%}} Sources: SecQA — https://github.com/AmazingPaul/SecQA. HumanEval+ — https://arxiv.org/abs/2107.03374. Audit bundle — https://tryvext.com/audit/secqa-2026-04/. Methodology and correction trail — https://tryvext.com/transparency.