Fair point on setting expectations, but this isn’t just LLMs checking LLMs. The important parts are non-LLM constraints.
The model never gets to “decide what’s true.” In KB mode it can only answer from attached files. Don’t feed it shit and it won’t say shit.
In Mentats mode it can only answer from the Vault. If retrieval returns nothing, the system forces a refusal. That’s enforced by the router, not by another model.
The triple-pass (thinker → critic → thinker) is just for internal consistency and formatting. The grounding, provenance, and refusal logic live outside the LLM.
So yeah, no absolute guarantees (nothing in this space has those), but the failure mode is “I don’t know / not in my sources, get fucked” not “confidently invented gibberish.”
Fair point on setting expectations, but this isn’t just LLMs checking LLMs. The important parts are non-LLM constraints.
The model never gets to “decide what’s true.” In KB mode it can only answer from attached files. Don’t feed it shit and it won’t say shit.
In Mentats mode it can only answer from the Vault. If retrieval returns nothing, the system forces a refusal. That’s enforced by the router, not by another model.
The triple-pass (thinker → critic → thinker) is just for internal consistency and formatting. The grounding, provenance, and refusal logic live outside the LLM.
So yeah, no absolute guarantees (nothing in this space has those), but the failure mode is “I don’t know / not in my sources, get fucked” not “confidently invented gibberish.”