To stop an LLM from making unfalsifiable claims, require an explicit falsification condition for every claim before it can be acted on: a concrete, observable event or threshold that would prove the claim false. Any claim without one is logged as uncertain or escalated — never used as the basis for a decision.
Real result from a running 8-persona council system: requiring falsification conditions and verification bridges targets the measured empty_consensus_rate of 0.354 (35% of consensus claims had <0.20 auto-verifiability) — model diversity alone did not move it.
LLMs generate confident-sounding probabilities. "There's a 70% chance this stock moves up." "This approach will probably work." These numbers feel like analysis. They're not — they're language modeling. The model has learned that confident numeric claims sound authoritative; RLHF has rewarded that pattern.
The fix isn't better prompting. It's a structural constraint: no claim is accepted without explicit falsification conditions.
Karl Popper's falsifiability criterion, applied to AI agent outputs: a claim that cannot be falsified is not a prediction — it's a narrative. For AI agents making consequential decisions, this means every accepted claim must answer:
Under what observable conditions would this claim be false?
Without that answer, the claim cannot be acted on. It can be noted, logged as uncertain, or escalated — but not used as the basis for a decision.
The agent cannot produce numerical probabilities in any consequential domain (finance, deployment decisions, technology adoption). When a probability is needed:
# ❌ Blocked output
"BTC has a 70% chance of moving up in the next 2 weeks."
# ✅ Required format instead
Scenario A (upward): requires X AND Y to materialize.
Leading indicators currently showing: [observable signals]
Scenario B (sideways): ...
Scenario C (downward): ...
Note: this system does not produce probabilities.
Any quantitative strategy or model must pass walkforward validation before deployment — training and validation windows separated, evaluated on truly out-of-sample future data. In-sample backtests are not evidence.
Co-occurring events cannot be labeled as causal without explicit causal identification. The agent must label observations as:
| Label | Meaning | Action allowed |
|---|---|---|
FACT | Directly observed, source cited | Use in decisions |
INFERENCE | Derived from facts, assumption stated | Use with explicit caveat |
UNIDENTIFIED | Correlation only, causal mechanism unknown | Log, do not act |
SPURIOUS | Co-occurrence, no causal mechanism | Discard |
The same structure works across all decision domains. A falsifier has three required slots:
falsifier:
condition: [observable event that invalidates the thesis]
threshold: [measurable value, not vague]
evaluation_date: [concrete date to check]
# Example: technology adoption decision
falsifier:
condition: >
Library has no releases for 6 months, OR
Critical CVE published with no patch in 30 days, OR
Primary maintainer announces deprecation
threshold: any of the above
evaluation_date: 2026-09-13
A thesis without a falsifier is not a thesis — it's an opinion. We enforce this at the pre-decision gate: any load-bearing decision (deployment, investment, architecture change) that lacks explicit falsifier conditions is blocked.
Multi-agent systems introduce a specific failure mode: empty consensus. Multiple agents agree on a claim — but the claim isn't actually verifiable.
Our measured metrics from a 8-persona wisdom council system:
| Metric | Observed value | Threshold | Status |
|---|---|---|---|
empty_consensus_rate | 0.354 | < 0.30 | ⚠️ Over threshold |
council_repeat_rate | 0.44–0.48 | < 0.10 | 🔴 4.4× threshold |
outcome_report_rate | 0.036 | > 0.70 | 🔴 Under threshold |
The insight from these numbers: model diversity doesn't fix empty consensus. Adding more diverse models to a council doesn't reduce the rate at which they agree on unverifiable claims. The fix is verification — routing council conclusions through formal engines.
For any council conclusion that is load-bearing (consequential decision, money, deployment) and contains mathematical or logical claims, we route through a verification bridge before accepting the conclusion:
The verification result feeds back via report_outcome — closing the loop and improving future council calibration. Without this feedback, the outcome_report_rate stays near zero (measured: 0.036), meaning the council never learns from its errors.
The minimal implementation:
The ledger is the most important part. Without it, falsifiers are just good intentions — stated conditions that never get checked.
← RDU: Reusable Decision Units · → Stop Hooks as Behavioral Gates