Falsifier-Driven AI Decisions: No Claim Without Falsification Conditions

To stop an LLM from making unfalsifiable claims, require an explicit falsification condition for every claim before it can be acted on: a concrete, observable event or threshold that would prove the claim false. Any claim without one is logged as uncertain or escalated — never used as the basis for a decision.

How to stop an LLM from making unfalsifiable claims

Real result from a running 8-persona council system: requiring falsification conditions and verification bridges targets the measured empty_consensus_rate of 0.354 (35% of consensus claims had <0.20 auto-verifiability) — model diversity alone did not move it.

← hexisteme · notes · June 13, 2026 · 10 min read

LLMs generate confident-sounding probabilities. "There's a 70% chance this stock moves up." "This approach will probably work." These numbers feel like analysis. They're not — they're language modeling. The model has learned that confident numeric claims sound authoritative; RLHF has rewarded that pattern.

The fix isn't better prompting. It's a structural constraint: no claim is accepted without explicit falsification conditions.

The Core Principle

Karl Popper's falsifiability criterion, applied to AI agent outputs: a claim that cannot be falsified is not a prediction — it's a narrative. For AI agents making consequential decisions, this means every accepted claim must answer:

Under what observable conditions would this claim be false?

Without that answer, the claim cannot be acted on. It can be noted, logged as uncertain, or escalated — but not used as the basis for a decision.

Three Concrete Rules (RDU-021, 022, 023)

Rule 1: No LLM Probability Production (RDU-021)

The agent cannot produce numerical probabilities in any consequential domain (finance, deployment decisions, technology adoption). When a probability is needed:

# ❌ Blocked output
"BTC has a 70% chance of moving up in the next 2 weeks."

# ✅ Required format instead
Scenario A (upward): requires X AND Y to materialize.
  Leading indicators currently showing: [observable signals]
Scenario B (sideways): ...
Scenario C (downward): ...
Note: this system does not produce probabilities.

Rule 2: Walkforward Validation Required (RDU-022)

Any quantitative strategy or model must pass walkforward validation before deployment — training and validation windows separated, evaluated on truly out-of-sample future data. In-sample backtests are not evidence.

Rule 3: Correlation ≠ Causation Enforcement (RDU-023)

Co-occurring events cannot be labeled as causal without explicit causal identification. The agent must label observations as:

LabelMeaningAction allowed
FACTDirectly observed, source citedUse in decisions
INFERENCEDerived from facts, assumption statedUse with explicit caveat
UNIDENTIFIEDCorrelation only, causal mechanism unknownLog, do not act
SPURIOUSCo-occurrence, no causal mechanismDiscard

The Falsifier Protocol: Applied Beyond Finance

The same structure works across all decision domains. A falsifier has three required slots:

falsifier:
  condition: [observable event that invalidates the thesis]
  threshold: [measurable value, not vague]
  evaluation_date: [concrete date to check]

# Example: technology adoption decision
falsifier:
  condition: >
    Library has no releases for 6 months, OR
    Critical CVE published with no patch in 30 days, OR
    Primary maintainer announces deprecation
  threshold: any of the above
  evaluation_date: 2026-09-13

A thesis without a falsifier is not a thesis — it's an opinion. We enforce this at the pre-decision gate: any load-bearing decision (deployment, investment, architecture change) that lacks explicit falsifier conditions is blocked.

The Empty Consensus Problem

Multi-agent systems introduce a specific failure mode: empty consensus. Multiple agents agree on a claim — but the claim isn't actually verifiable.

Our measured metrics from a 8-persona wisdom council system:

MetricObserved valueThresholdStatus
empty_consensus_rate0.354< 0.30⚠️ Over threshold
council_repeat_rate0.44–0.48< 0.10🔴 4.4× threshold
outcome_report_rate0.036> 0.70🔴 Under threshold

The insight from these numbers: model diversity doesn't fix empty consensus. Adding more diverse models to a council doesn't reduce the rate at which they agree on unverifiable claims. The fix is verification — routing council conclusions through formal engines.

Verification Bridges

For any council conclusion that is load-bearing (consequential decision, money, deployment) and contains mathematical or logical claims, we route through a verification bridge before accepting the conclusion:

The verification result feeds back via report_outcome — closing the loop and improving future council calibration. Without this feedback, the outcome_report_rate stays near zero (measured: 0.036), meaning the council never learns from its errors.

The real bottleneck isn't model quality — it's verifiability. A unanimous council conclusion that can't be verified is worth less than a single verified calculation. Most of our empty_consensus cases were claims that sounded like analysis but couldn't be checked against any ground truth.

Practical Implementation

The minimal implementation:

  1. Add a gate before any decision is accepted: does it have a falsifier condition?
  2. Block LLM probability production at the system prompt level — not with instructions, but with a stop hook that intercepts outputs containing unsourced percentages
  3. Build a ledger: store every load-bearing decision with its falsifier, evaluation date, and outcome
  4. Close the loop: when evaluation dates arrive, check falsifiers and update the ledger

The ledger is the most important part. Without it, falsifiers are just good intentions — stated conditions that never get checked.

← RDU: Reusable Decision Units · → Stop Hooks as Behavioral Gates

hexisteme notes — engineering patterns from building AI agent systems · home · CC-BY 4.0