How do I stop an LLM from making unfalsifiable claims by requiring falsification conditions?

Add a pre-decision gate that rejects any claim lacking an explicit falsification condition — a concrete, observable event or threshold that, if reached, would prove the claim false. Each falsifier needs three slots: the observable condition, a measurable threshold (not vague), and a concrete evaluation date. Claims without a falsifier are logged as uncertain or escalated, never used as the basis for a decision. Enforce it with a stop hook that blocks load-bearing outputs (deployment, investment, architecture changes) missing falsifier conditions.

How do you prevent LLM agents from generating fake probabilities?

Encode a hard rule: LLMs cannot produce numerical probabilities. Instead, they must produce conditional scenario branches (if X then Y, if A then B) with explicit falsification conditions. Any claim that includes a percentage without a calibrated statistical source is automatically blocked. This is implemented as RDU-021 in our system: 'LLMs do not generate financial probability numbers. If a probability is needed, a statistical model generates it and Brier score validation is required before exposure.'

What is a falsification condition in AI agent design?

A falsification condition is a concrete, observable event or threshold that, if reached, would invalidate the current decision. For example: 'This technology adoption decision is falsified if the library has no releases for 6 months OR if a critical security CVE is published with no patch within 30 days.' Without falsification conditions, AI decisions are unfalsifiable claims dressed as analysis.

What is the empty_consensus problem in multi-agent AI systems?

Empty consensus happens when multiple AI agents agree on a claim that isn't verifiable. Our measured empty_consensus_rate was 0.354 — 35% of multi-agent council responses had low auto-verifiability (<0.20) despite apparent consensus. The fix is not model diversity but verification bridges: routing council conclusions through formal verification engines (sympy, wolfram, lean) before accepting them as load-bearing decisions.

How do you track whether AI agent decisions are calibrated?

Brier score tracking: record every probabilistic claim the system makes (even when forced through scenario branches), then measure actual outcomes. Our finance-journal ledger stores EntryThesis records with falsifier conditions, tracks which falsifiers triggered, and computes rolling Brier scores. An agent that can't be measured by Brier score is producing narrative, not prediction.

Falsifier-Driven AI Decisions: No Claim Without Falsification Conditions

Q: How do you track whether AI agent decisions are calibrated?

Brier score tracking: record every probabilistic claim the system makes (even when forced through scenario branches), then measure actual outcomes. Our finance-journal ledger stores EntryThesis records with falsifier conditions, tracks which falsifiers triggered, and computes rolling Brier scores. An agent that can't be measured by Brier score is producing narrative, not prediction.

To stop an LLM from making unfalsifiable claims, require an explicit falsification condition for every claim before it can be acted on: a concrete, observable event or threshold that would prove the claim false. Any claim without one is logged as uncertain or escalated — never used as the basis for a decision.

How to stop an LLM from making unfalsifiable claims

Real result from a running 8-persona council system: requiring falsification conditions and verification bridges targets the measured empty_consensus_rate of 0.354 (35% of consensus claims had <0.20 auto-verifiability) — model diversity alone did not move it.

← hexisteme · notes · June 13, 2026 · 10 min read

LLMs generate confident-sounding probabilities. "There's a 70% chance this stock moves up." "This approach will probably work." These numbers feel like analysis. They're not — they're language modeling. The model has learned that confident numeric claims sound authoritative; RLHF has rewarded that pattern.

The fix isn't better prompting. It's a structural constraint: no claim is accepted without explicit falsification conditions.

The Core Principle

Karl Popper's falsifiability criterion, applied to AI agent outputs: a claim that cannot be falsified is not a prediction — it's a narrative. For AI agents making consequential decisions, this means every accepted claim must answer:

Under what observable conditions would this claim be false?

Without that answer, the claim cannot be acted on. It can be noted, logged as uncertain, or escalated — but not used as the basis for a decision.

Three Concrete Rules (RDU-021, 022, 023)

Rule 1: No LLM Probability Production (RDU-021)

The agent cannot produce numerical probabilities in any consequential domain (finance, deployment decisions, technology adoption). When a probability is needed:

A statistical model produces it — not the language model
Brier score validation is required before exposure
The LLM's role is to interpret the number, not generate it

# ❌ Blocked output
"BTC has a 70% chance of moving up in the next 2 weeks."

# ✅ Required format instead
Scenario A (upward): requires X AND Y to materialize.
  Leading indicators currently showing: [observable signals]
Scenario B (sideways): ...
Scenario C (downward): ...
Note: this system does not produce probabilities.

Rule 2: Walkforward Validation Required (RDU-022)

Any quantitative strategy or model must pass walkforward validation before deployment — training and validation windows separated, evaluated on truly out-of-sample future data. In-sample backtests are not evidence.

Rule 3: Correlation ≠ Causation Enforcement (RDU-023)

Co-occurring events cannot be labeled as causal without explicit causal identification. The agent must label observations as:

Label	Meaning	Action allowed
`FACT`	Directly observed, source cited	Use in decisions
`INFERENCE`	Derived from facts, assumption stated	Use with explicit caveat
`UNIDENTIFIED`	Correlation only, causal mechanism unknown	Log, do not act
`SPURIOUS`	Co-occurrence, no causal mechanism	Discard

The Falsifier Protocol: Applied Beyond Finance

The same structure works across all decision domains. A falsifier has three required slots:

falsifier:
  condition: [observable event that invalidates the thesis]
  threshold: [measurable value, not vague]
  evaluation_date: [concrete date to check]

# Example: technology adoption decision
falsifier:
  condition: >
    Library has no releases for 6 months, OR
    Critical CVE published with no patch in 30 days, OR
    Primary maintainer announces deprecation
  threshold: any of the above
  evaluation_date: 2026-09-13

A thesis without a falsifier is not a thesis — it's an opinion. We enforce this at the pre-decision gate: any load-bearing decision (deployment, investment, architecture change) that lacks explicit falsifier conditions is blocked.

The Empty Consensus Problem

Multi-agent systems introduce a specific failure mode: empty consensus. Multiple agents agree on a claim — but the claim isn't actually verifiable.

Our measured metrics from a 8-persona wisdom council system:

Metric	Observed value	Threshold	Status
`empty_consensus_rate`	0.354	< 0.30	⚠️ Over threshold
`council_repeat_rate`	0.44–0.48	< 0.10	🔴 4.4× threshold
`outcome_report_rate`	0.036	> 0.70	🔴 Under threshold

The insight from these numbers: model diversity doesn't fix empty consensus. Adding more diverse models to a council doesn't reduce the rate at which they agree on unverifiable claims. The fix is verification — routing council conclusions through formal engines.

Verification Bridges

For any council conclusion that is load-bearing (consequential decision, money, deployment) and contains mathematical or logical claims, we route through a verification bridge before accepting the conclusion:

Arithmetic claims → sympy-mcp or wolfram-alpha
Formal logic claims → lean-lsp
Empirical claims → cross-reference with primary sources, not web search summaries

The verification result feeds back via report_outcome — closing the loop and improving future council calibration. Without this feedback, the outcome_report_rate stays near zero (measured: 0.036), meaning the council never learns from its errors.

The real bottleneck isn't model quality — it's verifiability. A unanimous council conclusion that can't be verified is worth less than a single verified calculation. Most of our empty_consensus cases were claims that sounded like analysis but couldn't be checked against any ground truth.

Practical Implementation

The minimal implementation:

Add a gate before any decision is accepted: does it have a falsifier condition?
Block LLM probability production at the system prompt level — not with instructions, but with a stop hook that intercepts outputs containing unsourced percentages
Build a ledger: store every load-bearing decision with its falsifier, evaluation date, and outcome
Close the loop: when evaluation dates arrive, check falsifiers and update the ledger

The ledger is the most important part. Without it, falsifiers are just good intentions — stated conditions that never get checked.

← RDU: Reusable Decision Units · → Stop Hooks as Behavioral Gates

hexisteme notes — engineering patterns from building AI agent systems · home · CC-BY 4.0