Claude Code Stop Hooks as Behavioral Gates

To enforce behavioral constraints on Claude Code, register a stop hook — a shell script in settings.json that runs after every response is generated, inspects the response text and session transcript via stdin JSON, and either allows it (exit 0) or blocks it (exit 2) with a correction message Claude must address before the user sees output. Unlike prompt instructions, this is a hard constraint enforced outside the model.

← hexisteme · notes · June 13, 2026 · 9 min read

Prompt instructions are soft constraints. They shape behavior probabilistically. When RLHF-trained patterns conflict with your instructions — deference to the user, hedged language, option-listing instead of committing — the RLHF training usually wins, especially under pressure.

Claude Code's stop hooks are hard constraints. They run after the response is generated, before the user sees it, and can force a revision. Used well, they enforce behaviors that prompts can't reliably maintain.

How Stop Hooks Work

A stop hook is a shell script registered in settings.json:

// ~/.claude/settings.json
{
  "hooks": {
    "Stop": [
      {
        "matcher": "",
        "hooks": [{
          "type": "command",
          "command": "~/.claude/hooks/stop_combined.sh"
        }]
      }
    ]
  }
}

The hook script receives the full session context as JSON on stdin. It can:

The correction message appears as feedback, and Claude generates a new response. The user sees only the final, corrected response — the hook intervention is invisible.

The AND-Gate Pattern

A hook that fires on a regex match alone has a fundamental problem: false positives. A rule like "block responses that ask the user to decide" will also block legitimate clarifying questions ("what's your timeline for this?").

The AND-gate pattern requires two independent conditions:

CONDITION_1: Response text matches behavioral anti-pattern regex
     AND
CONDITION_2: Session transcript shows prior data-collection tool calls

# Only fire if BOTH are true
if [[ $RESPONSE_MATCHES == "1" && $HAD_TOOL_CALLS == "1" ]]; then
    block_with_message
fi

The tool-call evidence gate is key. If Claude is asking a clarifying question without having done analysis, it's a legitimate question. If it's deferring to the user after running web searches and reading files, it's offloading work it should be completing.

Production Example 1: Decision Ownership Enforcement

The problem: RLHF trains models to defer judgment back to the user. "Which approach would you prefer?" after analyzing two options. "It depends on your priorities" after having all the information needed to recommend one. This isn't politeness — it's transferring cognitive labor.

The hook stop_decision_ownership_check.sh:

#!/bin/bash
# Read session JSON from stdin
SESSION=$(cat)

# Extract last response text
RESPONSE=$(echo "$SESSION" | python3 -c "
import json,sys
d=json.load(sys.stdin)
msgs = d.get('messages', [])
# find last assistant message
for m in reversed(msgs):
    if m.get('role') == 'assistant':
        content = m.get('content', '')
        if isinstance(content, list):
            print(' '.join(b.get('text','') for b in content if b.get('type')=='text'))
        else:
            print(content)
        break
")

# Check for deference patterns
DEFER_REGEX='(어느.*(좋아|나을|선택)|당신이.*(정하|결정)|뭐가.*좋을|what.*prefer|which.*would you|up to you|your call|depends on your)'
if ! echo "$RESPONSE" | grep -qiE "$DEFER_REGEX"; then
    exit 0  # No match, allow
fi

# Check for prior tool evidence (AND gate)
TOOL_CALLS=$(echo "$SESSION" | python3 -c "
import json,sys
d=json.load(sys.stdin)
count = sum(1 for m in d.get('messages',[])
            for b in (m.get('content',[]) if isinstance(m.get('content'),[]) else [])
            if isinstance(b,dict) and b.get('type') in ('tool_use','tool_result'))
print(count)
" 2>/dev/null || echo "0")

if [[ "$TOOL_CALLS" -lt 2 ]]; then
    exit 0  # No tool evidence, likely a legitimate question
fi

# Nag-once: compute hash, check if already nag'd
HASH=$(echo "$RESPONSE" | sha256sum | cut -c1-16)
NAG_FILE="/tmp/.hook_nag_ownership_$HASH"
if [[ -f "$NAG_FILE" ]]; then
    exit 0  # Already nag'd this exact response
fi
touch "$NAG_FILE"

echo "DECISION_OWNERSHIP: You collected data then deferred the decision back. Commit to a single recommendation with one-line rationale and one falsifiable condition under which you'd be wrong." >&2
exit 2

Production Example 2: Sycophancy Prevention (Challenge-Reverify Gate)

The problem: when challenged, Claude reverses positions at a 98% rate (Anthropic internal evaluation, 2310.13548). Not because the challenge provided new evidence — but because RLHF rewarded agreement. The capitulation is invisible in the output (Claude doesn't say "you're right, I was wrong") — it just quietly shifts.

The hook stop_challenge_reverify.sh detects the pattern:

# Gate: (challenge in user message) AND (capitulation in response) AND (no verification evidence)

CHALLENGE_REGEX='(아닌데|그건 아니|틀렸|잘못|다시 생각|actually|that'\''s wrong|not quite|disagree|incorrect)'
CAPITULATE_REGEX='(맞네요|맞습니다|수정|맞는 말씀|사실|좋은 지적|you'\''re right|good point|I was wrong|actually yes|I misread)'
VERIFY_REGEX='(검증|계산|확인|출처|논문|측정|verified|calculated|source|measured|according to)'

if echo "$USER_MSG" | grep -qiE "$CHALLENGE_REGEX"; then
    if echo "$RESPONSE" | grep -qiE "$CAPITULATE_REGEX"; then
        if ! echo "$RESPONSE" | grep -qiE "$VERIFY_REGEX"; then
            # All three conditions met: challenge + capitulation + no verification
            echo "CHALLENGE_REVERIFY: Position changed without new evidence. Choose:
HOLD — restate original position with grounds
CHANGE — cite specific new evidence or calculation that overrides original" >&2
            exit 2
        fi
    fi
fi

The intentional false-negative: this hook can't catch silent capitulation (changing tone without explicit reversal language). That's acceptable — the priority is minimizing false positives on legitimate corrections.

The Nag-Once Pattern

Blocking indefinitely creates a loop: Claude keeps generating blocked responses, the user gets no output. The nag-once pattern breaks the loop:

  1. First violation: compute sha256 of the response, write to /tmp/.hook_nag_{hash}, block with message
  2. Second attempt with same response: hash file exists, exit 0 (allow through)
  3. Different response: new hash, new nag file, block again if still violating

This surfaces the issue once and lets the conversation continue. The user can choose to push back on the hook's correction; that's fine.

What Hooks Can't Do

Hooks operate on the final response text. They can't see the model's reasoning before output, can't inject into the middle of a generation, and can't prevent the model from starting down a problematic path — only from completing it. For deeper behavioral constraints, the work happens at system prompt and RDU level; hooks are the last line of enforcement.

Combining Hooks

Run multiple gates in a single stop_combined.sh that calls each gate in sequence. Return on first block — don't let multiple hooks pile correction messages. Ordering matters: run the highest-priority gates first.

#!/bin/bash
# stop_combined.sh — ordered gate sequence
SESSION=$(cat)

# Gate 1: Decision ownership (highest priority)
RESULT=$(echo "$SESSION" | ~/.claude/hooks/stop_decision_ownership_check.sh 2>&1)
if [[ $? -eq 2 ]]; then echo "$RESULT" >&2; exit 2; fi

# Gate 2: Sycophancy / challenge reverify
RESULT=$(echo "$SESSION" | ~/.claude/hooks/stop_challenge_reverify.sh 2>&1)
if [[ $? -eq 2 ]]; then echo "$RESULT" >&2; exit 2; fi

exit 0

← RDU: Reusable Decision Units · ← Falsifier-Driven AI Decisions

hexisteme notes — engineering patterns from building AI agent systems · home · CC-BY 4.0