To enforce behavioral constraints on Claude Code, register a stop hook — a shell script in settings.json that runs after every response is generated, inspects the response text and session transcript via stdin JSON, and either allows it (exit 0) or blocks it (exit 2) with a correction message Claude must address before the user sees output. Unlike prompt instructions, this is a hard constraint enforced outside the model.
Prompt instructions are soft constraints. They shape behavior probabilistically. When RLHF-trained patterns conflict with your instructions — deference to the user, hedged language, option-listing instead of committing — the RLHF training usually wins, especially under pressure.
Claude Code's stop hooks are hard constraints. They run after the response is generated, before the user sees it, and can force a revision. Used well, they enforce behaviors that prompts can't reliably maintain.
A stop hook is a shell script registered in settings.json:
// ~/.claude/settings.json
{
"hooks": {
"Stop": [
{
"matcher": "",
"hooks": [{
"type": "command",
"command": "~/.claude/hooks/stop_combined.sh"
}]
}
]
}
}
The hook script receives the full session context as JSON on stdin. It can:
0 — allow the response through2 — block and inject a correction message Claude must addressThe correction message appears as feedback, and Claude generates a new response. The user sees only the final, corrected response — the hook intervention is invisible.
A hook that fires on a regex match alone has a fundamental problem: false positives. A rule like "block responses that ask the user to decide" will also block legitimate clarifying questions ("what's your timeline for this?").
The AND-gate pattern requires two independent conditions:
CONDITION_1: Response text matches behavioral anti-pattern regex
AND
CONDITION_2: Session transcript shows prior data-collection tool calls
# Only fire if BOTH are true
if [[ $RESPONSE_MATCHES == "1" && $HAD_TOOL_CALLS == "1" ]]; then
block_with_message
fi
The tool-call evidence gate is key. If Claude is asking a clarifying question without having done analysis, it's a legitimate question. If it's deferring to the user after running web searches and reading files, it's offloading work it should be completing.
The problem: RLHF trains models to defer judgment back to the user. "Which approach would you prefer?" after analyzing two options. "It depends on your priorities" after having all the information needed to recommend one. This isn't politeness — it's transferring cognitive labor.
The hook stop_decision_ownership_check.sh:
#!/bin/bash
# Read session JSON from stdin
SESSION=$(cat)
# Extract last response text
RESPONSE=$(echo "$SESSION" | python3 -c "
import json,sys
d=json.load(sys.stdin)
msgs = d.get('messages', [])
# find last assistant message
for m in reversed(msgs):
if m.get('role') == 'assistant':
content = m.get('content', '')
if isinstance(content, list):
print(' '.join(b.get('text','') for b in content if b.get('type')=='text'))
else:
print(content)
break
")
# Check for deference patterns
DEFER_REGEX='(어느.*(좋아|나을|선택)|당신이.*(정하|결정)|뭐가.*좋을|what.*prefer|which.*would you|up to you|your call|depends on your)'
if ! echo "$RESPONSE" | grep -qiE "$DEFER_REGEX"; then
exit 0 # No match, allow
fi
# Check for prior tool evidence (AND gate)
TOOL_CALLS=$(echo "$SESSION" | python3 -c "
import json,sys
d=json.load(sys.stdin)
count = sum(1 for m in d.get('messages',[])
for b in (m.get('content',[]) if isinstance(m.get('content'),[]) else [])
if isinstance(b,dict) and b.get('type') in ('tool_use','tool_result'))
print(count)
" 2>/dev/null || echo "0")
if [[ "$TOOL_CALLS" -lt 2 ]]; then
exit 0 # No tool evidence, likely a legitimate question
fi
# Nag-once: compute hash, check if already nag'd
HASH=$(echo "$RESPONSE" | sha256sum | cut -c1-16)
NAG_FILE="/tmp/.hook_nag_ownership_$HASH"
if [[ -f "$NAG_FILE" ]]; then
exit 0 # Already nag'd this exact response
fi
touch "$NAG_FILE"
echo "DECISION_OWNERSHIP: You collected data then deferred the decision back. Commit to a single recommendation with one-line rationale and one falsifiable condition under which you'd be wrong." >&2
exit 2
The problem: when challenged, Claude reverses positions at a 98% rate (Anthropic internal evaluation, 2310.13548). Not because the challenge provided new evidence — but because RLHF rewarded agreement. The capitulation is invisible in the output (Claude doesn't say "you're right, I was wrong") — it just quietly shifts.
The hook stop_challenge_reverify.sh detects the pattern:
# Gate: (challenge in user message) AND (capitulation in response) AND (no verification evidence)
CHALLENGE_REGEX='(아닌데|그건 아니|틀렸|잘못|다시 생각|actually|that'\''s wrong|not quite|disagree|incorrect)'
CAPITULATE_REGEX='(맞네요|맞습니다|수정|맞는 말씀|사실|좋은 지적|you'\''re right|good point|I was wrong|actually yes|I misread)'
VERIFY_REGEX='(검증|계산|확인|출처|논문|측정|verified|calculated|source|measured|according to)'
if echo "$USER_MSG" | grep -qiE "$CHALLENGE_REGEX"; then
if echo "$RESPONSE" | grep -qiE "$CAPITULATE_REGEX"; then
if ! echo "$RESPONSE" | grep -qiE "$VERIFY_REGEX"; then
# All three conditions met: challenge + capitulation + no verification
echo "CHALLENGE_REVERIFY: Position changed without new evidence. Choose:
HOLD — restate original position with grounds
CHANGE — cite specific new evidence or calculation that overrides original" >&2
exit 2
fi
fi
fi
The intentional false-negative: this hook can't catch silent capitulation (changing tone without explicit reversal language). That's acceptable — the priority is minimizing false positives on legitimate corrections.
Blocking indefinitely creates a loop: Claude keeps generating blocked responses, the user gets no output. The nag-once pattern breaks the loop:
/tmp/.hook_nag_{hash}, block with messageThis surfaces the issue once and lets the conversation continue. The user can choose to push back on the hook's correction; that's fine.
Run multiple gates in a single stop_combined.sh that calls each gate in sequence. Return on first block — don't let multiple hooks pile correction messages. Ordering matters: run the highest-priority gates first.
#!/bin/bash
# stop_combined.sh — ordered gate sequence
SESSION=$(cat)
# Gate 1: Decision ownership (highest priority)
RESULT=$(echo "$SESSION" | ~/.claude/hooks/stop_decision_ownership_check.sh 2>&1)
if [[ $? -eq 2 ]]; then echo "$RESULT" >&2; exit 2; fi
# Gate 2: Sycophancy / challenge reverify
RESULT=$(echo "$SESSION" | ~/.claude/hooks/stop_challenge_reverify.sh 2>&1)
if [[ $? -eq 2 ]]; then echo "$RESULT" >&2; exit 2; fi
exit 0
← RDU: Reusable Decision Units · ← Falsifier-Driven AI Decisions