Why I rejected an event bus for my solo agent fleet: state is truth, events are rumors

For a small, high-churn fleet of agents, cron jobs, and MCP servers you don't fully control, prefer pull (scan on-disk state) over push (an event bus): a poller is self-healing and needs zero instrumentation, while an event bus goes silently blind the moment a component fails to emit, crashes before emitting, or is a third-party tool you can't modify at all. I designed the inbox as a computed view over existing state and rejected the event bus on purpose. Here is the reasoning, because it generalizes past my setup.

← hexisteme · notes · June 20, 2026 · 8 min read

I run a personal fleet on one machine — a handful of small agents, a pile of cron jobs and LaunchAgents, and several MCP servers, some of them third-party. I wanted a single inbox that answers "what needs my attention right now?": new outputs I haven't seen, decisions only a human can close, dependencies that broke.

The obvious architecture is an event bus. Every component emits events — job.finished, output.created, decision.pending — to an append-only log; the inbox reads the log; closing an item writes a close-event that advances the next step. It's clean on a whiteboard. I rejected it for four reasons, and chose a pull design instead.

The four reasons I killed the event bus

1. The instrumentation tax is a project killer

Push means every producer must emit. In a fleet that grows every week, that's a standing tax on each new agent, each new cron line, and — fatally — each third-party MCP server you cannot modify. You can't add an emit call to a server someone else wrote. So the moment you add a component and forget to instrument it (or can't), it becomes invisible in the inbox. An observability layer whose blind spots grow with your system is worse than useless: it's a "single source of truth" that quietly lies.

The trap: the components you least control — third-party tools, things that crash early — are exactly the ones an event bus can't see. Push optimizes for the easy case (code you own) and fails the hard case (code you don't).

2. State is truth, events are rumors

An event is a claim that depends on the claimant surviving and remembering to speak. If an agent crashes before it emits done, an event-based monitor shows nothing — the failure is invisible. State is the evidence left behind regardless: a stale output file, a log that stopped growing, a health check that fails, a process that isn't there. A pull monitor reads that evidence and reflects the crash without the agent's cooperation. This makes pull self-healing — it converges on reality every cycle — while push is only as honest as its least-reliable emitter.

# pull: derive status from evidence the component already leaves behind
status = {
  "alive":   process_is_running(job),         # ps / launchctl print
  "fresh":   newest_output_mtime(job) > expected,   # <project>/report/ glob
  "healthy": health_check(dependency),        # poll endpoint
}
# no emit() anywhere; a component that never reports is still seen

3. Orchestration schizophrenia

A closed-loop bus — where "close this item" emits an event that advances the next step — turns the inbox into a workflow engine. I already have an orchestration hub that decomposes goals into steps. Building a second one inside the monitor duplicates that responsibility and doubles the debugging surface: now a stuck task could be the hub's fault or the inbox's. A monitor should report state, not drive it. Keeping those two jobs in two systems is what keeps either one debuggable.

4. The bus itself becomes an unreviewed dump

An append-only event log is not free infrastructure. It accrues schema drift (the shape of output.created changes and old readers break), duplicate events, events nobody ever closes, and unbounded growth that demands compaction. You've added a database with none of a database's guarantees — and it needs its own monitoring. Pull has no such artifact: there's nothing to compact because there's nothing stored but the state that already exists on disk.

What pull looks like instead

The inbox is a computed view over state that already exists — no new store, no emit calls:

Attention typeDerived from (pull)
New / unread outputper-job output glob mtime vs a read-timestamp record
Pending decisionexisting on-disk sources — an attention scan's output, an undecided ledger entry, an expired evaluation date
Broken dependencyhealth-check failure, propagated to anything that declares a dependency on it

Priority is deterministic, not a model's guess: BLOCKED → STALE → NEW, where each is a hard fact (a failed health check, a file newer than its read-timestamp, a schedule past due). An LLM-generated priority number would be unfalsifiable and would drift; a deterministic trigger is reproducible and debuggable. The model stays out of the ranking entirely.

Freshness is "unread," not "old"

Pull also fixes a subtle metric. The intuitive freshness signal is elapsed time — "this ran 3 days ago." But age isn't the problem; unread output is. A report that ran an hour ago and that you haven't opened is more demanding of attention than one from last week you already read. So freshness is computed as a join: does an output exist whose mtime is newer than the last time you opened it? Clicking a card records a read-timestamp; unread items rise to the top; read ones sink. This is only cheap because the design already scans state — freshness falls out of the same glob, where in a push system it would be yet another event to emit and reconcile.

The boundary that keeps it honest

Choosing pull also forces a discipline: the monitor must not mutate fleet state. "Closing" an inbox item means acknowledge and deep-link to the real place the work is closed — it does not reach in and change a job, a ledger, or an agent's state. The moment a monitor starts writing back, it's an orchestrator again, and reasons 1–3 return. Read the world; link to the controls; never become the controls.

When push is right

None of this says event buses are bad — it says they fit a different shape. If you own every producer, can instrument all of them, and need high-throughput, low-latency fan-out, push is the right tool. The pull argument wins specifically for a small, heterogeneous, high-churn fleet with components you don't control, where the cost that dominates is instrumentation and the failure that hurts most is the silent one. Match the architecture to which cost is fatal: throughput, or blind spots.

FAQ

Q. Should I use an event bus or polling for a small multi-agent system?
Polling, if the fleet is high-churn and includes components you can't instrument. An event bus is blind to anything that doesn't emit; a poller scanning on-disk state is self-healing and needs zero instrumentation. Use a bus when you own and can instrument every producer and need high throughput.

Q. Is push or pull better for monitoring agents I don't control?
Pull. You can't add emit calls to a third-party MCP server, so push can't see it. Pull derives status from log mtimes, output files, liveness, and health endpoints — no cooperation required.

Q. How do I monitor third-party MCP servers that don't emit events?
Observe their external state: poll a health endpoint, check the process is alive, watch files they touch, and propagate status to dependents. The component never has to cooperate.

Q. How should a monitoring inbox decide priority without an LLM?
Deterministic state triggers — BLOCKED, then STALE, then NEW — each a fact derived from disk (health check, mtime vs read-timestamp, schedule vs log). An LLM priority score is unfalsifiable and drifts.

← hexisteme · notes · CC-BY 4.0