Turn Queue
A session runs one turn at a time. Many things want to start a turn: a human typing in the compositor, a cron expression matching, a webhook landing, an API call, an agent self-schedule, an MCP push. The turn queue is the single point where those competing demands are serialized, ordered, and drained.
This page owns that contract: the ingestion model, the queued_at
data shape, the run-state machine that drains the queue, the
invariants every implementor MUST honor, and the boundary between
what the core owns and what a host or UI may build on top.
The turn sources live elsewhere — human input in
compositor, non-human input in
triggers. This page is about what happens after
a turn-triggering message exists and before its turn fires.
The keywords MUST, MUST NOT, SHOULD, SHOULD NOT, MAY are used as in RFC 2119.
Why a queue
Two turns running on one session at the same time is a footgun. They would race on the same conversation history, the same tools, the same run state, and the same token rollup. The result is non-deterministic and unrecoverable.
So at most one turn runs per session. That invariant forces a decision about what to do with the second demand that arrives while the first turn is running. Three answers:
- Reject it. The core returns "busy" and the caller retries later. This pushes the queue into every caller, loses determinism (retry timing decides order), and breaks every non-human source — a webhook or a cron has no keyboard to sit at and resubmit.
- Preempt the running turn. The new message interrupts and replaces the in-flight turn. This is non-deterministic, throws away in-flight work and tokens, and makes "what did the agent actually see" unanswerable.
- Queue it. Accept it, persist it, order it, and fire it when the session next goes idle. ← chosen.
The queue buys determinism over priority: the conversation replays the same way every time, and no source jumps another.
The model
Every turn begins as a user message
All sources converge on one shape: a user message lands in the
session. The human path is the compositor; the
non-human path carries a metadata_json.trigger envelope
(triggers). The queue does
not care who originated the message — the presence of trigger is a
discriminator for auditing, not for ordering. A typed message and a
webhook obey the same queue rule.
This makes the queue the foundation of the turn sources, not a
sibling of them. The compositor and the trigger machinery are
turn sources that sit above the queue and feed it; the queue sits
below and serializes whatever they submit. The dependency runs one
way — a source knows it must enqueue; the queue knows nothing about
any particular source, carries the trigger envelope opaquely, and
never branches on it. So triggers are built on the queue, not the
queue on triggers — and it could not be the other way around: the
compositor is not a trigger (the human is the keyboard — no envelope,
no schedule, no auth), yet it feeds the same queue. Modeling the
queue on top of triggers would force the human path into a trigger
shape it does not fit.
Queued vs fired
- A message is queued when it is persisted while a turn is
already running, with
metadata_json.queued_atset to the epoch ms at which it was queued. - A message fires when the run-state machine clears its
queued_atand starts its turn. - A message that arrives while the session is idle fires
immediately and never carries
queued_at.queued_atpresent means "this message waited."
Single-flight
At most one turn per session is in the running state at any instant. The idle→busy transition MUST be atomic, so two near-simultaneous arrivals cannot both win the run; the loser is queued.
Order
The queue preserves queued_at order. When the session goes
idle, the run-state machine consumes queued messages in that order;
whether it consumes one per turn or all pending at once is the
implementer's drain discipline. Either way order
is preserved — a later message never fires ahead of an earlier one.
Human- and trigger-originated messages queue against the same
clock — the user is not jumped by a webhook, and the webhook is not
jumped by the user. Determinism over priority.
No preemption
A new message never interrupts the running turn. Preemption is not in
this contract. A host that needs "stop and run this now" composes it
from two existing operations:
abort(session_id) followed by a
submit. Explicit, observable, never the default.
The run-state machine
The queue is drained by the run-state machine — the core
component that owns whether a session is running and what fires next.
Its states are idle / busy / retrying / error. These project
onto the client-facing SessionStatus back-channel; the wire shape
and its transport live in
session / session status. This page
owns the behavior; that page owns the shape clients read.
Drain rule. On entering idle — when the running turn finishes
or is aborted — the machine selects the next batch of queued messages
in queued_at order (one message, or all currently queued, per the
drain discipline), clears their queued_at,
transitions to busy, and fires the turn. If nothing is queued, it
stays idle.
Where it lives. The core. The machine is authoritative; it is not the UI, and it is not any single client. Every client of the session — a second window, a hosted trigger runner with no UI at all, an inspector — sees the same queue because the queue is core state, not client state.
Hard failure pauses the drain. A turn that hard-fails
(state → error) does not auto-drain. Auto-firing the next
queued turn into a session that just broke would cascade the failure
— acutely dangerous under a trigger storm. So on hard error the queue
is paused: queued messages keep their queued_at and wait. The
drain resumes when error clears, which happens on the next fired
turn (a user retry, an edit-and-resend, or an explicit resume). A
transient failure is different: the turn is still the running turn
(retrying), so the queue is not drained mid-retry; it drains only
once the turn reaches a clean idle.
Stopping with a queue
A user abort (session / interruption)
is a clean way to reach idle, so it drains exactly like a natural
finish: aborting the running turn fires the next batch. Stop ends
the current turn, not the queue. Halting the whole cascade means
cancelling the queued messages (see
operating on queued messages) —
abort alone rolls into the next one. An implementation MAY offer a
combined "stop and clear" affordance, but the underlying abort and
cancel stay separate primitives; collapsing them would remove the
ability to end one turn while keeping the queue.
Drain discipline
How many queued messages the drain consumes per turn is a policy each implementer picks. Two disciplines are common:
- Serial — the drain fires one message per turn, earliest first. N queued messages produce N turns. Each gets its own assistant response and its own rewind point; the agent reacts to them one at a time.
- Coalescing — the drain folds all currently-queued messages
into a single turn, in
queued_atorder. N queued messages produce one turn. The agent sees the whole pending batch at once; fewer turns, fewer model round-trips.
Both honor every invariant — single-flight, queued_at
order, no preemption. They differ only in turn granularity:
serial trades round-trips for finer-grained history and rewind;
coalescing trades granularity for cost and for letting the agent
react to everything at once. The discipline is fixed per
implementation; this guide does not mandate one.
Drain cadence is a host policy. The drain rule fires the next batch
"when the session goes idle" — but an implementation MAY insert a brief
settle delay between the idle edge and the next fire. The session is
genuinely idle for that window (no turn running) and the next batch stays
queued for its duration — its queued_at is cleared only when it fires.
This gives every client time to observe the idle transition, and lets a
surface keep showing the still-pending batch as queued so it appears to
"submit" in step with its response rather than flushing early. Useful where a
surface projects run-state to a control (a stop/send toggle) that would
otherwise never paint the idle state on a back-to-back drain. The delay
changes only cadence, never an invariant; its duration is the
host's, like the throttle and dedup numbers.
Messages that arrive after a drain has begun belong to the
next batch, never the one already firing — a batch is whatever was
queued at the instant the session reached idle. This is what keeps a
coalescing drain deterministic.
Lifecycle
The full path, from any source to a fired turn:
- A turn-triggering message arrives.
- The core persists it as a
usermessage immediately, before deciding whether to run it. The persisted-message store is the queue; there is no separate queue structure. - If the session is idle, the machine fires the turn now (no
queued_at). If busy or retrying, the message is stampedqueued_atand the machine does not start a turn. - When the running turn reaches idle, the drain rule fires the earliest queued message.
- Steps 3–4 repeat until the queue is empty.
Operating on queued messages
While a message waits in the queue it can be acted on; once it fires it is an ordinary user message and any change is a rewind, not a queue operation.
- Cancel (remove) — a conforming implementation SHOULD expose this. It removes a queued message before it fires, and it is the only way to halt the drain cascade: because an abort drains the next batch, stopping the queue means cancelling its messages.
- Edit — an implementation MAY let the user rewrite a queued
message's parts in place (
queued_atand order unchanged). One that does not build in-place edit can rely on cancel + resubmit for the same effect. - Reorder — an implementation MAY let the user reorder queued
messages. The default order is
queued_at; reordering is a convenience that changes only the order the drain consumes, no other invariant.
These are the only operations on a queued message; how they surface
is a host concern (ux / queued sends).
Drop rules — what does not queue
Not every arriving message reaches the queue. Admission is a
source-layer decision, made before submit — the queue itself
accepts whatever a source hands it and owns no drop policy. The
trigger machinery is the only source that drops, under two host
policies, both defined in
triggers / queue semantics:
- Duplicate
delivery_id. The upstream redelivered an event the session already holds. Idempotency. - Throttle exceeded. Host policy caps a trigger's fire-rate; excess fires are dropped, not queued, so an upstream firehose cannot build an unbounded backlog.
A dropped message is distinct from a cancelled one: dropped never enters the queue; cancelled was queued and removed.
Persistence and restart
The queue is not a separate data structure — it is exactly the
set of persisted user messages that carry a queued_at and have
not yet fired (the queued_at metadata key is defined in
persistency / chat_messages).
This has a free consequence: the queue survives a host restart.
SessionStatus is volatile, so after a restart every session reads
as idle (session / session status);
the run-state machine then resumes by draining any still-queued
messages. A turn that was running at restart is not resumed —
cross-restart run-resume is out of scope, and its orphaned in-flight
tool calls are finalized as errors
(session / resume)
— but that turn's queued successors still drain normally.
The core / host / UI boundary
This is the line the contract draws — and the one most easily blurred. The queue is core, not surface.
| Concern | Owner |
|---|---|
| Accepting a turn-triggering message | Core |
Persisting it and stamping queued_at | Core |
| Single-flight enforcement | Core |
| Ordering and draining | Core (the run-state machine) |
Projecting state onto SessionStatus | Core |
| Edit / cancel of a queued message | Core operations |
| Rendering queued messages | UI |
| Edit / cancel affordances | UI (calls the core operations) |
Reading SessionStatus for busy / idle | UI / host |
| Status transport (event bus, polling) | Host |
A conforming UI MUST NOT implement a private hold-and-resubmit queue as the source of truth — holding messages client-side and replaying them on idle makes the queue invisible to every other client of the session, to triggers, and to the inspector, and it silently drops the moment the client closes. The UI MAY render optimistically (show a message as queued before the core confirms), but the authoritative queue is always the persisted-message set the core drains. A host with no human present — a scripted job, a hosted trigger runner — depends on this: there is no UI there to hold anything.
Invariants
A conforming implementation MUST hold all of these:
- At most one turn runs per session at any instant.
- The idle→busy transition is atomic.
- The drain fires the earliest unfired
queued_atmessage; ties are broken deterministically (e.g. by message id). - Human- and trigger-originated messages share one FIFO clock; no source is prioritized.
- A new message never preempts the running turn.
- A hard-failed turn pauses the drain; queued messages persist and resume on the next fired turn.
- Every fired message becomes a real, recorded turn — billable, inspectable, abortable. There is no shadow execution.
- The queue holds no state the persisted-message store does not; it is recoverable from the messages alone.
- The drain discipline (serial or coalescing) changes turn granularity only — never single-flight, order, or preemption.
What this guide does not specify
- A priority queue. Determinism over priority is deliberate. A host that needs urgency composes abort+submit; it does not get a priority lane.
- The drain discipline. Serial (one message per turn) or coalescing (all pending in one turn) — both conform; see Drain discipline.
- The status transport. Event bus, SSE, polling — all conformant.
The
SessionStatusshape is insession; delivery is the host's. - Throttle and dedup numbers. The drop contract is here; the
caps (hourly rate, per-account quota, TTL) are the host's, per
triggers / lifecycle bounds. - Whether the UI shows a queue list, a count, or nothing. Rendering is host territory; the persisted queue is what it renders.
- Cross-restart run-resume. Only the queue drains after a restart; a running turn is not resumed.
See also
- Compositor — the human turn source.
- Triggers — the non-human turn sources and the drop rules the trigger layer applies before enqueue.
- Session / session status — the
SessionStatuswire shape this machine projects onto, and the abort path the no-preemption rule composes with. - Session / rewinding — what editing a message becomes once it has fired.
- Persistency — the
queued_atmetadata key the queue is made of. - UX / queued sends — the user-facing framing that rides on this contract.
- Subagents — background subagents inject a completion message that queues like any other.