Agent System (WG)
This is a guide for implementing an LLM-driven agent system — implementation-agnostic, normative, and meant to play the same role for agent runtimes that the Agent Client Protocol plays for editor ↔ agent integration, or that the Language Server Protocol plays for language tooling.
It answers one question: what is an agent system that hosts a code agent, a design agent, or any other task-agnostic agent without rewriting the core?
The shape is host-agnostic: it holds for a desktop daemon, a cloud sandbox runtime, a CLI, an IDE plugin, a hosted multi-tenant service. UX (window, panel, picker) is out of scope except where a UX requirement reaches back into the protocol.
Conventions
The keywords MUST, MUST NOT, SHOULD, SHOULD NOT, MAY are used as in RFC 2119.
| Identifier shape | Convention used in this guide |
|---|---|
| Field, column, and function names | snake_case |
| Path variables | kebab-case (e.g. {user-data}, {workspace}) |
| Type names | PascalCase (e.g. SessionStatus, ChatSessionRow) |
| ACP wire identifiers | Carried verbatim from upstream (camelCase); translated at the seam. See ACP / naming seam. |
Vocabulary
- Agent — a config object: a system prompt + a tool list + a resolved model. Agent-as-data, not agent-as-class.
- Session — one conversation. Carries messages, parts, token rollups, and a parent pointer for forks. Persistent; survives the client process.
- Turn — one round-trip from a user message through assistant output (text, reasoning, tool calls, tool outputs) to a finished state. The unit a user can rewind to.
- Tool — a self-describing capability the agent can invoke. The set of fundamental tools is locked across agents; MCP tools and skills extend it without changing the contract.
- Runtime — the per-run capability surface handed to the agent (fs, net, shell, stream). Backed by a sandbox the agent does not see.
- Host — the process that loads the agent system. Desktop app, CLI, server, cloud sandbox. The host decides UI; the system decides protocol.
- Environment — where the host (and therefore the agent) runs:
web, cloud sandbox, or computer. See
environments.
What this is
A normative guide:
- Names the invariants every implementor MUST honor for an agent to be portable.
- Names the policies each implementor picks for their product shape.
- Specifies the wire-level shapes (session schema, chunk vocabulary, result envelope) that two conforming implementations agree on.
What this is not
- A model-provider router. Provider selection (Anthropic, OpenAI, cloud gateways, BYOK) is a sibling concern. The guide only requires that whichever provider is picked feeds the same AI-SDK-v6 chunk shape.
- A UI framework. Window / tab / sidebar / picker decisions belong to the host. The guide touches UX only where UX requirements bend the protocol (compositor format, sidecar branching, queued sends).
- A billing engine. Usage rollups land on the session row so a billing layer can read them; pricing is not the agent system's job.
- A multi-agent orchestration graph. Agents call subagents through
the locked
tasktool; there are no chains, no DAGs, no shared cross-run state.
Pages
The guide is organized as a set of pages. Read Foundations first; the rest can be read in any order.
| Page | Covers |
|---|---|
| Foundations | Bedrock: AI SDK v6 chunk shape, directory-rooted execution, the locked tool set summary, watchdog placement, web search, cross-cutting invariants. |
| AI SDK (reference substrate) | Implementor's annex to AI SDK's own docs. The token-usage cache normalization rule, where the SDK's tool-loop helper fits, what the RFC adds on top of the substrate. |
| Runtime Environments | Web / cloud sandbox / computer. Which capabilities each environment exposes; how the locked tool set degrades; sandbox primitives. |
| Sandbox Runtime (srt) | srt as the reference implementation of the computer environment's sandbox primitive. Capability surface, platform support, what the protocol does and does not lock to. |
| Session Lifecycle | Context tracking, rewinding, branching, compaction (auto + manual + failure), per-turn model switch, streaming, interruption, session status, permission scopes. |
| Persistency | Storage engine, the three-table schema, save policy, ID strategy, JSON discipline, event-log opt-in, schema evolution. |
| Tools | The locked fundamental set, the tool contract, capability requirements, result envelope, truncation, watchdog at the tool boundary, ACP kind mapping. |
| MCP and Connectors | User-plugged MCP servers, lazy materialization, tool_search for bulk discovery, OAuth, dynamic refresh, the untrusted-by-default trust policy. |
| Skills and Project Instructions | Two layers of knowledge: skills (lazy, advertise-then-load) and project instructions (eager, unconditional). Discovery sources, manifests, decision matrix. |
| Binary file handling | Glossary / reference. Three resolution paths (provider-native multimodal, skill-per-format, shell-based conversion), the format matrix (pdf / zip / pptx / psd / fig / …), the scratch-space pattern for archive extraction. |
| Subagents | The task tool, agent modes, blocking vs background, recursion, permission inheritance, inspectability, awareness, specialized subagents, opinionated patterns. |
| Triggers | Non-human-originated turns. Schedule / external webhook / programmatic API / agent self-schedule / MCP-pushed event sources. Trigger envelope on metadata_json.trigger, queue semantics, interactive-vs-hosted execution, lifecycle bounds, auth and trust. |
| Compositor | User intent representation. The multipart user-message shape, file refs vs attachments, inline commands, mentions, editor context (host-emitted selection / open / cursor / recent-action), attachment handling, and the user-view-vs-model-view lowering rules. |
| UX Patterns | What rides on top of the compositor: queued sends, sidecar chat as ephemeral branch, memory as a built-on-top layer. |
| Debugging | The canonical inspection format, export paths, what an inspection tool MUST expose, replay semantics, the DX checklist. |
| ACP Integration | The Agent Client Protocol as the default outward wire. Method mapping, capability matrix, where the protocol and the guide diverge. |
| FAQ | Question-and-answer index over the guide. Doubles as an entry point and as a conformance test — if a Q cannot be answered from the RFC, the RFC owes a clarification. |
Cross-cutting invariants
The following hold across every implementor:
| Layer | Invariant | Policy |
|---|---|---|
| Loop | One universal LLM loop drives any agent | Native vs AI SDK runtime path; cancel semantics |
| Agent | Agent-as-data: { manifest, tools, system_prompt } | Where the manifest lives; how it is compiled |
| Tools | Locked fundamental set; self-describing parameters | Which tools beyond the lock; how MCP is surfaced |
| Session | Three-table shape: chat_sessions / chat_messages / chat_parts | DB engine (SQLite default; alternatives); event-log opt-in |
| Streaming | AI SDK v6 chunk shape internally | Transport (SSE / IPC / WS); resume semantics |
| Outward protocol | ACP-conformant when an external client speaks ACP | Whether to ship an ACP adapter; which capabilities to advertise |
| Compaction | Auto-fire on overflow; user-fire on demand; failure modes named | Threshold tuning; which model summarizes; tail-budget |
| Skills | Discovered once; names + descriptions injected; body loaded lazily | Where to look; remote-skill fetch policy |
| Subagents | Same loop, gated by intersected permissions; deny rules unconditional | Recursion limit; whether parent inspects child |
| Sandbox | Capability surface, not free spawn | OS-level enforcement (seatbelt / landlock / VM); per-call sub-policies |
| Persistence | Save on every chunk by default | Storage engine; write-buffer trade-off |
| Model switch per turn | Allowed; carries to the next turn | What to do if new model has smaller context (force compaction vs error) |
Abstract
What matters most
The single decision that compounds across every other one is whether
the system treats an agent as data or as code. An
agent-as-data system publishes a config ({ manifest, tools, system_prompt, model? }) and runs one universal loop over it. An
agent-as-code system publishes a function per agent.
This guide picks agent-as-data because:
- Specialization is cheap. A "title" agent, a "summary" agent, a "compaction" agent are all the same loop with different config.
- The system is inspectable. Diff two agent configs to see what changed.
- The runtime is auditable in one place — one stream loop to read, one abort path, one permission gate.
Everything else in the guide follows from that choice.
Properties that follow
- Dynamic, task-agnostic workflow. A code agent and a design agent differ only by manifest. Adding a new agent type does not rewrite the loop, the session schema, the streaming layer, or the tool contract — it adds a config.
- Parallel workflow. A subagent is the same agent loop on a
child session. The parent's loop continues while children run;
results return as tool outputs. Parallelism is a function call,
not a new framework. See
subagents. - Safety and harness. The agent never touches the OS directly.
Every shell call, every file read, every network fetch goes
through a capability the runtime declared. The runtime sits on
top of a sandbox the host owns. See
environments. - Watchdog. A pre-execute hook on every tool call can refuse
with a reason that goes back to the model. Policy is host
configuration. See
tools / watchdog. - Web search. Locked tool by frequency, special case by
implementation (cannot be done in-house). The tool abstracts over
which provider the host wires up. See
tools / web search.
Stress tests
The guide is task-agnostic, but it pays to test it against the agents it targets:
- A code agent — long-running, file-heavy, shell-heavy,
occasional web search. Exercises
fs.*,shell.runwith sub-policies, rewind-to-edit, hour-long session compaction. - A design agent — file-light, model-call-heavy, tool-arg-heavy (vector diffs as tool calls). Exercises tool-output streaming, fast rewind, per-turn model swaps between cheap and premium tiers.
- A research / write agent — web-heavy, low write traffic. Exercises web search, subagent fan-out for parallel reading, queued sends.
- A scripted job agent — runs unattended on a queue. Exercises the watchdog, the canonical inspection format, permission policies with no human in the loop.
A change that breaks any of these four is a wrong move.
See also
- Agent Client Protocol — the upstream protocol the ACP integration page maps onto.
- AI SDK v6 — the chunk-shape substrate the guide pins.
- Grida bindings — the sibling layer that binds this RFC to Grida's actual surfaces (canvas, image tools, fundamentals as shipped).