Subagent: Let Your Model Delegate the Busywork

Kenny Rogers ·

Subagent: Let Your Model Delegate the Busywork
On this page

Add openrouter:subagent to your tools array and your model can delegate self-contained tasks to a smaller, cheaper, faster worker model mid-generation. Summarize a document, extract structured data, draft boilerplate, reformat text: the worker handles it and passes the result back. Your frontier model keeps orchestrating without burning expensive tokens on routine work.

Try it in the chatroom, read the docs, or follow the cookbook recipe to wire it into your app.

{
  "model": "anthropic/claude-opus-4.8",
  "messages": [{ "role": "user", "content": "Audit this release: summarize the changelog, list breaking changes, and draft the announcement." }],
  "tools": [
    {
      "type": "openrouter:subagent",
      "parameters": { "model": "z-ai/glm-5.2" }
    }
  ]
}

The model decides when to delegate. It only invokes the subagent for tasks that don’t need its full capability.

Find subagent opportunities in your codebase

Paste this prompt into your coding agent to have it scan your project for places where subagent delegation would cut costs:

Read through this codebase and identify places where an OpenRouter API call
could benefit from the openrouter:subagent server tool. Look for patterns where
a frontier model is doing mechanical sub-tasks inline: summarization, data
extraction, reformatting, boilerplate generation, or schema conversion.

For each candidate, explain:
1. Which file and function
2. What the sub-task is
3. Why it's a good fit for delegation (self-contained, predictable output, doesn't need the full conversation context)
4. A code snippet showing how to add the subagent tool to that call

Reference docs: https://openrouter.ai/docs/guides/features/server-tools/subagent
Cookbook recipe: https://openrouter.ai/docs/cookbook/building-agents/subagent-server-tool

Frontier brain, budget hands

Claude Opus 4.8 costs $5 per million input tokens. GPT-5.5 costs $5. GLM 5.2 costs $1.40. That’s a 3.6x spread on input between frontier and worker, 5.7x on output. (Claude Fable 5 was $10/$50 per M tokens before it got yanked, RIP.)

A frontier model doing a code review doesn’t need to spend its own tokens summarizing a 2,000-line changelog or reformatting a JSON blob. Those are mechanical tasks with clear instructions and predictable output. The subagent handles them at GLM prices while the orchestrator focuses on the parts that actually require reasoning.

In a complex agentic workflow with 20 tool calls, maybe 5-8 are subagent delegations: summarization, data extraction, template filling, format conversion. The frontier model orchestrates and judges. You’ve cut your per-request cost without touching the quality ceiling on the hard parts.

How it works under the hood

The worker model sees only what the delegating model explicitly passes in the task_description. No parent conversation, no prior context, no memory between tasks. Each delegation is a clean, isolated unit of work.

  1. Any model can be the worker. Pin it with parameters.model (anything in the model catalog works). Open-source models like z-ai/glm-5.2 work well for mechanical tasks. If you don’t specify a model, it falls back to the outer request model.

  2. Workers get their own tools. Give the worker openrouter:web_search and it can ground its output in fresh sources before responding. The worker runs its own tool loop internally; only the final text comes back to your model.

  3. Recursion is blocked. The subagent can’t call itself. A depth header and self-reference check prevent unbounded nesting, and delegations are capped at 10 per request.

{
  "tools": [
    {
      "type": "openrouter:subagent",
      "parameters": {
        "model": "z-ai/glm-5.2",
        "instructions": "You are a fast, focused worker. Complete the task exactly as described.",
        "tools": [{ "type": "openrouter:web_search" }]
      }
    }
  ]
}

Subagent vs. advisor

These two tools point in opposite directions. The advisor escalates hard decisions to a stronger model. The subagent delegates routine work to a cheaper one.

AdvisorSubagent
DirectionUp (consult a stronger model)Down (delegate to a cheaper model)
Worker choiceModel picks per callFixed by tool definition
Use case”Help me think through this""Do this mechanical task for me”
MemoryCross-request transcript replayNone (each task is isolated)

Use both in the same request. Your frontier model consults the advisor on architectural decisions and delegates summarization to the subagent. Different tools for different kinds of work.

Billing

Subagent tokens bill at the worker model’s rates, separate from the orchestrator. If your orchestrator is Claude Opus 4.8 ($5/$25 per M tokens) and the worker is GLM 5.2 ($1.40/$4.40 per M tokens), each model’s tokens bill at their own price. Both show up on your activity page.

Get started

One line in your tools array:

{ "type": "openrouter:subagent", "parameters": { "model": "z-ai/glm-5.2" } }

The model decides when to use it. Read the full docs for all parameters, worker tools, and recursion details, or follow the cookbook recipe for a working integration.