docs: lead/worker tutorial and blog post (#2930)

Co-authored-by: Rizel Scarlett <rizel@squareup.com>
This commit is contained in:
Angie Jones
2025-06-15 17:15:31 -05:00
committed by GitHub
parent b2a602dc0f
commit c896103018
5 changed files with 206 additions and 3 deletions

View File

@@ -0,0 +1,100 @@
---
title: "Treating LLMs Like Tools in a Toolbox: A Multi-Model Approach to Smarter AI Agents"
description: How Goose uses multiple LLMs within a single task, optimizing for speed, cost, and reliability in AI agent workflows
authors:
- angie
---
![blog cover](multi-model-ai-agent.png)
Not every task needs a genius. And not every step should cost a fortune.
That's something we've learned the hard way while scaling Goose, our open source AI agent. The same model that's great at unpacking a gnarly planning request might totally fumble a basic shell command—or worse, it might burn through your token budget doing it.
So we asked ourselves: what if we could mix and match models in a single session?
Not just switching based on user commands, but building Goose with an actual system for routing tasks between different models, each playing to their strengths.
This is the gap the lead/worker model is designed to fill.
<!-- truncate -->
## The Problem With One-Model Sessions
Originally, every Goose session used a single model from start to finish. That worked fine for short tasks, but longer sessions were harder to tune:
* Go too cheap, and the model might miss nuance or break tools.
* Go too premium, and your cost graph starts looking like a ski slope.
There was no built-in way to adapt on the fly.
We saw this tension in real usage: agents would start strong, then stall out when the model struggled to follow through. Sometimes users would manually switch models mid-session. But that's not scalable, and definitely not agent like.
## Designing the Lead/Worker System
The core idea is simple:
* Start the session with a lead model that's strong at reasoning and planning.
* After a few back-and-forths between you and the model (what we call "turns"), hand off to a worker model that's faster and cheaper, but still capable.
* If the worker gets stuck, Goose can detect the failure and temporarily bring the lead back in.
It's turn-based, transparent, and automatic.
You can configure how many turns the lead handles upfront (`GOOSE_LEAD_TURNS`), how many consecutive failures trigger fallback (`GOOSE_LEAD_FAILURE_THRESHOLD`), and how long the fallback lasts before Goose retries the worker.
This gives you a flexible, resilient setup where each model gets used where it shines.
One of the trickiest parts of this feature was defining what failure looks like.
We didn't want Goose to swap models just because an API timed out. Instead, we focused on real task failures:
* Tool execution errors
* Syntax mistakes in generated code
* File not found or permission errors
* User corrections like "that's wrong" or "try again"
Goose tracks these signals and knows when to escalate. And once the fallback model stabilizes things, it switches back without missing a beat.
## The Value of Multi-Model Design
Cost savings are a nice side effect, but the real value is in how this shifts the mental model: treating AI models like tools in a toolbox, each with its own role to play. Some are built for strategy. Some are built for speed. The more your agent can switch between them intelligently, the closer it gets to feeling like a true collaborator.
We've found that this multi-model design unlocks new workflows:
* **Long dev sessions** where planning and execution ebb and flow
* **Cross-provider setups** (Claude for planning, OpenAI for execution)
* **Lower-friction defaults** for teams worried about LLM spend
It also opens the door for even smarter routing in the future—task-type switching, ensemble voting, maybe even letting Goose decide which model to call based on tool context.
## Try It Out
[Lead/worker mode](/docs/tutorials/lead-worker) is already available in Goose. To enable, simply export these variables with two models that have already been configured in Goose:
```bash
export GOOSE_LEAD_MODEL="gpt-4o"
export GOOSE_MODEL="claude-4-sonnet"
```
From there, Goose takes care of the hand off, the fallback, and the recovery. You just... keep vibing.
If you're curious how it all works under the hood, we've got a [full tutorial](/docs/tutorials/lead-worker).
---
We'll keep sharing what we're learning as we build toward more dynamic useful AI agents. If you're experimenting with multi-model setups, [share](https://discord.gg/block-opensource) what's working and what isn't.
<head>
<meta property="og:title" content="Treating LLMs Like Tools in a Toolbox: A Multi-Model Approach to Smarter AI Agents" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://block.github.io/goose/blog/2025/06/16/multi-model-in-goose" />
<meta property="og:description" content="How Goose uses multiple LLMs within a single task, optimizing for speed, cost, and reliability in AI agent workflows" />
<meta property="og:image" content="https://block.github.io/goose/goose/assets/images/multi-model-ai-agent-d408feaeba3e13cafdbfe9377980bc3d.png" />
<meta name="twitter:card" content="summary_large_image" />
<meta property="twitter:domain" content="block.github.io/goose" />
<meta name="twitter:title" content="Treating LLMs Like Tools in a Toolbox: A Multi-Model Approach to Smarter AI Agents" />
<meta name="twitter:description" content="How Goose uses multiple LLMs within a single task, optimizing for speed, cost, and reliability in AI agent workflows" />
<meta name="twitter:image" content="https://block.github.io/goose/goose/assets/images/multi-model-ai-agent-d408feaeba3e13cafdbfe9377980bc3d.png" />
</head>

Binary file not shown.

After

Width:  |  Height:  |  Size: 447 KiB

View File

@@ -50,7 +50,7 @@ export GOOSE_PROVIDER__API_KEY="your-api-key-here"
### Lead/Worker Model Configuration ### Lead/Worker Model Configuration
These variables configure a lead/worker model pattern where a powerful lead model handles initial planning and complex reasoning, then switches to a faster/cheaper worker model for execution. The switch happens automatically based on your settings. These variables configure a [lead/worker model pattern](/docs/tutorials/lead-worker) where a powerful lead model handles initial planning and complex reasoning, then switches to a faster/cheaper worker model for execution. The switch happens automatically based on your settings.
| Variable | Purpose | Values | Default | | Variable | Purpose | Values | Default |
|----------|---------|---------|---------| |----------|---------|---------|---------|
@@ -62,7 +62,7 @@ These variables configure a lead/worker model pattern where a powerful lead mode
A _turn_ is one complete prompt-response interaction. Here's how it works with the default settings: A _turn_ is one complete prompt-response interaction. Here's how it works with the default settings:
- Use the lead model for the first 3 turns - Use the lead model for the first 3 turns
- Use the worker model starting on the 4th turns - Use the worker model starting on the 4th turn
- Fallback to the lead model if the worker model struggles for 2 consecutive turns - Fallback to the lead model if the worker model struggles for 2 consecutive turns
- Use the lead model for 2 turns and then switch back to the worker model - Use the lead model for 2 turns and then switch back to the worker model

View File

@@ -0,0 +1,103 @@
---
description: Enable multi-modal functionality by pairing LLMs to complete your tasks
---
# Lead/Worker Multi-Model Setup
Goose supports a lead/worker model configuration that lets you pair two different AI models - one that's great at thinking and another that's fast at doing. This setup tackles a major pain point: premium models are powerful but expensive, while cheaper models are faster but can struggle with complex tasks. With lead/worker mode, you get the best of both worlds.
The lead/worker model is a smart hand-off system. The "lead" model (think: GPT-4 or Claude Opus) kicks things off, handling the early planning and big picture reasoning. Once the direction is set, Goose hands the task over to the "worker" model (like GPT-4o-mini or Claude Sonnet) to carry out the steps.
If things go sideways (e.g. the worker model gets confused or keeps making mistakes), Goose notices and automatically pulls the lead model back in to recover. Once things are back on track, the worker takes over again.
## Turn-Based System
A **turn** is one full interaction - your prompt and the model's response. Goose switches models based on turns:
- **Initial turns** (default: 3) go to the lead model
- **Subsequent turns** use the worker model
- **Fallback kicks in** if the worker model fails too many times in a row
- **Recovery** returns the session to the worker model once things stabilize
## Quick Example
You might configure Goose like this:
```bash
export GOOSE_LEAD_MODEL="gpt-4o" # strong reasoning
export GOOSE_MODEL="gpt-4o-mini" # fast execution
export GOOSE_PROVIDER="openai"
```
Goose will start with `gpt-4o` for the first three turns, then hand off to `gpt-4o-mini`. If the worker gets tripped up twice in a row, Goose temporarily switches back to the lead model for two fallback turns before trying the worker again.
## Configuration
:::tip
Ensure you have [added the LLMs to Goose](/docs/getting-started/providers)
:::
The only required setting is:
```bash
export GOOSE_LEAD_MODEL="gpt-4o"
```
That's it. Goose treats your regular `GOOSE_MODEL` as the worker model by default.
If you want more control, here are the optional knobs:
```bash
export GOOSE_LEAD_PROVIDER="anthropic" # If different from the main provider
export GOOSE_LEAD_TURNS=5 # Use lead model for first 5 turns
export GOOSE_LEAD_FAILURE_THRESHOLD=3 # Switch back to lead after 3 failures
export GOOSE_LEAD_FALLBACK_TURNS=2 # Use lead model for 2 turns before retrying worker
```
After making these configurations, the lead/worker models will be used in new CLI and Desktop sessions.
## What Counts as a Failure?
Goose is smart about detecting actual task failures, not just API errors. The fallback kicks in when the worker:
- Generates broken code (syntax errors, tool failures, missing files)
- Hits permission issues
- Gets corrected by the user ("that's wrong", "try again", etc.)
Meanwhile, technical hiccups like timeouts, auth issues, or service downtime don't trigger fallback mode. Goose just retries those quietly.
## Reasons to Use Lead/Worker
- **Lower your costs** by using cheaper models for routine execution
- **Speed things up** while still getting solid plans from more capable models
- **Mix and match providers** (e.g., Claude for reasoning, OpenAI for execution)
- **Handle long dev sessions** without worrying about model fatigue or performance
## Best Practices
If you're just getting started, the default settings will work fine. But here's how to tune things:
- Bump up `GOOSE_LEAD_TURNS` to 57 for heavier planning upfront
- Lower `GOOSE_LEAD_FAILURE_THRESHOLD` to 1 if you want Goose to correct issues quickly
- Choose a fast, lightweight worker model (Claude Haiku, GPT-4o-mini) for day-to-day tasks
For debugging, you can see model switching behavior by turning on this log:
```bash
export RUST_LOG=goose::providers::lead_worker=info
```
## Planning Mode Compatibility
Lead/worker mode also works alongside Goose's `/plan` command. You can even assign separate models for each:
```bash
export GOOSE_LEAD_MODEL="o1-preview" # used automatically
export GOOSE_PLANNER_MODEL="gpt-4o" # used when you explicitly call /plan
export GOOSE_MODEL="gpt-4o-mini" # used for execution
```
---
The lead/worker model helps you work smarter with Goose. You get high quality reasoning when it matters and save time and money on execution. And with the fallback system in place, you don't have to babysit it. It just works.

View File

@@ -1,5 +1,5 @@
--- ---
title: mbot MCP Extension title: mbot Extension
description: Control a MakeBlock mbot2 rover through MQTT and MCP as a Goose Extension description: Control a MakeBlock mbot2 rover through MQTT and MCP as a Goose Extension
--- ---