Files
Tutorial-Codebase-Knowledge/docs/OpenManus/04_tool___toolcollection.md
2025-04-04 14:15:36 -04:00

316 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
layout: default
title: "Tool & ToolCollection"
parent: "OpenManus"
nav_order: 4
---
# Chapter 4: Tool / ToolCollection - Giving Your Agent Skills
In [Chapter 3: BaseAgent - The Agent Blueprint](03_baseagent.md), we learned how `BaseAgent` provides the standard structure for our agents, including a brain ([LLM](01_llm.md)) and memory ([Message / Memory](02_message___memory.md)). But what if we want our agent to do more than just *think* and *remember*? What if we want it to *act* in the world like searching the web, running code, or editing files?
This is where **Tools** come in!
## What Problem Do They Solve?
Imagine an agent trying to answer the question: "What's the weather like in Tokyo *right now*?"
The agent's LLM brain has a lot of general knowledge, but it doesn't have *real-time* access to the internet. It can't check the current weather. It needs a specific **capability** or **skill** to do that.
Similarly, if you ask an agent to "Write a python script that prints 'hello world' and save it to a file named `hello.py`," the agent needs the ability to:
1. Understand the request (using its LLM).
2. Write the code (using its LLM).
3. Actually *execute* code to create and write to a file.
Steps 1 and 2 are handled by the LLM, but step 3 requires interacting with the computer's file system something the LLM can't do directly.
**Tools** give agents these specific, actionable skills. A `ToolCollection` organizes these skills so the agent knows what it can do.
**Use Case:** Let's build towards an agent that can:
1. Search the web for today's date.
2. Tell the user the date.
This agent needs a "Web Search" tool.
## Key Concepts: Tools and Toolboxes
Let's break down the two main ideas:
### 1. `BaseTool`: The Blueprint for a Skill
Think of `BaseTool` (`app/tool/base.py`) as the *template* or *design specification* for any tool. It doesn't *do* anything itself, but it defines what every tool needs to have:
* **`name` (str):** A short, descriptive name for the tool (e.g., `web_search`, `file_writer`, `code_runner`). This is how the agent (or LLM) identifies the tool.
* **`description` (str):** A clear explanation of what the tool does, what it's good for, and when to use it. This is crucial for the LLM to decide *which* tool to use for a given task.
* **`parameters` (dict):** A definition of the inputs the tool expects. For example, a `web_search` tool needs a `query` input, and a `file_writer` needs a `path` and `content`. This is defined using a standard format called JSON Schema.
* **`execute` method:** An **abstract** method. This means `BaseTool` says "every tool *must* have an execute method", but each specific tool needs to provide its *own* instructions for how to actually perform the action.
You almost never use `BaseTool` directly. You use it as a starting point to create *actual*, usable tools.
### 2. Concrete Tools: The Actual Skills
These are specific classes that *inherit* from `BaseTool` and provide the real implementation for the `execute` method. OpenManus comes with several pre-built tools:
* **`WebSearch` (`app/tool/web_search.py`):** Searches the web using engines like Google, Bing, etc.
* **`Bash` (`app/tool/bash.py`):** Executes shell commands (like `ls`, `pwd`, `python script.py`).
* **`StrReplaceEditor` (`app/tool/str_replace_editor.py`):** Views, creates, and edits files by replacing text.
* **`BrowserUseTool` (`app/tool/browser_use_tool.py`):** Interacts with web pages like a user (clicking, filling forms, etc.).
* **`Terminate` (`app/tool/terminate.py`):** A special tool used by agents to signal they have finished their task.
Each of these defines its specific `name`, `description`, `parameters`, and implements the `execute` method to perform its unique action.
### 3. `ToolCollection`: The Agent's Toolbox
Think of a handyman. They don't just carry one tool; they have a toolbox filled with hammers, screwdrivers, wrenches, etc.
A `ToolCollection` (`app/tool/tool_collection.py`) is like that toolbox for an agent.
* It holds a list of specific tool instances (like `WebSearch`, `Bash`).
* It allows the agent (and its LLM) to see all the available tools and their descriptions.
* It provides a way to execute a specific tool by its name.
When an agent needs to perform an action, its LLM can look at the `ToolCollection`, read the descriptions of the available tools, choose the best one for the job, figure out the necessary inputs based on the tool's `parameters`, and then ask the `ToolCollection` to execute that tool with those inputs.
## How Do We Use Them?
Let's see how we can equip an agent with a simple tool. We'll create a basic "EchoTool" first.
**1. Creating a Concrete Tool (Inheriting from `BaseTool`):**
```python
# Import the necessary base class
from app.tool.base import BaseTool, ToolResult
# Define our simple tool
class EchoTool(BaseTool):
"""A simple tool that echoes the input text."""
name: str = "echo_message"
description: str = "Repeats back the text provided in the 'message' parameter."
parameters: dict = {
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The text to be echoed back.",
},
},
"required": ["message"], # Tells the LLM 'message' must be provided
}
# Implement the actual action
async def execute(self, message: str) -> ToolResult:
"""Takes a message and returns it."""
print(f"EchoTool executing with message: '{message}'")
# ToolResult is a standard way to return tool output
return ToolResult(output=f"You said: {message}")
# Create an instance of our tool
echo_tool_instance = EchoTool()
print(f"Tool Name: {echo_tool_instance.name}")
print(f"Tool Description: {echo_tool_instance.description}")
```
**Explanation:**
* We import `BaseTool` and `ToolResult` (a standard object for wrapping tool outputs).
* `class EchoTool(BaseTool):` declares that our `EchoTool` *is a type of* `BaseTool`.
* We define the `name`, `description`, and `parameters` according to the `BaseTool` template. The `parameters` structure tells the LLM what input is expected (`message` as a string) and that it's required.
* We implement `async def execute(self, message: str) -> ToolResult:`. This is the *specific* logic for our tool. It takes the `message` input and returns it wrapped in a `ToolResult`.
**Example Output:**
```
Tool Name: echo_message
Tool Description: Repeats back the text provided in the 'message' parameter.
```
**2. Creating a ToolCollection:**
Now, let's put our `EchoTool` and the built-in `WebSearch` tool into a toolbox.
```python
# Import ToolCollection and the tools we want
from app.tool import ToolCollection, WebSearch
# Assume EchoTool class is defined as above
# from your_module import EchoTool # Or wherever EchoTool is defined
# Create instances of the tools
echo_tool = EchoTool()
web_search_tool = WebSearch() # Uses default settings
# Create a ToolCollection containing these tools
my_toolbox = ToolCollection(echo_tool, web_search_tool)
# See the names of the tools in the collection
tool_names = [tool.name for tool in my_toolbox]
print(f"Tools in the toolbox: {tool_names}")
# Get the parameters needed for the LLM
tool_params_for_llm = my_toolbox.to_params()
print(f"\nParameters for LLM (showing first tool):")
import json
print(json.dumps(tool_params_for_llm[0], indent=2))
```
**Explanation:**
* We import `ToolCollection` and the specific tools (`WebSearch`, `EchoTool`).
* We create instances of the tools we need.
* `my_toolbox = ToolCollection(echo_tool, web_search_tool)` creates the collection, holding our tool instances.
* We can access the tools inside using `my_toolbox.tools` or iterate over `my_toolbox`.
* `my_toolbox.to_params()` is a crucial method. It formats the `name`, `description`, and `parameters` of *all* tools in the collection into a list of dictionaries. This specific format is exactly what the agent's [LLM](01_llm.md) needs (when using its `ask_tool` method) to understand which tools are available and how to use them.
**Example Output:**
```
Tools in the toolbox: ['echo_message', 'web_search']
Parameters for LLM (showing first tool):
{
"type": "function",
"function": {
"name": "echo_message",
"description": "Repeats back the text provided in the 'message' parameter.",
"parameters": {
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The text to be echoed back."
}
},
"required": [
"message"
]
}
}
}
```
**3. Agent Using the ToolCollection:**
Now, how does an agent like `ToolCallAgent` (a specific type of [BaseAgent](03_baseagent.md)) use this?
Conceptually (the real agent code is more complex):
1. The agent is configured with a `ToolCollection` (like `my_toolbox`).
2. When the agent needs to figure out the next step, it calls its LLM's `ask_tool` method.
3. It passes the conversation history ([Message / Memory](02_message___memory.md)) AND the output of `my_toolbox.to_params()` to the LLM.
4. The LLM looks at the conversation and the list of available tools (from `to_params()`). It reads the `description` of each tool to understand what it does.
5. If the LLM decides a tool is needed (e.g., the user asked "What's today's date?", the LLM sees the `web_search` tool is available and appropriate), it will generate a special response indicating:
* The `name` of the tool to use (e.g., `"web_search"`).
* The `arguments` (inputs) for the tool, based on its `parameters` (e.g., `{"query": "today's date"}`).
6. The agent receives this response from the LLM.
7. The agent then uses the `ToolCollection`'s `execute` method: `await my_toolbox.execute(name="web_search", tool_input={"query": "today's date"})`.
8. The `ToolCollection` finds the `WebSearch` tool instance in its internal `tool_map` and calls *its* `execute` method with the provided input.
9. The `WebSearch` tool runs, performs the actual web search, and returns the results (as a `ToolResult` or similar).
10. The agent takes this result, formats it as a `tool` message, adds it to its memory, and continues its thinking process (often asking the LLM again, now with the tool's result as context).
The `ToolCollection` acts as the crucial bridge between the LLM's *decision* to use a tool and the *actual execution* of that tool's code.
## Under the Hood: How `ToolCollection.execute` Works
Let's trace the flow when an agent asks its `ToolCollection` to run a tool:
```mermaid
sequenceDiagram
participant Agent as ToolCallAgent
participant LLM as LLM (Deciding Step)
participant Toolbox as ToolCollection
participant SpecificTool as e.g., WebSearch Tool
Agent->>+LLM: ask_tool(messages, tools=Toolbox.to_params())
LLM->>LLM: Analyzes messages & available tools
LLM-->>-Agent: Response indicating tool call: name='web_search', arguments={'query': '...'}
Agent->>+Toolbox: execute(name='web_search', tool_input={'query': '...'})
Toolbox->>Toolbox: Look up 'web_search' in internal tool_map
Note right of Toolbox: Finds the WebSearch instance
Toolbox->>+SpecificTool: Calls execute(**tool_input) on the found tool
SpecificTool->>SpecificTool: Performs actual web search action
SpecificTool-->>-Toolbox: Returns ToolResult (output="...", error=None)
Toolbox-->>-Agent: Returns the ToolResult
Agent->>Agent: Processes the result (adds to memory, etc.)
```
**Code Glimpse:**
Let's look at the `ToolCollection` itself in `app/tool/tool_collection.py`:
```python
# Simplified snippet from app/tool/tool_collection.py
from typing import Any, Dict, List, Tuple
from app.tool.base import BaseTool, ToolResult, ToolFailure
from app.exceptions import ToolError
class ToolCollection:
# ... (Config class) ...
tools: Tuple[BaseTool, ...] # Holds the tool instances
tool_map: Dict[str, BaseTool] # Maps name to tool instance for quick lookup
def __init__(self, *tools: BaseTool):
"""Initializes with a sequence of tools."""
self.tools = tools
# Create the map for easy lookup by name
self.tool_map = {tool.name: tool for tool in tools}
def to_params(self) -> List[Dict[str, Any]]:
"""Formats tools for the LLM API."""
# Calls the 'to_param()' method on each tool
return [tool.to_param() for tool in self.tools]
async def execute(
self, *, name: str, tool_input: Dict[str, Any] = None
) -> ToolResult:
"""Finds a tool by name and executes it."""
# 1. Find the tool instance using the name
tool = self.tool_map.get(name)
if not tool:
# Return a standard failure result if tool not found
return ToolFailure(error=f"Tool {name} is invalid")
# 2. Execute the tool's specific method
try:
# The 'tool(**tool_input)' calls the tool instance's __call__ method,
# which in BaseTool, calls the tool's 'execute' method.
# The ** unpacks the dictionary into keyword arguments.
result = await tool(**(tool_input or {}))
# Ensure the result is a ToolResult (or subclass)
return result if isinstance(result, ToolResult) else ToolResult(output=str(result))
except ToolError as e:
# Handle errors specific to tools
return ToolFailure(error=e.message)
except Exception as e:
# Handle unexpected errors during execution
return ToolFailure(error=f"Unexpected error executing tool {name}: {e}")
# ... other methods like add_tool, __iter__ ...
```
**Explanation:**
* The `__init__` method takes tool instances and stores them in `self.tools` (a tuple) and `self.tool_map` (a dictionary mapping name to instance).
* `to_params` iterates through `self.tools` and calls each tool's `to_param()` method (defined in `BaseTool`) to get the LLM-compatible format.
* `execute` is the core method used by agents:
* It uses `self.tool_map.get(name)` to quickly find the correct tool instance based on the requested name.
* If found, it calls `await tool(**(tool_input or {}))`. The `**` unpacks the `tool_input` dictionary into keyword arguments for the tool's `execute` method (e.g., `message="hello"` for our `EchoTool`, or `query="today's date"` for `WebSearch`).
* It wraps the execution in `try...except` blocks to catch errors and return a standardized `ToolFailure` result if anything goes wrong.
## Wrapping Up Chapter 4
We've learned how **Tools** give agents specific skills beyond basic language understanding.
* `BaseTool` is the abstract blueprint defining a tool's `name`, `description`, and expected `parameters`.
* Concrete tools (like `WebSearch`, `Bash`, or our custom `EchoTool`) inherit from `BaseTool` and implement the actual `execute` logic.
* `ToolCollection` acts as the agent's toolbox, holding various tools and providing methods (`to_params`, `execute`) for the agent (often guided by its [LLM](01_llm.md)) to discover and use these capabilities.
With tools, agents can interact with external systems, run code, access real-time data, and perform complex actions, making them much more powerful.
But how do we coordinate multiple agents, potentially using different tools, to work together on a larger task? That's where Flows come in.
Let's move on to [Chapter 5: BaseFlow](05_baseflow.md) to see how we orchestrate complex workflows involving multiple agents and steps.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)