mirror of
https://github.com/aljazceru/Tutorial-Codebase-Knowledge.git
synced 2025-12-19 23:44:19 +01:00
316 lines
15 KiB
Markdown
316 lines
15 KiB
Markdown
---
|
||
layout: default
|
||
title: "Tool & ToolCollection"
|
||
parent: "OpenManus"
|
||
nav_order: 4
|
||
---
|
||
|
||
# Chapter 4: Tool / ToolCollection - Giving Your Agent Skills
|
||
|
||
In [Chapter 3: BaseAgent - The Agent Blueprint](03_baseagent.md), we learned how `BaseAgent` provides the standard structure for our agents, including a brain ([LLM](01_llm.md)) and memory ([Message / Memory](02_message___memory.md)). But what if we want our agent to do more than just *think* and *remember*? What if we want it to *act* in the world – like searching the web, running code, or editing files?
|
||
|
||
This is where **Tools** come in!
|
||
|
||
## What Problem Do They Solve?
|
||
|
||
Imagine an agent trying to answer the question: "What's the weather like in Tokyo *right now*?"
|
||
|
||
The agent's LLM brain has a lot of general knowledge, but it doesn't have *real-time* access to the internet. It can't check the current weather. It needs a specific **capability** or **skill** to do that.
|
||
|
||
Similarly, if you ask an agent to "Write a python script that prints 'hello world' and save it to a file named `hello.py`," the agent needs the ability to:
|
||
1. Understand the request (using its LLM).
|
||
2. Write the code (using its LLM).
|
||
3. Actually *execute* code to create and write to a file.
|
||
|
||
Steps 1 and 2 are handled by the LLM, but step 3 requires interacting with the computer's file system – something the LLM can't do directly.
|
||
|
||
**Tools** give agents these specific, actionable skills. A `ToolCollection` organizes these skills so the agent knows what it can do.
|
||
|
||
**Use Case:** Let's build towards an agent that can:
|
||
1. Search the web for today's date.
|
||
2. Tell the user the date.
|
||
|
||
This agent needs a "Web Search" tool.
|
||
|
||
## Key Concepts: Tools and Toolboxes
|
||
|
||
Let's break down the two main ideas:
|
||
|
||
### 1. `BaseTool`: The Blueprint for a Skill
|
||
|
||
Think of `BaseTool` (`app/tool/base.py`) as the *template* or *design specification* for any tool. It doesn't *do* anything itself, but it defines what every tool needs to have:
|
||
|
||
* **`name` (str):** A short, descriptive name for the tool (e.g., `web_search`, `file_writer`, `code_runner`). This is how the agent (or LLM) identifies the tool.
|
||
* **`description` (str):** A clear explanation of what the tool does, what it's good for, and when to use it. This is crucial for the LLM to decide *which* tool to use for a given task.
|
||
* **`parameters` (dict):** A definition of the inputs the tool expects. For example, a `web_search` tool needs a `query` input, and a `file_writer` needs a `path` and `content`. This is defined using a standard format called JSON Schema.
|
||
* **`execute` method:** An **abstract** method. This means `BaseTool` says "every tool *must* have an execute method", but each specific tool needs to provide its *own* instructions for how to actually perform the action.
|
||
|
||
You almost never use `BaseTool` directly. You use it as a starting point to create *actual*, usable tools.
|
||
|
||
### 2. Concrete Tools: The Actual Skills
|
||
|
||
These are specific classes that *inherit* from `BaseTool` and provide the real implementation for the `execute` method. OpenManus comes with several pre-built tools:
|
||
|
||
* **`WebSearch` (`app/tool/web_search.py`):** Searches the web using engines like Google, Bing, etc.
|
||
* **`Bash` (`app/tool/bash.py`):** Executes shell commands (like `ls`, `pwd`, `python script.py`).
|
||
* **`StrReplaceEditor` (`app/tool/str_replace_editor.py`):** Views, creates, and edits files by replacing text.
|
||
* **`BrowserUseTool` (`app/tool/browser_use_tool.py`):** Interacts with web pages like a user (clicking, filling forms, etc.).
|
||
* **`Terminate` (`app/tool/terminate.py`):** A special tool used by agents to signal they have finished their task.
|
||
|
||
Each of these defines its specific `name`, `description`, `parameters`, and implements the `execute` method to perform its unique action.
|
||
|
||
### 3. `ToolCollection`: The Agent's Toolbox
|
||
|
||
Think of a handyman. They don't just carry one tool; they have a toolbox filled with hammers, screwdrivers, wrenches, etc.
|
||
|
||
A `ToolCollection` (`app/tool/tool_collection.py`) is like that toolbox for an agent.
|
||
|
||
* It holds a list of specific tool instances (like `WebSearch`, `Bash`).
|
||
* It allows the agent (and its LLM) to see all the available tools and their descriptions.
|
||
* It provides a way to execute a specific tool by its name.
|
||
|
||
When an agent needs to perform an action, its LLM can look at the `ToolCollection`, read the descriptions of the available tools, choose the best one for the job, figure out the necessary inputs based on the tool's `parameters`, and then ask the `ToolCollection` to execute that tool with those inputs.
|
||
|
||
## How Do We Use Them?
|
||
|
||
Let's see how we can equip an agent with a simple tool. We'll create a basic "EchoTool" first.
|
||
|
||
**1. Creating a Concrete Tool (Inheriting from `BaseTool`):**
|
||
|
||
```python
|
||
# Import the necessary base class
|
||
from app.tool.base import BaseTool, ToolResult
|
||
|
||
# Define our simple tool
|
||
class EchoTool(BaseTool):
|
||
"""A simple tool that echoes the input text."""
|
||
|
||
name: str = "echo_message"
|
||
description: str = "Repeats back the text provided in the 'message' parameter."
|
||
parameters: dict = {
|
||
"type": "object",
|
||
"properties": {
|
||
"message": {
|
||
"type": "string",
|
||
"description": "The text to be echoed back.",
|
||
},
|
||
},
|
||
"required": ["message"], # Tells the LLM 'message' must be provided
|
||
}
|
||
|
||
# Implement the actual action
|
||
async def execute(self, message: str) -> ToolResult:
|
||
"""Takes a message and returns it."""
|
||
print(f"EchoTool executing with message: '{message}'")
|
||
# ToolResult is a standard way to return tool output
|
||
return ToolResult(output=f"You said: {message}")
|
||
|
||
# Create an instance of our tool
|
||
echo_tool_instance = EchoTool()
|
||
|
||
print(f"Tool Name: {echo_tool_instance.name}")
|
||
print(f"Tool Description: {echo_tool_instance.description}")
|
||
```
|
||
|
||
**Explanation:**
|
||
|
||
* We import `BaseTool` and `ToolResult` (a standard object for wrapping tool outputs).
|
||
* `class EchoTool(BaseTool):` declares that our `EchoTool` *is a type of* `BaseTool`.
|
||
* We define the `name`, `description`, and `parameters` according to the `BaseTool` template. The `parameters` structure tells the LLM what input is expected (`message` as a string) and that it's required.
|
||
* We implement `async def execute(self, message: str) -> ToolResult:`. This is the *specific* logic for our tool. It takes the `message` input and returns it wrapped in a `ToolResult`.
|
||
|
||
**Example Output:**
|
||
|
||
```
|
||
Tool Name: echo_message
|
||
Tool Description: Repeats back the text provided in the 'message' parameter.
|
||
```
|
||
|
||
**2. Creating a ToolCollection:**
|
||
|
||
Now, let's put our `EchoTool` and the built-in `WebSearch` tool into a toolbox.
|
||
|
||
```python
|
||
# Import ToolCollection and the tools we want
|
||
from app.tool import ToolCollection, WebSearch
|
||
# Assume EchoTool class is defined as above
|
||
# from your_module import EchoTool # Or wherever EchoTool is defined
|
||
|
||
# Create instances of the tools
|
||
echo_tool = EchoTool()
|
||
web_search_tool = WebSearch() # Uses default settings
|
||
|
||
# Create a ToolCollection containing these tools
|
||
my_toolbox = ToolCollection(echo_tool, web_search_tool)
|
||
|
||
# See the names of the tools in the collection
|
||
tool_names = [tool.name for tool in my_toolbox]
|
||
print(f"Tools in the toolbox: {tool_names}")
|
||
|
||
# Get the parameters needed for the LLM
|
||
tool_params_for_llm = my_toolbox.to_params()
|
||
print(f"\nParameters for LLM (showing first tool):")
|
||
import json
|
||
print(json.dumps(tool_params_for_llm[0], indent=2))
|
||
```
|
||
|
||
**Explanation:**
|
||
|
||
* We import `ToolCollection` and the specific tools (`WebSearch`, `EchoTool`).
|
||
* We create instances of the tools we need.
|
||
* `my_toolbox = ToolCollection(echo_tool, web_search_tool)` creates the collection, holding our tool instances.
|
||
* We can access the tools inside using `my_toolbox.tools` or iterate over `my_toolbox`.
|
||
* `my_toolbox.to_params()` is a crucial method. It formats the `name`, `description`, and `parameters` of *all* tools in the collection into a list of dictionaries. This specific format is exactly what the agent's [LLM](01_llm.md) needs (when using its `ask_tool` method) to understand which tools are available and how to use them.
|
||
|
||
**Example Output:**
|
||
|
||
```
|
||
Tools in the toolbox: ['echo_message', 'web_search']
|
||
|
||
Parameters for LLM (showing first tool):
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "echo_message",
|
||
"description": "Repeats back the text provided in the 'message' parameter.",
|
||
"parameters": {
|
||
"type": "object",
|
||
"properties": {
|
||
"message": {
|
||
"type": "string",
|
||
"description": "The text to be echoed back."
|
||
}
|
||
},
|
||
"required": [
|
||
"message"
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**3. Agent Using the ToolCollection:**
|
||
|
||
Now, how does an agent like `ToolCallAgent` (a specific type of [BaseAgent](03_baseagent.md)) use this?
|
||
|
||
Conceptually (the real agent code is more complex):
|
||
|
||
1. The agent is configured with a `ToolCollection` (like `my_toolbox`).
|
||
2. When the agent needs to figure out the next step, it calls its LLM's `ask_tool` method.
|
||
3. It passes the conversation history ([Message / Memory](02_message___memory.md)) AND the output of `my_toolbox.to_params()` to the LLM.
|
||
4. The LLM looks at the conversation and the list of available tools (from `to_params()`). It reads the `description` of each tool to understand what it does.
|
||
5. If the LLM decides a tool is needed (e.g., the user asked "What's today's date?", the LLM sees the `web_search` tool is available and appropriate), it will generate a special response indicating:
|
||
* The `name` of the tool to use (e.g., `"web_search"`).
|
||
* The `arguments` (inputs) for the tool, based on its `parameters` (e.g., `{"query": "today's date"}`).
|
||
6. The agent receives this response from the LLM.
|
||
7. The agent then uses the `ToolCollection`'s `execute` method: `await my_toolbox.execute(name="web_search", tool_input={"query": "today's date"})`.
|
||
8. The `ToolCollection` finds the `WebSearch` tool instance in its internal `tool_map` and calls *its* `execute` method with the provided input.
|
||
9. The `WebSearch` tool runs, performs the actual web search, and returns the results (as a `ToolResult` or similar).
|
||
10. The agent takes this result, formats it as a `tool` message, adds it to its memory, and continues its thinking process (often asking the LLM again, now with the tool's result as context).
|
||
|
||
The `ToolCollection` acts as the crucial bridge between the LLM's *decision* to use a tool and the *actual execution* of that tool's code.
|
||
|
||
## Under the Hood: How `ToolCollection.execute` Works
|
||
|
||
Let's trace the flow when an agent asks its `ToolCollection` to run a tool:
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Agent as ToolCallAgent
|
||
participant LLM as LLM (Deciding Step)
|
||
participant Toolbox as ToolCollection
|
||
participant SpecificTool as e.g., WebSearch Tool
|
||
|
||
Agent->>+LLM: ask_tool(messages, tools=Toolbox.to_params())
|
||
LLM->>LLM: Analyzes messages & available tools
|
||
LLM-->>-Agent: Response indicating tool call: name='web_search', arguments={'query': '...'}
|
||
Agent->>+Toolbox: execute(name='web_search', tool_input={'query': '...'})
|
||
Toolbox->>Toolbox: Look up 'web_search' in internal tool_map
|
||
Note right of Toolbox: Finds the WebSearch instance
|
||
Toolbox->>+SpecificTool: Calls execute(**tool_input) on the found tool
|
||
SpecificTool->>SpecificTool: Performs actual web search action
|
||
SpecificTool-->>-Toolbox: Returns ToolResult (output="...", error=None)
|
||
Toolbox-->>-Agent: Returns the ToolResult
|
||
Agent->>Agent: Processes the result (adds to memory, etc.)
|
||
```
|
||
|
||
**Code Glimpse:**
|
||
|
||
Let's look at the `ToolCollection` itself in `app/tool/tool_collection.py`:
|
||
|
||
```python
|
||
# Simplified snippet from app/tool/tool_collection.py
|
||
from typing import Any, Dict, List, Tuple
|
||
from app.tool.base import BaseTool, ToolResult, ToolFailure
|
||
from app.exceptions import ToolError
|
||
|
||
class ToolCollection:
|
||
# ... (Config class) ...
|
||
|
||
tools: Tuple[BaseTool, ...] # Holds the tool instances
|
||
tool_map: Dict[str, BaseTool] # Maps name to tool instance for quick lookup
|
||
|
||
def __init__(self, *tools: BaseTool):
|
||
"""Initializes with a sequence of tools."""
|
||
self.tools = tools
|
||
# Create the map for easy lookup by name
|
||
self.tool_map = {tool.name: tool for tool in tools}
|
||
|
||
def to_params(self) -> List[Dict[str, Any]]:
|
||
"""Formats tools for the LLM API."""
|
||
# Calls the 'to_param()' method on each tool
|
||
return [tool.to_param() for tool in self.tools]
|
||
|
||
async def execute(
|
||
self, *, name: str, tool_input: Dict[str, Any] = None
|
||
) -> ToolResult:
|
||
"""Finds a tool by name and executes it."""
|
||
# 1. Find the tool instance using the name
|
||
tool = self.tool_map.get(name)
|
||
if not tool:
|
||
# Return a standard failure result if tool not found
|
||
return ToolFailure(error=f"Tool {name} is invalid")
|
||
|
||
# 2. Execute the tool's specific method
|
||
try:
|
||
# The 'tool(**tool_input)' calls the tool instance's __call__ method,
|
||
# which in BaseTool, calls the tool's 'execute' method.
|
||
# The ** unpacks the dictionary into keyword arguments.
|
||
result = await tool(**(tool_input or {}))
|
||
# Ensure the result is a ToolResult (or subclass)
|
||
return result if isinstance(result, ToolResult) else ToolResult(output=str(result))
|
||
except ToolError as e:
|
||
# Handle errors specific to tools
|
||
return ToolFailure(error=e.message)
|
||
except Exception as e:
|
||
# Handle unexpected errors during execution
|
||
return ToolFailure(error=f"Unexpected error executing tool {name}: {e}")
|
||
|
||
# ... other methods like add_tool, __iter__ ...
|
||
```
|
||
|
||
**Explanation:**
|
||
|
||
* The `__init__` method takes tool instances and stores them in `self.tools` (a tuple) and `self.tool_map` (a dictionary mapping name to instance).
|
||
* `to_params` iterates through `self.tools` and calls each tool's `to_param()` method (defined in `BaseTool`) to get the LLM-compatible format.
|
||
* `execute` is the core method used by agents:
|
||
* It uses `self.tool_map.get(name)` to quickly find the correct tool instance based on the requested name.
|
||
* If found, it calls `await tool(**(tool_input or {}))`. The `**` unpacks the `tool_input` dictionary into keyword arguments for the tool's `execute` method (e.g., `message="hello"` for our `EchoTool`, or `query="today's date"` for `WebSearch`).
|
||
* It wraps the execution in `try...except` blocks to catch errors and return a standardized `ToolFailure` result if anything goes wrong.
|
||
|
||
## Wrapping Up Chapter 4
|
||
|
||
We've learned how **Tools** give agents specific skills beyond basic language understanding.
|
||
* `BaseTool` is the abstract blueprint defining a tool's `name`, `description`, and expected `parameters`.
|
||
* Concrete tools (like `WebSearch`, `Bash`, or our custom `EchoTool`) inherit from `BaseTool` and implement the actual `execute` logic.
|
||
* `ToolCollection` acts as the agent's toolbox, holding various tools and providing methods (`to_params`, `execute`) for the agent (often guided by its [LLM](01_llm.md)) to discover and use these capabilities.
|
||
|
||
With tools, agents can interact with external systems, run code, access real-time data, and perform complex actions, making them much more powerful.
|
||
|
||
But how do we coordinate multiple agents, potentially using different tools, to work together on a larger task? That's where Flows come in.
|
||
|
||
Let's move on to [Chapter 5: BaseFlow](05_baseflow.md) to see how we orchestrate complex workflows involving multiple agents and steps.
|
||
|
||
---
|
||
|
||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) |