init push

This commit is contained in:
zachary62
2025-04-04 13:03:54 -04:00
parent e62ee2cb13
commit 2ebad5e5f2
160 changed files with 2 additions and 0 deletions

View File

@@ -0,0 +1,309 @@
# Chapter 4: Tool / ToolCollection - Giving Your Agent Skills
In [Chapter 3: BaseAgent - The Agent Blueprint](03_baseagent.md), we learned how `BaseAgent` provides the standard structure for our agents, including a brain ([LLM](01_llm.md)) and memory ([Message / Memory](02_message___memory.md)). But what if we want our agent to do more than just *think* and *remember*? What if we want it to *act* in the world like searching the web, running code, or editing files?
This is where **Tools** come in!
## What Problem Do They Solve?
Imagine an agent trying to answer the question: "What's the weather like in Tokyo *right now*?"
The agent's LLM brain has a lot of general knowledge, but it doesn't have *real-time* access to the internet. It can't check the current weather. It needs a specific **capability** or **skill** to do that.
Similarly, if you ask an agent to "Write a python script that prints 'hello world' and save it to a file named `hello.py`," the agent needs the ability to:
1. Understand the request (using its LLM).
2. Write the code (using its LLM).
3. Actually *execute* code to create and write to a file.
Steps 1 and 2 are handled by the LLM, but step 3 requires interacting with the computer's file system something the LLM can't do directly.
**Tools** give agents these specific, actionable skills. A `ToolCollection` organizes these skills so the agent knows what it can do.
**Use Case:** Let's build towards an agent that can:
1. Search the web for today's date.
2. Tell the user the date.
This agent needs a "Web Search" tool.
## Key Concepts: Tools and Toolboxes
Let's break down the two main ideas:
### 1. `BaseTool`: The Blueprint for a Skill
Think of `BaseTool` (`app/tool/base.py`) as the *template* or *design specification* for any tool. It doesn't *do* anything itself, but it defines what every tool needs to have:
* **`name` (str):** A short, descriptive name for the tool (e.g., `web_search`, `file_writer`, `code_runner`). This is how the agent (or LLM) identifies the tool.
* **`description` (str):** A clear explanation of what the tool does, what it's good for, and when to use it. This is crucial for the LLM to decide *which* tool to use for a given task.
* **`parameters` (dict):** A definition of the inputs the tool expects. For example, a `web_search` tool needs a `query` input, and a `file_writer` needs a `path` and `content`. This is defined using a standard format called JSON Schema.
* **`execute` method:** An **abstract** method. This means `BaseTool` says "every tool *must* have an execute method", but each specific tool needs to provide its *own* instructions for how to actually perform the action.
You almost never use `BaseTool` directly. You use it as a starting point to create *actual*, usable tools.
### 2. Concrete Tools: The Actual Skills
These are specific classes that *inherit* from `BaseTool` and provide the real implementation for the `execute` method. OpenManus comes with several pre-built tools:
* **`WebSearch` (`app/tool/web_search.py`):** Searches the web using engines like Google, Bing, etc.
* **`Bash` (`app/tool/bash.py`):** Executes shell commands (like `ls`, `pwd`, `python script.py`).
* **`StrReplaceEditor` (`app/tool/str_replace_editor.py`):** Views, creates, and edits files by replacing text.
* **`BrowserUseTool` (`app/tool/browser_use_tool.py`):** Interacts with web pages like a user (clicking, filling forms, etc.).
* **`Terminate` (`app/tool/terminate.py`):** A special tool used by agents to signal they have finished their task.
Each of these defines its specific `name`, `description`, `parameters`, and implements the `execute` method to perform its unique action.
### 3. `ToolCollection`: The Agent's Toolbox
Think of a handyman. They don't just carry one tool; they have a toolbox filled with hammers, screwdrivers, wrenches, etc.
A `ToolCollection` (`app/tool/tool_collection.py`) is like that toolbox for an agent.
* It holds a list of specific tool instances (like `WebSearch`, `Bash`).
* It allows the agent (and its LLM) to see all the available tools and their descriptions.
* It provides a way to execute a specific tool by its name.
When an agent needs to perform an action, its LLM can look at the `ToolCollection`, read the descriptions of the available tools, choose the best one for the job, figure out the necessary inputs based on the tool's `parameters`, and then ask the `ToolCollection` to execute that tool with those inputs.
## How Do We Use Them?
Let's see how we can equip an agent with a simple tool. We'll create a basic "EchoTool" first.
**1. Creating a Concrete Tool (Inheriting from `BaseTool`):**
```python
# Import the necessary base class
from app.tool.base import BaseTool, ToolResult
# Define our simple tool
class EchoTool(BaseTool):
"""A simple tool that echoes the input text."""
name: str = "echo_message"
description: str = "Repeats back the text provided in the 'message' parameter."
parameters: dict = {
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The text to be echoed back.",
},
},
"required": ["message"], # Tells the LLM 'message' must be provided
}
# Implement the actual action
async def execute(self, message: str) -> ToolResult:
"""Takes a message and returns it."""
print(f"EchoTool executing with message: '{message}'")
# ToolResult is a standard way to return tool output
return ToolResult(output=f"You said: {message}")
# Create an instance of our tool
echo_tool_instance = EchoTool()
print(f"Tool Name: {echo_tool_instance.name}")
print(f"Tool Description: {echo_tool_instance.description}")
```
**Explanation:**
* We import `BaseTool` and `ToolResult` (a standard object for wrapping tool outputs).
* `class EchoTool(BaseTool):` declares that our `EchoTool` *is a type of* `BaseTool`.
* We define the `name`, `description`, and `parameters` according to the `BaseTool` template. The `parameters` structure tells the LLM what input is expected (`message` as a string) and that it's required.
* We implement `async def execute(self, message: str) -> ToolResult:`. This is the *specific* logic for our tool. It takes the `message` input and returns it wrapped in a `ToolResult`.
**Example Output:**
```
Tool Name: echo_message
Tool Description: Repeats back the text provided in the 'message' parameter.
```
**2. Creating a ToolCollection:**
Now, let's put our `EchoTool` and the built-in `WebSearch` tool into a toolbox.
```python
# Import ToolCollection and the tools we want
from app.tool import ToolCollection, WebSearch
# Assume EchoTool class is defined as above
# from your_module import EchoTool # Or wherever EchoTool is defined
# Create instances of the tools
echo_tool = EchoTool()
web_search_tool = WebSearch() # Uses default settings
# Create a ToolCollection containing these tools
my_toolbox = ToolCollection(echo_tool, web_search_tool)
# See the names of the tools in the collection
tool_names = [tool.name for tool in my_toolbox]
print(f"Tools in the toolbox: {tool_names}")
# Get the parameters needed for the LLM
tool_params_for_llm = my_toolbox.to_params()
print(f"\nParameters for LLM (showing first tool):")
import json
print(json.dumps(tool_params_for_llm[0], indent=2))
```
**Explanation:**
* We import `ToolCollection` and the specific tools (`WebSearch`, `EchoTool`).
* We create instances of the tools we need.
* `my_toolbox = ToolCollection(echo_tool, web_search_tool)` creates the collection, holding our tool instances.
* We can access the tools inside using `my_toolbox.tools` or iterate over `my_toolbox`.
* `my_toolbox.to_params()` is a crucial method. It formats the `name`, `description`, and `parameters` of *all* tools in the collection into a list of dictionaries. This specific format is exactly what the agent's [LLM](01_llm.md) needs (when using its `ask_tool` method) to understand which tools are available and how to use them.
**Example Output:**
```
Tools in the toolbox: ['echo_message', 'web_search']
Parameters for LLM (showing first tool):
{
"type": "function",
"function": {
"name": "echo_message",
"description": "Repeats back the text provided in the 'message' parameter.",
"parameters": {
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The text to be echoed back."
}
},
"required": [
"message"
]
}
}
}
```
**3. Agent Using the ToolCollection:**
Now, how does an agent like `ToolCallAgent` (a specific type of [BaseAgent](03_baseagent.md)) use this?
Conceptually (the real agent code is more complex):
1. The agent is configured with a `ToolCollection` (like `my_toolbox`).
2. When the agent needs to figure out the next step, it calls its LLM's `ask_tool` method.
3. It passes the conversation history ([Message / Memory](02_message___memory.md)) AND the output of `my_toolbox.to_params()` to the LLM.
4. The LLM looks at the conversation and the list of available tools (from `to_params()`). It reads the `description` of each tool to understand what it does.
5. If the LLM decides a tool is needed (e.g., the user asked "What's today's date?", the LLM sees the `web_search` tool is available and appropriate), it will generate a special response indicating:
* The `name` of the tool to use (e.g., `"web_search"`).
* The `arguments` (inputs) for the tool, based on its `parameters` (e.g., `{"query": "today's date"}`).
6. The agent receives this response from the LLM.
7. The agent then uses the `ToolCollection`'s `execute` method: `await my_toolbox.execute(name="web_search", tool_input={"query": "today's date"})`.
8. The `ToolCollection` finds the `WebSearch` tool instance in its internal `tool_map` and calls *its* `execute` method with the provided input.
9. The `WebSearch` tool runs, performs the actual web search, and returns the results (as a `ToolResult` or similar).
10. The agent takes this result, formats it as a `tool` message, adds it to its memory, and continues its thinking process (often asking the LLM again, now with the tool's result as context).
The `ToolCollection` acts as the crucial bridge between the LLM's *decision* to use a tool and the *actual execution* of that tool's code.
## Under the Hood: How `ToolCollection.execute` Works
Let's trace the flow when an agent asks its `ToolCollection` to run a tool:
```mermaid
sequenceDiagram
participant Agent as ToolCallAgent
participant LLM as LLM (Deciding Step)
participant Toolbox as ToolCollection
participant SpecificTool as e.g., WebSearch Tool
Agent->>+LLM: ask_tool(messages, tools=Toolbox.to_params())
LLM->>LLM: Analyzes messages & available tools
LLM-->>-Agent: Response indicating tool call: name='web_search', arguments={'query': '...'}
Agent->>+Toolbox: execute(name='web_search', tool_input={'query': '...'})
Toolbox->>Toolbox: Look up 'web_search' in internal tool_map
Note right of Toolbox: Finds the WebSearch instance
Toolbox->>+SpecificTool: Calls execute(**tool_input) on the found tool
SpecificTool->>SpecificTool: Performs actual web search action
SpecificTool-->>-Toolbox: Returns ToolResult (output="...", error=None)
Toolbox-->>-Agent: Returns the ToolResult
Agent->>Agent: Processes the result (adds to memory, etc.)
```
**Code Glimpse:**
Let's look at the `ToolCollection` itself in `app/tool/tool_collection.py`:
```python
# Simplified snippet from app/tool/tool_collection.py
from typing import Any, Dict, List, Tuple
from app.tool.base import BaseTool, ToolResult, ToolFailure
from app.exceptions import ToolError
class ToolCollection:
# ... (Config class) ...
tools: Tuple[BaseTool, ...] # Holds the tool instances
tool_map: Dict[str, BaseTool] # Maps name to tool instance for quick lookup
def __init__(self, *tools: BaseTool):
"""Initializes with a sequence of tools."""
self.tools = tools
# Create the map for easy lookup by name
self.tool_map = {tool.name: tool for tool in tools}
def to_params(self) -> List[Dict[str, Any]]:
"""Formats tools for the LLM API."""
# Calls the 'to_param()' method on each tool
return [tool.to_param() for tool in self.tools]
async def execute(
self, *, name: str, tool_input: Dict[str, Any] = None
) -> ToolResult:
"""Finds a tool by name and executes it."""
# 1. Find the tool instance using the name
tool = self.tool_map.get(name)
if not tool:
# Return a standard failure result if tool not found
return ToolFailure(error=f"Tool {name} is invalid")
# 2. Execute the tool's specific method
try:
# The 'tool(**tool_input)' calls the tool instance's __call__ method,
# which in BaseTool, calls the tool's 'execute' method.
# The ** unpacks the dictionary into keyword arguments.
result = await tool(**(tool_input or {}))
# Ensure the result is a ToolResult (or subclass)
return result if isinstance(result, ToolResult) else ToolResult(output=str(result))
except ToolError as e:
# Handle errors specific to tools
return ToolFailure(error=e.message)
except Exception as e:
# Handle unexpected errors during execution
return ToolFailure(error=f"Unexpected error executing tool {name}: {e}")
# ... other methods like add_tool, __iter__ ...
```
**Explanation:**
* The `__init__` method takes tool instances and stores them in `self.tools` (a tuple) and `self.tool_map` (a dictionary mapping name to instance).
* `to_params` iterates through `self.tools` and calls each tool's `to_param()` method (defined in `BaseTool`) to get the LLM-compatible format.
* `execute` is the core method used by agents:
* It uses `self.tool_map.get(name)` to quickly find the correct tool instance based on the requested name.
* If found, it calls `await tool(**(tool_input or {}))`. The `**` unpacks the `tool_input` dictionary into keyword arguments for the tool's `execute` method (e.g., `message="hello"` for our `EchoTool`, or `query="today's date"` for `WebSearch`).
* It wraps the execution in `try...except` blocks to catch errors and return a standardized `ToolFailure` result if anything goes wrong.
## Wrapping Up Chapter 4
We've learned how **Tools** give agents specific skills beyond basic language understanding.
* `BaseTool` is the abstract blueprint defining a tool's `name`, `description`, and expected `parameters`.
* Concrete tools (like `WebSearch`, `Bash`, or our custom `EchoTool`) inherit from `BaseTool` and implement the actual `execute` logic.
* `ToolCollection` acts as the agent's toolbox, holding various tools and providing methods (`to_params`, `execute`) for the agent (often guided by its [LLM](01_llm.md)) to discover and use these capabilities.
With tools, agents can interact with external systems, run code, access real-time data, and perform complex actions, making them much more powerful.
But how do we coordinate multiple agents, potentially using different tools, to work together on a larger task? That's where Flows come in.
Let's move on to [Chapter 5: BaseFlow](05_baseflow.md) to see how we orchestrate complex workflows involving multiple agents and steps.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)