Tutorial-Codebase-Knowledge/docs/OpenManus/04_tool___toolcollection.md

---
layout: default
title: "Tool & ToolCollection"
parent: "OpenManus"
nav_order: 4
---

# Chapter 4: Tool / ToolCollection - Giving Your Agent Skills

In [Chapter 3: BaseAgent - The Agent Blueprint](03_baseagent.md), we learned how `BaseAgent` provides the standard structure for our agents, including a brain ([LLM](01_llm.md)) and memory ([Message / Memory](02_message___memory.md)). But what if we want our agent to do more than just *think* and *remember*? What if we want it to *act* in the world – like searching the web, running code, or editing files?

This is where **Tools** come in!

## What Problem Do They Solve?

Imagine an agent trying to answer the question: "What's the weather like in Tokyo *right now*?"

The agent's LLM brain has a lot of general knowledge, but it doesn't have *real-time* access to the internet. It can't check the current weather. It needs a specific **capability** or **skill** to do that.

Similarly, if you ask an agent to "Write a python script that prints 'hello world' and save it to a file named `hello.py`," the agent needs the ability to:
1.  Understand the request (using its LLM).
2.  Write the code (using its LLM).
3.  Actually *execute* code to create and write to a file.

Steps 1 and 2 are handled by the LLM, but step 3 requires interacting with the computer's file system – something the LLM can't do directly.

**Tools** give agents these specific, actionable skills. A `ToolCollection` organizes these skills so the agent knows what it can do.

**Use Case:** Let's build towards an agent that can:
1.  Search the web for today's date.
2.  Tell the user the date.

This agent needs a "Web Search" tool.

## Key Concepts: Tools and Toolboxes

Let's break down the two main ideas:

### 1. `BaseTool`: The Blueprint for a Skill

Think of `BaseTool` (`app/tool/base.py`) as the *template* or *design specification* for any tool. It doesn't *do* anything itself, but it defines what every tool needs to have:

*   **`name` (str):** A short, descriptive name for the tool (e.g., `web_search`, `file_writer`, `code_runner`). This is how the agent (or LLM) identifies the tool.
*   **`description` (str):** A clear explanation of what the tool does, what it's good for, and when to use it. This is crucial for the LLM to decide *which* tool to use for a given task.
*   **`parameters` (dict):** A definition of the inputs the tool expects. For example, a `web_search` tool needs a `query` input, and a `file_writer` needs a `path` and `content`. This is defined using a standard format called JSON Schema.
*   **`execute` method:** An **abstract** method. This means `BaseTool` says "every tool *must* have an execute method", but each specific tool needs to provide its *own* instructions for how to actually perform the action.

You almost never use `BaseTool` directly. You use it as a starting point to create *actual*, usable tools.

### 2. Concrete Tools: The Actual Skills

These are specific classes that *inherit* from `BaseTool` and provide the real implementation for the `execute` method. OpenManus comes with several pre-built tools:

*   **`WebSearch` (`app/tool/web_search.py`):** Searches the web using engines like Google, Bing, etc.
*   **`Bash` (`app/tool/bash.py`):** Executes shell commands (like `ls`, `pwd`, `python script.py`).
*   **`StrReplaceEditor` (`app/tool/str_replace_editor.py`):** Views, creates, and edits files by replacing text.
*   **`BrowserUseTool` (`app/tool/browser_use_tool.py`):** Interacts with web pages like a user (clicking, filling forms, etc.).
*   **`Terminate` (`app/tool/terminate.py`):** A special tool used by agents to signal they have finished their task.

Each of these defines its specific `name`, `description`, `parameters`, and implements the `execute` method to perform its unique action.

### 3. `ToolCollection`: The Agent's Toolbox

Think of a handyman. They don't just carry one tool; they have a toolbox filled with hammers, screwdrivers, wrenches, etc.

A `ToolCollection` (`app/tool/tool_collection.py`) is like that toolbox for an agent.

*   It holds a list of specific tool instances (like `WebSearch`, `Bash`).
*   It allows the agent (and its LLM) to see all the available tools and their descriptions.
*   It provides a way to execute a specific tool by its name.

When an agent needs to perform an action, its LLM can look at the `ToolCollection`, read the descriptions of the available tools, choose the best one for the job, figure out the necessary inputs based on the tool's `parameters`, and then ask the `ToolCollection` to execute that tool with those inputs.

## How Do We Use Them?

Let's see how we can equip an agent with a simple tool. We'll create a basic "EchoTool" first.

**1. Creating a Concrete Tool (Inheriting from `BaseTool`):**

```python
# Import the necessary base class
from app.tool.base import BaseTool, ToolResult

# Define our simple tool
class EchoTool(BaseTool):
    """A simple tool that echoes the input text."""

    name: str = "echo_message"
    description: str = "Repeats back the text provided in the 'message' parameter."
    parameters: dict = {
        "type": "object",
        "properties": {
            "message": {
                "type": "string",
                "description": "The text to be echoed back.",
            },
        },
        "required": ["message"], # Tells the LLM 'message' must be provided
    }

    # Implement the actual action
    async def execute(self, message: str) -> ToolResult:
        """Takes a message and returns it."""
        print(f"EchoTool executing with message: '{message}'")
        # ToolResult is a standard way to return tool output
        return ToolResult(output=f"You said: {message}")

# Create an instance of our tool
echo_tool_instance = EchoTool()

print(f"Tool Name: {echo_tool_instance.name}")
print(f"Tool Description: {echo_tool_instance.description}")
```

**Explanation:**

*   We import `BaseTool` and `ToolResult` (a standard object for wrapping tool outputs).
*   `class EchoTool(BaseTool):` declares that our `EchoTool` *is a type of* `BaseTool`.
*   We define the `name`, `description`, and `parameters` according to the `BaseTool` template. The `parameters` structure tells the LLM what input is expected (`message` as a string) and that it's required.
*   We implement `async def execute(self, message: str) -> ToolResult:`. This is the *specific* logic for our tool. It takes the `message` input and returns it wrapped in a `ToolResult`.

**Example Output:**

```
Tool Name: echo_message
Tool Description: Repeats back the text provided in the 'message' parameter.
```

**2. Creating a ToolCollection:**

Now, let's put our `EchoTool` and the built-in `WebSearch` tool into a toolbox.

```python
# Import ToolCollection and the tools we want
from app.tool import ToolCollection, WebSearch
# Assume EchoTool class is defined as above
# from your_module import EchoTool # Or wherever EchoTool is defined

# Create instances of the tools
echo_tool = EchoTool()
web_search_tool = WebSearch() # Uses default settings

# Create a ToolCollection containing these tools
my_toolbox = ToolCollection(echo_tool, web_search_tool)

# See the names of the tools in the collection
tool_names = [tool.name for tool in my_toolbox]
print(f"Tools in the toolbox: {tool_names}")

# Get the parameters needed for the LLM
tool_params_for_llm = my_toolbox.to_params()
print(f"\nParameters for LLM (showing first tool):")
import json
print(json.dumps(tool_params_for_llm[0], indent=2))
```

**Explanation:**

*   We import `ToolCollection` and the specific tools (`WebSearch`, `EchoTool`).
*   We create instances of the tools we need.
*   `my_toolbox = ToolCollection(echo_tool, web_search_tool)` creates the collection, holding our tool instances.
*   We can access the tools inside using `my_toolbox.tools` or iterate over `my_toolbox`.
*   `my_toolbox.to_params()` is a crucial method. It formats the `name`, `description`, and `parameters` of *all* tools in the collection into a list of dictionaries. This specific format is exactly what the agent's [LLM](01_llm.md) needs (when using its `ask_tool` method) to understand which tools are available and how to use them.

**Example Output:**

```
Tools in the toolbox: ['echo_message', 'web_search']

Parameters for LLM (showing first tool):
{
  "type": "function",
  "function": {
    "name": "echo_message",
    "description": "Repeats back the text provided in the 'message' parameter.",
    "parameters": {
      "type": "object",
      "properties": {
        "message": {
          "type": "string",
          "description": "The text to be echoed back."
        }
      },
      "required": [
        "message"
      ]
    }
  }
}
```

**3. Agent Using the ToolCollection:**

Now, how does an agent like `ToolCallAgent` (a specific type of [BaseAgent](03_baseagent.md)) use this?

Conceptually (the real agent code is more complex):

1.  The agent is configured with a `ToolCollection` (like `my_toolbox`).
2.  When the agent needs to figure out the next step, it calls its LLM's `ask_tool` method.
3.  It passes the conversation history ([Message / Memory](02_message___memory.md)) AND the output of `my_toolbox.to_params()` to the LLM.
4.  The LLM looks at the conversation and the list of available tools (from `to_params()`). It reads the `description` of each tool to understand what it does.
5.  If the LLM decides a tool is needed (e.g., the user asked "What's today's date?", the LLM sees the `web_search` tool is available and appropriate), it will generate a special response indicating:
    *   The `name` of the tool to use (e.g., `"web_search"`).
    *   The `arguments` (inputs) for the tool, based on its `parameters` (e.g., `{"query": "today's date"}`).
6.  The agent receives this response from the LLM.
7.  The agent then uses the `ToolCollection`'s `execute` method: `await my_toolbox.execute(name="web_search", tool_input={"query": "today's date"})`.
8.  The `ToolCollection` finds the `WebSearch` tool instance in its internal `tool_map` and calls *its* `execute` method with the provided input.
9.  The `WebSearch` tool runs, performs the actual web search, and returns the results (as a `ToolResult` or similar).
10. The agent takes this result, formats it as a `tool` message, adds it to its memory, and continues its thinking process (often asking the LLM again, now with the tool's result as context).

The `ToolCollection` acts as the crucial bridge between the LLM's *decision* to use a tool and the *actual execution* of that tool's code.

## Under the Hood: How `ToolCollection.execute` Works

Let's trace the flow when an agent asks its `ToolCollection` to run a tool:

```mermaid
sequenceDiagram
    participant Agent as ToolCallAgent
    participant LLM as LLM (Deciding Step)
    participant Toolbox as ToolCollection
    participant SpecificTool as e.g., WebSearch Tool

    Agent->>+LLM: ask_tool(messages, tools=Toolbox.to_params())
    LLM->>LLM: Analyzes messages & available tools
    LLM-->>-Agent: Response indicating tool call: name='web_search', arguments={'query': '...'}
    Agent->>+Toolbox: execute(name='web_search', tool_input={'query': '...'})
    Toolbox->>Toolbox: Look up 'web_search' in internal tool_map
    Note right of Toolbox: Finds the WebSearch instance
    Toolbox->>+SpecificTool: Calls execute(**tool_input) on the found tool
    SpecificTool->>SpecificTool: Performs actual web search action
    SpecificTool-->>-Toolbox: Returns ToolResult (output="...", error=None)
    Toolbox-->>-Agent: Returns the ToolResult
    Agent->>Agent: Processes the result (adds to memory, etc.)
```

**Code Glimpse:**

Let's look at the `ToolCollection` itself in `app/tool/tool_collection.py`:

```python
# Simplified snippet from app/tool/tool_collection.py
from typing import Any, Dict, List, Tuple
from app.tool.base import BaseTool, ToolResult, ToolFailure
from app.exceptions import ToolError

class ToolCollection:
    # ... (Config class) ...

    tools: Tuple[BaseTool, ...] # Holds the tool instances
    tool_map: Dict[str, BaseTool] # Maps name to tool instance for quick lookup

    def __init__(self, *tools: BaseTool):
        """Initializes with a sequence of tools."""
        self.tools = tools
        # Create the map for easy lookup by name
        self.tool_map = {tool.name: tool for tool in tools}

    def to_params(self) -> List[Dict[str, Any]]:
        """Formats tools for the LLM API."""
        # Calls the 'to_param()' method on each tool
        return [tool.to_param() for tool in self.tools]

    async def execute(
        self, *, name: str, tool_input: Dict[str, Any] = None
    ) -> ToolResult:
        """Finds a tool by name and executes it."""
        # 1. Find the tool instance using the name
        tool = self.tool_map.get(name)
        if not tool:
            # Return a standard failure result if tool not found
            return ToolFailure(error=f"Tool {name} is invalid")

        # 2. Execute the tool's specific method
        try:
            # The 'tool(**tool_input)' calls the tool instance's __call__ method,
            # which in BaseTool, calls the tool's 'execute' method.
            # The ** unpacks the dictionary into keyword arguments.
            result = await tool(**(tool_input or {}))
            # Ensure the result is a ToolResult (or subclass)
            return result if isinstance(result, ToolResult) else ToolResult(output=str(result))
        except ToolError as e:
             # Handle errors specific to tools
            return ToolFailure(error=e.message)
        except Exception as e:
             # Handle unexpected errors during execution
            return ToolFailure(error=f"Unexpected error executing tool {name}: {e}")

    # ... other methods like add_tool, __iter__ ...
```

**Explanation:**

*   The `__init__` method takes tool instances and stores them in `self.tools` (a tuple) and `self.tool_map` (a dictionary mapping name to instance).
*   `to_params` iterates through `self.tools` and calls each tool's `to_param()` method (defined in `BaseTool`) to get the LLM-compatible format.
*   `execute` is the core method used by agents:
    *   It uses `self.tool_map.get(name)` to quickly find the correct tool instance based on the requested name.
    *   If found, it calls `await tool(**(tool_input or {}))`. The `**` unpacks the `tool_input` dictionary into keyword arguments for the tool's `execute` method (e.g., `message="hello"` for our `EchoTool`, or `query="today's date"` for `WebSearch`).
    *   It wraps the execution in `try...except` blocks to catch errors and return a standardized `ToolFailure` result if anything goes wrong.

## Wrapping Up Chapter 4

We've learned how **Tools** give agents specific skills beyond basic language understanding.
*   `BaseTool` is the abstract blueprint defining a tool's `name`, `description`, and expected `parameters`.
*   Concrete tools (like `WebSearch`, `Bash`, or our custom `EchoTool`) inherit from `BaseTool` and implement the actual `execute` logic.
*   `ToolCollection` acts as the agent's toolbox, holding various tools and providing methods (`to_params`, `execute`) for the agent (often guided by its [LLM](01_llm.md)) to discover and use these capabilities.

With tools, agents can interact with external systems, run code, access real-time data, and perform complex actions, making them much more powerful.

But how do we coordinate multiple agents, potentially using different tools, to work together on a larger task? That's where Flows come in.

Let's move on to [Chapter 5: BaseFlow](05_baseflow.md) to see how we orchestrate complex workflows involving multiple agents and steps.

---

Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)