init push

2025-12-23 09:24:23 +01:00 · 2025-04-04 13:03:54 -04:00
parent e62ee2cb13
commit 2ebad5e5f2
160 changed files with 2 additions and 0 deletions
--- a/docs/OpenManus/04_tool___toolcollection.md
+++ b/docs/OpenManus/04_tool___toolcollection.md
@@ -0,0 +1,309 @@
+# Chapter 4: Tool / ToolCollection - Giving Your Agent Skills
+
+In [Chapter 3: BaseAgent - The Agent Blueprint](03_baseagent.md), we learned how `BaseAgent` provides the standard structure for our agents, including a brain ([LLM](01_llm.md)) and memory ([Message / Memory](02_message___memory.md)). But what if we want our agent to do more than just *think* and *remember*? What if we want it to *act* in the world – like searching the web, running code, or editing files?
+
+This is where **Tools** come in!
+
+## What Problem Do They Solve?
+
+Imagine an agent trying to answer the question: "What's the weather like in Tokyo *right now*?"
+
+The agent's LLM brain has a lot of general knowledge, but it doesn't have *real-time* access to the internet. It can't check the current weather. It needs a specific **capability** or **skill** to do that.
+
+Similarly, if you ask an agent to "Write a python script that prints 'hello world' and save it to a file named `hello.py`," the agent needs the ability to:
+1.  Understand the request (using its LLM).
+2.  Write the code (using its LLM).
+3.  Actually *execute* code to create and write to a file.
+
+Steps 1 and 2 are handled by the LLM, but step 3 requires interacting with the computer's file system – something the LLM can't do directly.
+
+**Tools** give agents these specific, actionable skills. A `ToolCollection` organizes these skills so the agent knows what it can do.
+
+**Use Case:** Let's build towards an agent that can:
+1.  Search the web for today's date.
+2.  Tell the user the date.
+
+This agent needs a "Web Search" tool.
+
+## Key Concepts: Tools and Toolboxes
+
+Let's break down the two main ideas:
+
+### 1. `BaseTool`: The Blueprint for a Skill
+
+Think of `BaseTool` (`app/tool/base.py`) as the *template* or *design specification* for any tool. It doesn't *do* anything itself, but it defines what every tool needs to have:
+
+*   **`name` (str):** A short, descriptive name for the tool (e.g., `web_search`, `file_writer`, `code_runner`). This is how the agent (or LLM) identifies the tool.
+*   **`description` (str):** A clear explanation of what the tool does, what it's good for, and when to use it. This is crucial for the LLM to decide *which* tool to use for a given task.
+*   **`parameters` (dict):** A definition of the inputs the tool expects. For example, a `web_search` tool needs a `query` input, and a `file_writer` needs a `path` and `content`. This is defined using a standard format called JSON Schema.
+*   **`execute` method:** An **abstract** method. This means `BaseTool` says "every tool *must* have an execute method", but each specific tool needs to provide its *own* instructions for how to actually perform the action.
+
+You almost never use `BaseTool` directly. You use it as a starting point to create *actual*, usable tools.
+
+### 2. Concrete Tools: The Actual Skills
+
+These are specific classes that *inherit* from `BaseTool` and provide the real implementation for the `execute` method. OpenManus comes with several pre-built tools:
+
+*   **`WebSearch` (`app/tool/web_search.py`):** Searches the web using engines like Google, Bing, etc.
+*   **`Bash` (`app/tool/bash.py`):** Executes shell commands (like `ls`, `pwd`, `python script.py`).
+*   **`StrReplaceEditor` (`app/tool/str_replace_editor.py`):** Views, creates, and edits files by replacing text.
+*   **`BrowserUseTool` (`app/tool/browser_use_tool.py`):** Interacts with web pages like a user (clicking, filling forms, etc.).
+*   **`Terminate` (`app/tool/terminate.py`):** A special tool used by agents to signal they have finished their task.
+
+Each of these defines its specific `name`, `description`, `parameters`, and implements the `execute` method to perform its unique action.
+
+### 3. `ToolCollection`: The Agent's Toolbox
+
+Think of a handyman. They don't just carry one tool; they have a toolbox filled with hammers, screwdrivers, wrenches, etc.
+
+A `ToolCollection` (`app/tool/tool_collection.py`) is like that toolbox for an agent.
+
+*   It holds a list of specific tool instances (like `WebSearch`, `Bash`).
+*   It allows the agent (and its LLM) to see all the available tools and their descriptions.
+*   It provides a way to execute a specific tool by its name.
+
+When an agent needs to perform an action, its LLM can look at the `ToolCollection`, read the descriptions of the available tools, choose the best one for the job, figure out the necessary inputs based on the tool's `parameters`, and then ask the `ToolCollection` to execute that tool with those inputs.
+
+## How Do We Use Them?
+
+Let's see how we can equip an agent with a simple tool. We'll create a basic "EchoTool" first.
+
+**1. Creating a Concrete Tool (Inheriting from `BaseTool`):**
+
+```python
+# Import the necessary base class
+from app.tool.base import BaseTool, ToolResult
+
+# Define our simple tool
+class EchoTool(BaseTool):
+    """A simple tool that echoes the input text."""
+
+    name: str = "echo_message"
+    description: str = "Repeats back the text provided in the 'message' parameter."
+    parameters: dict = {
+        "type": "object",
+        "properties": {
+            "message": {
+                "type": "string",
+                "description": "The text to be echoed back.",
+            },
+        },
+        "required": ["message"], # Tells the LLM 'message' must be provided
+    }
+
+    # Implement the actual action
+    async def execute(self, message: str) -> ToolResult:
+        """Takes a message and returns it."""
+        print(f"EchoTool executing with message: '{message}'")
+        # ToolResult is a standard way to return tool output
+        return ToolResult(output=f"You said: {message}")
+
+# Create an instance of our tool
+echo_tool_instance = EchoTool()
+
+print(f"Tool Name: {echo_tool_instance.name}")
+print(f"Tool Description: {echo_tool_instance.description}")
+```
+
+**Explanation:**
+
+*   We import `BaseTool` and `ToolResult` (a standard object for wrapping tool outputs).
+*   `class EchoTool(BaseTool):` declares that our `EchoTool` *is a type of* `BaseTool`.
+*   We define the `name`, `description`, and `parameters` according to the `BaseTool` template. The `parameters` structure tells the LLM what input is expected (`message` as a string) and that it's required.
+*   We implement `async def execute(self, message: str) -> ToolResult:`. This is the *specific* logic for our tool. It takes the `message` input and returns it wrapped in a `ToolResult`.
+
+**Example Output:**
+
+```
+Tool Name: echo_message
+Tool Description: Repeats back the text provided in the 'message' parameter.
+```
+
+**2. Creating a ToolCollection:**
+
+Now, let's put our `EchoTool` and the built-in `WebSearch` tool into a toolbox.
+
+```python
+# Import ToolCollection and the tools we want
+from app.tool import ToolCollection, WebSearch
+# Assume EchoTool class is defined as above
+# from your_module import EchoTool # Or wherever EchoTool is defined
+
+# Create instances of the tools
+echo_tool = EchoTool()
+web_search_tool = WebSearch() # Uses default settings
+
+# Create a ToolCollection containing these tools
+my_toolbox = ToolCollection(echo_tool, web_search_tool)
+
+# See the names of the tools in the collection
+tool_names = [tool.name for tool in my_toolbox]
+print(f"Tools in the toolbox: {tool_names}")
+
+# Get the parameters needed for the LLM
+tool_params_for_llm = my_toolbox.to_params()
+print(f"\nParameters for LLM (showing first tool):")
+import json
+print(json.dumps(tool_params_for_llm[0], indent=2))
+```
+
+**Explanation:**
+
+*   We import `ToolCollection` and the specific tools (`WebSearch`, `EchoTool`).
+*   We create instances of the tools we need.
+*   `my_toolbox = ToolCollection(echo_tool, web_search_tool)` creates the collection, holding our tool instances.
+*   We can access the tools inside using `my_toolbox.tools` or iterate over `my_toolbox`.
+*   `my_toolbox.to_params()` is a crucial method. It formats the `name`, `description`, and `parameters` of *all* tools in the collection into a list of dictionaries. This specific format is exactly what the agent's [LLM](01_llm.md) needs (when using its `ask_tool` method) to understand which tools are available and how to use them.
+
+**Example Output:**
+
+```
+Tools in the toolbox: ['echo_message', 'web_search']
+
+Parameters for LLM (showing first tool):
+{
+  "type": "function",
+  "function": {
+    "name": "echo_message",
+    "description": "Repeats back the text provided in the 'message' parameter.",
+    "parameters": {
+      "type": "object",
+      "properties": {
+        "message": {
+          "type": "string",
+          "description": "The text to be echoed back."
+        }
+      },
+      "required": [
+        "message"
+      ]
+    }
+  }
+}
+```
+
+**3. Agent Using the ToolCollection:**
+
+Now, how does an agent like `ToolCallAgent` (a specific type of [BaseAgent](03_baseagent.md)) use this?
+
+Conceptually (the real agent code is more complex):
+
+1.  The agent is configured with a `ToolCollection` (like `my_toolbox`).
+2.  When the agent needs to figure out the next step, it calls its LLM's `ask_tool` method.
+3.  It passes the conversation history ([Message / Memory](02_message___memory.md)) AND the output of `my_toolbox.to_params()` to the LLM.
+4.  The LLM looks at the conversation and the list of available tools (from `to_params()`). It reads the `description` of each tool to understand what it does.
+5.  If the LLM decides a tool is needed (e.g., the user asked "What's today's date?", the LLM sees the `web_search` tool is available and appropriate), it will generate a special response indicating:
+    *   The `name` of the tool to use (e.g., `"web_search"`).
+    *   The `arguments` (inputs) for the tool, based on its `parameters` (e.g., `{"query": "today's date"}`).
+6.  The agent receives this response from the LLM.
+7.  The agent then uses the `ToolCollection`'s `execute` method: `await my_toolbox.execute(name="web_search", tool_input={"query": "today's date"})`.
+8.  The `ToolCollection` finds the `WebSearch` tool instance in its internal `tool_map` and calls *its* `execute` method with the provided input.
+9.  The `WebSearch` tool runs, performs the actual web search, and returns the results (as a `ToolResult` or similar).
+10. The agent takes this result, formats it as a `tool` message, adds it to its memory, and continues its thinking process (often asking the LLM again, now with the tool's result as context).
+
+The `ToolCollection` acts as the crucial bridge between the LLM's *decision* to use a tool and the *actual execution* of that tool's code.
+
+## Under the Hood: How `ToolCollection.execute` Works
+
+Let's trace the flow when an agent asks its `ToolCollection` to run a tool:
+
+```mermaid
+sequenceDiagram
+    participant Agent as ToolCallAgent
+    participant LLM as LLM (Deciding Step)
+    participant Toolbox as ToolCollection
+    participant SpecificTool as e.g., WebSearch Tool
+
+    Agent->>+LLM: ask_tool(messages, tools=Toolbox.to_params())
+    LLM->>LLM: Analyzes messages & available tools
+    LLM-->>-Agent: Response indicating tool call: name='web_search', arguments={'query': '...'}
+    Agent->>+Toolbox: execute(name='web_search', tool_input={'query': '...'})
+    Toolbox->>Toolbox: Look up 'web_search' in internal tool_map
+    Note right of Toolbox: Finds the WebSearch instance
+    Toolbox->>+SpecificTool: Calls execute(**tool_input) on the found tool
+    SpecificTool->>SpecificTool: Performs actual web search action
+    SpecificTool-->>-Toolbox: Returns ToolResult (output="...", error=None)
+    Toolbox-->>-Agent: Returns the ToolResult
+    Agent->>Agent: Processes the result (adds to memory, etc.)
+```
+
+**Code Glimpse:**
+
+Let's look at the `ToolCollection` itself in `app/tool/tool_collection.py`:
+
+```python
+# Simplified snippet from app/tool/tool_collection.py
+from typing import Any, Dict, List, Tuple
+from app.tool.base import BaseTool, ToolResult, ToolFailure
+from app.exceptions import ToolError
+
+class ToolCollection:
+    # ... (Config class) ...
+
+    tools: Tuple[BaseTool, ...] # Holds the tool instances
+    tool_map: Dict[str, BaseTool] # Maps name to tool instance for quick lookup
+
+    def __init__(self, *tools: BaseTool):
+        """Initializes with a sequence of tools."""
+        self.tools = tools
+        # Create the map for easy lookup by name
+        self.tool_map = {tool.name: tool for tool in tools}
+
+    def to_params(self) -> List[Dict[str, Any]]:
+        """Formats tools for the LLM API."""
+        # Calls the 'to_param()' method on each tool
+        return [tool.to_param() for tool in self.tools]
+
+    async def execute(
+        self, *, name: str, tool_input: Dict[str, Any] = None
+    ) -> ToolResult:
+        """Finds a tool by name and executes it."""
+        # 1. Find the tool instance using the name
+        tool = self.tool_map.get(name)
+        if not tool:
+            # Return a standard failure result if tool not found
+            return ToolFailure(error=f"Tool {name} is invalid")
+
+        # 2. Execute the tool's specific method
+        try:
+            # The 'tool(**tool_input)' calls the tool instance's __call__ method,
+            # which in BaseTool, calls the tool's 'execute' method.
+            # The ** unpacks the dictionary into keyword arguments.
+            result = await tool(**(tool_input or {}))
+            # Ensure the result is a ToolResult (or subclass)
+            return result if isinstance(result, ToolResult) else ToolResult(output=str(result))
+        except ToolError as e:
+             # Handle errors specific to tools
+            return ToolFailure(error=e.message)
+        except Exception as e:
+             # Handle unexpected errors during execution
+            return ToolFailure(error=f"Unexpected error executing tool {name}: {e}")
+
+    # ... other methods like add_tool, __iter__ ...
+```
+
+**Explanation:**
+
+*   The `__init__` method takes tool instances and stores them in `self.tools` (a tuple) and `self.tool_map` (a dictionary mapping name to instance).
+*   `to_params` iterates through `self.tools` and calls each tool's `to_param()` method (defined in `BaseTool`) to get the LLM-compatible format.
+*   `execute` is the core method used by agents:
+    *   It uses `self.tool_map.get(name)` to quickly find the correct tool instance based on the requested name.
+    *   If found, it calls `await tool(**(tool_input or {}))`. The `**` unpacks the `tool_input` dictionary into keyword arguments for the tool's `execute` method (e.g., `message="hello"` for our `EchoTool`, or `query="today's date"` for `WebSearch`).
+    *   It wraps the execution in `try...except` blocks to catch errors and return a standardized `ToolFailure` result if anything goes wrong.
+
+## Wrapping Up Chapter 4
+
+We've learned how **Tools** give agents specific skills beyond basic language understanding.
+*   `BaseTool` is the abstract blueprint defining a tool's `name`, `description`, and expected `parameters`.
+*   Concrete tools (like `WebSearch`, `Bash`, or our custom `EchoTool`) inherit from `BaseTool` and implement the actual `execute` logic.
+*   `ToolCollection` acts as the agent's toolbox, holding various tools and providing methods (`to_params`, `execute`) for the agent (often guided by its [LLM](01_llm.md)) to discover and use these capabilities.
+
+With tools, agents can interact with external systems, run code, access real-time data, and perform complex actions, making them much more powerful.
+
+But how do we coordinate multiple agents, potentially using different tools, to work together on a larger task? That's where Flows come in.
+
+Let's move on to [Chapter 5: BaseFlow](05_baseflow.md) to see how we orchestrate complex workflows involving multiple agents and steps.
+
+---
+
+Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)