mirror of
https://github.com/aljazceru/Tutorial-Codebase-Knowledge.git
synced 2025-12-19 15:34:23 +01:00
init push
This commit is contained in:
261
docs/SmolaAgents/01_multistepagent.md
Normal file
261
docs/SmolaAgents/01_multistepagent.md
Normal file
@@ -0,0 +1,261 @@
|
||||
# Chapter 1: The MultiStepAgent - Your Task Orchestrator
|
||||
|
||||
Welcome to the SmolaAgents library! If you're looking to build smart AI agents that can tackle complex problems, you're in the right place.
|
||||
|
||||
Imagine you have a complex task, like "Research the pros and cons of electric cars and write a short summary." A single request to a simple AI might not be enough. It needs to search the web, read different articles, synthesize the information, and then write the summary. How does an AI manage such a multi-step process?
|
||||
|
||||
This is where the `MultiStepAgent` comes in! Think of it as the **project manager** for your AI task. It doesn't do all the work itself, but it directs the process, decides what needs to happen next, uses specialized helpers (called "Tools"), and keeps track of everything until the task is done.
|
||||
|
||||
## The Core Idea: Think, Act, Observe
|
||||
|
||||
The `MultiStepAgent` works by following a cycle, much like how humans solve problems. This cycle is often called **ReAct** (Reasoning and Acting):
|
||||
|
||||
1. **Think (Reason):** The agent looks at the main goal (the task) and where it currently is in the process. Based on this, it thinks about what the *very next step* should be to get closer to the goal. Should it search for information? Should it perform a calculation? Should it write something down?
|
||||
2. **Act:** The agent performs the action it decided on. This usually involves using a specific **[Tool](03_tool.md)** (like a web search tool, a calculator, or a code execution tool) or generating text/code.
|
||||
3. **Observe:** The agent looks at the result of its action. What did the web search return? What was the output of the code? This new information ("observation") helps it decide what to do in the next "Think" phase.
|
||||
|
||||
The agent repeats this **Think -> Act -> Observe** cycle over and over, step-by-step, until it believes it has fully completed the task and has a final answer.
|
||||
|
||||
## How It Works: Coordinating the Team
|
||||
|
||||
The `MultiStepAgent` doesn't work in isolation. It coordinates several key components:
|
||||
|
||||
1. **The Language Model (LLM):** This is the "brain" of the operation. The agent consults the LLM during the "Think" phase. It sends the current task, the history of actions and observations, and asks the LLM, "What should I do next?". We'll explore this more in [Chapter 2: Model Interface](02_model_interface.md).
|
||||
2. **Tools:** These are specialized functions the agent can use to perform actions. Examples include searching the web, running Python code, fetching weather information, or even generating images. The agent chooses which tool to use (if any) during the "Act" phase based on the LLM's suggestion. Learn all about them in [Chapter 3: Tool](03_tool.md).
|
||||
3. **Memory:** This is like the agent's notepad. It keeps track of the original task, the plan (if any), every action taken, and every observation received. This history is crucial for the agent (and the LLM) to understand the progress and decide the next steps. We'll dive into this in [Chapter 4: AgentMemory](04_agentmemory.md).
|
||||
|
||||
## A Simple Example: Getting the Capital and Weather
|
||||
|
||||
Let's revisit our simple task: **"What is the capital of France, and what is its current weather?"**
|
||||
|
||||
Here's how a `MultiStepAgent`, equipped with a `search` tool and a `weather` tool, might handle it:
|
||||
|
||||
1. **Step 1 (Think):** The agent sees the task. It realizes it needs two pieces of information: the capital and the weather *for* that capital. First, it needs the capital.
|
||||
2. **Step 1 (Act):** It decides to use the `search` tool with the query "Capital of France".
|
||||
3. **Step 1 (Observe):** The `search` tool returns "Paris". The agent stores "Capital is Paris" in its [Memory](04_agentmemory.md).
|
||||
4. **Step 2 (Think):** The agent checks its memory. It has the capital (Paris) but still needs the weather.
|
||||
5. **Step 2 (Act):** It decides to use the `weather` tool with the location "Paris".
|
||||
6. **Step 2 (Observe):** The `weather` tool returns something like "Sunny, 25°C". The agent stores this observation in its [Memory](04_agentmemory.md).
|
||||
7. **Step 3 (Think):** The agent reviews its memory. It now has both the capital ("Paris") and the weather ("Sunny, 25°C"). It has all the information needed to answer the original task.
|
||||
8. **Step 3 (Act):** It decides it's finished and uses a special built-in tool called `final_answer` to provide the complete result.
|
||||
9. **Step 3 (Observe):** The `final_answer` tool packages the result, like "The capital of France is Paris, and the current weather there is Sunny, 25°C." The cycle ends.
|
||||
|
||||
## Let's See Some Code (Basic Setup)
|
||||
|
||||
Okay, enough theory! How does this look in code? Setting up a basic `MultiStepAgent` involves giving it its "brain" (the model) and its "helpers" (the tools).
|
||||
|
||||
```python
|
||||
# --- File: basic_agent.py ---
|
||||
# Import necessary components (we'll explain these more in later chapters!)
|
||||
from smolagents import MultiStepAgent
|
||||
from smolagents.models import LiteLLMModel # A simple way to use various LLMs
|
||||
from smolagents.tools import SearchTool, WeatherTool # Example Tools
|
||||
|
||||
# 1. Define the tools the agent can use
|
||||
# These are like specialized workers the agent can call upon.
|
||||
search_tool = SearchTool() # A tool to search the web (details in Chapter 3)
|
||||
weather_tool = WeatherTool() # A tool to get weather info (details in Chapter 3)
|
||||
# Note: Real tools might need API keys or setup!
|
||||
|
||||
# 2. Choose a language model (the "brain")
|
||||
# We'll use LiteLLMModel here, connecting to a capable model.
|
||||
# Make sure you have 'litellm' installed: pip install litellm
|
||||
llm = LiteLLMModel(model_id="gpt-3.5-turbo") # Needs an API key set up
|
||||
# We'll cover models properly in Chapter 2
|
||||
|
||||
# 3. Create the MultiStepAgent instance
|
||||
# We pass the brain (llm) and the helpers (tools)
|
||||
agent = MultiStepAgent(
|
||||
model=llm,
|
||||
tools=[search_tool, weather_tool]
|
||||
# By default, a 'final_answer' tool is always added.
|
||||
)
|
||||
|
||||
print("Agent created!")
|
||||
|
||||
# 4. Give the agent a task!
|
||||
task = "What is the capital of France, and what is its current weather?"
|
||||
print(f"Running agent with task: '{task}'")
|
||||
|
||||
# The agent will now start its Think-Act-Observe cycle...
|
||||
final_answer = agent.run(task)
|
||||
|
||||
# ... and eventually return the final result.
|
||||
print("-" * 20)
|
||||
print(f"Final Answer received: {final_answer}")
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. **Import:** We bring in `MultiStepAgent` and placeholders for a model and tools.
|
||||
2. **Tools:** We create instances of the tools our agent might need (`SearchTool`, `WeatherTool`). How tools work is covered in [Chapter 3: Tool](03_tool.md).
|
||||
3. **Model:** We set up the language model (`LiteLLMModel`) that will power the agent's thinking. More on models in [Chapter 2: Model Interface](02_model_interface.md).
|
||||
4. **Agent Creation:** We initialize `MultiStepAgent`, telling it which `model` to use for thinking and which `tools` are available for acting.
|
||||
5. **Run Task:** We call the `agent.run()` method with our specific `task`. This kicks off the Think-Act-Observe cycle.
|
||||
6. **Output:** The `run` method continues executing steps until the `final_answer` tool is called or a limit is reached. It then returns the content provided to `final_answer`.
|
||||
|
||||
*(Note: Running the code above requires setting up API keys for the chosen LLM and potentially the tools).*
|
||||
|
||||
## Under the Hood: The `run` Process
|
||||
|
||||
When you call `agent.run(task)`, a sequence of internal steps takes place:
|
||||
|
||||
1. **Initialization:** The agent receives the `task` and stores it in its [AgentMemory](04_agentmemory.md). The step counter is reset.
|
||||
2. **Loop:** The agent enters the main Think-Act-Observe loop. This loop continues until a final answer is produced or the maximum number of steps (`max_steps`) is reached.
|
||||
3. **Prepare Input:** Inside the loop, the agent gathers its history (task, previous actions, observations) from [AgentMemory](04_agentmemory.md) using `write_memory_to_messages`.
|
||||
4. **Think (Call Model):** It sends this history to the [Model](02_model_interface.md) (e.g., `self.model(messages)`), asking for the next action (which tool to call and with what arguments, or if it should use `final_answer`).
|
||||
5. **Store Thought:** The model's response (the thought process and the intended action) is recorded in the current step's data within [AgentMemory](04_agentmemory.md).
|
||||
6. **Act (Execute Tool/Code):**
|
||||
* The agent parses the model's response to identify the action (e.g., call `search` with "Capital of France").
|
||||
* If it's a [Tool](03_tool.md) call, it executes the tool (e.g., `search_tool("Capital of France")`).
|
||||
* If it's the `final_answer` tool, it prepares to exit the loop.
|
||||
* *(Note: Different agent types handle this 'Act' phase differently. We'll see this in [Chapter 7: AgentType](07_agenttype.md). For instance, a `CodeAgent` generates and runs code here.)*
|
||||
7. **Observe (Get Result):** The result from the tool execution (or code execution) is captured as the "observation".
|
||||
8. **Store Observation:** This observation (e.g., "Paris") is recorded in the current step's data in [AgentMemory](04_agentmemory.md).
|
||||
9. **Repeat:** The loop goes back to step 3, using the new observation as part of the history for the next "Think" phase.
|
||||
10. **Finish:** Once the `final_answer` tool is called, the loop breaks, and the value passed to `final_answer` is returned by the `run` method. If `max_steps` is reached without a final answer, an error or a fallback answer might occur.
|
||||
|
||||
Here's a simplified diagram showing the flow:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant MSA as MultiStepAgent
|
||||
participant Model as LLM Brain
|
||||
participant Tools
|
||||
participant Memory
|
||||
|
||||
User->>MSA: run("Task: Capital & Weather?")
|
||||
MSA->>Memory: Store Task
|
||||
loop Think-Act-Observe Cycle
|
||||
MSA->>Memory: Get history (Task)
|
||||
MSA->>Model: What's next? (based on Task)
|
||||
Model-->>MSA: Think: Need capital. Act: search("Capital of France")
|
||||
MSA->>Memory: Store Thought & Action Plan
|
||||
MSA->>Tools: Execute search("Capital of France")
|
||||
Tools-->>MSA: Observation: "Paris"
|
||||
MSA->>Memory: Store Observation ("Paris")
|
||||
|
||||
MSA->>Memory: Get history (Task, search result "Paris")
|
||||
MSA->>Model: What's next? (based on Task & "Paris")
|
||||
Model-->>MSA: Think: Need weather for Paris. Act: weather("Paris")
|
||||
MSA->>Memory: Store Thought & Action Plan
|
||||
MSA->>Tools: Execute weather("Paris")
|
||||
Tools-->>MSA: Observation: "Sunny, 25°C"
|
||||
MSA->>Memory: Store Observation ("Sunny, 25°C")
|
||||
|
||||
MSA->>Memory: Get history (Task, "Paris", "Sunny, 25°C")
|
||||
MSA->>Model: What's next? (based on Task & results)
|
||||
Model-->>MSA: Think: Have all info. Act: final_answer("Capital: Paris, Weather: Sunny, 25°C")
|
||||
MSA->>Memory: Store Thought & Action Plan (Final Answer)
|
||||
MSA-->>User: Return "Capital: Paris, Weather: Sunny, 25°C"
|
||||
Note right of MSA: Loop completes when final answer is ready
|
||||
end
|
||||
```
|
||||
|
||||
## Diving Deeper (Code References)
|
||||
|
||||
Let's peek at some relevant code snippets from `agents.py` to see how this is implemented (simplified for clarity):
|
||||
|
||||
* **Initialization (`__init__`)**: Stores the essential components.
|
||||
```python
|
||||
# --- File: agents.py (Simplified __init__) ---
|
||||
class MultiStepAgent:
|
||||
def __init__(
|
||||
self,
|
||||
tools: List[Tool], # List of available tools
|
||||
model: Callable, # The language model function
|
||||
max_steps: int = 20, # Max cycles allowed
|
||||
# ... other parameters like memory, prompts, etc.
|
||||
):
|
||||
self.model = model
|
||||
self.tools = {tool.name: tool for tool in tools}
|
||||
# Add the essential final_answer tool
|
||||
self.tools.setdefault("final_answer", FinalAnswerTool())
|
||||
self.max_steps = max_steps
|
||||
self.memory = AgentMemory(...) # Initialize memory
|
||||
# ... setup logging, etc.
|
||||
```
|
||||
|
||||
* **Starting the process (`run`)**: Sets up the task and calls the internal loop.
|
||||
```python
|
||||
# --- File: agents.py (Simplified run) ---
|
||||
class MultiStepAgent:
|
||||
def run(self, task: str, ...):
|
||||
self.task = task
|
||||
# ... maybe handle additional arguments ...
|
||||
|
||||
# Reset memory if needed
|
||||
self.memory.reset()
|
||||
self.memory.steps.append(TaskStep(task=self.task)) # Record the task
|
||||
|
||||
# Start the internal execution loop
|
||||
# The deque gets the *last* item yielded, which is the final answer
|
||||
return deque(self._run(task=self.task, max_steps=self.max_steps), maxlen=1)[0].final_answer
|
||||
```
|
||||
|
||||
* **The Core Loop (`_run`)**: Implements the Think-Act-Observe cycle.
|
||||
```python
|
||||
# --- File: agents.py (Simplified _run) ---
|
||||
class MultiStepAgent:
|
||||
def _run(self, task: str, max_steps: int, ...) -> Generator:
|
||||
final_answer = None
|
||||
self.step_number = 1
|
||||
while final_answer is None and self.step_number <= max_steps:
|
||||
action_step = self._create_action_step(...) # Prepare memory for this step
|
||||
|
||||
try:
|
||||
# This is where the agent type decides how to act
|
||||
# (e.g., call LLM, parse, execute tool/code)
|
||||
final_answer = self._execute_step(task, action_step)
|
||||
except AgentError as e:
|
||||
action_step.error = e # Record errors
|
||||
finally:
|
||||
self._finalize_step(action_step, ...) # Record timing, etc.
|
||||
self.memory.steps.append(action_step) # Save step to memory
|
||||
yield action_step # Yield step details (for streaming)
|
||||
self.step_number += 1
|
||||
|
||||
if final_answer is None:
|
||||
# Handle reaching max steps
|
||||
...
|
||||
yield FinalAnswerStep(handle_agent_output_types(final_answer)) # Yield final answer
|
||||
```
|
||||
|
||||
* **Executing a Step (`_execute_step`)**: This calls the `step` method which specific agent types (like `CodeAgent` or `ToolCallingAgent`) implement differently.
|
||||
```python
|
||||
# --- File: agents.py (Simplified _execute_step) ---
|
||||
class MultiStepAgent:
|
||||
def _execute_step(self, task: str, memory_step: ActionStep) -> Union[None, Any]:
|
||||
# Calls the specific logic for the agent type
|
||||
# This method will interact with the model, tools, memory
|
||||
final_answer = self.step(memory_step)
|
||||
# ... (optional checks on final answer) ...
|
||||
return final_answer
|
||||
|
||||
# step() is implemented by subclasses like CodeAgent or ToolCallingAgent
|
||||
def step(self, memory_step: ActionStep) -> Union[None, Any]:
|
||||
raise NotImplementedError("Subclasses must implement the step method.")
|
||||
```
|
||||
|
||||
These snippets show how `MultiStepAgent` orchestrates the process, relying on its `model`, `tools`, and `memory`, and delegating the specific "how-to-act" logic to subclasses via the `step` method (more on this in [Chapter 7: AgentType](07_agenttype.md)).
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `MultiStepAgent` is the heart of the SmolaAgents library. It provides the framework for agents to tackle complex tasks by breaking them down into a **Think -> Act -> Observe** cycle. It acts as the central coordinator, managing interactions between the language model (the brain), the tools (the specialized helpers), and the memory (the notepad).
|
||||
|
||||
You've learned:
|
||||
|
||||
* Why `MultiStepAgent` is needed for tasks requiring multiple steps.
|
||||
* The core ReAct cycle: Think, Act, Observe.
|
||||
* How it coordinates the Model, Tools, and Memory.
|
||||
* Seen a basic code example of setting up and running an agent.
|
||||
* Gotten a glimpse into the internal `run` process.
|
||||
|
||||
Now that we understand the orchestrator, let's move on to understand the "brain" it relies on.
|
||||
|
||||
**Next Chapter:** [Chapter 2: Model Interface](02_model_interface.md) - Connecting Your Agent to an LLM Brain.
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
290
docs/SmolaAgents/02_model_interface.md
Normal file
290
docs/SmolaAgents/02_model_interface.md
Normal file
@@ -0,0 +1,290 @@
|
||||
# Chapter 2: Model Interface - Your Agent's Universal Translator
|
||||
|
||||
Welcome back! In [Chapter 1: The MultiStepAgent - Your Task Orchestrator](01_multistepagent.md), we met the `MultiStepAgent`, our AI project manager. We learned that it follows a "Think -> Act -> Observe" cycle to solve tasks. A crucial part of the "Think" phase is consulting its "brain" – a Large Language Model (LLM).
|
||||
|
||||
But wait... there are so many different LLMs out there! OpenAI's GPT-4, Anthropic's Claude, Google's Gemini, open-source models you can run locally like Llama or Mistral... How can our agent talk to all of them without needing completely different code for each one?
|
||||
|
||||
This is where the **Model Interface** comes in!
|
||||
|
||||
## The Problem: Too Many Remotes!
|
||||
|
||||
Imagine you have several TVs at home, each from a different brand (Sony, Samsung, LG). Each TV comes with its own specific remote control. To watch TV, you need to find the *right* remote and know *its specific buttons*. It's a hassle!
|
||||
|
||||
  
|
||||
|
||||
Different LLMs are like those different TVs. Each has its own way of being "controlled" – its own API (Application Programming Interface) or library with specific functions, required inputs, and ways of giving back answers. If our `MultiStepAgent` had to learn the specific "remote control commands" for every possible LLM, our code would become very complicated very quickly!
|
||||
|
||||
## The Solution: The Universal Remote (Model Interface)
|
||||
|
||||
Wouldn't it be great if you had *one* universal remote that could control *all* your TVs? You'd just press "Power", "Volume Up", or "Channel Down", and the universal remote would figure out how to send the correct signal to whichever TV you're using.
|
||||
|
||||
 -> Controls ->   
|
||||
|
||||
The **Model Interface** in `SmolaAgents` is exactly like that universal remote.
|
||||
|
||||
* It's an **abstraction layer**: a way to hide the complicated details.
|
||||
* It provides a **consistent way** for the `MultiStepAgent` to talk to *any* supported LLM.
|
||||
* It handles the "translation" behind the scenes:
|
||||
* Taking the agent's request (like "What should I do next?").
|
||||
* Formatting it correctly for the specific LLM being used.
|
||||
* Sending the request (making the API call or running the local model).
|
||||
* Receiving the LLM's raw response.
|
||||
* Parsing that response back into a standard format the agent understands (including things like requests to use [Tools](03_tool.md)).
|
||||
|
||||
So, the `MultiStepAgent` only needs to learn how to use the *one* universal remote (the Model Interface), not the specific commands for every LLM "TV".
|
||||
|
||||
## How It Works: The Standard `__call__`
|
||||
|
||||
The magic of the Model Interface lies in its simplicity from the agent's perspective. All Model Interfaces in `SmolaAgents` work the same way: you "call" them like a function, passing in the conversation history.
|
||||
|
||||
Think of it like pressing the main button on our universal remote.
|
||||
|
||||
1. **Input:** The agent gives the Model Interface a list of messages representing the conversation so far. This usually includes the system prompt (instructions for the LLM), the user's task, and any previous "Think -> Act -> Observe" steps stored in [AgentMemory](04_agentmemory.md). Each message typically has a `role` (like `user`, `assistant`, or `system`) and `content`.
|
||||
2. **Processing (Behind the Scenes):** The *specific* Model Interface (e.g., one for OpenAI, one for local models) takes this standard list of messages and:
|
||||
* Connects to the correct LLM (using API keys, loading a local model, etc.).
|
||||
* Formats the messages exactly how that LLM expects them.
|
||||
* Sends the request.
|
||||
* Waits for the LLM to generate a response.
|
||||
* Gets the response back.
|
||||
3. **Output:** It translates the LLM's raw response back into a standard `ChatMessage` object. This object contains the LLM's text response and, importantly, might include structured information if the LLM decided the agent should use a [Tool](03_tool.md). The agent knows exactly how to read this `ChatMessage`.
|
||||
|
||||
## Using a Model Interface
|
||||
|
||||
Let's see how you'd actually *use* one. `SmolaAgents` comes with several built-in Model Interfaces. A very useful one is `LiteLLMModel`, which uses the `litellm` library to connect to hundreds of different LLM providers (OpenAI, Anthropic, Cohere, Azure, local models via Ollama, etc.) with minimal code changes!
|
||||
|
||||
**Step 1: Choose and Initialize Your Model Interface**
|
||||
|
||||
First, you decide which LLM you want your agent to use. Then, you create an instance of the corresponding Model Interface.
|
||||
|
||||
```python
|
||||
# --- File: choose_model.py ---
|
||||
# Import the model interface you want to use
|
||||
from smolagents.models import LiteLLMModel
|
||||
# (You might need to install litellm first: pip install smolagents[litellm])
|
||||
|
||||
# Choose the specific LLM model ID that litellm supports
|
||||
# Example: OpenAI's GPT-3.5 Turbo
|
||||
# Requires setting the OPENAI_API_KEY environment variable!
|
||||
model_id = "gpt-3.5-turbo"
|
||||
|
||||
# Create an instance of the Model Interface
|
||||
# This object is our "universal remote" configured for GPT-3.5
|
||||
llm = LiteLLMModel(model_id=model_id)
|
||||
|
||||
print(f"Model Interface created for: {model_id}")
|
||||
# Example Output: Model Interface created for: gpt-3.5-turbo
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
* We import `LiteLLMModel`.
|
||||
* We specify the `model_id` we want to use (here, `"gpt-3.5-turbo"`). `litellm` knows how to talk to this model if the necessary API key (`OPENAI_API_KEY`) is available in your environment.
|
||||
* We create the `llm` object. This object now knows how to communicate with GPT-3.5 Turbo via the `litellm` library, but it presents a standard interface to the rest of our code.
|
||||
|
||||
**Step 2: Give the Model to the Agent**
|
||||
|
||||
Remember from Chapter 1 how we created the `MultiStepAgent`? We simply pass our `llm` object (the configured universal remote) to it.
|
||||
|
||||
```python
|
||||
# --- Continued from choose_model.py ---
|
||||
# (Requires imports from Chapter 1: MultiStepAgent, SearchTool, etc.)
|
||||
from smolagents import MultiStepAgent
|
||||
from smolagents.tools import SearchTool # Example Tool
|
||||
|
||||
# Define some tools (details in Chapter 3)
|
||||
search_tool = SearchTool()
|
||||
tools = [search_tool]
|
||||
|
||||
# Create the agent, giving it the model interface instance
|
||||
agent = MultiStepAgent(
|
||||
model=llm, # <= Here's where we plug in our "universal remote"!
|
||||
tools=tools
|
||||
)
|
||||
|
||||
print("MultiStepAgent created and configured with the model!")
|
||||
# Example Output: MultiStepAgent created and configured with the model!
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
* The `MultiStepAgent` doesn't need to know it's talking to GPT-3.5 Turbo specifically. It just knows it has a `model` object that it can call.
|
||||
|
||||
**Step 3: How the Agent Uses the Model (Simplified)**
|
||||
|
||||
Inside its "Think" phase, the agent prepares the conversation history and calls the model:
|
||||
|
||||
```python
|
||||
# --- Simplified view of what happens inside the agent ---
|
||||
from smolagents.models import ChatMessage, MessageRole
|
||||
|
||||
# Agent prepares messages (example)
|
||||
messages_for_llm = [
|
||||
{"role": MessageRole.SYSTEM, "content": "You are a helpful agent. Decide the next step."},
|
||||
{"role": MessageRole.USER, "content": "Task: What is the capital of France?"},
|
||||
# ... potentially previous steps ...
|
||||
]
|
||||
|
||||
# Agent calls the model using the standard interface
|
||||
# This is like pressing the main button on the universal remote
|
||||
print("Agent asking model: What should I do next?")
|
||||
response: ChatMessage = agent.model(messages_for_llm) # agent.model refers to our 'llm' instance
|
||||
|
||||
# Agent gets a standard response back
|
||||
print(f"Model suggested action (simplified): {response.content}")
|
||||
# Example Output (will vary):
|
||||
# Agent asking model: What should I do next?
|
||||
# Model suggested action (simplified): Thought: I need to find the capital of France. I can use the search tool.
|
||||
# Action:
|
||||
# ```json
|
||||
# {
|
||||
# "action": "search",
|
||||
# "action_input": "Capital of France"
|
||||
# }
|
||||
# ```
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
* The agent prepares a list of `messages_for_llm`.
|
||||
* It simply calls `agent.model(...)` which executes `llm(messages_for_llm)`.
|
||||
* The `LiteLLMModel` (`llm`) handles talking to the actual OpenAI API.
|
||||
* The agent receives a `ChatMessage` object, which it knows how to parse to find the next action (like using the `search` tool, as suggested in the example output).
|
||||
|
||||
## Under the Hood: How the "Universal Remote" Works
|
||||
|
||||
Let's peek behind the curtain. What happens when the agent calls `model(messages)`?
|
||||
|
||||
**Conceptual Steps:**
|
||||
|
||||
1. **Receive Request:** The specific Model Interface (e.g., `LiteLLMModel`) gets the standard list of messages from the agent.
|
||||
2. **Prepare Backend Request:** It looks at its own configuration (e.g., `model_id="gpt-3.5-turbo"`, API key) and translates the standard messages into the specific format the target LLM backend (e.g., the OpenAI API) requires. This might involve changing role names, structuring the data differently, etc.
|
||||
3. **Send to Backend:** It makes the actual network call to the LLM's API endpoint or runs the command to invoke a local model.
|
||||
4. **Receive Backend Response:** It gets the raw response back from the LLM (often as JSON or plain text).
|
||||
5. **Parse Response:** It parses this raw response, extracting the generated text and any structured data (like tool calls).
|
||||
6. **Return Standard Response:** It packages this information into a standard `ChatMessage` object and returns it to the agent.
|
||||
|
||||
**Diagram:**
|
||||
|
||||
Here's a simplified sequence diagram showing the flow:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Agent as MultiStepAgent
|
||||
participant ModelI as Model Interface (e.g., LiteLLMModel)
|
||||
participant Backend as Specific LLM API/Library (e.g., OpenAI)
|
||||
|
||||
Agent->>ModelI: call(standard_messages)
|
||||
ModelI->>ModelI: Translate messages to backend format
|
||||
ModelI->>Backend: Send API Request (formatted messages, API key)
|
||||
Backend-->>ModelI: Receive API Response (raw JSON/text)
|
||||
ModelI->>ModelI: Parse raw response into ChatMessage
|
||||
ModelI-->>Agent: Return ChatMessage object
|
||||
```
|
||||
|
||||
**Code Glimpse (Simplified):**
|
||||
|
||||
Let's look at `models.py` where these interfaces are defined.
|
||||
|
||||
* **Base Class (`Model`):** Defines the common structure, including the `__call__` method that all specific interfaces must implement.
|
||||
```python
|
||||
# --- File: models.py (Simplified Model base class) ---
|
||||
from typing import List, Dict, Optional
|
||||
from .tools import Tool # Reference to Tool concept
|
||||
|
||||
@dataclass
|
||||
class ChatMessage: # Simplified representation of the standard response
|
||||
role: str
|
||||
content: Optional[str] = None
|
||||
tool_calls: Optional[List[dict]] = None # For tool usage (Chapter 3)
|
||||
# ... other fields ...
|
||||
|
||||
class Model:
|
||||
def __init__(self, **kwargs):
|
||||
self.kwargs = kwargs # Stores model-specific settings
|
||||
# ...
|
||||
|
||||
# The standard "button" our agent presses!
|
||||
def __call__(
|
||||
self,
|
||||
messages: List[Dict[str, str]],
|
||||
stop_sequences: Optional[List[str]] = None,
|
||||
tools_to_call_from: Optional[List[Tool]] = None,
|
||||
**kwargs,
|
||||
) -> ChatMessage:
|
||||
# Each specific model interface implements this method
|
||||
raise NotImplementedError("Subclasses must implement the __call__ method.")
|
||||
|
||||
def _prepare_completion_kwargs(self, messages, **kwargs) -> Dict:
|
||||
# Helper to format messages and parameters for the backend
|
||||
# ... translation logic ...
|
||||
pass
|
||||
```
|
||||
|
||||
* **Specific Implementation (`LiteLLMModel`):** Inherits from `Model` and implements `__call__` using the `litellm` library.
|
||||
```python
|
||||
# --- File: models.py (Simplified LiteLLMModel __call__) ---
|
||||
import litellm # The library that talks to many LLMs
|
||||
|
||||
class LiteLLMModel(Model):
|
||||
def __init__(self, model_id: str, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self.model_id = model_id
|
||||
# LiteLLM typically uses environment variables for API keys
|
||||
|
||||
def __call__(
|
||||
self,
|
||||
messages: List[Dict[str, str]],
|
||||
stop_sequences: Optional[List[str]] = None,
|
||||
tools_to_call_from: Optional[List[Tool]] = None,
|
||||
**kwargs,
|
||||
) -> ChatMessage:
|
||||
# 1. Prepare arguments using the helper
|
||||
completion_kwargs = self._prepare_completion_kwargs(
|
||||
messages=messages,
|
||||
stop_sequences=stop_sequences,
|
||||
tools_to_call_from=tools_to_call_from,
|
||||
model=self.model_id, # Tell litellm which model
|
||||
# ... other parameters ...
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
# 2. Call the actual backend via litellm
|
||||
# This hides the complexity of different API calls!
|
||||
response = litellm.completion(**completion_kwargs)
|
||||
|
||||
# 3. Parse the response into our standard ChatMessage
|
||||
# (Simplified - actual parsing involves more details)
|
||||
raw_message = response.choices[0].message
|
||||
chat_message = ChatMessage(
|
||||
role=raw_message.role,
|
||||
content=raw_message.content,
|
||||
tool_calls=raw_message.tool_calls # If the LLM requested a tool
|
||||
)
|
||||
# ... store token counts, raw response etc. ...
|
||||
return chat_message
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
* The `Model` class defines the contract (the `__call__` method).
|
||||
* `LiteLLMModel` fulfills this contract. Its `__call__` method uses `_prepare_completion_kwargs` to format the request suitable for `litellm`.
|
||||
* The core work happens in `litellm.completion(...)`, which connects to the actual LLM service (like OpenAI).
|
||||
* The result is then parsed back into the standard `ChatMessage` format.
|
||||
|
||||
The beauty is that the `MultiStepAgent` only ever interacts with the `__call__` method, regardless of whether it's using `LiteLLMModel`, `TransformersModel` (for local models), or another interface.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The Model Interface is a vital piece of the `SmolaAgents` puzzle. It acts as a universal translator or remote control, allowing your `MultiStepAgent` to seamlessly communicate with a wide variety of Large Language Models without getting bogged down in the specific details of each one.
|
||||
|
||||
You've learned:
|
||||
|
||||
* Why a Model Interface is needed to handle diverse LLMs.
|
||||
* The "universal remote" analogy.
|
||||
* How the standard `__call__` method provides a consistent way for the agent to interact with the model.
|
||||
* How to choose, initialize, and provide a Model Interface (`LiteLLMModel` example) to your `MultiStepAgent`.
|
||||
* A glimpse into the internal process: translating requests, calling the backend LLM, and parsing responses.
|
||||
|
||||
Now that our agent has a brain (`MultiStepAgent`) and a way to talk to it (`Model Interface`), how does it actually *do* things based on the LLM's suggestions? How does it search the web, run code, or perform other actions? That's where our next component comes in!
|
||||
|
||||
**Next Chapter:** [Chapter 3: Tool](03_tool.md) - Giving Your Agent Capabilities.
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
292
docs/SmolaAgents/03_tool.md
Normal file
292
docs/SmolaAgents/03_tool.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# Chapter 3: Tool - Giving Your Agent Superpowers
|
||||
|
||||
Welcome back! In [Chapter 2: Model Interface](02_model_interface.md), we learned how our `MultiStepAgent` uses a "universal remote" (the Model Interface) to talk to its LLM "brain". The LLM thinks and suggests what the agent should do next.
|
||||
|
||||
But how does the agent actually *do* things? If the LLM suggests "Search the web for the capital of France," how does the agent perform the search? It can't just magically type into Google!
|
||||
|
||||
This is where **Tools** come in. They are the agent's hands and specialized equipment, allowing it to interact with the world beyond just generating text.
|
||||
|
||||
## The Problem: An Agent Trapped in its Mind
|
||||
|
||||
Imagine a brilliant chef who only knows recipes but is locked in an empty room. They can tell you exactly how to make a perfect soufflé, step-by-step, but they can't actually *do* any of it. They have no ingredients, no oven, no whisk, no bowls. They're stuck!
|
||||
|
||||
 🤔 -> 📝 Recipe (Think)
|
||||
|
||||
An agent without tools is like that chef. The LLM brain can reason and plan ("I need to search the web"), but the agent itself has no way to execute that plan ("How do I *actually* search?").
|
||||
|
||||
## The Solution: The Agent's Toolbox
|
||||
|
||||
Tools are specific capabilities we give to our agent. Think of them like the utensils and appliances in a kitchen drawer:
|
||||
|
||||
* **Peeler:** Used for peeling vegetables.
|
||||
* **Whisk:** Used for mixing ingredients.
|
||||
* **Oven:** Used for baking.
|
||||
* **Search Engine Tool:** Used for searching the web.
|
||||
* **Calculator Tool:** Used for performing calculations.
|
||||
* **Code Execution Tool:** Used for running computer code.
|
||||
|
||||
 -> 🔎 Search, 💻 Code Runner, ☁️ Weather API
|
||||
|
||||
Each tool is a reusable function that the agent can call upon to perform a specific action. The agent acts like the chef, looking at the next step in the recipe (the LLM's suggestion) and picking the right tool from its toolbox.
|
||||
|
||||
## What Makes a Tool?
|
||||
|
||||
Every tool in `SmolaAgents` needs a few key pieces of information so the agent (and the LLM helping it) can understand it:
|
||||
|
||||
1. **`name`**: A short, descriptive name for the tool (e.g., `web_search`, `calculator`). This is how the agent identifies which tool to use.
|
||||
2. **`description`**: A clear explanation of what the tool does, what it's good for, and what information it needs. This helps the LLM decide *when* to suggest using this tool. Example: *"Performs a web search using DuckDuckGo and returns the top results."*
|
||||
3. **`inputs`**: Defines what information the tool needs to do its job. This is like specifying that a peeler needs a vegetable, or a calculator needs numbers and an operation. It's defined as a dictionary where keys are argument names and values describe the type and purpose. Example: `{"query": {"type": "string", "description": "The search query"}}`.
|
||||
4. **`output_type`**: Describes the type of result the tool will return (e.g., `string`, `number`, `image`).
|
||||
5. **`forward` method**: This is the actual Python code that gets executed when the tool is used. It takes the defined `inputs` as arguments and performs the tool's action, returning the result.
|
||||
|
||||
## Creating Your First Tool: The `GreetingTool`
|
||||
|
||||
Let's build a very simple tool. Imagine we want our agent to be able to greet someone by name.
|
||||
|
||||
We'll create a `GreetingTool` by inheriting from the base `Tool` class provided by `SmolaAgents`.
|
||||
|
||||
```python
|
||||
# --- File: simple_tools.py ---
|
||||
from smolagents import Tool # Import the base class
|
||||
|
||||
class GreetingTool(Tool):
|
||||
"""A simple tool that generates a greeting."""
|
||||
|
||||
# 1. Give it a unique name
|
||||
name: str = "greet_person"
|
||||
|
||||
# 2. Describe what it does clearly
|
||||
description: str = "Greets a person by their name."
|
||||
|
||||
# 3. Define the inputs it needs
|
||||
# It needs one input: the 'name' of the person, which should be a string.
|
||||
inputs: dict = {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "The name of the person to greet."
|
||||
}
|
||||
}
|
||||
|
||||
# 4. Specify the type of the output
|
||||
# It will return the greeting as a string.
|
||||
output_type: str = "string"
|
||||
|
||||
# 5. Implement the action in the 'forward' method
|
||||
def forward(self, name: str) -> str:
|
||||
"""The actual code that runs when the tool is called."""
|
||||
print(f"--- GreetingTool activated with name: {name} ---")
|
||||
greeting = f"Hello, {name}! Nice to meet you."
|
||||
return greeting
|
||||
|
||||
# Let's test it quickly (outside the agent context)
|
||||
greeter = GreetingTool()
|
||||
result = greeter(name="Alice") # Calling the tool instance
|
||||
print(f"Tool returned: '{result}'")
|
||||
|
||||
# Expected Output:
|
||||
# --- GreetingTool activated with name: Alice ---
|
||||
# Tool returned: 'Hello, Alice! Nice to meet you.'
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. **Import:** We import the base `Tool` class.
|
||||
2. **Class Definition:** We define `GreetingTool` inheriting from `Tool`.
|
||||
3. **Attributes:** We set the required class attributes: `name`, `description`, `inputs`, and `output_type`. These tell the agent everything it needs to know *about* the tool without running it.
|
||||
4. **`forward` Method:** This method contains the core logic. It takes the `name` (defined in `inputs`) as an argument and returns the greeting string. We added a `print` statement just to see when it runs.
|
||||
5. **Testing:** We create an instance `greeter` and call it like a function, passing the required argument `name="Alice"`. It executes the `forward` method and returns the result.
|
||||
|
||||
This `GreetingTool` is now ready to be added to an agent's toolbox!
|
||||
|
||||
## Adding the Tool to Your Agent
|
||||
|
||||
Remember how we created our `MultiStepAgent` in [Chapter 1](01_multistepagent.md)? We gave it a model and a list of tools. Let's add our new `GreetingTool`:
|
||||
|
||||
```python
|
||||
# --- File: agent_with_greeting.py ---
|
||||
# (Assuming GreetingTool is defined as above or imported)
|
||||
# from simple_tools import GreetingTool
|
||||
from smolagents import MultiStepAgent
|
||||
from smolagents.models import LiteLLMModel # From Chapter 2
|
||||
# Potentially other tools like SearchTool etc.
|
||||
|
||||
# 1. Create an instance of our new tool
|
||||
greeting_tool = GreetingTool()
|
||||
|
||||
# 2. Create instances of any other tools the agent might need
|
||||
# search_tool = SearchTool() # Example from Chapter 1
|
||||
|
||||
# 3. Choose a language model (the "brain")
|
||||
llm = LiteLLMModel(model_id="gpt-3.5-turbo") # Needs API key setup
|
||||
|
||||
# 4. Create the MultiStepAgent, passing the tool(s) in a list
|
||||
agent = MultiStepAgent(
|
||||
model=llm,
|
||||
tools=[greeting_tool] # Add our tool here! Maybe add search_tool too?
|
||||
# tools=[greeting_tool, search_tool]
|
||||
)
|
||||
|
||||
print("Agent created with GreetingTool!")
|
||||
|
||||
# 5. Give the agent a task that might use the tool
|
||||
task = "Greet the user named Bob."
|
||||
print(f"Running agent with task: '{task}'")
|
||||
|
||||
# The agent will now start its Think-Act-Observe cycle...
|
||||
final_answer = agent.run(task)
|
||||
|
||||
print("-" * 20)
|
||||
print(f"Final Answer received: {final_answer}")
|
||||
|
||||
# --- Expected Interaction (Simplified) ---
|
||||
# Agent (thinks): The task is to greet Bob. I have a 'greet_person' tool.
|
||||
# Agent (acts): Use tool 'greet_person' with input name="Bob".
|
||||
# --- GreetingTool activated with name: Bob --- (Our print statement)
|
||||
# Agent (observes): Tool returned "Hello, Bob! Nice to meet you."
|
||||
# Agent (thinks): I have the greeting. That completes the task.
|
||||
# Agent (acts): Use 'final_answer' tool with "Hello, Bob! Nice to meet you."
|
||||
# --------------------
|
||||
# Final Answer received: Hello, Bob! Nice to meet you.
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. We create an instance of `GreetingTool`.
|
||||
2. We put this instance into the `tools` list when initializing `MultiStepAgent`.
|
||||
3. The agent now "knows" about the `greet_person` tool, its description, and how to use it (via its `name` and `inputs`).
|
||||
4. When we run the `agent` with the task "Greet the user named Bob," the LLM (using the tool descriptions provided in the prompt) will likely recognize that the `greet_person` tool is perfect for this.
|
||||
5. The agent will then execute the `greeting_tool.forward(name="Bob")` method during its "Act" phase.
|
||||
|
||||
## How the Agent Uses Tools: Under the Hood
|
||||
|
||||
Let's revisit the **Think -> Act -> Observe** cycle from [Chapter 1](01_multistepagent.md) and see exactly where tools fit in.
|
||||
|
||||
1. **Think:** The agent gathers its history ([AgentMemory](04_agentmemory.md)) and the available tool descriptions. It sends this to the LLM via the [Model Interface](02_model_interface.md) asking, "What should I do next to accomplish the task 'Greet Bob'?" The LLM, seeing the `greet_person` tool description, might respond with something like:
|
||||
```json
|
||||
{
|
||||
"thought": "The user wants me to greet Bob. I should use the 'greet_person' tool.",
|
||||
"action": "greet_person",
|
||||
"action_input": {"name": "Bob"}
|
||||
}
|
||||
```
|
||||
*(Note: The exact format depends on the agent type and model. Some models use explicit tool-calling formats like the one shown in Chapter 2's `ToolCallingAgent` example output).*
|
||||
|
||||
2. **Act:** The `MultiStepAgent` receives this response.
|
||||
* It parses the response to identify the intended `action` (`greet_person`) and the `action_input` (`{"name": "Bob"}`).
|
||||
* It looks up the tool named `greet_person` in its `self.tools` dictionary.
|
||||
* It calls the `forward` method of that tool instance, passing the arguments from `action_input`. In our case: `greeting_tool.forward(name="Bob")`.
|
||||
* This executes our Python code inside the `forward` method.
|
||||
|
||||
3. **Observe:** The agent captures the return value from the `forward` method (e.g., `"Hello, Bob! Nice to meet you."`). This becomes the "observation" for this step.
|
||||
* This observation is stored in the [AgentMemory](04_agentmemory.md).
|
||||
* The cycle repeats: The agent thinks again, now considering the result of the greeting tool. It likely decides the task is complete and uses the built-in `final_answer` tool.
|
||||
|
||||
Here's a simplified diagram:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Agent as MultiStepAgent
|
||||
participant LLM as LLM Brain
|
||||
participant GreetTool as GreetingTool
|
||||
|
||||
Agent->>LLM: Task: Greet Bob. Tools: [greet_person]. What next?
|
||||
LLM-->>Agent: Use tool 'greet_person' with name='Bob'
|
||||
Agent->>GreetTool: forward(name="Bob")
|
||||
GreetTool-->>Agent: "Hello, Bob! Nice to meet you." (Observation)
|
||||
Agent->>LLM: Observation: "Hello, Bob!..." Task done?
|
||||
LLM-->>Agent: Use tool 'final_answer' with "Hello, Bob!..."
|
||||
Agent-->>User: "Hello, Bob! Nice to meet you."
|
||||
```
|
||||
|
||||
**Code Glimpse (Simplified `execute_tool_call`):**
|
||||
|
||||
Inside the `agents.py` file (specifically within agent types like `ToolCallingAgent`), there's logic similar to this (heavily simplified):
|
||||
|
||||
```python
|
||||
# --- Simplified concept from agents.py ---
|
||||
class SomeAgentType(MultiStepAgent):
|
||||
# ... other methods ...
|
||||
|
||||
def execute_tool_call(self, tool_name: str, arguments: dict) -> Any:
|
||||
# Find the tool in the agent's toolbox
|
||||
if tool_name in self.tools:
|
||||
tool_instance = self.tools[tool_name]
|
||||
try:
|
||||
# Call the tool's forward method with the arguments!
|
||||
# This is where GreetingTool.forward(name="Bob") happens.
|
||||
result = tool_instance(**arguments) # Uses ** to unpack the dict
|
||||
return result
|
||||
except Exception as e:
|
||||
# Handle errors if the tool fails
|
||||
print(f"Error executing tool {tool_name}: {e}")
|
||||
return f"Error: Tool {tool_name} failed."
|
||||
# ... handle case where tool_name is not found ...
|
||||
elif tool_name == "final_answer":
|
||||
# Special handling for the final answer
|
||||
return arguments.get("answer", arguments) # Return the final answer content
|
||||
else:
|
||||
return f"Error: Unknown tool {tool_name}."
|
||||
|
||||
def step(self, memory_step: ActionStep):
|
||||
# ... (Agent thinks and gets LLM response) ...
|
||||
llm_response = # ... result from self.model(...) ...
|
||||
|
||||
if llm_response suggests a tool call:
|
||||
tool_name = # ... parse tool name from response ...
|
||||
arguments = # ... parse arguments from response ...
|
||||
|
||||
# === ACT ===
|
||||
observation = self.execute_tool_call(tool_name, arguments)
|
||||
memory_step.observations = str(observation) # Store observation
|
||||
|
||||
if tool_name == "final_answer":
|
||||
return observation # Signal that this is the final answer
|
||||
# ... (handle cases where LLM gives text instead of tool call) ...
|
||||
return None # Not the final answer yet
|
||||
```
|
||||
|
||||
This shows the core idea: the agent gets the `tool_name` and `arguments` from the LLM, finds the corresponding `Tool` object, and calls its `forward` method using the arguments.
|
||||
|
||||
## Common Built-in Tools
|
||||
|
||||
`SmolaAgents` comes with several useful tools ready to use (found in `default_tools.py`):
|
||||
|
||||
* **`DuckDuckGoSearchTool` (`web_search`)**: Searches the web using DuckDuckGo.
|
||||
* **`PythonInterpreterTool` (`python_interpreter`)**: Executes Python code snippets safely. Very powerful for calculations, data manipulation, etc. (Used primarily by `CodeAgent`, see [Chapter 6: PythonExecutor](06_pythonexecutor.md)).
|
||||
* **`VisitWebpageTool` (`visit_webpage`)**: Fetches the content of a webpage URL.
|
||||
* **`FinalAnswerTool` (`final_answer`)**: A special, essential tool. The agent uses this *only* when it believes it has completed the task and has the final result. Calling this tool usually ends the agent's run. It's automatically added to every agent.
|
||||
|
||||
You can import and use these just like we used our `GreetingTool`:
|
||||
|
||||
```python
|
||||
from smolagents.tools import DuckDuckGoSearchTool, FinalAnswerTool # FinalAnswerTool is usually added automatically
|
||||
|
||||
search_tool = DuckDuckGoSearchTool()
|
||||
# calculator_tool = PythonInterpreterTool() # Often used internally by CodeAgent
|
||||
|
||||
agent = MultiStepAgent(
|
||||
model=llm,
|
||||
tools=[search_tool] # Agent can now search!
|
||||
)
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
Tools are the bridge between an agent's reasoning and the real world (or specific functionalities like code execution). They are reusable capabilities defined by their `name`, `description`, `inputs`, `output_type`, and the core logic in their `forward` method.
|
||||
|
||||
You've learned:
|
||||
|
||||
* Why agents need tools (like a chef needs utensils).
|
||||
* The essential components of a `Tool` in `SmolaAgents`.
|
||||
* How to create a simple custom tool (`GreetingTool`).
|
||||
* How to give tools to your `MultiStepAgent`.
|
||||
* How the agent uses the LLM's suggestions to select and execute the correct tool during the "Act" phase.
|
||||
* About some common built-in tools.
|
||||
|
||||
By equipping your agent with the right set of tools, you dramatically expand the range of tasks it can accomplish! But as the agent takes multiple steps, using tools and getting results, how does it keep track of everything that has happened? That's where memory comes in.
|
||||
|
||||
**Next Chapter:** [Chapter 4: AgentMemory](04_agentmemory.md) - The Agent's Notepad.
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
357
docs/SmolaAgents/04_agentmemory.md
Normal file
357
docs/SmolaAgents/04_agentmemory.md
Normal file
@@ -0,0 +1,357 @@
|
||||
# Chapter 4: AgentMemory - The Agent's Notepad
|
||||
|
||||
Welcome back! In [Chapter 3: Tool](03_tool.md), we equipped our agent with "superpowers" – tools like web search or calculators that let it interact with the world and perform actions. We saw how the agent's "brain" (the LLM) decides which tool to use, and the agent executes it.
|
||||
|
||||
But wait... how does the agent remember what it has already done? If it searches for the capital of France in Step 1, how does it remember "Paris" when deciding what to do in Step 2 (like finding the weather in Paris)?
|
||||
|
||||
This is where **AgentMemory** comes in. Think of it as the agent's dedicated notepad or, even better, a **ship's logbook**.
|
||||
|
||||
## The Problem: An Agent with Amnesia
|
||||
|
||||
Imagine a captain sailing a ship on a long voyage. After each hour, they completely forget everything that happened before – the course they set, the islands they passed, the storms they weathered. How could they possibly reach their destination? They'd be lost!
|
||||
|
||||
 ❓ "Where am I? What was I doing?"
|
||||
|
||||
An agent without memory is like that forgetful captain. It might perform a single action correctly, but it wouldn't understand the context. It wouldn't know:
|
||||
|
||||
* What the original goal (task) was.
|
||||
* What steps it has already taken.
|
||||
* What results (observations) it got from those steps.
|
||||
* What errors it might have encountered.
|
||||
|
||||
Without this history, the agent can't make informed decisions about what to do next. It can't build upon previous results or learn from mistakes within the same task.
|
||||
|
||||
## The Solution: The Ship's Logbook (`AgentMemory`)
|
||||
|
||||
The `AgentMemory` is the component that solves this problem. It automatically records every significant event during the agent's "voyage" (its execution run).
|
||||
|
||||
 📜 "Log Entry: Searched 'Capital of France'. Result: 'Paris'."
|
||||
|
||||
Just like a ship's logbook helps the captain navigate, the `AgentMemory` helps the agent maintain context and proceed effectively.
|
||||
|
||||
## What Does the `AgentMemory` Store?
|
||||
|
||||
The `AgentMemory` keeps a chronological record of the agent's journey. For each run, it typically stores:
|
||||
|
||||
1. **System Prompt:** The initial instructions given to the agent's LLM brain (we'll see more in [Chapter 5: PromptTemplates](05_prompttemplates.md)).
|
||||
2. **Initial Task:** The main goal the user gave the agent (e.g., "What is the capital of France, and what is its current weather?").
|
||||
3. **Steps:** A list detailing each cycle of the agent's operation:
|
||||
* **Planning (Optional):** If the agent makes plans, the plan itself is recorded.
|
||||
* **Thinking:** The LLM's reasoning process and the action it decided to take (e.g., "Thought: I need the capital. Action: Use `search` tool").
|
||||
* **Action:** The specific [Tool](03_tool.md) called and the arguments used (e.g., `search("Capital of France")`). This could also be code execution for code-based agents.
|
||||
* **Observation:** The result received after performing the action (e.g., "Paris").
|
||||
* **Errors:** If something went wrong during the step (e.g., a tool failed), the error is noted.
|
||||
|
||||
This detailed history allows the agent (specifically, the LLM guiding it) to look back at any point and understand the full context before deciding the next move.
|
||||
|
||||
## How is `AgentMemory` Used? (Mostly Automatic!)
|
||||
|
||||
The good news is that you, as the user, usually don't need to interact directly with `AgentMemory`. The `MultiStepAgent` manages it automatically behind the scenes!
|
||||
|
||||
Here's the key interaction:
|
||||
|
||||
1. **Before "Thinking":** When the agent needs to decide the next step (the "Think" phase), the `MultiStepAgent` asks the `AgentMemory` to format the recorded history (task, previous actions, observations, errors) into a sequence of messages. This happens via a method often called `write_memory_to_messages`.
|
||||
2. **Consulting the Brain:** This formatted history is sent to the LLM via the [Model Interface](02_model_interface.md). This gives the LLM the full context it needs to provide a sensible next step. ("Okay, based on the task 'Capital and Weather', and the fact we just found 'Paris', what should we do now?").
|
||||
3. **After "Acting" and "Observing":** Once the agent performs an action and gets an observation (or an error), the `MultiStepAgent` records this new information as a new step in the `AgentMemory`.
|
||||
|
||||
So, the memory is constantly being read from (to inform the LLM) and written to (to record new events).
|
||||
|
||||
## Example Revisited: Capital and Weather Logbook
|
||||
|
||||
Let's trace our "Capital of France and Weather" example from [Chapter 1: MultiStepAgent](01_multistepagent.md) and see what the `AgentMemory` logbook might look like (simplified):
|
||||
|
||||
**(Start of Run)**
|
||||
|
||||
1. **System Prompt:** Recorded (e.g., "You are a helpful assistant...")
|
||||
2. **Task:** Recorded (`task: "What is the capital of France, and what is its current weather?"`)
|
||||
|
||||
**(Step 1)**
|
||||
|
||||
3. **Think/Action:** Recorded (`thought: "Need capital.", action: search("Capital of France")`)
|
||||
4. **Observation:** Recorded (`observation: "Paris"`)
|
||||
|
||||
**(Step 2)**
|
||||
|
||||
5. **Think/Action:** Recorded (`thought: "Have capital (Paris), need weather.", action: weather("Paris")`)
|
||||
6. **Observation:** Recorded (`observation: "Sunny, 25°C"`)
|
||||
|
||||
**(Step 3)**
|
||||
|
||||
7. **Think/Action:** Recorded (`thought: "Have capital and weather. Task complete.", action: final_answer("The capital of France is Paris, and the current weather there is Sunny, 25°C.")`)
|
||||
8. **Observation:** Recorded (Result of `final_answer` is the final output).
|
||||
|
||||
**(End of Run)**
|
||||
|
||||
Now, before Step 2 started, the agent would read entries 1-4 from memory to give context to the LLM. Before Step 3, it would read entries 1-6. This prevents the agent from forgetting what it's doing!
|
||||
|
||||
## Under the Hood: Memory Structure
|
||||
|
||||
How does `SmolaAgents` actually implement this?
|
||||
|
||||
**Core Idea:** The `AgentMemory` object holds a list called `steps`. Each item in this list represents one distinct event or phase in the agent's run. These items are usually instances of specific "Step" classes.
|
||||
|
||||
**Key Step Types (Simplified from `memory.py`):**
|
||||
|
||||
* `SystemPromptStep`: Stores the initial system prompt text.
|
||||
* `TaskStep`: Stores the user's task description (and potentially input images).
|
||||
* `PlanningStep` (Optional): Stores any explicit plans the agent generates.
|
||||
* `ActionStep`: This is the most common one, recording a single Think-Act-Observe cycle. It contains fields for:
|
||||
* `step_number`
|
||||
* `model_input_messages`: What was sent to the LLM for thinking.
|
||||
* `model_output_message`: The LLM's raw response (thought + action plan).
|
||||
* `tool_calls`: Which [Tool](03_tool.md) was called (name, arguments). Stored as `ToolCall` objects.
|
||||
* `observations`: The result returned by the tool.
|
||||
* `error`: Any error that occurred.
|
||||
* `start_time`, `end_time`, `duration`: Timing information.
|
||||
* `FinalAnswerStep`: A special step indicating the final result returned by the agent.
|
||||
|
||||
**Interaction Flow:**
|
||||
|
||||
Here's how the `MultiStepAgent` uses `AgentMemory`:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant MSA as MultiStepAgent
|
||||
participant Memory as AgentMemory
|
||||
participant Model as LLM Brain
|
||||
participant Tool
|
||||
|
||||
User->>MSA: run("Task: Capital & Weather?")
|
||||
MSA->>Memory: Store TaskStep("Capital & Weather?")
|
||||
loop Think-Act-Observe Cycle (Step 1)
|
||||
MSA->>Memory: write_memory_to_messages() --> Get History [Task]
|
||||
MSA->>Model: What's next? (with History)
|
||||
Model-->>MSA: Think: Need capital. Act: search(...) -> LLM Response
|
||||
MSA->>Memory: Store LLM Response in new ActionStep
|
||||
MSA->>Tool: Execute search(...)
|
||||
Tool-->>MSA: Observation: "Paris"
|
||||
MSA->>Memory: Store Observation in current ActionStep
|
||||
MSA->>Memory: Append finished ActionStep to steps list
|
||||
end
|
||||
loop Think-Act-Observe Cycle (Step 2)
|
||||
MSA->>Memory: write_memory_to_messages() --> Get History [Task, Step 1]
|
||||
MSA->>Model: What's next? (with History)
|
||||
Model-->>MSA: Think: Need weather. Act: weather(...) -> LLM Response
|
||||
MSA->>Memory: Store LLM Response in new ActionStep
|
||||
MSA->>Tool: Execute weather(...)
|
||||
Tool-->>MSA: Observation: "Sunny, 25C"
|
||||
MSA->>Memory: Store Observation in current ActionStep
|
||||
MSA->>Memory: Append finished ActionStep to steps list
|
||||
end
|
||||
MSA-->>User: Final Answer
|
||||
```
|
||||
|
||||
**Code Glimpse (Simplified):**
|
||||
|
||||
Let's look at some relevant pieces from `memory.py` and `agents.py`.
|
||||
|
||||
* **Memory Step Dataclasses (`memory.py`):** Define the structure of log entries.
|
||||
|
||||
```python
|
||||
# --- File: memory.py (Simplified Step Structures) ---
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Any, Dict
|
||||
|
||||
@dataclass
|
||||
class ToolCall: # Represents a tool invocation request
|
||||
name: str
|
||||
arguments: Any
|
||||
id: str # Unique ID for matching responses
|
||||
|
||||
@dataclass
|
||||
class MemoryStep: # Base class for all memory entries
|
||||
def to_messages(self, **kwargs) -> List[Dict[str, Any]]:
|
||||
# Each step type knows how to format itself for the LLM
|
||||
raise NotImplementedError
|
||||
|
||||
@dataclass
|
||||
class TaskStep(MemoryStep):
|
||||
task: str
|
||||
# ... (potentially images)
|
||||
def to_messages(self, **kwargs) -> List[Dict[str, Any]]:
|
||||
# Format: {"role": "user", "content": [{"type": "text", "text": "New task: ..."}]}
|
||||
# ... simplified ...
|
||||
return [{"role": "user", "content": f"New task:\n{self.task}"}]
|
||||
|
||||
@dataclass
|
||||
class ActionStep(MemoryStep):
|
||||
step_number: int
|
||||
# model_input_messages: List = None # What was sent to LLM
|
||||
model_output: str | None = None # LLM's thought/action text
|
||||
tool_calls: List[ToolCall] | None = None # Parsed tool calls
|
||||
observations: str | None = None # Tool results or code output
|
||||
error: Any | None = None # Any error encountered
|
||||
# ... other fields like timing ...
|
||||
|
||||
def to_messages(self, **kwargs) -> List[Dict[str, Any]]:
|
||||
# Formats the LLM output, tool calls, observations/errors
|
||||
# into messages for the next LLM call.
|
||||
messages = []
|
||||
if self.model_output:
|
||||
messages.append({"role": "assistant", "content": self.model_output})
|
||||
if self.tool_calls:
|
||||
# Simplified representation
|
||||
messages.append({"role": "tool_call", "content": f"Calling: {self.tool_calls[0].name}(...)"})
|
||||
if self.observations:
|
||||
messages.append({"role": "tool_response", "content": f"Observation:\n{self.observations}"})
|
||||
if self.error:
|
||||
messages.append({"role": "tool_response", "content": f"Error:\n{self.error}"})
|
||||
return messages
|
||||
|
||||
# ... potentially other step types like SystemPromptStep, PlanningStep ...
|
||||
```
|
||||
|
||||
* **AgentMemory Class (`memory.py`):** Holds the list of steps.
|
||||
|
||||
```python
|
||||
# --- File: memory.py (Simplified AgentMemory) ---
|
||||
from typing import List, Union
|
||||
|
||||
@dataclass
|
||||
class SystemPromptStep(MemoryStep): # Simplified
|
||||
system_prompt: str
|
||||
def to_messages(self, **kwargs): # Simplified
|
||||
return [{"role": "system", "content": self.system_prompt}]
|
||||
|
||||
class AgentMemory:
|
||||
def __init__(self, system_prompt: str):
|
||||
# Initialize with the system prompt
|
||||
self.system_prompt = SystemPromptStep(system_prompt=system_prompt)
|
||||
# The main logbook - a list of steps taken
|
||||
self.steps: List[Union[TaskStep, ActionStep, PlanningStep]] = []
|
||||
|
||||
def reset(self):
|
||||
"""Clears the memory for a new run."""
|
||||
self.steps = []
|
||||
|
||||
def replay(self, logger, detailed: bool = False):
|
||||
"""Utility to print the memory steps nicely."""
|
||||
# ... implementation uses logger to print each step ...
|
||||
pass
|
||||
```
|
||||
|
||||
* **Agent Using Memory (`agents.py`):** How `MultiStepAgent` reads and writes.
|
||||
|
||||
```python
|
||||
# --- File: agents.py (Simplified MultiStepAgent interactions) ---
|
||||
from .memory import AgentMemory, TaskStep, ActionStep, ToolCall # Import memory components
|
||||
|
||||
class MultiStepAgent:
|
||||
def __init__(self, ..., memory: Optional[AgentMemory] = None):
|
||||
# ... setup model, tools ...
|
||||
self.system_prompt = self.initialize_system_prompt() # Define system prompt
|
||||
# Create the memory instance
|
||||
self.memory = memory if memory is not None else AgentMemory(self.system_prompt)
|
||||
# ... setup logger, monitor ...
|
||||
|
||||
def run(self, task: str, ...):
|
||||
# ... setup ...
|
||||
if reset: # Option to clear memory before a new run
|
||||
self.memory.reset()
|
||||
|
||||
# Record the initial task in memory
|
||||
self.memory.steps.append(TaskStep(task=self.task))
|
||||
|
||||
# Start the internal execution loop (_run)
|
||||
# ... calls _run ...
|
||||
final_result = # ... get result from _run ...
|
||||
return final_result
|
||||
|
||||
def _run(self, task: str, max_steps: int, ...) -> Generator:
|
||||
# ... loop initialization ...
|
||||
while final_answer is None and self.step_number <= max_steps:
|
||||
# ... (handle planning steps if enabled) ...
|
||||
|
||||
# Create a placeholder for the current step's data
|
||||
action_step = self._create_action_step(...)
|
||||
|
||||
try:
|
||||
# === Execute one step (Think -> Act -> Observe) ===
|
||||
# This method internally calls write_memory_to_messages,
|
||||
# calls the model, executes the tool, and populates
|
||||
# the 'action_step' object with results.
|
||||
final_answer = self._execute_step(task, action_step)
|
||||
|
||||
except AgentError as e:
|
||||
# Record errors in the memory step
|
||||
action_step.error = e
|
||||
finally:
|
||||
# Finalize timing etc. for the step
|
||||
self._finalize_step(action_step, ...)
|
||||
# === Store the completed step in memory ===
|
||||
self.memory.steps.append(action_step)
|
||||
# ... yield step details ...
|
||||
self.step_number += 1
|
||||
# ... handle finish ...
|
||||
yield FinalAnswerStep(final_answer)
|
||||
|
||||
|
||||
def write_memory_to_messages(self, summary_mode: Optional[bool] = False) -> List[Dict[str, str]]:
|
||||
"""
|
||||
Reads history from memory and formats it for the LLM.
|
||||
"""
|
||||
messages = self.memory.system_prompt.to_messages(summary_mode=summary_mode)
|
||||
# Go through each step recorded in memory
|
||||
for memory_step in self.memory.steps:
|
||||
# Ask each step to format itself into messages
|
||||
messages.extend(memory_step.to_messages(summary_mode=summary_mode))
|
||||
return messages
|
||||
|
||||
def _execute_step(self, task: str, memory_step: ActionStep) -> Union[None, Any]:
|
||||
self.logger.log_rule(f"Step {self.step_number}", level=LogLevel.INFO)
|
||||
# === THINK ===
|
||||
# 1. Get history from memory
|
||||
messages_for_llm = self.write_memory_to_messages()
|
||||
memory_step.model_input_messages = messages_for_llm # Record input to LLM
|
||||
|
||||
# 2. Call the LLM brain
|
||||
llm_response = self.model(messages_for_llm, ...) # Call Model Interface
|
||||
memory_step.model_output_message = llm_response # Record LLM response
|
||||
|
||||
# 3. Parse LLM response for action
|
||||
# (Specific parsing logic depends on AgentType - ToolCallingAgent, CodeAgent)
|
||||
tool_name, arguments = self._parse_action(llm_response) # Simplified
|
||||
memory_step.tool_calls = [ToolCall(name=tool_name, arguments=arguments, id=...)]
|
||||
|
||||
# === ACT & OBSERVE ===
|
||||
# 4. Execute the action (tool call or code)
|
||||
observation = self._execute_action(tool_name, arguments) # Simplified
|
||||
|
||||
# 5. Record observation
|
||||
memory_step.observations = str(observation)
|
||||
|
||||
# 6. Check if it's the final answer
|
||||
if tool_name == "final_answer":
|
||||
return observation # Return the final answer to stop the loop
|
||||
else:
|
||||
return None # Continue to the next step
|
||||
|
||||
# ... other methods like _create_action_step, _finalize_step ...
|
||||
```
|
||||
|
||||
**Key Takeaways from Code:**
|
||||
* Memory holds a list of `Step` objects (`self.memory.steps`).
|
||||
* The agent adds new `TaskStep` or `ActionStep` objects to this list as it progresses (`self.memory.steps.append(...)`).
|
||||
* Before calling the LLM, `write_memory_to_messages` iterates through `self.memory.steps`, calling `to_messages()` on each step to build the history.
|
||||
* Each step (like `ActionStep`) stores details like the LLM's output (`model_output`), tool calls (`tool_calls`), and results (`observations` or `error`).
|
||||
|
||||
## Conclusion
|
||||
|
||||
`AgentMemory` is the agent's essential logbook, providing the context needed to navigate complex, multi-step tasks. It diligently records the initial task, system instructions, and every action, observation, and error along the way.
|
||||
|
||||
You've learned:
|
||||
|
||||
* Why memory is crucial for agents (avoiding amnesia).
|
||||
* The "ship's logbook" analogy.
|
||||
* What kind of information `AgentMemory` stores (task, system prompt, steps with thoughts, actions, observations, errors).
|
||||
* How the `MultiStepAgent` uses memory automatically: reading history before thinking, and writing results after acting/observing.
|
||||
* The basic structure of `AgentMemory` and its `Step` objects (`TaskStep`, `ActionStep`).
|
||||
|
||||
While you often don't need to manipulate memory directly, understanding its role is key to understanding how agents maintain context and achieve complex goals. The content of this memory directly influences the prompts sent to the LLM. How can we customize those prompts? Let's find out!
|
||||
|
||||
**Next Chapter:** [Chapter 5: PromptTemplates](05_prompttemplates.md) - Customizing Your Agent's Instructions.
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
257
docs/SmolaAgents/05_prompttemplates.md
Normal file
257
docs/SmolaAgents/05_prompttemplates.md
Normal file
@@ -0,0 +1,257 @@
|
||||
# Chapter 5: PromptTemplates - Crafting Your Agent's Script
|
||||
|
||||
Welcome back! In [Chapter 4: AgentMemory](04_agentmemory.md), we learned how our agent uses its "logbook" (`AgentMemory`) to remember the task, its past actions, and observations. This memory is crucial for deciding the next step.
|
||||
|
||||
But how exactly does the agent *use* this memory to talk to its LLM brain ([Chapter 2: Model Interface](02_model_interface.md))? How does it tell the LLM:
|
||||
* "Here's your overall job..."
|
||||
* "Here are the tools ([Chapter 3: Tool](03_tool.md)) you can use..."
|
||||
* "Here's the specific task..."
|
||||
* "Here's what happened so far..."
|
||||
* "Now, tell me what to do next!"
|
||||
|
||||
Simply dumping the raw memory might confuse the LLM. We need a structured way to present this information – like giving someone clear, consistent instructions. This is where **PromptTemplates** come in!
|
||||
|
||||
## The Problem: Giving Clear Instructions Every Time
|
||||
|
||||
Imagine you have a very capable assistant, but you need to explain their role and the current task *every single time* you talk to them. You'd want a standard way to do this, right? You'd probably have a template:
|
||||
|
||||
* "Good morning! Remember, your main goal is [Overall Goal]."
|
||||
* "For this specific task, [Task Description], you have these resources available: [List of Resources]."
|
||||
* "So far, we've done [Summary of Progress]."
|
||||
* "What should we do next?"
|
||||
|
||||
If you just improvised every time, your instructions might be inconsistent, confusing, or miss important details.
|
||||
|
||||
Our AI agent faces the same challenge. It needs to send instructions (prompts) to the LLM at various points (like the very beginning, before each step, maybe when planning). These instructions need to include:
|
||||
* The agent's basic persona and rules.
|
||||
* Descriptions of the available [Tools](03_tool.md).
|
||||
* The current `task`.
|
||||
* Relevant parts of the [AgentMemory](04_agentmemory.md).
|
||||
|
||||
How can we manage these instructions effectively and dynamically include the specific details for the current situation?
|
||||
|
||||
## The Solution: Mad Libs for Agents! (`PromptTemplates`)
|
||||
|
||||
Remember Mad Libs? The game where you have a story template with blanks like `[NOUN]`, `[VERB]`, `[ADJECTIVE]`, and you fill them in to create a funny story?
|
||||
|
||||

|
||||
|
||||
**PromptTemplates** in `SmolaAgents` work a lot like that!
|
||||
|
||||
* They are a collection of **pre-written instruction templates**.
|
||||
* These templates have **placeholders** (like `{{ task }}` or `{{ tools }}`) for information that changes with each run or step.
|
||||
* They use a powerful templating engine called **Jinja2** (common in web development) to fill in these blanks.
|
||||
* The `MultiStepAgent` automatically picks the right template, fills in the blanks with current data (like the task description, tool list from [Chapter 3: Tool](03_tool.md), or memory summary from [Chapter 4: AgentMemory](04_agentmemory.md)), and sends the final, complete prompt to the LLM.
|
||||
|
||||
This ensures the LLM gets clear, consistent, and context-rich instructions every time.
|
||||
|
||||
## What's Inside the `PromptTemplates` Collection?
|
||||
|
||||
The `PromptTemplates` object is essentially a structured dictionary holding different template strings for different situations. The main ones are:
|
||||
|
||||
1. **`system_prompt`**: This is the **master instruction manual** given to the LLM at the very beginning of the conversation. It tells the LLM:
|
||||
* Its overall role or personality (e.g., "You are a helpful assistant that uses tools...").
|
||||
* The rules it must follow (e.g., "Always think step-by-step," "Use the `final_answer` tool when done.").
|
||||
* **Crucially, the descriptions of the available `{{ tools }}` and `{{ managed_agents }}` (if any).** This is how the LLM learns what capabilities the agent has!
|
||||
* The format it should use for its response (e.g., "Provide your reasoning in a 'Thought:' section and the action in a 'Code:' section").
|
||||
|
||||
2. **`planning`**: This group contains templates used only if the agent's planning feature is turned on (often for more complex tasks). It includes templates for:
|
||||
* Generating an initial plan based on the `{{ task }}` and `{{ tools }}`.
|
||||
* Updating the plan based on progress stored in memory.
|
||||
*(Planning is a bit more advanced, so we won't focus heavily on these templates here).*
|
||||
|
||||
3. **`final_answer`**: These templates are used in specific scenarios, like when the agent hits its maximum step limit (`max_steps`) and needs the LLM to try and generate a final answer based on the conversation history (`{{ task }}`, memory).
|
||||
|
||||
4. **`managed_agent`**: If you build agents that can call *other* agents (like team members), these templates define how the calling agent instructs the "managed" agent (`{{ name }}`, `{{ task }}`) and how the result (`{{ final_answer }}`) is reported back.
|
||||
|
||||
The most important one for understanding basic agent behavior is the **`system_prompt`**. It sets the stage for the entire interaction.
|
||||
|
||||
## How It Works: Filling in the Blanks with Jinja2
|
||||
|
||||
Let's imagine a simplified `system_prompt` template:
|
||||
|
||||
```jinja
|
||||
You are a helpful assistant.
|
||||
Your task is to achieve the goal described by the user.
|
||||
You have access to the following tools:
|
||||
{{ tools }}
|
||||
|
||||
Think step-by-step and then choose a tool to use or use the final_answer tool.
|
||||
```
|
||||
|
||||
Now, let's say our agent is created with a `SearchTool` and our `GreetingTool` from [Chapter 3: Tool](03_tool.md).
|
||||
|
||||
1. **Agent Starts:** The `MultiStepAgent` needs to prepare the initial message for the LLM.
|
||||
2. **Get Template:** It retrieves the `system_prompt` template string.
|
||||
3. **Get Data:** It gets the list of actual tool instances (`[SearchTool(...), GreetingTool(...)]`). It formats their names and descriptions into a string. Let's say this formatted string is:
|
||||
```
|
||||
- web_search: Searches the web...
|
||||
- greet_person: Greets a person by name...
|
||||
- final_answer: Use this when you have the final answer...
|
||||
```
|
||||
4. **Fill Blanks (Render):** It uses the Jinja2 engine to replace `{{ tools }}` in the template with the formatted tool descriptions.
|
||||
5. **Final Prompt:** The resulting prompt sent to the LLM would be:
|
||||
|
||||
```text
|
||||
You are a helpful assistant.
|
||||
Your task is to achieve the goal described by the user.
|
||||
You have access to the following tools:
|
||||
- web_search: Searches the web...
|
||||
- greet_person: Greets a person by name...
|
||||
- final_answer: Use this when you have the final answer...
|
||||
|
||||
Think step-by-step and then choose a tool to use or use the final_answer tool.
|
||||
```
|
||||
|
||||
This final, complete prompt gives the LLM all the context it needs to start working on the user's task.
|
||||
|
||||
Here's a diagram of the process:
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
A["Prompt Template String<br/>System Prompt with \{\{ tools \}\}"] --> C{Jinja2 Engine};
|
||||
B["Agent Data<br/>(Formatted Tool Descriptions)"] --> C;
|
||||
C --> D["Final Prompt String<br/>(System Prompt with actual tools listed)"];
|
||||
D --> E["LLM Brain"];
|
||||
```
|
||||
|
||||
The agent uses similar logic for other templates, inserting `{{ task }}`, snippets from [AgentMemory](04_agentmemory.md), etc., as needed.
|
||||
|
||||
## Using `PromptTemplates` in `SmolaAgents`
|
||||
|
||||
The good news is that `SmolaAgents` handles most of this automatically!
|
||||
|
||||
* **Defaults:** When you create an agent like `CodeAgent` or `ToolCallingAgent`, it comes pre-loaded with sophisticated default `PromptTemplates` tailored for that agent type. These defaults live in YAML files within the `SmolaAgents` library (e.g., `prompts/code_agent.yaml`, `prompts/toolcalling_agent.yaml`). These files define the `system_prompt`, `planning` prompts, etc., with all the necessary placeholders.
|
||||
|
||||
* **Automatic Loading:** The agent's `__init__` method loads these default templates unless you explicitly provide your own.
|
||||
|
||||
Let's look at a simplified snippet from `agents.py` showing how a `CodeAgent` might initialize its system prompt:
|
||||
|
||||
```python
|
||||
# --- File: agents.py (Simplified CodeAgent __init__ and initialize_system_prompt) ---
|
||||
import yaml
|
||||
import importlib.resources
|
||||
from .tools import Tool # From Chapter 3
|
||||
from .agents import MultiStepAgent, populate_template, PromptTemplates # Helper function
|
||||
|
||||
class CodeAgent(MultiStepAgent):
|
||||
def __init__(
|
||||
self,
|
||||
tools: list[Tool],
|
||||
model: callable,
|
||||
prompt_templates: PromptTemplates | None = None, # Allow custom templates
|
||||
# ... other parameters ...
|
||||
):
|
||||
# 1. Load default templates if none provided
|
||||
if prompt_templates is None:
|
||||
# Find the default 'code_agent.yaml' file
|
||||
default_yaml_path = importlib.resources.files("smolagents.prompts").joinpath("code_agent.yaml")
|
||||
# Load the templates from the YAML file
|
||||
prompt_templates = yaml.safe_load(default_yaml_path.read_text())
|
||||
|
||||
# 2. Call the parent class init, passing the templates along
|
||||
super().__init__(
|
||||
tools=tools,
|
||||
model=model,
|
||||
prompt_templates=prompt_templates, # Use loaded or provided templates
|
||||
# ... other parameters ...
|
||||
)
|
||||
# ... rest of CodeAgent setup ...
|
||||
# self.system_prompt is set later using initialize_system_prompt
|
||||
|
||||
def initialize_system_prompt(self) -> str:
|
||||
"""Creates the final system prompt string by filling the template."""
|
||||
# 3. Get necessary data (tools, managed agents, authorized imports)
|
||||
formatted_tools = # ... format self.tools for the template ...
|
||||
formatted_managed_agents = # ... format self.managed_agents ...
|
||||
authorized_imports = # ... get list of allowed imports for CodeAgent ...
|
||||
|
||||
# 4. Use the populate_template helper to fill in the blanks
|
||||
system_prompt_string = populate_template(
|
||||
template=self.prompt_templates["system_prompt"], # Get the template string
|
||||
variables={ # Provide the data for the placeholders
|
||||
"tools": formatted_tools,
|
||||
"managed_agents": formatted_managed_agents,
|
||||
"authorized_imports": authorized_imports,
|
||||
# ... other potential variables ...
|
||||
}
|
||||
)
|
||||
return system_prompt_string
|
||||
|
||||
# ... other CodeAgent methods ...
|
||||
|
||||
# --- Helper function used internally (Simplified from agents.py) ---
|
||||
from jinja2 import Template, StrictUndefined
|
||||
|
||||
def populate_template(template: str, variables: dict) -> str:
|
||||
"""Renders a Jinja2 template string with given variables."""
|
||||
compiled_template = Template(template, undefined=StrictUndefined)
|
||||
try:
|
||||
# This does the magic of replacing {{ placeholder }} with actual values
|
||||
return compiled_template.render(**variables)
|
||||
except Exception as e:
|
||||
raise Exception(f"Error rendering Jinja template: {e}")
|
||||
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. **Load Defaults:** If the user doesn't provide custom `prompt_templates` when creating a `CodeAgent`, it loads the defaults from the `code_agent.yaml` file.
|
||||
2. **Store Templates:** The loaded templates (either default or custom) are stored within the agent instance (via the `super().__init__` call).
|
||||
3. **Get Data:** When the agent needs the final system prompt (e.g., during `run`), the `initialize_system_prompt` method gathers the current list of tools, managed agents, etc.
|
||||
4. **Render Template:** It calls the `populate_template` helper function. This function uses Jinja2's `Template(...).render(...)` to take the `system_prompt` template string and the collected `variables` (tools, etc.) and produces the final, ready-to-use prompt string.
|
||||
|
||||
*For beginners, you usually don't need to write your own templates. The defaults are designed to work well.* However, understanding that these templates exist and how they work helps you understand *why* the agent behaves the way it does and how it knows about its tools.
|
||||
|
||||
If you *do* want to see what these templates look like, you can inspect the `.yaml` files inside the `smolagents/prompts/` directory in the library's source code. For example, here's a small part of a typical `system_prompt` for a `CodeAgent`:
|
||||
|
||||
```yaml
|
||||
# --- Snippet from prompts/code_agent.yaml ---
|
||||
system_prompt: |-
|
||||
You are an expert assistant who can solve any task using code blobs.
|
||||
# ... (lots of instructions and examples) ...
|
||||
|
||||
You only have access to these tools:
|
||||
{%- for tool in tools.values() %}
|
||||
- {{ tool.name }}: {{ tool.description }}
|
||||
Takes inputs: {{tool.inputs}}
|
||||
Returns an output of type: {{tool.output_type}}
|
||||
{%- endfor %}
|
||||
|
||||
{%- if managed_agents and managed_agents.values() | list %}
|
||||
You can also give tasks to team members.
|
||||
# ... (instructions for managed agents) ...
|
||||
{%- for agent in managed_agents.values() %}
|
||||
- {{ agent.name }}: {{ agent.description }}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
|
||||
Here are the rules you should always follow:
|
||||
# ... (more rules) ...
|
||||
You can use imports in your code, but only from the following list of modules: {{authorized_imports}}
|
||||
# ... (rest of the prompt) ...
|
||||
```
|
||||
You can see the `{{ tools }}`, `{{ managed_agents }}`, and `{{ authorized_imports }}` placeholders ready to be filled in. The `{%- for ... %}` syntax is Jinja2's way of looping through lists (like the list of tools).
|
||||
|
||||
## Conclusion
|
||||
|
||||
`PromptTemplates` are the unsung heroes that shape the conversation between the agent and its LLM brain. They act like customizable scripts or Mad Libs templates, ensuring the LLM receives clear, consistent instructions filled with the specific details it needs (like the task, available tools, and memory context).
|
||||
|
||||
You've learned:
|
||||
|
||||
* Why structured prompts are necessary for guiding LLMs effectively.
|
||||
* The "Mad Libs" analogy for `PromptTemplates`.
|
||||
* How Jinja2 is used to fill placeholders like `{{ task }}` and `{{ tools }}`.
|
||||
* The main types of prompts stored (`system_prompt`, `planning`, `final_answer`).
|
||||
* That `SmolaAgents` provides sensible default templates, especially the crucial `system_prompt`.
|
||||
* How the agent automatically renders these templates with current data before sending them to the LLM.
|
||||
|
||||
Understanding `PromptTemplates` helps you grasp how the agent frames its requests to the LLM. While you might stick to the defaults initially, knowing this mechanism exists opens the door to customizing agent behavior later on.
|
||||
|
||||
One of the most powerful tools often described in these prompts, especially for `CodeAgent`, is the ability to execute Python code. How is that done safely? Let's find out!
|
||||
|
||||
**Next Chapter:** [Chapter 6: PythonExecutor](06_pythonexecutor.md) - Running Code Safely.
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
368
docs/SmolaAgents/06_pythonexecutor.md
Normal file
368
docs/SmolaAgents/06_pythonexecutor.md
Normal file
@@ -0,0 +1,368 @@
|
||||
# Chapter 6: PythonExecutor - Running Code Safely
|
||||
|
||||
Welcome back! In [Chapter 5: PromptTemplates](05_prompttemplates.md), we saw how agents use templates to create clear instructions for their LLM brain. These instructions often involve asking the LLM to generate code, especially for agents like `CodeAgent`, which are designed to solve problems by writing and running Python.
|
||||
|
||||
But wait... running code generated by an AI? Isn't that risky? What if the AI generates code that tries to delete your files, access sensitive information, or just crashes?
|
||||
|
||||
This is a very valid concern! You wouldn't want an AI assistant to accidentally (or intentionally!) cause harm to your computer. We need a secure way to run this generated code.
|
||||
|
||||
This is exactly the problem the **`PythonExecutor`** solves!
|
||||
|
||||
## The Problem: Running Untrusted Code
|
||||
|
||||
Imagine you have a brilliant but slightly unpredictable scientist (the `CodeAgent`) who comes up with new experiments (Python code snippets) to solve problems. You want the results of these experiments, but you can't let the scientist run them directly in your main lab (your computer) because they might spill dangerous chemicals or break expensive equipment.
|
||||
|
||||
 ➡️ 🔥💻 (Danger!)
|
||||
|
||||
Directly executing AI-generated code is like letting that unpredictable scientist run wild. We need a controlled environment.
|
||||
|
||||
## The Solution: The Secure Laboratory (`PythonExecutor`)
|
||||
|
||||
The `PythonExecutor` acts like a **secure, isolated laboratory** or a **sandbox** for the code generated by the `CodeAgent`.
|
||||
|
||||
 <-> 👨🔬 CodeAgent
|
||||
|
||||
Think of it this way:
|
||||
|
||||
1. **Isolation:** The `PythonExecutor` creates a safe space, separate from your main system, where the code can run. If the code tries to do something harmful, the damage is contained within this sandbox and doesn't affect your computer.
|
||||
2. **Execution:** It takes the Python code snippet provided by the `CodeAgent` and runs it within this safe environment.
|
||||
3. **State Management:** Just like a real lab keeps track of ongoing experiments, the `PythonExecutor` can remember variables and the state *between* different code snippets run in sequence. If one snippet calculates `x = 5`, the next snippet run by the same executor will know the value of `x`.
|
||||
4. **Capture Results:** It carefully observes what happens inside the sandbox, capturing any output produced by the code (like results from `print()` statements) and the final result of the code snippet.
|
||||
5. **Handle Errors:** If the code crashes or produces an error, the `PythonExecutor` catches the error message instead of letting it crash the whole agent.
|
||||
|
||||
Essentially, the `PythonExecutor` allows the `CodeAgent` to "run experiments" safely and report back the findings (or failures) without endangering the outside world.
|
||||
|
||||
## How Does the `CodeAgent` Use It? (Mostly Automatic!)
|
||||
|
||||
For beginners, the great news is that the `CodeAgent` handles the `PythonExecutor` automatically! When you create a `CodeAgent`, it usually sets up a `PythonExecutor` behind the scenes.
|
||||
|
||||
```python
|
||||
# --- File: create_code_agent.py ---
|
||||
from smolagents import CodeAgent
|
||||
from smolagents.models import LiteLLMModel # From Chapter 2
|
||||
# Assume we have some tools defined, maybe a search tool
|
||||
from smolagents.tools import DuckDuckGoSearchTool
|
||||
|
||||
search_tool = DuckDuckGoSearchTool()
|
||||
|
||||
# Choose a language model
|
||||
llm = LiteLLMModel(model_id="gpt-4-turbo") # Needs API key setup
|
||||
|
||||
# Create the CodeAgent
|
||||
# It automatically creates a PythonExecutor internally!
|
||||
agent = CodeAgent(
|
||||
model=llm,
|
||||
tools=[search_tool],
|
||||
# By default, executor_type="local" is used
|
||||
)
|
||||
|
||||
print("CodeAgent created with an internal PythonExecutor.")
|
||||
|
||||
# Now, when you run the agent:
|
||||
# task = "Calculate the square root of 1764 and tell me the result."
|
||||
# result = agent.run(task)
|
||||
# print(f"Result: {result}")
|
||||
# --> The agent will generate code like "import math; result = math.sqrt(1764); final_answer(result)"
|
||||
# --> It will pass this code to its PythonExecutor to run safely.
|
||||
# --> The executor runs it, captures the result (42.0), and returns it to the agent.
|
||||
# --> The agent then uses the final_answer tool.
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
* When we create `CodeAgent`, we don't explicitly create a `PythonExecutor`. The `CodeAgent`'s initialization logic does this for us.
|
||||
* By default, it uses a `LocalPythonExecutor`, which runs the code in a restricted local environment.
|
||||
* When `agent.run()` is called, and the LLM generates Python code, the `CodeAgent` automatically passes that code to its internal `python_executor` instance for execution.
|
||||
|
||||
## Local vs. Remote Execution
|
||||
|
||||
`SmolaAgents` offers different types of executors for varying levels of security and environment needs:
|
||||
|
||||
1. **`LocalPythonExecutor` (Default):**
|
||||
* Runs the code within the same Python process as your agent, but uses clever techniques (like parsing the code's Abstract Syntax Tree - AST) to restrict dangerous operations (like file system access or arbitrary imports).
|
||||
* It's the simplest to set up (usually requires no extra installation).
|
||||
* It's generally safe for many tasks, but a very complex or malicious piece of code *might* potentially find ways around the restrictions (though this is difficult).
|
||||
|
||||
2. **`DockerExecutor`:**
|
||||
* Runs the code inside a separate Docker container. Docker provides strong isolation from your main system.
|
||||
* Requires Docker to be installed and running on your machine.
|
||||
* Offers better security than the local executor.
|
||||
|
||||
3. **`E2BExecutor` (Environment-to-Behavior):**
|
||||
* Uses a cloud service (E2B.dev) to provide secure, sandboxed cloud environments for code execution.
|
||||
* Requires an E2B account and API key.
|
||||
* Offers very strong security and avoids needing Docker locally, but relies on an external service.
|
||||
|
||||
**How to Choose?**
|
||||
|
||||
* **Beginners:** Stick with the default `LocalPythonExecutor`. It's usually sufficient and requires no extra setup.
|
||||
* **Need Higher Security:** If you're running potentially riskier code or need stronger guarantees, consider `DockerExecutor` (if you have Docker) or `E2BExecutor`.
|
||||
|
||||
You can specify the executor type when creating the `CodeAgent`:
|
||||
|
||||
```python
|
||||
# Example: Using a Docker executor (if Docker is installed and running)
|
||||
docker_agent = CodeAgent(
|
||||
model=llm,
|
||||
tools=[search_tool],
|
||||
executor_type="docker" # Tell the agent to use Docker
|
||||
# You might need to pass executor_kwargs for specific configurations
|
||||
)
|
||||
|
||||
# Example: Using E2B (requires E2B setup and API key in environment)
|
||||
# pip install 'smolagents[e2b]'
|
||||
e2b_agent = CodeAgent(
|
||||
model=llm,
|
||||
tools=[search_tool],
|
||||
executor_type="e2b" # Tell the agent to use E2B
|
||||
)
|
||||
```
|
||||
|
||||
For the rest of this chapter, we'll mostly focus on the concepts common to all executors, using the default `LocalPythonExecutor` as the main example.
|
||||
|
||||
## Under the Hood: How Execution Works
|
||||
|
||||
Let's trace what happens when `CodeAgent` decides to run a piece of code:
|
||||
|
||||
1. **Agent (Think):** The LLM generates a response containing Python code, like:
|
||||
```python
|
||||
# Thought: I need to calculate 5 * 10.
|
||||
result = 5 * 10
|
||||
print(f"The intermediate result is: {result}")
|
||||
final_answer(result)
|
||||
```
|
||||
2. **Agent (Act - Parse):** The `CodeAgent` extracts the Python code block.
|
||||
3. **Agent (Act - Execute):** The `CodeAgent` calls its `python_executor` instance, passing the code string. `output, logs, is_final = self.python_executor(code_string)`
|
||||
4. **Executor (Prepare):** The `PythonExecutor` (e.g., `LocalPythonExecutor`) gets ready. It knows the current state (variables defined in previous steps).
|
||||
5. **Executor (Run Safely):**
|
||||
* `LocalPythonExecutor`: Parses the code into an Abstract Syntax Tree (AST). It walks through the tree, evaluating allowed operations (math, variable assignments, safe function calls) and blocking dangerous ones (like `os.system`). It executes the code within the current `state`.
|
||||
* `DockerExecutor`/`E2BExecutor`: Sends the code to the remote environment (Docker container or E2B sandbox) for execution.
|
||||
6. **Executor (Capture):** It intercepts any output sent to `print()` (captured in `logs`) and gets the final value returned by the code block (if any, captured in `output`). It also checks if the special `final_answer()` function was called (indicated by `is_final`).
|
||||
7. **Executor (Update State):** If the code assigned variables (like `result = 50`), the executor updates its internal `state` dictionary.
|
||||
8. **Agent (Observe):** The `CodeAgent` receives the `output`, `logs`, and `is_final` flag from the executor. This becomes the "Observation" for the current step. If `is_final` is true, the agent knows the task is complete.
|
||||
|
||||
**Diagram:**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Agent as CodeAgent
|
||||
participant Executor as PythonExecutor (e.g., Local)
|
||||
participant SafeEnv as Safe Execution Env (AST walk / Docker / E2B)
|
||||
participant State as Executor State
|
||||
|
||||
Agent->>Executor: execute(code_string)
|
||||
Executor->>State: Get current variables
|
||||
Executor->>SafeEnv: Run code_string safely
|
||||
SafeEnv->>SafeEnv: Execute line by line (e.g., result = 5 * 10)
|
||||
SafeEnv-->>State: Update variable 'result' = 50
|
||||
SafeEnv->>Executor: Capture print() output ("The intermediate result is: 50")
|
||||
SafeEnv->>Executor: Capture final result (50)
|
||||
SafeEnv->>Executor: Indicate if final_answer() was called
|
||||
Executor-->>Agent: Return: output=50, logs="...", is_final=True
|
||||
```
|
||||
|
||||
## Code Glimpse: Where is the Executor Used?
|
||||
|
||||
Let's look at simplified snippets showing the key interactions.
|
||||
|
||||
* **`CodeAgent` Initialization (`agents.py`):** Creates the executor instance.
|
||||
|
||||
```python
|
||||
# --- File: agents.py (Simplified CodeAgent __init__) ---
|
||||
from .local_python_executor import LocalPythonExecutor, PythonExecutor
|
||||
from .remote_executors import DockerExecutor, E2BExecutor
|
||||
|
||||
class CodeAgent(MultiStepAgent):
|
||||
def __init__(
|
||||
self,
|
||||
# ... model, tools, etc. ...
|
||||
executor_type: str | None = "local", # Default is local
|
||||
executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
additional_authorized_imports: Optional[List[str]] = None,
|
||||
max_print_outputs_length: Optional[int] = None,
|
||||
# ... other kwargs ...
|
||||
):
|
||||
# ... setup basic agent parts ...
|
||||
self.executor_type = executor_type or "local"
|
||||
self.executor_kwargs = executor_kwargs or {}
|
||||
self.additional_authorized_imports = additional_authorized_imports or []
|
||||
self.max_print_outputs_length = max_print_outputs_length
|
||||
|
||||
# Create the appropriate executor instance based on type
|
||||
self.python_executor: PythonExecutor = self.create_python_executor()
|
||||
|
||||
# ... rest of setup ...
|
||||
# Send initial state/tools to executor if needed
|
||||
if getattr(self, "python_executor", None):
|
||||
self.python_executor.send_variables(variables=self.state)
|
||||
self.python_executor.send_tools({**self.tools, **self.managed_agents})
|
||||
|
||||
|
||||
def create_python_executor(self) -> PythonExecutor:
|
||||
"""Helper method to create the executor instance."""
|
||||
match self.executor_type:
|
||||
case "e2b":
|
||||
return E2BExecutor(self.additional_authorized_imports, self.logger, **self.executor_kwargs)
|
||||
case "docker":
|
||||
return DockerExecutor(self.additional_authorized_imports, self.logger, **self.executor_kwargs)
|
||||
case "local":
|
||||
return LocalPythonExecutor(
|
||||
self.additional_authorized_imports,
|
||||
max_print_outputs_length=self.max_print_outputs_length,
|
||||
)
|
||||
case _:
|
||||
raise ValueError(f"Unsupported executor type: {self.executor_type}")
|
||||
```
|
||||
* The `CodeAgent` takes `executor_type` and related arguments.
|
||||
* The `create_python_executor` method instantiates the correct class (`LocalPythonExecutor`, `DockerExecutor`, or `E2BExecutor`).
|
||||
* Initial tools and state might be sent to the executor using `send_tools` and `send_variables`.
|
||||
|
||||
* **`CodeAgent` Step Execution (`agents.py`):** Uses the executor instance.
|
||||
|
||||
```python
|
||||
# --- File: agents.py (Simplified CodeAgent step) ---
|
||||
from .utils import parse_code_blobs # Helper to extract code
|
||||
from .local_python_executor import fix_final_answer_code # Helper
|
||||
|
||||
class CodeAgent(MultiStepAgent):
|
||||
def step(self, memory_step: ActionStep) -> Union[None, Any]:
|
||||
# ... (Agent thinks, gets LLM response with code) ...
|
||||
model_output = chat_message.content
|
||||
|
||||
# Parse the code from the LLM response
|
||||
try:
|
||||
# parse_code_blobs finds ```python ... ``` blocks
|
||||
# fix_final_answer ensures `final_answer = x` becomes `final_answer(x)`
|
||||
code_action = fix_final_answer_code(parse_code_blobs(model_output))
|
||||
except Exception as e:
|
||||
# Handle parsing errors
|
||||
raise AgentParsingError(...)
|
||||
|
||||
# === Execute the code using the PythonExecutor ===
|
||||
self.logger.log_code(title="Executing parsed code:", content=code_action, ...)
|
||||
try:
|
||||
# THE CORE CALL to the executor
|
||||
output, execution_logs, is_final_answer = self.python_executor(code_action)
|
||||
|
||||
# Store results in memory step
|
||||
memory_step.observations = f"Execution logs:\n{execution_logs}\nLast output:\n{output}"
|
||||
memory_step.action_output = output
|
||||
|
||||
except Exception as e:
|
||||
# Handle execution errors reported by the executor
|
||||
raise AgentExecutionError(...)
|
||||
|
||||
# Return the output if it's the final answer, otherwise None
|
||||
return output if is_final_answer else None
|
||||
# ...
|
||||
```
|
||||
* The `step` method gets code from the LLM.
|
||||
* It calls `self.python_executor(code_action)`. This triggers the executor's `__call__` method.
|
||||
* It receives the `output`, `logs`, and `is_final_answer` flag back from the executor.
|
||||
|
||||
* **`LocalPythonExecutor` Execution (`local_python_executor.py`):** The core logic for local execution.
|
||||
|
||||
```python
|
||||
# --- File: local_python_executor.py (Simplified LocalPythonExecutor __call__) ---
|
||||
from .local_python_executor import evaluate_python_code # The safe evaluation function
|
||||
from .tools import Tool # For type hinting
|
||||
|
||||
class LocalPythonExecutor(PythonExecutor):
|
||||
def __init__(self, additional_authorized_imports, max_print_outputs_length=None):
|
||||
self.custom_tools = {} # Stores functions defined in code
|
||||
self.state = {} # Stores variables
|
||||
self.max_print_outputs_length = max_print_outputs_length or 50000
|
||||
self.additional_authorized_imports = additional_authorized_imports
|
||||
self.authorized_imports = # ... combine base and additional imports ...
|
||||
self.static_tools = None # Will hold agent tools + safe builtins
|
||||
|
||||
def __call__(self, code_action: str) -> Tuple[Any, str, bool]:
|
||||
"""Runs the code using the safe evaluate_python_code function."""
|
||||
output, is_final_answer = evaluate_python_code(
|
||||
code=code_action,
|
||||
static_tools=self.static_tools, # Tools provided by the agent
|
||||
custom_tools=self.custom_tools, # Functions defined during execution
|
||||
state=self.state, # Current variables
|
||||
authorized_imports=self.authorized_imports, # Allowed imports
|
||||
max_print_outputs_length=self.max_print_outputs_length,
|
||||
)
|
||||
# Get captured print logs from the state
|
||||
logs = str(self.state.get("_print_outputs", ""))
|
||||
return output, logs, is_final_answer
|
||||
|
||||
def send_variables(self, variables: dict):
|
||||
"""Adds external variables to the executor's state."""
|
||||
self.state.update(variables)
|
||||
|
||||
def send_tools(self, tools: Dict[str, Tool]):
|
||||
"""Makes agent tools available to the executed code."""
|
||||
# Combine agent tools with safe Python builtins (like len, str, math functions)
|
||||
from .local_python_executor import BASE_PYTHON_TOOLS
|
||||
self.static_tools = {**tools, **BASE_PYTHON_TOOLS.copy()}
|
||||
|
||||
# --- Also in local_python_executor.py ---
|
||||
def evaluate_python_code(code, static_tools, custom_tools, state, authorized_imports, ...):
|
||||
"""
|
||||
Safely evaluates code by parsing to AST and walking the tree.
|
||||
- Parses `code` string into an Abstract Syntax Tree (AST).
|
||||
- Initializes `state['_print_outputs']` to capture prints.
|
||||
- Defines a `final_answer` wrapper to signal completion.
|
||||
- Iterates through AST nodes using `evaluate_ast`.
|
||||
- `evaluate_ast` recursively handles different node types (assignments, calls, loops etc.)
|
||||
- It uses `state` to read/write variables.
|
||||
- It checks calls against `static_tools` and `custom_tools`.
|
||||
- It enforces `authorized_imports`.
|
||||
- It blocks dangerous operations (e.g., direct `eval`, certain imports).
|
||||
- Returns the final `result` and `is_final_answer` flag.
|
||||
- Captures print outputs in `state['_print_outputs']`.
|
||||
- Handles errors gracefully.
|
||||
"""
|
||||
# ... implementation details ...
|
||||
try:
|
||||
expression = ast.parse(code) # Parse code to AST
|
||||
# ... setup state, wrap final_answer ...
|
||||
for node in expression.body:
|
||||
result = evaluate_ast(node, state, static_tools, custom_tools, authorized_imports) # Evaluate node-by-node
|
||||
# ... capture logs, handle exceptions ...
|
||||
return result, is_final_answer
|
||||
except FinalAnswerException as e:
|
||||
# ... capture logs ...
|
||||
return e.value, True # Special exception for final_answer
|
||||
except Exception as e:
|
||||
# ... capture logs, wrap error ...
|
||||
raise InterpreterError(...)
|
||||
|
||||
def evaluate_ast(expression: ast.AST, state, static_tools, custom_tools, authorized_imports):
|
||||
"""Recursive function to evaluate a single AST node safely."""
|
||||
# ... checks node type (ast.Assign, ast.Call, ast.Import, etc.) ...
|
||||
# ... performs the corresponding safe operation using state and tools ...
|
||||
# ... raises InterpreterError for disallowed operations ...
|
||||
pass
|
||||
```
|
||||
* The `LocalPythonExecutor`'s `__call__` method relies heavily on `evaluate_python_code`.
|
||||
* `evaluate_python_code` parses the code into an AST and evaluates it node by node using `evaluate_ast`, maintaining `state` and respecting allowed `tools` and `authorized_imports`.
|
||||
* The `send_variables` and `send_tools` methods prepare the `state` and available functions for the executor.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `PythonExecutor` is a critical safety component in `SmolaAgents`, especially when using `CodeAgent`. It provides a secure sandbox (local or remote) to execute AI-generated Python code, preventing potential harm while still allowing the agent to leverage code for complex calculations, data manipulation, and interacting with tools.
|
||||
|
||||
You've learned:
|
||||
|
||||
* Why safe code execution is essential when dealing with AI-generated code.
|
||||
* The "secure laboratory" analogy for `PythonExecutor`.
|
||||
* Its key responsibilities: isolation, execution, state management, and capturing output/errors.
|
||||
* How `CodeAgent` uses it automatically (usually the `LocalPythonExecutor` by default).
|
||||
* The difference between `LocalPythonExecutor`, `DockerExecutor`, and `E2BExecutor`.
|
||||
* The basic flow of execution: Agent -> Executor -> Safe Environment -> State -> Executor -> Agent.
|
||||
* Where the executor is created and used within the `CodeAgent` code.
|
||||
|
||||
While you might not interact with the `PythonExecutor` directly very often as a beginner, understanding its role is crucial for trusting your agents and knowing how they perform code-based actions safely.
|
||||
|
||||
So far, we've seen `CodeAgent` and `ToolCallingAgent`. Are these the only types of agents? How can we define different agent behaviors?
|
||||
|
||||
**Next Chapter:** [Chapter 7: AgentType](07_agenttype.md) - Defining Agent Behaviors.
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
365
docs/SmolaAgents/07_agenttype.md
Normal file
365
docs/SmolaAgents/07_agenttype.md
Normal file
@@ -0,0 +1,365 @@
|
||||
# Chapter 7: AgentType - Handling More Than Just Text
|
||||
|
||||
Welcome back! In the previous chapters, especially when discussing [Tools](03_tool.md) and the [PythonExecutor](06_pythonexecutor.md), we saw how agents can perform actions and generate results. So far, we've mostly focused on text-based tasks and results.
|
||||
|
||||
But what happens when an agent needs to work with images, audio, or other types of data? For example:
|
||||
* An agent uses a tool to generate an image based on a description.
|
||||
* An agent uses a tool to transcribe an audio file into text.
|
||||
* An agent receives an image as input and needs to describe it.
|
||||
|
||||
How does the `SmolaAgents` framework handle these different kinds of data consistently? How does it make sure an image generated by a tool is displayed correctly in your notebook, or saved properly in the agent's [Memory](04_agentmemory.md)?
|
||||
|
||||
This is where the **`AgentType`** concept comes in!
|
||||
|
||||
## The Problem: Shipping Different Kinds of Cargo
|
||||
|
||||
Imagine you run a shipping company. Most of the time, you ship standard boxes (like text). But sometimes, customers need to ship different things:
|
||||
* Fresh produce that needs a refrigerated container (like audio data).
|
||||
* Large machinery that needs a flatbed truck (like image data).
|
||||
|
||||
You can't just stuff the fresh produce into a standard box – it would spoil! And the machinery won't even fit. You need specialized containers designed for specific types of cargo.
|
||||
|
||||
  
|
||||
|
||||
Similarly, our agents need a way to handle data beyond simple text strings. Using Python's built-in types directly (like a raw `PIL.Image` object for images) can cause problems:
|
||||
* **How do you display it?** A raw image object doesn't automatically show up as a picture in a Jupyter notebook.
|
||||
* **How do you save it?** How do you store an image or audio clip in the agent's text-based [Memory](04_agentmemory.md) log? You can't just put the raw image data there.
|
||||
* **How do you pass it around?** How does the framework ensure different components (tools, agent core, memory) know how to handle these different data types consistently?
|
||||
|
||||
## The Solution: Specialized Data Containers (`AgentType`)
|
||||
|
||||
`SmolaAgents` introduces special "data containers" to solve this problem. These are custom data types that inherit from a base `AgentType` class:
|
||||
|
||||
* **`AgentText`**: For handling plain text. It behaves just like a standard Python string.
|
||||
* **`AgentImage`**: For handling images (usually as `PIL.Image` objects).
|
||||
* **`AgentAudio`**: For handling audio data (often as `torch.Tensor` or file paths).
|
||||
|
||||
Think of these as the specialized shipping containers:
|
||||
|
||||
* `AgentText` is like the standard shipping box.
|
||||
* `AgentImage` is like a container designed to safely transport and display pictures.
|
||||
* `AgentAudio` is like a container designed to safely transport and play audio clips.
|
||||
|
||||
These `AgentType` objects **wrap** the actual data (the string, the image object, the audio data) but add extra capabilities.
|
||||
|
||||
## Why Use `AgentType`? (The Benefits)
|
||||
|
||||
Using these specialized containers gives us several advantages:
|
||||
|
||||
1. **Consistent Handling:** The `SmolaAgents` framework knows how to recognize and work with `AgentType` objects, regardless of whether they contain text, images, or audio.
|
||||
2. **Smart Display:** Objects like `AgentImage` and `AgentAudio` know how to display themselves correctly in environments like Jupyter notebooks or Gradio interfaces. For example, an `AgentImage` will automatically render as an image, not just print `<PIL.Image.Image ...>`.
|
||||
3. **Proper Serialization:** They know how to convert themselves into a string representation suitable for logging or storing in [Memory](04_agentmemory.md).
|
||||
* `AgentText` simply returns its string content.
|
||||
* `AgentImage` automatically saves the image to a temporary file and returns the *path* to that file when converted to a string (`to_string()` method). This path can be safely logged.
|
||||
* `AgentAudio` does something similar for audio data, saving it to a temporary `.wav` file.
|
||||
4. **Clear Communication:** Tools can clearly state what type of output they produce (e.g., `output_type="image"`), and the framework ensures the output is wrapped correctly.
|
||||
|
||||
## How is `AgentType` Used? (Mostly Automatic!)
|
||||
|
||||
The best part is that you often don't need to manually create or handle these `AgentType` objects. The framework does the heavy lifting.
|
||||
|
||||
**Scenario 1: A Tool Returning an Image**
|
||||
|
||||
Imagine you have a tool that generates images using a library like `diffusers`.
|
||||
|
||||
```python
|
||||
# --- File: image_tool.py ---
|
||||
from smolagents import Tool
|
||||
from PIL import Image
|
||||
# Assume 'diffusion_pipeline' is a pre-loaded image generation model
|
||||
# from diffusers import DiffusionPipeline
|
||||
# diffusion_pipeline = DiffusionPipeline.from_pretrained(...)
|
||||
|
||||
class ImageGeneratorTool(Tool):
|
||||
name: str = "image_generator"
|
||||
description: str = "Generates an image based on a text prompt."
|
||||
inputs: dict = {
|
||||
"prompt": {
|
||||
"type": "string",
|
||||
"description": "The text description for the image."
|
||||
}
|
||||
}
|
||||
# Tell the framework this tool outputs an image!
|
||||
output_type: str = "image" # <--- Crucial Hint!
|
||||
|
||||
def forward(self, prompt: str) -> Image.Image:
|
||||
"""Generates the image using a diffusion model."""
|
||||
print(f"--- ImageGeneratorTool generating image for: '{prompt}' ---")
|
||||
# image = diffusion_pipeline(prompt).images[0] # Actual generation
|
||||
# For simplicity, let's create a dummy blank image
|
||||
image = Image.new('RGB', (60, 30), color = 'red')
|
||||
print(f"--- Tool returning a PIL Image object ---")
|
||||
return image
|
||||
|
||||
# --- How the framework uses it (conceptual) ---
|
||||
image_tool = ImageGeneratorTool()
|
||||
prompt = "A red rectangle"
|
||||
raw_output = image_tool(prompt=prompt) # Calls forward(), gets a PIL.Image object
|
||||
|
||||
# Framework automatically wraps the output because output_type="image"
|
||||
# Uses handle_agent_output_types(raw_output, output_type="image")
|
||||
from smolagents.agent_types import handle_agent_output_types
|
||||
wrapped_output = handle_agent_output_types(raw_output, output_type="image")
|
||||
|
||||
print(f"Raw output type: {type(raw_output)}")
|
||||
print(f"Wrapped output type: {type(wrapped_output)}")
|
||||
|
||||
# When storing in memory or logging, the framework calls to_string()
|
||||
output_string = wrapped_output.to_string()
|
||||
print(f"String representation for logs: {output_string}")
|
||||
|
||||
# Expected Output (path will vary):
|
||||
# --- ImageGeneratorTool generating image for: 'A red rectangle' ---
|
||||
# --- Tool returning a PIL Image object ---
|
||||
# Raw output type: <class 'PIL.Image.Image'>
|
||||
# Wrapped output type: <class 'smolagents.agent_types.AgentImage'>
|
||||
# String representation for logs: /tmp/tmpxxxxxx/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.png
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. We define `ImageGeneratorTool` and crucially set `output_type="image"`.
|
||||
2. The `forward` method does its work and returns a standard `PIL.Image.Image` object.
|
||||
3. When the agent framework receives this output, it checks the tool's `output_type`. Since it's `"image"`, it automatically uses the `handle_agent_output_types` function (or similar internal logic) to wrap the `PIL.Image.Image` object inside an `AgentImage` container.
|
||||
4. If this `AgentImage` needs to be logged or stored in [Memory](04_agentmemory.md), the framework calls its `to_string()` method, which saves the image to a temporary file and returns the file path.
|
||||
|
||||
**Scenario 2: Passing an `AgentType` to a Tool**
|
||||
|
||||
What if an `AgentImage` object (maybe retrieved from memory or state) needs to be passed *into* another tool, perhaps one that analyzes images?
|
||||
|
||||
```python
|
||||
# --- File: image_analyzer_tool.py ---
|
||||
from smolagents import Tool
|
||||
from PIL import Image
|
||||
from smolagents.agent_types import AgentImage, handle_agent_input_types
|
||||
|
||||
class ImageAnalyzerTool(Tool):
|
||||
name: str = "image_analyzer"
|
||||
description: str = "Analyzes an image and returns its dimensions."
|
||||
inputs: dict = {
|
||||
"input_image": {
|
||||
"type": "image", # Expects an image type
|
||||
"description": "The image to analyze."
|
||||
}
|
||||
}
|
||||
output_type: str = "string"
|
||||
|
||||
def forward(self, input_image: Image.Image) -> str:
|
||||
"""Analyzes the image."""
|
||||
# IMPORTANT: input_image here is ALREADY the raw PIL.Image object!
|
||||
print(f"--- ImageAnalyzerTool received image of type: {type(input_image)} ---")
|
||||
width, height = input_image.size
|
||||
return f"Image dimensions are {width}x{height}."
|
||||
|
||||
# --- How the framework uses it (conceptual) ---
|
||||
analyzer_tool = ImageAnalyzerTool()
|
||||
|
||||
# Let's pretend 'agent_image_object' is an AgentImage retrieved from memory
|
||||
# (It wraps a red PIL.Image.Image object like the one from Scenario 1)
|
||||
agent_image_object = AgentImage(Image.new('RGB', (60, 30), color = 'red'))
|
||||
print(f"Input object type: {type(agent_image_object)}")
|
||||
|
||||
# Framework automatically unwraps the input before calling 'forward'
|
||||
# Uses handle_agent_input_types(input_image=agent_image_object)
|
||||
# args_tuple, kwargs_dict = handle_agent_input_types(input_image=agent_image_object)
|
||||
# result = analyzer_tool.forward(**kwargs_dict) # Simplified conceptual call
|
||||
|
||||
# Simulate the unwrapping and call:
|
||||
raw_image = agent_image_object.to_raw() # Get the underlying PIL Image
|
||||
result = analyzer_tool.forward(input_image=raw_image)
|
||||
|
||||
print(f"Analysis result: {result}")
|
||||
|
||||
# Expected Output:
|
||||
# Input object type: <class 'smolagents.agent_types.AgentImage'>
|
||||
# --- ImageAnalyzerTool received image of type: <class 'PIL.Image.Image'> ---
|
||||
# Analysis result: Image dimensions are 60x30.
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. `ImageAnalyzerTool` defines its input `input_image` as type `"image"`. Its `forward` method expects a standard `PIL.Image.Image`.
|
||||
2. We have an `AgentImage` object (maybe from a previous step).
|
||||
3. When the framework prepares to call `analyzer_tool.forward`, it sees that the input `agent_image_object` is an `AgentType`. It uses `handle_agent_input_types` (or similar logic) to automatically call the `.to_raw()` method on `agent_image_object`.
|
||||
4. This `to_raw()` method extracts the underlying `PIL.Image.Image` object.
|
||||
5. The framework passes this *raw* image object to the `forward` method. The tool developer doesn't need to worry about unwrapping the `AgentType` inside their tool logic.
|
||||
|
||||
## Under the Hood: A Peek at the Code
|
||||
|
||||
Let's look at simplified versions of the `AgentType` classes and helper functions from `agent_types.py`.
|
||||
|
||||
* **Base `AgentType` Class:**
|
||||
|
||||
```python
|
||||
# --- File: agent_types.py (Simplified AgentType) ---
|
||||
import logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class AgentType:
|
||||
"""Abstract base class for custom agent data types."""
|
||||
def __init__(self, value):
|
||||
# Stores the actual data (string, PIL Image, etc.)
|
||||
self._value = value
|
||||
|
||||
def __str__(self):
|
||||
# Default string conversion uses the to_string method
|
||||
return self.to_string()
|
||||
|
||||
def to_raw(self):
|
||||
"""Returns the underlying raw Python object."""
|
||||
logger.error("to_raw() called on base AgentType!")
|
||||
return self._value
|
||||
|
||||
def to_string(self) -> str:
|
||||
"""Returns a string representation suitable for logging/memory."""
|
||||
logger.error("to_string() called on base AgentType!")
|
||||
return str(self._value)
|
||||
|
||||
# Other potential common methods...
|
||||
```
|
||||
* It holds the original `_value`.
|
||||
* Defines the basic methods `to_raw` and `to_string` that subclasses will implement properly.
|
||||
|
||||
* **`AgentImage` Implementation:**
|
||||
|
||||
```python
|
||||
# --- File: agent_types.py (Simplified AgentImage) ---
|
||||
import PIL.Image
|
||||
import os
|
||||
import tempfile
|
||||
import uuid
|
||||
from io import BytesIO
|
||||
|
||||
class AgentImage(AgentType): # Doesn't inherit from PIL.Image directly in reality, but conceptually similar
|
||||
"""Handles image data, behaving like a PIL.Image."""
|
||||
|
||||
def __init__(self, value):
|
||||
# value can be PIL.Image, path string, bytes, etc.
|
||||
AgentType.__init__(self, value) # Store original value form
|
||||
self._raw_image = None # To store the loaded PIL Image
|
||||
self._path = None # To store the path if saved to temp file
|
||||
|
||||
# Logic to load image from different input types (simplified)
|
||||
if isinstance(value, PIL.Image.Image):
|
||||
self._raw_image = value
|
||||
elif isinstance(value, (str, os.PathLike)):
|
||||
# We might load it lazily later in to_raw()
|
||||
self._path = str(value) # Assume it's already a path
|
||||
# In reality, it loads here if path exists
|
||||
elif isinstance(value, bytes):
|
||||
self._raw_image = PIL.Image.open(BytesIO(value))
|
||||
# ... (handle tensors, etc.) ...
|
||||
else:
|
||||
raise TypeError(f"Unsupported type for AgentImage: {type(value)}")
|
||||
|
||||
|
||||
def to_raw(self) -> PIL.Image.Image:
|
||||
"""Returns the raw PIL.Image.Image object."""
|
||||
if self._raw_image is None:
|
||||
# Lazy loading if initialized with a path
|
||||
if self._path and os.path.exists(self._path):
|
||||
self._raw_image = PIL.Image.open(self._path)
|
||||
else:
|
||||
# Handle error or create placeholder
|
||||
raise ValueError("Cannot get raw image data.")
|
||||
return self._raw_image
|
||||
|
||||
def to_string(self) -> str:
|
||||
"""Saves image to temp file (if needed) and returns the path."""
|
||||
if self._path and os.path.exists(self._path):
|
||||
# Already have a path (e.g., loaded from file initially)
|
||||
return self._path
|
||||
|
||||
# Need to save the raw image data to a temp file
|
||||
raw_img = self.to_raw() # Ensure image is loaded
|
||||
directory = tempfile.mkdtemp()
|
||||
# Generate a unique filename
|
||||
self._path = os.path.join(directory, str(uuid.uuid4()) + ".png")
|
||||
raw_img.save(self._path, format="png")
|
||||
print(f"--- AgentImage saved to temp file: {self._path} ---")
|
||||
return self._path
|
||||
|
||||
def _ipython_display_(self):
|
||||
"""Special method for display in Jupyter/IPython."""
|
||||
from IPython.display import display
|
||||
display(self.to_raw()) # Display the raw PIL image
|
||||
|
||||
# We can also make AgentImage behave like PIL.Image by delegating methods
|
||||
# (e.g., using __getattr__ or explicit wrappers)
|
||||
@property
|
||||
def size(self):
|
||||
return self.to_raw().size
|
||||
|
||||
def save(self, *args, **kwargs):
|
||||
self.to_raw().save(*args, **kwargs)
|
||||
|
||||
# ... other PIL.Image methods ...
|
||||
```
|
||||
* It can be initialized with various image sources (PIL object, path, bytes).
|
||||
* `to_raw()` ensures a PIL Image object is returned, loading from disk if necessary.
|
||||
* `to_string()` saves the image to a temporary PNG file if it doesn't already have a path, and returns that path.
|
||||
* `_ipython_display_` allows Jupyter notebooks to automatically display the image.
|
||||
* It can delegate common image methods (like `.size`, `.save`) to the underlying raw image.
|
||||
|
||||
* **Helper Functions (Conceptual):**
|
||||
|
||||
```python
|
||||
# --- File: agent_types.py / agents.py (Simplified Helpers) ---
|
||||
|
||||
# Mapping from type name string to AgentType class
|
||||
_AGENT_TYPE_MAPPING = {"string": AgentText, "image": AgentImage, "audio": AgentAudio}
|
||||
|
||||
def handle_agent_output_types(output: Any, output_type: Optional[str] = None) -> Any:
|
||||
"""Wraps raw output into an AgentType if needed."""
|
||||
if output_type in _AGENT_TYPE_MAPPING:
|
||||
# If the tool explicitly defines output type (e.g., "image")
|
||||
wrapper_class = _AGENT_TYPE_MAPPING[output_type]
|
||||
return wrapper_class(output)
|
||||
else:
|
||||
# If no type defined, try to guess based on Python type (optional)
|
||||
if isinstance(output, str):
|
||||
return AgentText(output)
|
||||
if isinstance(output, PIL.Image.Image):
|
||||
return AgentImage(output)
|
||||
# ... add checks for audio tensors etc. ...
|
||||
|
||||
# Otherwise, return the output as is
|
||||
return output
|
||||
|
||||
def handle_agent_input_types(*args, **kwargs) -> tuple[list, dict]:
|
||||
"""Unwraps AgentType inputs into raw types before passing to a tool."""
|
||||
processed_args = []
|
||||
for arg in args:
|
||||
# If it's an AgentType instance, call to_raw(), otherwise keep as is
|
||||
processed_args.append(arg.to_raw() if isinstance(arg, AgentType) else arg)
|
||||
|
||||
processed_kwargs = {}
|
||||
for key, value in kwargs.items():
|
||||
processed_kwargs[key] = value.to_raw() if isinstance(value, AgentType) else value
|
||||
|
||||
return tuple(processed_args), processed_kwargs
|
||||
```
|
||||
* `handle_agent_output_types` checks the tool's `output_type` or the actual Python type of the output and wraps it in the corresponding `AgentType` class (e.g., `AgentImage`).
|
||||
* `handle_agent_input_types` iterates through arguments, checks if any are `AgentType` instances, and calls `.to_raw()` on them to get the underlying data before the tool's `forward` method is called.
|
||||
|
||||
## Conclusion
|
||||
|
||||
`AgentType` (`AgentText`, `AgentImage`, `AgentAudio`) provides a crucial layer for handling diverse data types within the `SmolaAgents` framework. They act as specialized containers that ensure non-text data can be consistently processed, displayed correctly (especially in notebooks), and serialized appropriately for logging and memory.
|
||||
|
||||
You've learned:
|
||||
|
||||
* Why standard Python types aren't always enough for agent inputs/outputs.
|
||||
* The "specialized shipping container" analogy for `AgentType`.
|
||||
* The benefits: consistent handling, smart display, and proper serialization (like saving images/audio to temp files).
|
||||
* How the framework automatically wraps tool outputs (`handle_agent_output_types`) and unwraps tool inputs (`handle_agent_input_types`).
|
||||
* Seen simplified code examples for `AgentImage` and the helper functions.
|
||||
|
||||
By using `AgentType`, `SmolaAgents` makes it much easier to build agents that can work seamlessly with multi-modal data like images and audio, without you having to manually handle the complexities of display and serialization in most cases.
|
||||
|
||||
Now that we understand how agents handle different data types, how can we keep track of everything the agent is doing, monitor its performance, and debug issues?
|
||||
|
||||
**Next Chapter:** [Chapter 8: AgentLogger & Monitor](08_agentlogger___monitor.md) - Observing Your Agent in Action.
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
364
docs/SmolaAgents/08_agentlogger___monitor.md
Normal file
364
docs/SmolaAgents/08_agentlogger___monitor.md
Normal file
@@ -0,0 +1,364 @@
|
||||
# Chapter 8: AgentLogger & Monitor - Observing Your Agent in Action
|
||||
|
||||
Welcome to the final chapter of the SmolaAgents tutorial! In [Chapter 7: AgentType](07_agenttype.md), we saw how `SmolaAgents` handles different kinds of data like text, images, and audio using specialized containers. Now that our agent can perform complex tasks ([Chapter 1: MultiStepAgent](01_multistepagent.md)), use various [Tools](03_tool.md), remember its progress ([Chapter 4: AgentMemory](04_agentmemory.md)), and even handle diverse data types, a new question arises: **How do we actually see what the agent is doing?**
|
||||
|
||||
What if the agent gets stuck in a loop? What if it uses the wrong tool or gives an unexpected answer? How can we peek inside its "mind" to understand its reasoning, track its actions, and maybe figure out what went wrong or how well it's performing?
|
||||
|
||||
## The Problem: Flying Blind
|
||||
|
||||
Imagine driving a car with no dashboard. You wouldn't know your speed, fuel level, or if the engine was overheating. You'd be driving blind! Or imagine an airplane without its "black box" flight recorder – after an incident, it would be much harder to understand what happened.
|
||||
|
||||
 ❓❓❓
|
||||
|
||||
Running an AI agent without visibility is similar. Without seeing its internal steps, thoughts, and actions, debugging problems or understanding its behavior becomes incredibly difficult. We need a way to observe the agent in real-time and record its performance.
|
||||
|
||||
## The Solution: The Dashboard (`AgentLogger`) and Black Box (`Monitor`)
|
||||
|
||||
`SmolaAgents` provides two key components to give you this visibility:
|
||||
|
||||
1. **`AgentLogger` (The Dashboard):** This component provides **structured, real-time logging** of the agent's activities directly to your console (or wherever you run your Python script). It uses a library called `rich` to display colorful, formatted output, making it easy to follow:
|
||||
* Which step the agent is on.
|
||||
* The LLM's thoughts and the action it plans to take.
|
||||
* Which [Tool](03_tool.md) is being called and with what arguments.
|
||||
* The results (observations) from the tool.
|
||||
* Any errors encountered.
|
||||
It's like watching the car's speedometer, fuel gauge, and warning lights as you drive.
|
||||
|
||||
2. **`Monitor` (The Black Box):** This component works quietly in the background, **tracking key performance metrics** during the agent's run. It records data like:
|
||||
* How long each step took (duration).
|
||||
* How many tokens the LLM used for input and output (if the [Model Interface](02_model_interface.md) provides this).
|
||||
This data isn't usually displayed as prominently as the logger's output but is stored and can be used later for analysis, cost calculation, or identifying performance bottlenecks. It's like the airplane's flight data recorder.
|
||||
|
||||
Both `AgentLogger` and `Monitor` are automatically set up and used by the `MultiStepAgent`, making observation easy!
|
||||
|
||||
## `AgentLogger`: Your Real-Time Dashboard
|
||||
|
||||
The `AgentLogger` is your primary window into the agent's live execution. It makes the **Think -> Act -> Observe** cycle visible.
|
||||
|
||||
**How It's Used (Automatic!)**
|
||||
|
||||
When you create a `MultiStepAgent`, it automatically creates an `AgentLogger` instance, usually stored in `self.logger`. Throughout the agent's `run` process, various methods within the agent call `self.logger` to print information:
|
||||
|
||||
* `agent.run()` calls `self.logger.log_task()` to show the initial task.
|
||||
* `agent._execute_step()` calls `self.logger.log_rule()` to mark the beginning of a new step.
|
||||
* If the agent uses code (like `CodeAgent`), it calls `self.logger.log_code()` to show the code being executed.
|
||||
* It logs tool calls using `self.logger.log()`.
|
||||
* It logs observations using `self.logger.log()`.
|
||||
* It logs errors using `self.logger.log_error()`.
|
||||
* It logs the final answer using `self.logger.log()`.
|
||||
|
||||
**Example Output (Simulated)**
|
||||
|
||||
The `AgentLogger` uses `rich` to make the output colorful and easy to read. Here’s a simplified idea of what you might see in your console for our "Capital and Weather" example:
|
||||
|
||||
```console
|
||||
╭─[bold] New run ─ ToolCallingAgent [/bold]────────────────────────────────╮
|
||||
│ │
|
||||
│ [bold]What is the capital of France, and what is its current weather?[/bold] │
|
||||
│ │
|
||||
╰────────────────────────── LiteLLMModel - gpt-3.5-turbo ─╯
|
||||
|
||||
━━━[bold] Step 1 [/bold]━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
INFO ╭─ Thinking... ───────────────────────────────────────────────────╮
|
||||
INFO │ Thought: The user wants the capital of France and its weather.│
|
||||
INFO │ First, I need to find the capital. I can use the search tool. │
|
||||
INFO ╰─────────────────────────────────────────────────────────────────╯
|
||||
INFO Panel(Text("Calling tool: 'search' with arguments: {'query': 'Capital of France'}"))
|
||||
INFO Observations: Paris
|
||||
DEBUG [Step 1: Duration 1.52 seconds| Input tokens: 150 | Output tokens: 50]
|
||||
|
||||
━━━[bold] Step 2 [/bold]━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
INFO ╭─ Thinking... ───────────────────────────────────────────────────╮
|
||||
INFO │ Thought: I have the capital, which is Paris. Now I need the │
|
||||
INFO │ weather for Paris. I can use the weather tool. │
|
||||
INFO ╰─────────────────────────────────────────────────────────────────╯
|
||||
INFO Panel(Text("Calling tool: 'weather' with arguments: {'location': 'Paris'}"))
|
||||
INFO Observations: Sunny, 25°C
|
||||
DEBUG [Step 2: Duration 1.81 seconds| Input tokens: 210 | Output tokens: 105]
|
||||
|
||||
━━━[bold] Step 3 [/bold]━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
INFO ╭─ Thinking... ───────────────────────────────────────────────────╮
|
||||
INFO │ Thought: I have both the capital (Paris) and the weather │
|
||||
INFO │ (Sunny, 25°C). I have fulfilled the user's request. I should │
|
||||
INFO │ use the final_answer tool. │
|
||||
INFO ╰─────────────────────────────────────────────────────────────────╯
|
||||
INFO Panel(Text("Calling tool: 'final_answer' with arguments: {'answer': 'The capital of France is Paris, and the current weather there is Sunny, 25°C.'}"))
|
||||
INFO [bold #d4b702]Final answer:[/bold #d4b702] The capital of France is Paris, and the current weather there is Sunny, 25°C.
|
||||
DEBUG [Step 3: Duration 1.25 seconds| Input tokens: 280 | Output tokens: 170]
|
||||
```
|
||||
|
||||
*(Note: This is a conceptual representation. The exact formatting, colors, and details might vary. The "Thinking..." part is simulated; the logger typically shows the raw model output or parsed action.)*
|
||||
|
||||
**Log Levels**
|
||||
|
||||
You can control how much detail the logger shows using the `verbosity_level` parameter when creating the agent:
|
||||
|
||||
* `LogLevel.INFO` (Default): Shows the main steps, tool calls, observations, final answer, and errors. Good for general use.
|
||||
* `LogLevel.DEBUG`: Shows everything `INFO` shows, plus the detailed LLM inputs/outputs and performance metrics from the `Monitor`. Useful for deep debugging.
|
||||
* `LogLevel.ERROR`: Only shows critical error messages.
|
||||
* `LogLevel.OFF`: Shows nothing.
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent
|
||||
from smolagents.models import LiteLLMModel
|
||||
from smolagents.monitoring import LogLevel # Import LogLevel
|
||||
|
||||
llm = LiteLLMModel(model_id="gpt-3.5-turbo")
|
||||
|
||||
# Create an agent with DEBUG level logging
|
||||
agent_debug = CodeAgent(
|
||||
model=llm,
|
||||
tools=[],
|
||||
verbosity_level=LogLevel.DEBUG # Set the level here
|
||||
)
|
||||
|
||||
# This agent will print more detailed logs when run
|
||||
# agent_debug.run("What is 2+2?")
|
||||
```
|
||||
|
||||
**Code Glimpse (`monitoring.py` and `agents.py`)**
|
||||
|
||||
* **`AgentLogger` Class:** It uses the `rich.console.Console` to print formatted output based on the log level.
|
||||
|
||||
```python
|
||||
# --- File: monitoring.py (Simplified AgentLogger) ---
|
||||
from enum import IntEnum
|
||||
from rich.console import Console
|
||||
from rich.panel import Panel
|
||||
from rich.syntax import Syntax
|
||||
from rich.rule import Rule
|
||||
# ... other rich imports ...
|
||||
|
||||
class LogLevel(IntEnum):
|
||||
OFF = -1
|
||||
ERROR = 0
|
||||
INFO = 1
|
||||
DEBUG = 2
|
||||
|
||||
YELLOW_HEX = "#d4b702" # Used for styling
|
||||
|
||||
class AgentLogger:
|
||||
def __init__(self, level: LogLevel = LogLevel.INFO):
|
||||
self.level = level
|
||||
# The core object from the 'rich' library for printing
|
||||
self.console = Console()
|
||||
|
||||
def log(self, *args, level: LogLevel = LogLevel.INFO, **kwargs):
|
||||
"""Logs a message if the level is sufficient."""
|
||||
if level <= self.level:
|
||||
self.console.print(*args, **kwargs)
|
||||
|
||||
def log_error(self, error_message: str):
|
||||
"""Logs an error message."""
|
||||
self.log(error_message, style="bold red", level=LogLevel.ERROR)
|
||||
|
||||
def log_code(self, title: str, content: str, level: LogLevel = LogLevel.INFO):
|
||||
"""Logs a Python code block with syntax highlighting."""
|
||||
self.log(
|
||||
Panel(Syntax(content, lexer="python", ...), title=title, ...),
|
||||
level=level
|
||||
)
|
||||
|
||||
def log_rule(self, title: str, level: LogLevel = LogLevel.INFO):
|
||||
"""Logs a horizontal rule separator."""
|
||||
self.log(Rule("[bold]" + title, style=YELLOW_HEX), level=level)
|
||||
|
||||
def log_task(self, content: str, subtitle: str, title: Optional[str] = None, level: LogLevel = LogLevel.INFO):
|
||||
"""Logs the initial task."""
|
||||
self.log(Panel(f"\n[bold]{content}\n", title=title, subtitle=subtitle, ...), level=level)
|
||||
|
||||
# ... other helper methods for specific formatting ...
|
||||
```
|
||||
|
||||
* **Agent Using the Logger:** The `MultiStepAgent` calls `self.logger` methods.
|
||||
|
||||
```python
|
||||
# --- File: agents.py (Simplified Agent using Logger) ---
|
||||
from .monitoring import AgentLogger, LogLevel
|
||||
|
||||
class MultiStepAgent:
|
||||
def __init__(self, ..., verbosity_level: LogLevel = LogLevel.INFO):
|
||||
# ... other setup ...
|
||||
self.logger = AgentLogger(level=verbosity_level)
|
||||
# ...
|
||||
|
||||
def run(self, task: str, ...):
|
||||
# ...
|
||||
self.logger.log_task(content=self.task, ..., level=LogLevel.INFO)
|
||||
# ... call _run ...
|
||||
|
||||
def _execute_step(self, task: str, memory_step: ActionStep):
|
||||
self.logger.log_rule(f"Step {self.step_number}", level=LogLevel.INFO)
|
||||
try:
|
||||
# ... (Think phase: LLM call) ...
|
||||
|
||||
# ... (Act phase: Execute tool/code) ...
|
||||
# Example for CodeAgent:
|
||||
# self.logger.log_code("Executing code:", code_action, level=LogLevel.INFO)
|
||||
# observation = self.python_executor(code_action)
|
||||
|
||||
# Example for ToolCallingAgent:
|
||||
# self.logger.log(Panel(f"Calling tool: '{tool_name}' ..."), level=LogLevel.INFO)
|
||||
# observation = self.execute_tool_call(tool_name, arguments)
|
||||
|
||||
# ... (Observe phase) ...
|
||||
self.logger.log(f"Observations: {observation}", level=LogLevel.INFO)
|
||||
|
||||
# ... (Handle final answer) ...
|
||||
# if final_answer:
|
||||
# self.logger.log(f"Final answer: {final_answer}", style=f"bold {YELLOW_HEX}", level=LogLevel.INFO)
|
||||
|
||||
except AgentError as e:
|
||||
# Log errors using the logger's error method
|
||||
action_step.error = e # Store error in memory
|
||||
self.logger.log_error(f"Error in step {self.step_number}: {e}") # Display error
|
||||
|
||||
# ...
|
||||
```
|
||||
|
||||
## `Monitor`: Your Performance Black Box
|
||||
|
||||
While the `AgentLogger` shows you *what* the agent is doing, the `Monitor` tracks *how well* it's doing it in terms of performance.
|
||||
|
||||
**How It's Used (Automatic!)**
|
||||
|
||||
The `MultiStepAgent` also creates a `Monitor` instance (`self.monitor`). The monitor's main job is done via its `update_metrics` method. This method is automatically added to a list of `step_callbacks` in the agent. At the end of every single step, the agent calls all functions in `step_callbacks`, including `self.monitor.update_metrics`.
|
||||
|
||||
Inside `update_metrics`, the monitor:
|
||||
1. Accesses the `ActionStep` object for the just-completed step from [AgentMemory](04_agentmemory.md).
|
||||
2. Reads the `duration` recorded in the `ActionStep`.
|
||||
3. Accesses the agent's [Model Interface](02_model_interface.md) (`self.tracked_model`) to get the token counts (`last_input_token_count`, `last_output_token_count`) for the LLM call made during that step (if available).
|
||||
4. Updates its internal totals (e.g., `total_input_token_count`).
|
||||
5. Uses the `AgentLogger` (passed during initialization) to print these metrics, but typically only at the `DEBUG` log level, so they don't clutter the default `INFO` output.
|
||||
|
||||
**Example Output (at `DEBUG` level)**
|
||||
|
||||
If you run the agent with `verbosity_level=LogLevel.DEBUG`, you'll see the monitor's output added at the end of each step log:
|
||||
|
||||
```console
|
||||
[...]
|
||||
INFO Observations: Paris
|
||||
DEBUG [Step 1: Duration 1.52 seconds| Input tokens: 150 | Output tokens: 50] # <-- Monitor Output
|
||||
|
||||
[...]
|
||||
INFO Observations: Sunny, 25°C
|
||||
DEBUG [Step 2: Duration 1.81 seconds| Input tokens: 210 | Output tokens: 105] # <-- Monitor Output
|
||||
|
||||
[...]
|
||||
INFO [bold #d4b702]Final answer:[/bold #d4b702] The capital of France is Paris, ...
|
||||
DEBUG [Step 3: Duration 1.25 seconds| Input tokens: 280 | Output tokens: 170] # <-- Monitor Output
|
||||
```
|
||||
|
||||
**Code Glimpse (`monitoring.py` and `agents.py`)**
|
||||
|
||||
* **`Monitor` Class:** Tracks metrics and logs them.
|
||||
|
||||
```python
|
||||
# --- File: monitoring.py (Simplified Monitor) ---
|
||||
from .memory import ActionStep # Needs access to step data
|
||||
from .models import Model # Needs access to model token counts
|
||||
from .monitoring import AgentLogger, LogLevel # Uses the logger to print
|
||||
|
||||
class Monitor:
|
||||
def __init__(self, tracked_model: Model, logger: AgentLogger):
|
||||
self.step_durations = []
|
||||
self.tracked_model = tracked_model # Reference to the agent's model
|
||||
self.logger = logger # Uses the logger to output metrics
|
||||
self.total_input_token_count = 0
|
||||
self.total_output_token_count = 0
|
||||
# ... potentially other metrics ...
|
||||
|
||||
def reset(self):
|
||||
"""Resets metrics for a new run."""
|
||||
self.step_durations = []
|
||||
self.total_input_token_count = 0
|
||||
self.total_output_token_count = 0
|
||||
|
||||
def update_metrics(self, step_log: ActionStep):
|
||||
"""Callback function called after each step."""
|
||||
# 1. Get duration from the step log
|
||||
step_duration = step_log.duration
|
||||
self.step_durations.append(step_duration)
|
||||
|
||||
console_outputs = f"[Step {len(self.step_durations)}: Duration {step_duration:.2f} seconds"
|
||||
|
||||
# 2. Get token counts from the model (if available)
|
||||
input_tokens = getattr(self.tracked_model, "last_input_token_count", None)
|
||||
output_tokens = getattr(self.tracked_model, "last_output_token_count", None)
|
||||
|
||||
if input_tokens is not None and output_tokens is not None:
|
||||
self.total_input_token_count += input_tokens
|
||||
self.total_output_token_count += output_tokens
|
||||
# 4. Format metrics string
|
||||
console_outputs += (
|
||||
f"| Input tokens: {self.total_input_token_count:,}"
|
||||
f" | Output tokens: {self.total_output_token_count:,}"
|
||||
)
|
||||
console_outputs += "]"
|
||||
|
||||
# 5. Log metrics using the logger (at DEBUG level)
|
||||
self.logger.log(console_outputs, level=LogLevel.DEBUG) # Note: logs at DEBUG
|
||||
|
||||
# ... methods to get totals, averages etc. ...
|
||||
```
|
||||
|
||||
* **Agent Setting Up the Monitor:**
|
||||
|
||||
```python
|
||||
# --- File: agents.py (Simplified Agent setup for Monitor) ---
|
||||
from .monitoring import Monitor
|
||||
from .memory import ActionStep
|
||||
|
||||
class MultiStepAgent:
|
||||
def __init__(self, ..., model: Model, step_callbacks: Optional[List[Callable]] = None):
|
||||
# ... setup logger ...
|
||||
self.model = model # Store the model
|
||||
self.monitor = Monitor(self.model, self.logger) # Create Monitor
|
||||
|
||||
# Add monitor's update method to callbacks
|
||||
self.step_callbacks = step_callbacks if step_callbacks is not None else []
|
||||
self.step_callbacks.append(self.monitor.update_metrics)
|
||||
# ...
|
||||
|
||||
def _finalize_step(self, memory_step: ActionStep, step_start_time: float):
|
||||
"""Called at the very end of each step."""
|
||||
memory_step.end_time = time.time()
|
||||
memory_step.duration = memory_step.end_time - step_start_time
|
||||
|
||||
# Call all registered callbacks, including monitor.update_metrics
|
||||
for callback in self.step_callbacks:
|
||||
# Pass the completed step data to the callback
|
||||
callback(memory_step)
|
||||
# ...
|
||||
|
||||
def run(self, ..., reset: bool = True):
|
||||
# ...
|
||||
if reset:
|
||||
self.memory.reset()
|
||||
self.monitor.reset() # Reset monitor metrics on new run
|
||||
# ...
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `AgentLogger` and `Monitor` are your essential tools for observing and understanding your `SmolaAgents`.
|
||||
|
||||
* **`AgentLogger`** acts as the real-time dashboard, giving you formatted, colorful console output of the agent's steps, thoughts, actions, and errors, crucial for debugging and following along.
|
||||
* **`Monitor`** acts as the performance black box, tracking metrics like step duration and token usage, which are logged (usually at the `DEBUG` level) and useful for analysis and optimization.
|
||||
|
||||
You've learned:
|
||||
|
||||
* Why visibility into agent execution is critical.
|
||||
* The roles of `AgentLogger` (dashboard) and `Monitor` (black box).
|
||||
* How they are automatically used by `MultiStepAgent`.
|
||||
* How `AgentLogger` provides readable, step-by-step output using `rich`.
|
||||
* How `Monitor` tracks performance metrics via step callbacks.
|
||||
* How to control log verbosity using `LogLevel`.
|
||||
|
||||
With these tools, you're no longer flying blind! You can confidently run your agents, watch them work, understand their performance, and diagnose issues when they arise.
|
||||
|
||||
This concludes our introductory tour of the core concepts in `SmolaAgents`. We hope these chapters have given you a solid foundation to start building your own intelligent agents. Happy coding!
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
49
docs/SmolaAgents/index.md
Normal file
49
docs/SmolaAgents/index.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Tutorial: SmolaAgents
|
||||
|
||||
`SmolaAgents` is a project for building *autonomous agents* that can solve complex tasks.
|
||||
The core component is the **MultiStepAgent**, which acts like a project manager. It uses a **Model Interface** to talk to language models (LLMs), employs **Tools** (like web search or code execution) to interact with the world or perform actions, and keeps track of its progress and conversation history using **AgentMemory**.
|
||||
For agents that write and run Python code (`CodeAgent`), a **PythonExecutor** provides a safe environment. **PromptTemplates** help structure the instructions given to the LLM, while **AgentType** handles different data formats like images or audio. Finally, **AgentLogger & Monitor** provides logging and tracking for debugging and analysis.
|
||||
|
||||
|
||||
**Source Repository:** [https://github.com/huggingface/smolagents/tree/076cca5e8a130d3fa2ff990ad630231b49767745/src/smolagents](https://github.com/huggingface/smolagents/tree/076cca5e8a130d3fa2ff990ad630231b49767745/src/smolagents)
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A0["MultiStepAgent"]
|
||||
A1["Tool"]
|
||||
A2["Model Interface"]
|
||||
A3["AgentMemory"]
|
||||
A4["PythonExecutor"]
|
||||
A5["PromptTemplates"]
|
||||
A6["AgentType"]
|
||||
A7["AgentLogger & Monitor"]
|
||||
A0 -- "Uses tools" --> A1
|
||||
A0 -- "Uses model" --> A2
|
||||
A0 -- "Uses memory" --> A3
|
||||
A0 -- "Uses templates" --> A5
|
||||
A0 -- "Uses logger/monitor" --> A7
|
||||
A0 -- "Uses executor (CodeAgent)" --> A4
|
||||
A1 -- "Outputs agent types" --> A6
|
||||
A4 -- "Executes tool code" --> A1
|
||||
A2 -- "Generates/Parses tool calls" --> A1
|
||||
A3 -- "Logs tool calls" --> A1
|
||||
A5 -- "Includes tool info" --> A1
|
||||
A6 -- "Handled by agent" --> A0
|
||||
A7 -- "Replays memory" --> A3
|
||||
```
|
||||
|
||||
## Chapters
|
||||
|
||||
1. [MultiStepAgent](01_multistepagent.md)
|
||||
2. [Model Interface](02_model_interface.md)
|
||||
3. [Tool](03_tool.md)
|
||||
4. [AgentMemory](04_agentmemory.md)
|
||||
5. [PromptTemplates](05_prompttemplates.md)
|
||||
6. [PythonExecutor](06_pythonexecutor.md)
|
||||
7. [AgentType](07_agenttype.md)
|
||||
8. [AgentLogger & Monitor](08_agentlogger___monitor.md)
|
||||
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
Reference in New Issue
Block a user