Files
Tutorial-Codebase-Knowledge/docs/OpenManus/02_message___memory.md
zachary62 2ebad5e5f2 init push
2025-04-04 13:03:54 -04:00

321 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Chapter 2: Message / Memory - Remembering the Conversation
In [Chapter 1: The LLM - Your Agent's Brainpower](01_llm.md), we learned how our agent uses the `LLM` class to access its "thinking" capabilities. But just like humans, an agent needs to remember what was said earlier in a conversation to make sense of new requests and respond appropriately.
Imagine asking a friend: "What was the first thing I asked you?". If they have no memory, they can't answer! Agents face the same problem. They need a way to store the conversation history.
This is where `Message` and `Memory` come in.
## What Problem Do They Solve?
Think about a simple chat:
1. **You:** "What's the weather like in London?"
2. **Agent:** "It's currently cloudy and 15°C in London."
3. **You:** "What about Paris?"
For the agent to answer your *second* question ("What about Paris?"), it needs to remember that the *topic* of the conversation is "weather". Without remembering the first question, the second question is meaningless.
`Message` and `Memory` provide the structure to:
1. Represent each individual turn (like your question or the agent's answer) clearly.
2. Store these turns in order, creating a log of the conversation.
## The Key Concepts: Message and Memory
Let's break these down:
### 1. Message: A Single Turn in the Chat
A `Message` object is like a single speech bubble in a chat interface. It represents one specific thing said by someone (or something) at a particular point in the conversation.
Every `Message` has two main ingredients:
* **`role`**: *Who* sent this message? This is crucial for the LLM to understand the flow. Common roles are:
* `user`: A message from the end-user interacting with the agent. (e.g., "What's the weather?")
* `assistant`: A message *from* the agent/LLM. (e.g., "The weather is sunny.")
* `system`: An initial instruction to guide the agent's overall behavior. (e.g., "You are a helpful weather assistant.")
* `tool`: The output or result from a [Tool / ToolCollection](04_tool___toolcollection.md) that the agent used. (e.g., The raw data returned by a weather API tool).
* **`content`**: *What* was said? This is the actual text of the message. (e.g., "What's the weather like in London?")
There are also optional parts for more advanced uses, like `tool_calls` (when the assistant decides to use a tool) or `base64_image` (if an image is included in the message), but `role` and `content` are the basics.
### 2. Memory: The Conversation Log
The `Memory` object is simply a container, like a list or a notebook, that holds a sequence of `Message` objects.
* It keeps track of the entire conversation history (or at least the recent parts).
* It stores messages in the order they occurred.
* Agents look at the `Memory` before deciding what to do next, giving them context.
Think of `Memory` as the agent's short-term memory for the current interaction.
## How Do We Use Them?
Let's see how you'd typically work with `Message` and `Memory` in OpenManus (often, the agent framework handles some of this automatically, but it's good to understand the pieces).
**1. Creating Messages:**
The `Message` class in `app/schema.py` provides handy shortcuts to create messages with the correct role:
```python
# Import the Message class
from app.schema import Message
# Create a message from the user
user_q = Message.user_message("What's the capital of France?")
# Create a message from the assistant (agent's response)
assistant_a = Message.assistant_message("The capital of France is Paris.")
# Create a system instruction
system_instruction = Message.system_message("You are a helpful geography expert.")
print(f"User Message: Role='{user_q.role}', Content='{user_q.content}'")
print(f"Assistant Message: Role='{assistant_a.role}', Content='{assistant_a.content}'")
```
**Explanation:**
* We import `Message` from `app/schema.py`.
* `Message.user_message("...")` creates a `Message` object with `role` set to `user`.
* `Message.assistant_message("...")` creates one with `role` set to `assistant`.
* `Message.system_message("...")` creates one with `role` set to `system`.
* Each of these returns a `Message` object containing the role and the text content you provided.
**Example Output:**
```
User Message: Role='user', Content='What's the capital of France?'
Assistant Message: Role='assistant', Content='The capital of France is Paris.'
```
**2. Storing Messages in Memory:**
The `Memory` class (`app/schema.py`) holds these messages. Agents usually have a `memory` attribute.
```python
# Import Memory and Message
from app.schema import Message, Memory
# Create a Memory instance
conversation_memory = Memory()
# Add messages to the memory
conversation_memory.add_message(
Message.system_message("You are a helpful geography expert.")
)
conversation_memory.add_message(
Message.user_message("What's the capital of France?")
)
conversation_memory.add_message(
Message.assistant_message("The capital of France is Paris.")
)
conversation_memory.add_message(
Message.user_message("What about Spain?")
)
# See the messages stored
print(f"Number of messages in memory: {len(conversation_memory.messages)}")
# Print the last message
print(f"Last message: {conversation_memory.messages[-1].to_dict()}")
```
**Explanation:**
* We import `Memory` and `Message`.
* `conversation_memory = Memory()` creates an empty memory store.
* `conversation_memory.add_message(...)` adds a `Message` object to the end of the internal list.
* `conversation_memory.messages` gives you access to the list of `Message` objects currently stored.
* `message.to_dict()` converts a `Message` object into a simple dictionary format, which is often needed for APIs.
**Example Output:**
```
Number of messages in memory: 4
Last message: {'role': 'user', 'content': 'What about Spain?'}
```
**3. Using Memory for Context:**
Now, how does the agent use this? Before calling the [LLM](01_llm.md) to figure out the answer to "What about Spain?", the agent would grab the messages from its `Memory`.
```python
# (Continuing from previous example)
# Agent prepares to ask the LLM
messages_for_llm = conversation_memory.to_dict_list()
print("Messages being sent to LLM for context:")
for msg in messages_for_llm:
print(f"- {msg}")
# Simplified: Agent would now pass 'messages_for_llm' to llm.ask(...)
# response = await agent.llm.ask(messages=messages_for_llm)
# print(f"LLM would likely respond about the capital of Spain, e.g., 'The capital of Spain is Madrid.'")
```
**Explanation:**
* `conversation_memory.to_dict_list()` converts all stored `Message` objects into the list-of-dictionaries format that the `llm.ask` method expects (as we saw in Chapter 1).
* By sending this *entire history*, the LLM sees:
1. Its instructions ("You are a helpful geography expert.")
2. The first question ("What's the capital of France?")
3. Its previous answer ("The capital of France is Paris.")
4. The *new* question ("What about Spain?")
* With this context, the LLM can correctly infer that "What about Spain?" means "What is the capital of Spain?".
## Under the Hood: How It Works
`Memory` is conceptually simple. It's primarily a wrapper around a standard Python list, ensuring messages are stored correctly and providing convenient methods.
Here's a simplified flow of how an agent uses memory:
```mermaid
sequenceDiagram
participant User
participant Agent as BaseAgent (app/agent/base.py)
participant Mem as Memory (app/schema.py)
participant LLM as LLM Class (app/llm.py)
participant LLM_API as Actual LLM API
User->>+Agent: Sends message ("What about Spain?")
Agent->>+Mem: update_memory(role="user", content="What about Spain?")
Mem->>Mem: Adds Message(role='user', ...) to internal list
Mem-->>-Agent: Memory updated
Agent->>Agent: Needs to generate response
Agent->>+Mem: Get all messages (memory.messages)
Mem-->>-Agent: Returns list of Message objects
Agent->>Agent: Formats messages to dict list (memory.to_dict_list())
Agent->>+LLM: ask(messages=formatted_list)
LLM->>LLM_API: Sends request with history
LLM_API-->>LLM: Receives response ("The capital is Madrid.")
LLM-->>-Agent: Returns text response
Agent->>+Mem: update_memory(role="assistant", content="The capital is Madrid.")
Mem->>Mem: Adds Message(role='assistant', ...) to internal list
Mem-->>-Agent: Memory updated
Agent->>-User: Sends response ("The capital is Madrid.")
```
**Code Glimpse:**
Let's look at the core parts in `app/schema.py`:
```python
# Simplified snippet from app/schema.py
from typing import List, Optional
from pydantic import BaseModel, Field
# (Role enum and other definitions are here)
class Message(BaseModel):
role: str # Simplified: In reality uses ROLE_TYPE Literal
content: Optional[str] = None
# ... other optional fields like tool_calls, name, etc.
def to_dict(self) -> dict:
# Creates a dictionary representation, skipping None values
message_dict = {"role": self.role}
if self.content is not None:
message_dict["content"] = self.content
# ... add other fields if they exist ...
return message_dict
@classmethod
def user_message(cls, content: str) -> "Message":
return cls(role="user", content=content)
@classmethod
def assistant_message(cls, content: Optional[str]) -> "Message":
return cls(role="assistant", content=content)
# ... other classmethods like system_message, tool_message ...
class Memory(BaseModel):
messages: List[Message] = Field(default_factory=list)
max_messages: int = 100 # Example limit
def add_message(self, message: Message) -> None:
"""Add a single message to the list."""
self.messages.append(message)
# Optional: Trim old messages if limit exceeded
if len(self.messages) > self.max_messages:
self.messages = self.messages[-self.max_messages :]
def to_dict_list(self) -> List[dict]:
"""Convert all stored messages to dictionaries."""
return [msg.to_dict() for msg in self.messages]
# ... other methods like clear(), get_recent_messages() ...
```
**Explanation:**
* The `Message` class uses Pydantic `BaseModel` for structure and validation. It clearly defines `role` and `content`. The classmethods (`user_message`, etc.) are just convenient ways to create instances with the role pre-filled. `to_dict` prepares it for API calls.
* The `Memory` class also uses `BaseModel`. Its main part is `messages: List[Message]`, which holds the conversation history. `add_message` simply appends to this list (and optionally trims it). `to_dict_list` iterates through the stored messages and converts each one using its `to_dict` method.
And here's how an agent might use its memory attribute (simplified from `app/agent/base.py`):
```python
# Simplified conceptual snippet inspired by app/agent/base.py
from app.schema import Memory, Message, ROLE_TYPE # Simplified imports
from app.llm import LLM
class SimplifiedAgent:
def __init__(self):
self.memory = Memory() # Agent holds a Memory instance
self.llm = LLM() # Agent has access to the LLM
def add_user_input(self, text: str):
"""Adds user input to memory."""
user_msg = Message.user_message(text)
self.memory.add_message(user_msg)
print(f"Agent Memory Updated with: {user_msg.to_dict()}")
async def generate_response(self) -> str:
"""Generates a response based on memory."""
print("Agent consulting memory...")
messages_for_llm = self.memory.to_dict_list()
print(f"Sending {len(messages_for_llm)} messages to LLM...")
# The actual call to the LLM
response_text = await self.llm.ask(messages=messages_for_llm)
# Add assistant response to memory
assistant_msg = Message.assistant_message(response_text)
self.memory.add_message(assistant_msg)
print(f"Agent Memory Updated with: {assistant_msg.to_dict()}")
return response_text
# Example Usage (needs async context)
# agent = SimplifiedAgent()
# agent.add_user_input("What is the capital of France?")
# response = await agent.generate_response() # Gets "Paris"
# agent.add_user_input("What about Spain?")
# response2 = await agent.generate_response() # Gets "Madrid"
```
**Explanation:**
* The agent has `self.memory`.
* When input arrives (`add_user_input`), it creates a `Message` and adds it using `self.memory.add_message`.
* When generating a response (`generate_response`), it retrieves the history using `self.memory.to_dict_list()` and passes it to `self.llm.ask`.
* It then adds the LLM's response back into memory as an `assistant` message.
## Wrapping Up Chapter 2
You've now learned about `Message` (a single conversational turn with a role and content) and `Memory` (the ordered list storing these messages). Together, they provide the crucial context agents need to understand conversations and respond coherently. They act as the agent's short-term memory or chat log.
We have the brain ([LLM](01_llm.md)) and the memory (`Message`/`Memory`). Now we need something to orchestrate the process to receive input, consult memory, use the LLM, potentially use tools, and manage its state. That's the job of the Agent itself.
Let's move on to [Chapter 3: BaseAgent](03_baseagent.md) to see how agents are structured and how they use these core components.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)