mirror of
https://github.com/aljazceru/Tutorial-Codebase-Knowledge.git
synced 2025-12-19 15:34:23 +01:00
328 lines
14 KiB
Markdown
328 lines
14 KiB
Markdown
---
|
||
layout: default
|
||
title: "Message & Memory"
|
||
parent: "OpenManus"
|
||
nav_order: 2
|
||
---
|
||
|
||
# Chapter 2: Message / Memory - Remembering the Conversation
|
||
|
||
In [Chapter 1: The LLM - Your Agent's Brainpower](01_llm.md), we learned how our agent uses the `LLM` class to access its "thinking" capabilities. But just like humans, an agent needs to remember what was said earlier in a conversation to make sense of new requests and respond appropriately.
|
||
|
||
Imagine asking a friend: "What was the first thing I asked you?". If they have no memory, they can't answer! Agents face the same problem. They need a way to store the conversation history.
|
||
|
||
This is where `Message` and `Memory` come in.
|
||
|
||
## What Problem Do They Solve?
|
||
|
||
Think about a simple chat:
|
||
|
||
1. **You:** "What's the weather like in London?"
|
||
2. **Agent:** "It's currently cloudy and 15°C in London."
|
||
3. **You:** "What about Paris?"
|
||
|
||
For the agent to answer your *second* question ("What about Paris?"), it needs to remember that the *topic* of the conversation is "weather". Without remembering the first question, the second question is meaningless.
|
||
|
||
`Message` and `Memory` provide the structure to:
|
||
|
||
1. Represent each individual turn (like your question or the agent's answer) clearly.
|
||
2. Store these turns in order, creating a log of the conversation.
|
||
|
||
## The Key Concepts: Message and Memory
|
||
|
||
Let's break these down:
|
||
|
||
### 1. Message: A Single Turn in the Chat
|
||
|
||
A `Message` object is like a single speech bubble in a chat interface. It represents one specific thing said by someone (or something) at a particular point in the conversation.
|
||
|
||
Every `Message` has two main ingredients:
|
||
|
||
* **`role`**: *Who* sent this message? This is crucial for the LLM to understand the flow. Common roles are:
|
||
* `user`: A message from the end-user interacting with the agent. (e.g., "What's the weather?")
|
||
* `assistant`: A message *from* the agent/LLM. (e.g., "The weather is sunny.")
|
||
* `system`: An initial instruction to guide the agent's overall behavior. (e.g., "You are a helpful weather assistant.")
|
||
* `tool`: The output or result from a [Tool / ToolCollection](04_tool___toolcollection.md) that the agent used. (e.g., The raw data returned by a weather API tool).
|
||
* **`content`**: *What* was said? This is the actual text of the message. (e.g., "What's the weather like in London?")
|
||
|
||
There are also optional parts for more advanced uses, like `tool_calls` (when the assistant decides to use a tool) or `base64_image` (if an image is included in the message), but `role` and `content` are the basics.
|
||
|
||
### 2. Memory: The Conversation Log
|
||
|
||
The `Memory` object is simply a container, like a list or a notebook, that holds a sequence of `Message` objects.
|
||
|
||
* It keeps track of the entire conversation history (or at least the recent parts).
|
||
* It stores messages in the order they occurred.
|
||
* Agents look at the `Memory` before deciding what to do next, giving them context.
|
||
|
||
Think of `Memory` as the agent's short-term memory for the current interaction.
|
||
|
||
## How Do We Use Them?
|
||
|
||
Let's see how you'd typically work with `Message` and `Memory` in OpenManus (often, the agent framework handles some of this automatically, but it's good to understand the pieces).
|
||
|
||
**1. Creating Messages:**
|
||
|
||
The `Message` class in `app/schema.py` provides handy shortcuts to create messages with the correct role:
|
||
|
||
```python
|
||
# Import the Message class
|
||
from app.schema import Message
|
||
|
||
# Create a message from the user
|
||
user_q = Message.user_message("What's the capital of France?")
|
||
|
||
# Create a message from the assistant (agent's response)
|
||
assistant_a = Message.assistant_message("The capital of France is Paris.")
|
||
|
||
# Create a system instruction
|
||
system_instruction = Message.system_message("You are a helpful geography expert.")
|
||
|
||
print(f"User Message: Role='{user_q.role}', Content='{user_q.content}'")
|
||
print(f"Assistant Message: Role='{assistant_a.role}', Content='{assistant_a.content}'")
|
||
```
|
||
|
||
**Explanation:**
|
||
|
||
* We import `Message` from `app/schema.py`.
|
||
* `Message.user_message("...")` creates a `Message` object with `role` set to `user`.
|
||
* `Message.assistant_message("...")` creates one with `role` set to `assistant`.
|
||
* `Message.system_message("...")` creates one with `role` set to `system`.
|
||
* Each of these returns a `Message` object containing the role and the text content you provided.
|
||
|
||
**Example Output:**
|
||
|
||
```
|
||
User Message: Role='user', Content='What's the capital of France?'
|
||
Assistant Message: Role='assistant', Content='The capital of France is Paris.'
|
||
```
|
||
|
||
**2. Storing Messages in Memory:**
|
||
|
||
The `Memory` class (`app/schema.py`) holds these messages. Agents usually have a `memory` attribute.
|
||
|
||
```python
|
||
# Import Memory and Message
|
||
from app.schema import Message, Memory
|
||
|
||
# Create a Memory instance
|
||
conversation_memory = Memory()
|
||
|
||
# Add messages to the memory
|
||
conversation_memory.add_message(
|
||
Message.system_message("You are a helpful geography expert.")
|
||
)
|
||
conversation_memory.add_message(
|
||
Message.user_message("What's the capital of France?")
|
||
)
|
||
conversation_memory.add_message(
|
||
Message.assistant_message("The capital of France is Paris.")
|
||
)
|
||
conversation_memory.add_message(
|
||
Message.user_message("What about Spain?")
|
||
)
|
||
|
||
|
||
# See the messages stored
|
||
print(f"Number of messages in memory: {len(conversation_memory.messages)}")
|
||
# Print the last message
|
||
print(f"Last message: {conversation_memory.messages[-1].to_dict()}")
|
||
```
|
||
|
||
**Explanation:**
|
||
|
||
* We import `Memory` and `Message`.
|
||
* `conversation_memory = Memory()` creates an empty memory store.
|
||
* `conversation_memory.add_message(...)` adds a `Message` object to the end of the internal list.
|
||
* `conversation_memory.messages` gives you access to the list of `Message` objects currently stored.
|
||
* `message.to_dict()` converts a `Message` object into a simple dictionary format, which is often needed for APIs.
|
||
|
||
**Example Output:**
|
||
|
||
```
|
||
Number of messages in memory: 4
|
||
Last message: {'role': 'user', 'content': 'What about Spain?'}
|
||
```
|
||
|
||
**3. Using Memory for Context:**
|
||
|
||
Now, how does the agent use this? Before calling the [LLM](01_llm.md) to figure out the answer to "What about Spain?", the agent would grab the messages from its `Memory`.
|
||
|
||
```python
|
||
# (Continuing from previous example)
|
||
|
||
# Agent prepares to ask the LLM
|
||
messages_for_llm = conversation_memory.to_dict_list()
|
||
|
||
print("Messages being sent to LLM for context:")
|
||
for msg in messages_for_llm:
|
||
print(f"- {msg}")
|
||
|
||
# Simplified: Agent would now pass 'messages_for_llm' to llm.ask(...)
|
||
# response = await agent.llm.ask(messages=messages_for_llm)
|
||
# print(f"LLM would likely respond about the capital of Spain, e.g., 'The capital of Spain is Madrid.'")
|
||
```
|
||
|
||
**Explanation:**
|
||
|
||
* `conversation_memory.to_dict_list()` converts all stored `Message` objects into the list-of-dictionaries format that the `llm.ask` method expects (as we saw in Chapter 1).
|
||
* By sending this *entire history*, the LLM sees:
|
||
1. Its instructions ("You are a helpful geography expert.")
|
||
2. The first question ("What's the capital of France?")
|
||
3. Its previous answer ("The capital of France is Paris.")
|
||
4. The *new* question ("What about Spain?")
|
||
* With this context, the LLM can correctly infer that "What about Spain?" means "What is the capital of Spain?".
|
||
|
||
## Under the Hood: How It Works
|
||
|
||
`Memory` is conceptually simple. It's primarily a wrapper around a standard Python list, ensuring messages are stored correctly and providing convenient methods.
|
||
|
||
Here's a simplified flow of how an agent uses memory:
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant User
|
||
participant Agent as BaseAgent (app/agent/base.py)
|
||
participant Mem as Memory (app/schema.py)
|
||
participant LLM as LLM Class (app/llm.py)
|
||
participant LLM_API as Actual LLM API
|
||
|
||
User->>+Agent: Sends message ("What about Spain?")
|
||
Agent->>+Mem: update_memory(role="user", content="What about Spain?")
|
||
Mem->>Mem: Adds Message(role='user', ...) to internal list
|
||
Mem-->>-Agent: Memory updated
|
||
Agent->>Agent: Needs to generate response
|
||
Agent->>+Mem: Get all messages (memory.messages)
|
||
Mem-->>-Agent: Returns list of Message objects
|
||
Agent->>Agent: Formats messages to dict list (memory.to_dict_list())
|
||
Agent->>+LLM: ask(messages=formatted_list)
|
||
LLM->>LLM_API: Sends request with history
|
||
LLM_API-->>LLM: Receives response ("The capital is Madrid.")
|
||
LLM-->>-Agent: Returns text response
|
||
Agent->>+Mem: update_memory(role="assistant", content="The capital is Madrid.")
|
||
Mem->>Mem: Adds Message(role='assistant', ...) to internal list
|
||
Mem-->>-Agent: Memory updated
|
||
Agent->>-User: Sends response ("The capital is Madrid.")
|
||
|
||
```
|
||
|
||
**Code Glimpse:**
|
||
|
||
Let's look at the core parts in `app/schema.py`:
|
||
|
||
```python
|
||
# Simplified snippet from app/schema.py
|
||
|
||
from typing import List, Optional
|
||
from pydantic import BaseModel, Field
|
||
|
||
# (Role enum and other definitions are here)
|
||
|
||
class Message(BaseModel):
|
||
role: str # Simplified: In reality uses ROLE_TYPE Literal
|
||
content: Optional[str] = None
|
||
# ... other optional fields like tool_calls, name, etc.
|
||
|
||
def to_dict(self) -> dict:
|
||
# Creates a dictionary representation, skipping None values
|
||
message_dict = {"role": self.role}
|
||
if self.content is not None:
|
||
message_dict["content"] = self.content
|
||
# ... add other fields if they exist ...
|
||
return message_dict
|
||
|
||
@classmethod
|
||
def user_message(cls, content: str) -> "Message":
|
||
return cls(role="user", content=content)
|
||
|
||
@classmethod
|
||
def assistant_message(cls, content: Optional[str]) -> "Message":
|
||
return cls(role="assistant", content=content)
|
||
|
||
# ... other classmethods like system_message, tool_message ...
|
||
|
||
class Memory(BaseModel):
|
||
messages: List[Message] = Field(default_factory=list)
|
||
max_messages: int = 100 # Example limit
|
||
|
||
def add_message(self, message: Message) -> None:
|
||
"""Add a single message to the list."""
|
||
self.messages.append(message)
|
||
# Optional: Trim old messages if limit exceeded
|
||
if len(self.messages) > self.max_messages:
|
||
self.messages = self.messages[-self.max_messages :]
|
||
|
||
def to_dict_list(self) -> List[dict]:
|
||
"""Convert all stored messages to dictionaries."""
|
||
return [msg.to_dict() for msg in self.messages]
|
||
|
||
# ... other methods like clear(), get_recent_messages() ...
|
||
```
|
||
|
||
**Explanation:**
|
||
|
||
* The `Message` class uses Pydantic `BaseModel` for structure and validation. It clearly defines `role` and `content`. The classmethods (`user_message`, etc.) are just convenient ways to create instances with the role pre-filled. `to_dict` prepares it for API calls.
|
||
* The `Memory` class also uses `BaseModel`. Its main part is `messages: List[Message]`, which holds the conversation history. `add_message` simply appends to this list (and optionally trims it). `to_dict_list` iterates through the stored messages and converts each one using its `to_dict` method.
|
||
|
||
And here's how an agent might use its memory attribute (simplified from `app/agent/base.py`):
|
||
|
||
```python
|
||
# Simplified conceptual snippet inspired by app/agent/base.py
|
||
|
||
from app.schema import Memory, Message, ROLE_TYPE # Simplified imports
|
||
from app.llm import LLM
|
||
|
||
class SimplifiedAgent:
|
||
def __init__(self):
|
||
self.memory = Memory() # Agent holds a Memory instance
|
||
self.llm = LLM() # Agent has access to the LLM
|
||
|
||
def add_user_input(self, text: str):
|
||
"""Adds user input to memory."""
|
||
user_msg = Message.user_message(text)
|
||
self.memory.add_message(user_msg)
|
||
print(f"Agent Memory Updated with: {user_msg.to_dict()}")
|
||
|
||
async def generate_response(self) -> str:
|
||
"""Generates a response based on memory."""
|
||
print("Agent consulting memory...")
|
||
messages_for_llm = self.memory.to_dict_list()
|
||
|
||
print(f"Sending {len(messages_for_llm)} messages to LLM...")
|
||
# The actual call to the LLM
|
||
response_text = await self.llm.ask(messages=messages_for_llm)
|
||
|
||
# Add assistant response to memory
|
||
assistant_msg = Message.assistant_message(response_text)
|
||
self.memory.add_message(assistant_msg)
|
||
print(f"Agent Memory Updated with: {assistant_msg.to_dict()}")
|
||
|
||
return response_text
|
||
|
||
# Example Usage (needs async context)
|
||
# agent = SimplifiedAgent()
|
||
# agent.add_user_input("What is the capital of France?")
|
||
# response = await agent.generate_response() # Gets "Paris"
|
||
# agent.add_user_input("What about Spain?")
|
||
# response2 = await agent.generate_response() # Gets "Madrid"
|
||
```
|
||
|
||
**Explanation:**
|
||
|
||
* The agent has `self.memory`.
|
||
* When input arrives (`add_user_input`), it creates a `Message` and adds it using `self.memory.add_message`.
|
||
* When generating a response (`generate_response`), it retrieves the history using `self.memory.to_dict_list()` and passes it to `self.llm.ask`.
|
||
* It then adds the LLM's response back into memory as an `assistant` message.
|
||
|
||
## Wrapping Up Chapter 2
|
||
|
||
You've now learned about `Message` (a single conversational turn with a role and content) and `Memory` (the ordered list storing these messages). Together, they provide the crucial context agents need to understand conversations and respond coherently. They act as the agent's short-term memory or chat log.
|
||
|
||
We have the brain ([LLM](01_llm.md)) and the memory (`Message`/`Memory`). Now we need something to orchestrate the process – to receive input, consult memory, use the LLM, potentially use tools, and manage its state. That's the job of the Agent itself.
|
||
|
||
Let's move on to [Chapter 3: BaseAgent](03_baseagent.md) to see how agents are structured and how they use these core components.
|
||
|
||
---
|
||
|
||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|