init push

This commit is contained in:
zachary62
2025-04-04 13:01:50 -04:00
parent 97c20e803a
commit e62ee2cb13
162 changed files with 42423 additions and 11 deletions

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Zachary Huang
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -1,20 +1,23 @@
<h1 align="center">Agentic Coding - Project Template</h1>
<h1 align="center">Turns Codebase into Easy Tutorial</h1>
![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
<p align="center">
<i>Ever stared at a new codebase written by others feeling completely lost? This tutorial shows you how to build an AI agent that analyzes GitHub repositories and creates beginner-friendly tutorials explaining exactly how the code works.</i>
</p>
<p align="center">
<a href="https://github.com/The-Pocket/PocketFlow" target="_blank">
<img
src="./assets/banner.png" width="600"
src="./assets/banner.png" width="800"
/>
</a>
</p>
This is a project template for Agentic Coding with [Pocket Flow](https://github.com/The-Pocket/PocketFlow), a 100-line LLM framework, and Cursor.
This project crawls GitHub repositories and build a knowledge base from the code:
- We have included the [.cursorrules](.cursorrules) file to let Cursor AI help you build LLM projects.
- Want to learn how to build LLM projects with Agentic Coding?
- Check out the [Agentic Coding Guidance](https://the-pocket.github.io/PocketFlow/guide.html)
- Check out the [YouTube Tutorial](https://www.youtube.com/@ZacharyLLM?sub_confirmation=1)
- **Analyze entire codebases** to identify core abstractions and how they interact
- **Transform complex code** into beginner-friendly tutorials with clear visualizations
- **Build understanding systematically** from fundamentals to advanced concepts in logical steps
Built with [Pocket Flow](https://github.com/The-Pocket/PocketFlow), a 100-line LLM framework.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.9 MiB

After

Width:  |  Height:  |  Size: 4.8 MiB

View File

@@ -0,0 +1,281 @@
# Chapter 1: Agent - The Workers of AutoGen
Welcome to the AutoGen Core tutorial! We're excited to guide you through building powerful applications with autonomous agents.
## Motivation: Why Do We Need Agents?
Imagine you want to build an automated system to write blog posts. You might need one part of the system to research a topic and another part to write the actual post based on the research. How do you represent these different "workers" and make them talk to each other?
This is where the concept of an **Agent** comes in. In AutoGen Core, an `Agent` is the fundamental building block representing an actor or worker in your system. Think of it like an employee in an office.
## Key Concepts: Understanding Agents
Let's break down what makes an Agent:
1. **It's a Worker:** An Agent is designed to *do* things. This could be running calculations, calling a Large Language Model (LLM) like ChatGPT, using a tool (like a search engine), or managing a piece of data.
2. **It Has an Identity (`AgentId`):** Just like every employee has a name and a job title, every Agent needs a unique identity. This identity, called `AgentId`, has two parts:
* `type`: What kind of role does the agent have? (e.g., "researcher", "writer", "coder"). This helps organize agents.
* `key`: A unique name for this specific agent instance (e.g., "researcher-01", "amy-the-writer").
```python
# From: _agent_id.py
class AgentId:
def __init__(self, type: str, key: str) -> None:
# ... (validation checks omitted for brevity)
self._type = type
self._key = key
@property
def type(self) -> str:
return self._type
@property
def key(self) -> str:
return self._key
def __str__(self) -> str:
# Creates an id like "researcher/amy-the-writer"
return f"{self._type}/{self._key}"
```
This `AgentId` acts like the agent's address, allowing other agents (or the system) to send messages specifically to it.
3. **It Has Metadata (`AgentMetadata`):** Besides its core identity, an agent often has descriptive information.
* `type`: Same as in `AgentId`.
* `key`: Same as in `AgentId`.
* `description`: A human-readable explanation of what the agent does (e.g., "Researches topics using web search").
```python
# From: _agent_metadata.py
from typing import TypedDict
class AgentMetadata(TypedDict):
type: str
key: str
description: str
```
This metadata helps understand the agent's purpose within the system.
4. **It Communicates via Messages:** Agents don't work in isolation. They collaborate by sending and receiving messages. The primary way an agent receives work is through its `on_message` method. Think of this like the agent's inbox.
```python
# From: _agent.py (Simplified Agent Protocol)
from typing import Any, Mapping, Protocol
# ... other imports
class Agent(Protocol):
@property
def id(self) -> AgentId: ... # The agent's unique ID
async def on_message(self, message: Any, ctx: MessageContext) -> Any:
"""Handles an incoming message."""
# Agent's logic to process the message goes here
...
```
When an agent receives a message, `on_message` is called. The `message` contains the data or task, and `ctx` (MessageContext) provides extra information about the message (like who sent it). We'll cover `MessageContext` more later.
5. **It Can Remember Things (State):** Sometimes, an agent needs to remember information between tasks, like keeping notes on research progress. Agents can optionally implement `save_state` and `load_state` methods to store and retrieve their internal memory.
```python
# From: _agent.py (Simplified Agent Protocol)
class Agent(Protocol):
# ... other methods
async def save_state(self) -> Mapping[str, Any]:
"""Save the agent's internal memory."""
# Return a dictionary representing the state
...
async def load_state(self, state: Mapping[str, Any]) -> None:
"""Load the agent's internal memory."""
# Restore state from the dictionary
...
```
We'll explore state and memory in more detail in [Chapter 7: Memory](07_memory.md).
6. **Different Agent Types:** AutoGen Core provides base classes to make creating agents easier:
* `BaseAgent`: The fundamental class most agents inherit from. It provides common setup.
* `ClosureAgent`: A very quick way to create simple agents using just a function (like hiring a temp worker for a specific task defined on the spot).
* `RoutedAgent`: An agent that can automatically direct different types of messages to different internal handler methods (like a smart receptionist).
## Use Case Example: Researcher and Writer
Let's revisit our blog post example. We want a `Researcher` agent and a `Writer` agent.
**Goal:**
1. Tell the `Researcher` a topic (e.g., "AutoGen Agents").
2. The `Researcher` finds some facts (we'll keep it simple and just make them up for now).
3. The `Researcher` sends these facts to the `Writer`.
4. The `Writer` receives the facts and drafts a short post.
**Simplified Implementation Idea (using `ClosureAgent` for brevity):**
First, let's define the messages they might exchange:
```python
from dataclasses import dataclass
@dataclass
class ResearchTopic:
topic: str
@dataclass
class ResearchFacts:
topic: str
facts: list[str]
@dataclass
class DraftPost:
topic: str
draft: str
```
These are simple Python classes to hold the data being passed around.
Now, let's imagine defining the `Researcher` using a `ClosureAgent`. This agent will listen for `ResearchTopic` messages.
```python
# Simplified concept - requires AgentRuntime (Chapter 3) to actually run
async def researcher_logic(agent_context, message: ResearchTopic, msg_context):
print(f"Researcher received topic: {message.topic}")
# In a real scenario, this would involve searching, calling an LLM, etc.
# For now, we just make up facts.
facts = [f"Fact 1 about {message.topic}", f"Fact 2 about {message.topic}"]
print(f"Researcher found facts: {facts}")
# Find the Writer agent's ID (we assume we know it)
writer_id = AgentId(type="writer", key="blog_writer_1")
# Send the facts to the Writer
await agent_context.send_message(
message=ResearchFacts(topic=message.topic, facts=facts),
recipient=writer_id,
)
print("Researcher sent facts to Writer.")
# This agent doesn't return a direct reply
return None
```
This `researcher_logic` function defines *what* the researcher does when it gets a `ResearchTopic` message. It processes the topic, creates `ResearchFacts`, and uses `agent_context.send_message` to send them to the `writer` agent.
Similarly, the `Writer` agent would have its own logic:
```python
# Simplified concept - requires AgentRuntime (Chapter 3) to actually run
async def writer_logic(agent_context, message: ResearchFacts, msg_context):
print(f"Writer received facts for topic: {message.topic}")
# In a real scenario, this would involve LLM prompting
draft = f"Blog Post about {message.topic}:\n"
for fact in message.facts:
draft += f"- {fact}\n"
print(f"Writer drafted post:\n{draft}")
# Perhaps save the draft or send it somewhere else
# For now, we just print it. We don't send another message.
return None # Or maybe return a confirmation/result
```
This `writer_logic` function defines how the writer reacts to receiving `ResearchFacts`.
**Important:** To actually *run* these agents and make them communicate, we need the `AgentRuntime` (covered in [Chapter 3: AgentRuntime](03_agentruntime.md)) and the `Messaging System` (covered in [Chapter 2: Messaging System](02_messaging_system__topic___subscription_.md)). For now, focus on the *idea* that Agents are distinct workers defined by their logic (`on_message`) and identified by their `AgentId`.
## Under the Hood: How an Agent Gets a Message
While the full message delivery involves the `Messaging System` and `AgentRuntime`, let's look at the agent's role when it receives a message.
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant Sender as Sender Agent
participant Runtime as AgentRuntime
participant Recipient as Recipient Agent
Sender->>+Runtime: send_message(message, recipient_id)
Runtime->>+Recipient: Locate agent by recipient_id
Runtime->>+Recipient: on_message(message, context)
Recipient->>Recipient: Process message using internal logic
alt Response Needed
Recipient->>-Runtime: Return response value
Runtime->>-Sender: Deliver response value
else No Response
Recipient->>-Runtime: Return None (or no return)
end
```
1. Some other agent (Sender) or the system decides to send a message to our agent (Recipient).
2. It tells the `AgentRuntime` (the manager): "Deliver this `message` to the agent with `recipient_id`".
3. The `AgentRuntime` finds the correct `Recipient` agent instance.
4. The `AgentRuntime` calls the `Recipient.on_message(message, context)` method.
5. The agent's internal logic inside `on_message` (or methods called by it, like in `RoutedAgent`) runs to process the message.
6. If the message requires a direct response (like an RPC call), the agent returns a value from `on_message`. If not (like a general notification or event), it might return `None`.
**Code Glimpse:**
The core definition is the `Agent` Protocol (`_agent.py`). It's like an interface or a contract any class wanting to be an Agent *must* provide these methods.
```python
# From: _agent.py - The Agent blueprint (Protocol)
@runtime_checkable
class Agent(Protocol):
@property
def metadata(self) -> AgentMetadata: ...
@property
def id(self) -> AgentId: ...
async def on_message(self, message: Any, ctx: MessageContext) -> Any: ...
async def save_state(self) -> Mapping[str, Any]: ...
async def load_state(self, state: Mapping[str, Any]) -> None: ...
async def close(self) -> None: ...
```
Most agents you create will inherit from `BaseAgent` (`_base_agent.py`). It provides some standard setup:
```python
# From: _base_agent.py (Simplified)
class BaseAgent(ABC, Agent):
def __init__(self, description: str) -> None:
# Gets runtime & id from a special context when created by the runtime
# Raises error if you try to create it directly!
self._runtime: AgentRuntime = AgentInstantiationContext.current_runtime()
self._id: AgentId = AgentInstantiationContext.current_agent_id()
self._description = description
# ...
# This is the final version called by the runtime
@final
async def on_message(self, message: Any, ctx: MessageContext) -> Any:
# It calls the implementation method you need to write
return await self.on_message_impl(message, ctx)
# You MUST implement this in your subclass
@abstractmethod
async def on_message_impl(self, message: Any, ctx: MessageContext) -> Any: ...
# Helper to send messages easily
async def send_message(self, message: Any, recipient: AgentId, ...) -> Any:
# It just asks the runtime to do the actual sending
return await self._runtime.send_message(
message, sender=self.id, recipient=recipient, ...
)
# ... other methods like publish_message, save_state, load_state
```
Notice how `BaseAgent` handles getting its `id` and `runtime` during creation and provides a convenient `send_message` method that uses the runtime. When inheriting from `BaseAgent`, you primarily focus on implementing the `on_message_impl` method to define your agent's unique behavior.
## Next Steps
You now understand the core concept of an `Agent` in AutoGen Core! It's the fundamental worker unit with an identity, the ability to process messages, and optionally maintain state.
In the next chapters, we'll explore:
* [Chapter 2: Messaging System](02_messaging_system__topic___subscription_.md): How messages actually travel between agents.
* [Chapter 3: AgentRuntime](03_agentruntime.md): The manager responsible for creating, running, and connecting agents.
Let's continue building your understanding!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,267 @@
# Chapter 2: Messaging System (Topic & Subscription)
In [Chapter 1: Agent](01_agent.md), we learned about Agents as individual workers. But how do they coordinate when one agent doesn't know exactly *who* needs the information it produces? Imagine our Researcher finds some facts. Maybe the Writer needs them, but maybe a Fact-Checker agent or a Summary agent also needs them later. How can the Researcher just announce "Here are the facts!" without needing a specific mailing list?
This is where the **Messaging System**, specifically **Topics** and **Subscriptions**, comes in. It allows agents to broadcast messages to anyone interested, like posting on a company announcement board.
## Motivation: Broadcasting Information
Let's refine our blog post example:
1. The `Researcher` agent finds facts about "AutoGen Agents".
2. Instead of sending *directly* to the `Writer`, the `Researcher` **publishes** these facts to a general "research-results" **Topic**.
3. The `Writer` agent has previously told the system it's **subscribed** to the "research-results" Topic.
4. The system sees the new message on the Topic and delivers it to the `Writer` (and any other subscribers).
This way, the `Researcher` doesn't need to know who the `Writer` is, or even if a `Writer` exists! It just broadcasts the results. If we later add a `FactChecker` agent that also needs the results, it simply subscribes to the same Topic.
## Key Concepts: Topics and Subscriptions
Let's break down the components of this broadcasting system:
1. **Topic (`TopicId`): The Announcement Board**
* A `TopicId` represents a specific channel or category for messages. Think of it like the name of an announcement board (e.g., "Project Updates", "General Announcements").
* It has two main parts:
* `type`: What *kind* of event or information is this? (e.g., "research.completed", "user.request"). This helps categorize messages.
* `source`: *Where* or *why* did this event originate? Often, this relates to the specific task or context (e.g., the specific blog post being researched like "autogen-agents-blog-post", or the team generating the event like "research-team").
```python
# From: _topic.py (Simplified)
from dataclasses import dataclass
@dataclass(frozen=True) # Immutable: can't change after creation
class TopicId:
type: str
source: str
def __str__(self) -> str:
# Creates an id like "research.completed/autogen-agents-blog-post"
return f"{self.type}/{self.source}"
```
This structure allows for flexible filtering. Agents might subscribe to all topics of a certain `type`, regardless of the `source`, or only to topics with a specific `source`.
2. **Publishing: Posting the Announcement**
* When an agent has information to share broadly, it *publishes* a message to a specific `TopicId`.
* This is like pinning a note to the designated announcement board. The agent doesn't need to know who will read it.
3. **Subscription (`Subscription`): Signing Up for Updates**
* A `Subscription` is how an agent declares its interest in certain `TopicId`s.
* It acts like a rule: "If a message is published to a Topic that matches *this pattern*, please deliver it to *this kind of agent*".
* The `Subscription` links a `TopicId` pattern (e.g., "all topics with type `research.completed`") to an `AgentId` (or a way to determine the `AgentId`).
4. **Routing: Delivering the Mail**
* The `AgentRuntime` (the system manager we'll meet in [Chapter 3: AgentRuntime](03_agentruntime.md)) keeps track of all active `Subscription`s.
* When a message is published to a `TopicId`, the `AgentRuntime` checks which `Subscription`s match that `TopicId`.
* For each match, it uses the `Subscription`'s rule to figure out which specific `AgentId` should receive the message and delivers it.
## Use Case Example: Researcher Publishes, Writer Subscribes
Let's see how our Researcher and Writer can use this system.
**Goal:** Researcher publishes facts to a topic, Writer receives them via subscription.
**1. Define the Topic:**
We need a `TopicId` for research results. Let's say the `type` is "research.facts.available" and the `source` identifies the specific research task (e.g., "blog-post-autogen").
```python
# From: _topic.py
from autogen_core import TopicId
# Define the topic for this specific research task
research_topic_id = TopicId(type="research.facts.available", source="blog-post-autogen")
print(f"Topic ID: {research_topic_id}")
# Output: Topic ID: research.facts.available/blog-post-autogen
```
This defines the "announcement board" we'll use.
**2. Researcher Publishes:**
The `Researcher` agent, after finding facts, will use its `agent_context` (provided by the runtime) to publish the `ResearchFacts` message to this topic.
```python
# Simplified concept - Researcher agent logic
# Assume 'agent_context' and 'message' (ResearchTopic) are provided
# Define the facts message (from Chapter 1)
@dataclass
class ResearchFacts:
topic: str
facts: list[str]
async def researcher_publish_logic(agent_context, message: ResearchTopic, msg_context):
print(f"Researcher working on: {message.topic}")
facts_data = ResearchFacts(
topic=message.topic,
facts=[f"Fact A about {message.topic}", f"Fact B about {message.topic}"]
)
# Define the specific topic for this task's results
results_topic = TopicId(type="research.facts.available", source=message.topic) # Use message topic as source
# Publish the facts to the topic
await agent_context.publish_message(message=facts_data, topic_id=results_topic)
print(f"Researcher published facts to topic: {results_topic}")
# No direct reply needed
return None
```
Notice the `agent_context.publish_message` call. The Researcher doesn't specify a recipient, only the topic.
**3. Writer Subscribes:**
The `Writer` agent needs to tell the system it's interested in messages on topics like "research.facts.available". We can use a predefined `Subscription` type called `TypeSubscription`. This subscription typically means: "I am interested in all topics with this *exact type*. When a message arrives, create/use an agent of *my type* whose `key` matches the topic's `source`."
```python
# From: _type_subscription.py (Simplified Concept)
from autogen_core import TypeSubscription, BaseAgent
class WriterAgent(BaseAgent):
# ... agent implementation ...
async def on_message_impl(self, message: ResearchFacts, ctx):
# This method gets called when a subscribed message arrives
print(f"Writer ({self.id}) received facts via subscription: {message.facts}")
# ... process facts and write draft ...
# How the Writer subscribes (usually done during runtime setup - Chapter 3)
# This tells the runtime: "Messages on topics with type 'research.facts.available'
# should go to a 'writer' agent whose key matches the topic source."
writer_subscription = TypeSubscription(
topic_type="research.facts.available",
agent_type="writer" # The type of agent that should handle this
)
print(f"Writer subscription created for topic type: {writer_subscription.topic_type}")
# Output: Writer subscription created for topic type: research.facts.available
```
When the `Researcher` publishes to `TopicId(type="research.facts.available", source="blog-post-autogen")`, the `AgentRuntime` will see that `writer_subscription` matches the `topic_type`. It will then use the rule: "Find (or create) an agent with `AgentId(type='writer', key='blog-post-autogen')` and deliver the message."
**Benefit:** Decoupling! The Researcher just broadcasts. The Writer just listens for relevant broadcasts. We can add more listeners (like a `FactChecker` subscribing to the same `topic_type`) without changing the `Researcher` at all.
## Under the Hood: How Publishing Works
Let's trace the journey of a published message.
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant Publisher as Publisher Agent
participant Runtime as AgentRuntime
participant SubRegistry as Subscription Registry
participant Subscriber as Subscriber Agent
Publisher->>+Runtime: publish_message(message, topic_id)
Runtime->>+SubRegistry: Find subscriptions matching topic_id
SubRegistry-->>-Runtime: Return list of matching Subscriptions
loop For each matching Subscription
Runtime->>Subscription: map_to_agent(topic_id)
Subscription-->>Runtime: Return target AgentId
Runtime->>+Subscriber: Locate/Create Agent instance by AgentId
Runtime->>Subscriber: on_message(message, context)
Subscriber-->>-Runtime: Process message (optional return)
end
Runtime-->>-Publisher: Return (usually None for publish)
```
1. **Publish:** An agent calls `agent_context.publish_message(message, topic_id)`. This internally calls the `AgentRuntime`'s publish method.
2. **Lookup:** The `AgentRuntime` takes the `topic_id` and consults its internal `Subscription Registry`.
3. **Match:** The Registry checks all registered `Subscription` objects. Each `Subscription` has an `is_match(topic_id)` method. The registry finds all subscriptions where `is_match` returns `True`.
4. **Map:** For each matching `Subscription`, the Runtime calls its `map_to_agent(topic_id)` method. This method returns the specific `AgentId` that should handle this message based on the subscription rule and the topic details.
5. **Deliver:** The `AgentRuntime` finds the agent instance corresponding to the returned `AgentId` (potentially creating it if it doesn't exist yet, especially with `TypeSubscription`). It then calls that agent's `on_message` method, delivering the original published `message`.
**Code Glimpse:**
* **`TopicId` (`_topic.py`):** As shown before, a simple dataclass holding `type` and `source`. It includes validation to ensure the `type` follows certain naming conventions.
```python
# From: _topic.py
@dataclass(eq=True, frozen=True)
class TopicId:
type: str
source: str
# ... validation and __str__ ...
@classmethod
def from_str(cls, topic_id: str) -> Self:
# Helper to parse "type/source" string
# ... implementation ...
```
* **`Subscription` Protocol (`_subscription.py`):** This defines the *contract* for any subscription rule.
```python
# From: _subscription.py (Simplified Protocol)
from typing import Protocol
# ... other imports
class Subscription(Protocol):
@property
def id(self) -> str: ... # Unique ID for this subscription instance
def is_match(self, topic_id: TopicId) -> bool:
"""Check if a topic matches this subscription's rule."""
...
def map_to_agent(self, topic_id: TopicId) -> AgentId:
"""Determine the target AgentId if is_match was True."""
...
```
Any class implementing these methods can act as a subscription rule.
* **`TypeSubscription` (`_type_subscription.py`):** A common implementation of the `Subscription` protocol.
```python
# From: _type_subscription.py (Simplified)
class TypeSubscription(Subscription):
def __init__(self, topic_type: str, agent_type: str, ...):
self._topic_type = topic_type
self._agent_type = agent_type
# ... generates a unique self._id ...
def is_match(self, topic_id: TopicId) -> bool:
# Matches if the topic's type is exactly the one we want
return topic_id.type == self._topic_type
def map_to_agent(self, topic_id: TopicId) -> AgentId:
# Maps to an agent of the specified type, using the
# topic's source as the agent's unique key.
if not self.is_match(topic_id):
raise CantHandleException(...) # Should not happen if used correctly
return AgentId(type=self._agent_type, key=topic_id.source)
# ... id property ...
```
This implementation provides the "one agent instance per source" behavior for a specific topic type.
* **`DefaultSubscription` (`_default_subscription.py`):** This is often used via a decorator (`@default_subscription`) and provides a convenient way to create a `TypeSubscription` where the `agent_type` is automatically inferred from the agent class being defined, and the `topic_type` defaults to "default" (but can be overridden). It simplifies common use cases.
```python
# From: _default_subscription.py (Conceptual Usage)
from autogen_core import BaseAgent, default_subscription, ResearchFacts
@default_subscription # Uses 'default' topic type, infers agent type 'writer'
class WriterAgent(BaseAgent):
# Agent logic here...
async def on_message_impl(self, message: ResearchFacts, ctx): ...
# Or specify the topic type
@default_subscription(topic_type="research.facts.available")
class SpecificWriterAgent(BaseAgent):
# Agent logic here...
async def on_message_impl(self, message: ResearchFacts, ctx): ...
```
The actual sending (`publish_message`) and routing logic reside within the `AgentRuntime`, which we'll explore next.
## Next Steps
You've learned how AutoGen Core uses a publish/subscribe system (`TopicId`, `Subscription`) to allow agents to communicate without direct coupling. This is crucial for building flexible and scalable multi-agent applications.
* **Topic (`TopicId`):** Named channels (`type`/`source`) for broadcasting messages.
* **Publish:** Sending a message to a Topic.
* **Subscription:** An agent's declared interest in messages on certain Topics, defining a routing rule.
Now, let's dive into the orchestrator that manages agents and makes this messaging system work:
* [Chapter 3: AgentRuntime](03_agentruntime.md): The manager responsible for creating, running, and connecting agents, including handling message publishing and subscription routing.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,349 @@
# Chapter 3: AgentRuntime - The Office Manager
In [Chapter 1: Agent](01_agent.md), we met the workers (`Agent`) of our system. In [Chapter 2: Messaging System](02_messaging_system__topic___subscription_.md), we saw how they can communicate broadly using topics and subscriptions. But who hires these agents? Who actually delivers the messages, whether direct or published? And who keeps the whole system running smoothly?
This is where the **`AgentRuntime`** comes in. It's the central nervous system, the operating system, or perhaps the most fitting analogy: **the office manager** for all your agents.
## Motivation: Why Do We Need an Office Manager?
Imagine an office full of employees (Agents). You have researchers, writers, maybe coders.
* How does a new employee get hired and set up?
* When one employee wants to send a memo directly to another, who makes sure it gets to the right desk?
* When someone posts an announcement on the company bulletin board (publishes to a topic), who ensures everyone who signed up for that type of announcement sees it?
* Who starts the workday and ensures everything keeps running?
Without an office manager, it would be chaos! The `AgentRuntime` serves this crucial role in AutoGen Core. It handles:
1. **Agent Creation:** "Onboarding" new agents when they are needed.
2. **Message Routing:** Delivering direct messages (`send_message`) and published messages (`publish_message`).
3. **Lifecycle Management:** Starting, running, and stopping the whole system.
4. **State Management:** Keeping track of the overall system state (optional).
## Key Concepts: Understanding the Manager's Job
Let's break down the main responsibilities of the `AgentRuntime`:
1. **Agent Instantiation (Hiring):**
* You don't usually create agent objects directly (like `my_agent = ResearcherAgent()`). Why? Because the agent needs to know *about* the runtime (the office it works in) to send messages, publish announcements, etc.
* Instead, you tell the `AgentRuntime`: "I need an agent of type 'researcher'. Here's a recipe (a **factory function**) for how to create one." This is done using `runtime.register_factory(...)`.
* When a message needs to go to a 'researcher' agent with a specific key (e.g., 'researcher-01'), the runtime checks if it already exists. If not, it uses the registered factory function to create (instantiate) the agent.
* **Crucially**, while creating the agent, the runtime provides special context (`AgentInstantiationContext`) so the new agent automatically gets its unique `AgentId` and a reference to the `AgentRuntime` itself. This is like giving a new employee their ID badge and telling them who the office manager is.
```python
# Simplified Concept - How a BaseAgent gets its ID and runtime access
# From: _agent_instantiation.py and _base_agent.py
# Inside the agent's __init__ method (when inheriting from BaseAgent):
class MyAgent(BaseAgent):
def __init__(self, description: str):
# This magic happens *because* the AgentRuntime is creating the agent
# inside a special context.
self._runtime = AgentInstantiationContext.current_runtime() # Gets the manager
self._id = AgentInstantiationContext.current_agent_id() # Gets its own ID
self._description = description
# ... rest of initialization ...
```
This ensures agents are properly integrated into the system from the moment they are created.
2. **Message Delivery (Mail Room):**
* **Direct Send (`send_message`):** When an agent calls `await agent_context.send_message(message, recipient_id)`, it's actually telling the `AgentRuntime`, "Please deliver this `message` directly to the agent identified by `recipient_id`." The runtime finds the recipient agent (creating it if necessary) and calls its `on_message` method. It's like putting a specific name on an envelope and handing it to the mail room.
* **Publish (`publish_message`):** When an agent calls `await agent_context.publish_message(message, topic_id)`, it tells the runtime, "Post this `message` to the announcement board named `topic_id`." The runtime then checks its list of **subscriptions** (who signed up for which boards). For every matching subscription, it figures out the correct recipient agent(s) (based on the subscription rule) and delivers the message to their `on_message` method.
3. **Lifecycle Management (Opening/Closing the Office):**
* The runtime needs to be started to begin processing messages. Typically, you call `runtime.start()`. This usually kicks off a background process or loop that watches for incoming messages.
* When work is done, you need to stop the runtime gracefully. `runtime.stop_when_idle()` is common it waits until all messages currently in the queue have been processed, then stops. `runtime.stop()` stops more abruptly.
4. **State Management (Office Records):**
* The runtime can save the state of *all* the agents it manages (`runtime.save_state()`) and load it back later (`runtime.load_state()`). This is useful for pausing and resuming complex multi-agent interactions. It can also save/load state for individual agents (`runtime.agent_save_state()` / `runtime.agent_load_state()`). We'll touch more on state in [Chapter 7: Memory](07_memory.md).
## Use Case Example: Running Our Researcher and Writer
Let's finally run the Researcher/Writer scenario from Chapters 1 and 2. We need the `AgentRuntime` to make it happen.
**Goal:**
1. Create a runtime.
2. Register factories for a 'researcher' and a 'writer' agent.
3. Tell the runtime that 'writer' agents are interested in "research.facts.available" topics (add subscription).
4. Start the runtime.
5. Send an initial `ResearchTopic` message to a 'researcher' agent.
6. Let the system run (Researcher publishes facts, Runtime delivers to Writer via subscription, Writer processes).
7. Stop the runtime when idle.
**Code Snippets (Simplified):**
```python
# 0. Imports and Message Definitions (from previous chapters)
import asyncio
from dataclasses import dataclass
from autogen_core import (
AgentId, BaseAgent, SingleThreadedAgentRuntime, TopicId,
MessageContext, TypeSubscription, AgentInstantiationContext
)
@dataclass
class ResearchTopic: topic: str
@dataclass
class ResearchFacts: topic: str; facts: list[str]
```
These are the messages our agents will exchange.
```python
# 1. Define Agent Logic (using BaseAgent)
class ResearcherAgent(BaseAgent):
async def on_message_impl(self, message: ResearchTopic, ctx: MessageContext):
print(f"Researcher ({self.id}) got topic: {message.topic}")
facts = [f"Fact 1 about {message.topic}", f"Fact 2"]
results_topic = TopicId("research.facts.available", message.topic)
# Use the runtime (via self.publish_message helper) to publish
await self.publish_message(
ResearchFacts(topic=message.topic, facts=facts), results_topic
)
print(f"Researcher ({self.id}) published facts to {results_topic}")
class WriterAgent(BaseAgent):
async def on_message_impl(self, message: ResearchFacts, ctx: MessageContext):
print(f"Writer ({self.id}) received facts via topic '{ctx.topic_id}': {message.facts}")
draft = f"Draft for {message.topic}: {'; '.join(message.facts)}"
print(f"Writer ({self.id}) created draft: '{draft}'")
# This agent doesn't send further messages in this example
```
Here we define the behavior of our two agent types, inheriting from `BaseAgent` which gives us `self.id`, `self.publish_message`, etc.
```python
# 2. Define Agent Factories
def researcher_factory():
# Gets runtime/id via AgentInstantiationContext inside BaseAgent.__init__
print("Runtime is creating a ResearcherAgent...")
return ResearcherAgent(description="I research topics.")
def writer_factory():
print("Runtime is creating a WriterAgent...")
return WriterAgent(description="I write drafts from facts.")
```
These simple functions tell the runtime *how* to create instances of our agents when needed.
```python
# 3. Setup and Run the Runtime
async def main():
# Create the runtime (the office manager)
runtime = SingleThreadedAgentRuntime()
# Register the factories (tell the manager how to hire)
await runtime.register_factory("researcher", researcher_factory)
await runtime.register_factory("writer", writer_factory)
print("Registered agent factories.")
# Add the subscription (tell manager who listens to which announcements)
# Rule: Messages to topics of type "research.facts.available"
# should go to a "writer" agent whose key matches the topic source.
writer_sub = TypeSubscription(topic_type="research.facts.available", agent_type="writer")
await runtime.add_subscription(writer_sub)
print(f"Added subscription: {writer_sub.id}")
# Start the runtime (open the office)
runtime.start()
print("Runtime started.")
# Send the initial message to kick things off
research_task_topic = "AutoGen Agents"
researcher_instance_id = AgentId(type="researcher", key=research_task_topic)
print(f"Sending initial topic '{research_task_topic}' to {researcher_instance_id}")
await runtime.send_message(
message=ResearchTopic(topic=research_task_topic),
recipient=researcher_instance_id,
)
# Wait until all messages are processed (wait for work day to end)
print("Waiting for runtime to become idle...")
await runtime.stop_when_idle()
print("Runtime stopped.")
# Run the main function
asyncio.run(main())
```
This script sets up the `SingleThreadedAgentRuntime`, registers the blueprints (factories) and communication rules (subscription), starts the process, and then shuts down cleanly.
**Expected Output (Conceptual Order):**
```
Registered agent factories.
Added subscription: type=research.facts.available=>agent=writer
Runtime started.
Sending initial topic 'AutoGen Agents' to researcher/AutoGen Agents
Waiting for runtime to become idle...
Runtime is creating a ResearcherAgent... # First time researcher/AutoGen Agents is needed
Researcher (researcher/AutoGen Agents) got topic: AutoGen Agents
Researcher (researcher/AutoGen Agents) published facts to research.facts.available/AutoGen Agents
Runtime is creating a WriterAgent... # First time writer/AutoGen Agents is needed (due to subscription)
Writer (writer/AutoGen Agents) received facts via topic 'research.facts.available/AutoGen Agents': ['Fact 1 about AutoGen Agents', 'Fact 2']
Writer (writer/AutoGen Agents) created draft: 'Draft for AutoGen Agents: Fact 1 about AutoGen Agents; Fact 2'
Runtime stopped.
```
You can see the runtime orchestrating the creation of agents and the flow of messages based on the initial request and the subscription rule.
## Under the Hood: How the Manager Works
Let's peek inside the `SingleThreadedAgentRuntime` (a common implementation provided by AutoGen Core) to understand the flow.
**Core Idea:** It uses an internal queue (`_message_queue`) to hold incoming requests (`send_message`, `publish_message`). A background task continuously takes items from the queue and processes them one by one (though the *handling* of a message might involve `await` and allow other tasks to run).
**1. Agent Creation (`_get_agent`, `_invoke_agent_factory`)**
When the runtime needs an agent instance (e.g., to deliver a message) that hasn't been created yet:
```mermaid
sequenceDiagram
participant Runtime as AgentRuntime
participant Factory as Agent Factory Func
participant AgentCtx as AgentInstantiationContext
participant Agent as New Agent Instance
Runtime->>Runtime: Check if agent instance exists (e.g., in `_instantiated_agents` dict)
alt Agent Not Found
Runtime->>Runtime: Find registered factory for agent type
Runtime->>AgentCtx: Set current runtime & agent_id
activate AgentCtx
Runtime->>Factory: Call factory function()
activate Factory
Factory->>AgentCtx: (Inside Agent.__init__) Get current runtime
AgentCtx-->>Factory: Return runtime
Factory->>AgentCtx: (Inside Agent.__init__) Get current agent_id
AgentCtx-->>Factory: Return agent_id
Factory-->>Runtime: Return new Agent instance
deactivate Factory
Runtime->>AgentCtx: Clear context
deactivate AgentCtx
Runtime->>Runtime: Store new agent instance
end
Runtime->>Runtime: Return agent instance
```
* The runtime looks up the factory function registered for the required `AgentId.type`.
* It uses `AgentInstantiationContext.populate_context` to temporarily store its own reference and the target `AgentId`.
* It calls the factory function.
* Inside the agent's `__init__` (usually via `BaseAgent`), `AgentInstantiationContext.current_runtime()` and `AgentInstantiationContext.current_agent_id()` are called to retrieve the context set by the runtime.
* The factory returns the fully initialized agent instance.
* The runtime stores this instance for future use.
```python
# From: _agent_instantiation.py (Simplified)
class AgentInstantiationContext:
_CONTEXT_VAR = ContextVar("agent_context") # Stores (runtime, agent_id)
@classmethod
@contextmanager
def populate_context(cls, ctx: tuple[AgentRuntime, AgentId]):
token = cls._CONTEXT_VAR.set(ctx) # Store context for this block
try:
yield # Code inside the 'with' block runs here
finally:
cls._CONTEXT_VAR.reset(token) # Clean up context
@classmethod
def current_runtime(cls) -> AgentRuntime:
return cls._CONTEXT_VAR.get()[0] # Retrieve runtime from context
@classmethod
def current_agent_id(cls) -> AgentId:
return cls._CONTEXT_VAR.get()[1] # Retrieve agent_id from context
```
This context manager pattern ensures the correct runtime and ID are available *only* during the agent's creation by the runtime.
**2. Direct Messaging (`send_message` -> `_process_send`)**
```mermaid
sequenceDiagram
participant Sender as Sending Agent/Code
participant Runtime as AgentRuntime
participant Queue as Internal Queue
participant Recipient as Recipient Agent
Sender->>+Runtime: send_message(msg, recipient_id, ...)
Runtime->>Runtime: Create Future (for response)
Runtime->>+Queue: Put SendMessageEnvelope(msg, recipient_id, future)
Runtime-->>-Sender: Return awaitable Future
Note over Queue, Runtime: Background task picks up envelope
Runtime->>Runtime: _process_send(envelope)
Runtime->>+Recipient: _get_agent(recipient_id) (creates if needed)
Recipient-->>-Runtime: Return Agent instance
Runtime->>+Recipient: on_message(msg, context)
Recipient->>Recipient: Process message...
Recipient-->>-Runtime: Return response value
Runtime->>Runtime: Set Future result with response value
```
* `send_message` creates a `Future` object (a placeholder for the eventual result) and wraps the message details in a `SendMessageEnvelope`.
* This envelope is put onto the internal `_message_queue`.
* The background task picks up the envelope.
* `_process_send` gets the recipient agent instance (using `_get_agent`).
* It calls the recipient's `on_message` method.
* When `on_message` returns a result, `_process_send` sets the result on the `Future` object, which makes the original `await runtime.send_message(...)` call return the value.
**3. Publish/Subscribe (`publish_message` -> `_process_publish`)**
```mermaid
sequenceDiagram
participant Publisher as Publishing Agent/Code
participant Runtime as AgentRuntime
participant Queue as Internal Queue
participant SubManager as SubscriptionManager
participant Subscriber as Subscribed Agent
Publisher->>+Runtime: publish_message(msg, topic_id, ...)
Runtime->>+Queue: Put PublishMessageEnvelope(msg, topic_id)
Runtime-->>-Publisher: Return (None for publish)
Note over Queue, Runtime: Background task picks up envelope
Runtime->>Runtime: _process_publish(envelope)
Runtime->>+SubManager: get_subscribed_recipients(topic_id)
SubManager->>SubManager: Find matching subscriptions
SubManager->>SubManager: Map subscriptions to AgentIds
SubManager-->>-Runtime: Return list of recipient AgentIds
loop For each recipient AgentId
Runtime->>+Subscriber: _get_agent(recipient_id) (creates if needed)
Subscriber-->>-Runtime: Return Agent instance
Runtime->>+Subscriber: on_message(msg, context with topic_id)
Subscriber->>Subscriber: Process message...
Subscriber-->>-Runtime: Return (usually None for publish)
end
```
* `publish_message` wraps the message in a `PublishMessageEnvelope` and puts it on the queue.
* The background task picks it up.
* `_process_publish` asks the `SubscriptionManager` (`_subscription_manager`) for all `AgentId`s that are subscribed to the given `topic_id`.
* The `SubscriptionManager` checks its registered `Subscription` objects (`_subscriptions` list, added via `add_subscription`). For each `Subscription` where `is_match(topic_id)` is true, it calls `map_to_agent(topic_id)` to get the target `AgentId`.
* For each resulting `AgentId`, the runtime gets the agent instance and calls its `on_message` method, providing the `topic_id` in the `MessageContext`.
```python
# From: _runtime_impl_helpers.py (SubscriptionManager simplified)
class SubscriptionManager:
def __init__(self):
self._subscriptions: List[Subscription] = []
# Optimization cache can be added here
async def add_subscription(self, subscription: Subscription):
self._subscriptions.append(subscription)
# Clear cache if any
async def get_subscribed_recipients(self, topic: TopicId) -> List[AgentId]:
recipients = []
for sub in self._subscriptions:
if sub.is_match(topic):
recipients.append(sub.map_to_agent(topic))
return recipients
```
The `SubscriptionManager` simply iterates through registered subscriptions to find matches when a message is published.
## Next Steps
You now understand the `AgentRuntime` - the essential coordinator that brings Agents to life, manages their communication, and runs the entire show. It handles agent creation via factories, routes direct and published messages, and manages the system's lifecycle.
With the core concepts of `Agent`, `Messaging`, and `AgentRuntime` covered, we can start looking at more specialized building blocks. Next, we'll explore how agents can use external capabilities:
* [Chapter 4: Tool](04_tool.md): How to give agents tools (like functions or APIs) to perform specific actions beyond just processing messages.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,272 @@
# Chapter 4: Tool - Giving Agents Specific Capabilities
In the previous chapters, we learned about Agents as workers ([Chapter 1](01_agent.md)), how they can communicate directly or using announcements ([Chapter 2](02_messaging_system__topic___subscription_.md)), and the `AgentRuntime` that manages them ([Chapter 3](03_agentruntime.md)).
Agents can process messages and coordinate, but what if an agent needs to perform a very specific action, like looking up information online, running a piece of code, accessing a database, or even just finding out the current date? They need specialized *capabilities*.
This is where the concept of a **Tool** comes in.
## Motivation: Agents Need Skills!
Imagine our `Writer` agent from before. It receives facts and writes a draft. Now, let's say we want the `Writer` (or perhaps a smarter `Assistant` agent helping it) to always include the current date in the blog post title.
How does the agent get the current date? It doesn't inherently know it. It needs a specific *skill* or *tool* for that.
A `Tool` in AutoGen Core represents exactly this: a specific, well-defined capability that an Agent can use. Think of it like giving an employee (Agent) a specialized piece of equipment (Tool), like a calculator, a web browser, or a calendar lookup program.
## Key Concepts: Understanding Tools
Let's break down what defines a Tool:
1. **It's a Specific Capability:** A Tool performs one well-defined task. Examples:
* `search_web(query: str)`
* `run_python_code(code: str)`
* `get_stock_price(ticker: str)`
* `get_current_date()`
2. **It Has a Schema (The Manual):** This is crucial! For an Agent (especially one powered by a Large Language Model - LLM) to know *when* and *how* to use a tool, the tool needs a clear description or "manual". This is called the `ToolSchema`. It typically includes:
* **`name`**: A unique identifier for the tool (e.g., `get_current_date`).
* **`description`**: A clear explanation of what the tool does, which helps the LLM decide if this tool is appropriate for the current task (e.g., "Fetches the current date in YYYY-MM-DD format").
* **`parameters`**: Defines what inputs the tool needs. This is itself a schema (`ParametersSchema`) describing the input fields, their types, and which ones are required. For our `get_current_date` example, it might need no parameters. For `get_stock_price`, it would need a `ticker` parameter of type string.
```python
# From: tools/_base.py (Simplified Concept)
from typing import TypedDict, Dict, Any, Sequence, NotRequired
class ParametersSchema(TypedDict):
type: str # Usually "object"
properties: Dict[str, Any] # Defines input fields and their types
required: NotRequired[Sequence[str]] # List of required field names
class ToolSchema(TypedDict):
name: str
description: NotRequired[str]
parameters: NotRequired[ParametersSchema]
# 'strict' flag also possible (Chapter 5 related)
```
This schema allows an LLM to understand: "Ah, there's a tool called `get_current_date` that takes no inputs and gives me the current date. I should use that now!"
3. **It Can Be Executed:** Once an agent decides to use a tool (often based on the schema), there needs to be a mechanism to actually *run* the tool's underlying function and get the result.
## Use Case Example: Adding a `get_current_date` Tool
Let's equip an agent with the ability to find the current date.
**Goal:** Define a tool that gets the current date and show how it could be executed by a specialized agent.
**Step 1: Define the Python Function**
First, we need the actual Python code that performs the action.
```python
# File: get_date_function.py
import datetime
def get_current_date() -> str:
"""Fetches the current date as a string."""
today = datetime.date.today()
return today.isoformat() # Returns date like "2023-10-27"
# Test the function
print(f"Function output: {get_current_date()}")
```
This is a standard Python function. It takes no arguments and returns the date as a string.
**Step 2: Wrap it as a `FunctionTool`**
AutoGen Core provides a convenient way to turn a Python function like this into a `Tool` object using `FunctionTool`. It automatically inspects the function's signature (arguments and return type) and docstring to help build the `ToolSchema`.
```python
# File: create_date_tool.py
from autogen_core.tools import FunctionTool
from get_date_function import get_current_date # Import our function
# Create the Tool instance
# We provide the function and a clear description for the LLM
date_tool = FunctionTool(
func=get_current_date,
description="Use this tool to get the current date in YYYY-MM-DD format."
# Name defaults to function name 'get_current_date'
)
# Let's see what FunctionTool generated
print(f"Tool Name: {date_tool.name}")
print(f"Tool Description: {date_tool.description}")
# The schema defines inputs (none in this case)
# print(f"Tool Schema Parameters: {date_tool.schema['parameters']}")
# Output (simplified): {'type': 'object', 'properties': {}, 'required': []}
```
`FunctionTool` wraps our `get_current_date` function. It uses the function name as the tool name and the description we provided. It also correctly determines from the function signature that there are no input parameters (`properties: {}`).
**Step 3: How an Agent Might Request Tool Use**
Now we have a `date_tool`. How is it used? Typically, an LLM-powered agent (which we'll see more of in [Chapter 5: ChatCompletionClient](05_chatcompletionclient.md)) analyzes a request and decides a tool is needed. It then generates a request to *call* that tool, often using a specific message type like `FunctionCall`.
```python
# File: tool_call_request.py
from autogen_core import FunctionCall # Represents a request to call a tool
# Imagine an LLM agent decided to use the date tool.
# It constructs this message, providing the tool name and arguments (as JSON string).
date_call_request = FunctionCall(
id="call_date_001", # A unique ID for this specific call attempt
name="get_current_date", # Matches the Tool's name
arguments="{}" # An empty JSON object because no arguments are needed
)
print("FunctionCall message:", date_call_request)
# Output: FunctionCall(id='call_date_001', name='get_current_date', arguments='{}')
```
This `FunctionCall` message is like a work order: "Please execute the tool named `get_current_date` with these arguments."
**Step 4: The `ToolAgent` Executes the Tool**
Who receives this `FunctionCall` message? Usually, a specialized agent called `ToolAgent`. You create a `ToolAgent` and give it the list of tools it knows how to execute. When it receives a `FunctionCall`, it finds the matching tool and runs it.
```python
# File: tool_agent_example.py
import asyncio
from autogen_core.tool_agent import ToolAgent
from autogen_core.models import FunctionExecutionResult
from create_date_tool import date_tool # Import the tool we created
from tool_call_request import date_call_request # Import the request message
# Create an agent specifically designed to execute tools
tool_executor = ToolAgent(
description="I can execute tools like getting the date.",
tools=[date_tool] # Give it the list of tools it manages
)
# --- Simulation of Runtime delivering the message ---
# In a real app, the AgentRuntime (Chapter 3) would route the
# date_call_request message to this tool_executor agent.
# We simulate the call to its message handler here:
async def simulate_execution():
# Fake context (normally provided by runtime)
class MockContext: cancellation_token = None
ctx = MockContext()
print(f"ToolAgent received request: {date_call_request.name}")
result: FunctionExecutionResult = await tool_executor.handle_function_call(
message=date_call_request,
ctx=ctx
)
print(f"ToolAgent produced result: {result}")
asyncio.run(simulate_execution())
```
**Expected Output:**
```
ToolAgent received request: get_current_date
ToolAgent produced result: FunctionExecutionResult(content='2023-10-27', call_id='call_date_001', is_error=False, name='get_current_date') # Date will be current date
```
The `ToolAgent` received the `FunctionCall`, found the `date_tool` in its list, executed the underlying `get_current_date` function, and packaged the result (the date string) into a `FunctionExecutionResult` message. This result message can then be sent back to the agent that originally requested the tool use.
## Under the Hood: How Tool Execution Works
Let's visualize the typical flow when an LLM agent decides to use a tool managed by a `ToolAgent`.
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant LLMA as LLM Agent (Decides)
participant Caller as Caller Agent (Orchestrates)
participant ToolA as ToolAgent (Executes)
participant ToolFunc as Tool Function (e.g., get_current_date)
Note over LLMA: Analyzes conversation, decides tool needed.
LLMA->>Caller: Sends AssistantMessage containing FunctionCall(name='get_current_date', args='{}')
Note over Caller: Receives LLM response, sees FunctionCall.
Caller->>+ToolA: Uses runtime.send_message(message=FunctionCall, recipient=ToolAgent_ID)
Note over ToolA: Receives FunctionCall via on_message.
ToolA->>ToolA: Looks up 'get_current_date' in its internal list of Tools.
ToolA->>+ToolFunc: Calls tool.run_json(args={}) -> triggers get_current_date()
ToolFunc-->>-ToolA: Returns the result (e.g., "2023-10-27")
ToolA->>ToolA: Creates FunctionExecutionResult message with the content.
ToolA-->>-Caller: Returns FunctionExecutionResult via runtime messaging.
Note over Caller: Receives the tool result.
Caller->>LLMA: Sends FunctionExecutionResultMessage to LLM for next step.
Note over LLMA: Now knows the current date.
```
1. **Decision:** An LLM-powered agent decides a tool is needed based on the conversation and the available tools' descriptions. It generates a `FunctionCall`.
2. **Request:** A "Caller" agent (often the same LLM agent or a managing agent) sends this `FunctionCall` message to the dedicated `ToolAgent` using the `AgentRuntime`.
3. **Lookup:** The `ToolAgent` receives the message, extracts the tool `name` (`get_current_date`), and finds the corresponding `Tool` object (our `date_tool`) in the list it was configured with.
4. **Execution:** The `ToolAgent` calls the `run_json` method on the `Tool` object, passing the arguments from the `FunctionCall`. For a `FunctionTool`, `run_json` validates the arguments against the generated schema and then executes the original Python function (`get_current_date`).
5. **Result:** The Python function returns its result (the date string).
6. **Response:** The `ToolAgent` wraps this result string in a `FunctionExecutionResult` message, including the original `call_id`, and sends it back to the Caller agent.
7. **Continuation:** The Caller agent typically sends this result back to the LLM agent, allowing the conversation or task to continue with the new information.
**Code Glimpse:**
* **`Tool` Protocol (`tools/_base.py`):** Defines the basic contract any tool must fulfill. Key methods are `schema` (property returning the `ToolSchema`) and `run_json` (method to execute the tool with JSON-like arguments).
* **`BaseTool` (`tools/_base.py`):** An abstract class that helps implement the `Tool` protocol, especially using Pydantic models for defining arguments (`args_type`) and return values (`return_type`). It automatically generates the `parameters` part of the schema from the `args_type` model.
* **`FunctionTool` (`tools/_function_tool.py`):** Inherits from `BaseTool`. Its magic lies in automatically creating the `args_type` Pydantic model by inspecting the wrapped Python function's signature (`args_base_model_from_signature`). Its `run` method handles calling the original sync or async Python function.
```python
# Inside FunctionTool (Simplified Concept)
class FunctionTool(BaseTool[BaseModel, BaseModel]):
def __init__(self, func, description, ...):
self._func = func
self._signature = get_typed_signature(func)
# Automatically create Pydantic model for arguments
args_model = args_base_model_from_signature(...)
# Get return type from signature
return_type = self._signature.return_annotation
super().__init__(args_model, return_type, ...)
async def run(self, args: BaseModel, ...):
# Extract arguments from the 'args' model
kwargs = args.model_dump()
# Call the original Python function (sync or async)
result = await self._call_underlying_func(**kwargs)
return result # Must match the expected return_type
```
* **`ToolAgent` (`tool_agent/_tool_agent.py`):** A specialized `RoutedAgent`. It registers a handler specifically for `FunctionCall` messages.
```python
# Inside ToolAgent (Simplified Concept)
class ToolAgent(RoutedAgent):
def __init__(self, ..., tools: List[Tool]):
super().__init__(...)
self._tools = {tool.name: tool for tool in tools} # Store tools by name
@message_handler # Registers this for FunctionCall messages
async def handle_function_call(self, message: FunctionCall, ctx: MessageContext):
# Find the tool by name
tool = self._tools.get(message.name)
if tool is None:
# Handle error: Tool not found
raise ToolNotFoundException(...)
try:
# Parse arguments string into a dictionary
arguments = json.loads(message.arguments)
# Execute the tool's run_json method
result_obj = await tool.run_json(args=arguments, ...)
# Convert result object back to string if needed
result_str = tool.return_value_as_string(result_obj)
# Create the success result message
return FunctionExecutionResult(content=result_str, ...)
except Exception as e:
# Handle execution errors
return FunctionExecutionResult(content=f"Error: {e}", is_error=True, ...)
```
Its core logic is: find tool -> parse args -> run tool -> return result/error.
## Next Steps
You've learned how **Tools** provide specific capabilities to Agents, defined by a **Schema** that LLMs can understand. We saw how `FunctionTool` makes it easy to wrap existing Python functions and how `ToolAgent` acts as the executor for these tools.
This ability for agents to use tools is fundamental to building powerful and versatile AI systems that can interact with the real world or perform complex calculations.
Now that agents can use tools, we need to understand more about the agents that *decide* which tools to use, which often involves interacting with Large Language Models:
* [Chapter 5: ChatCompletionClient](05_chatcompletionclient.md): How agents interact with LLMs like GPT to generate responses or decide on actions (like calling a tool).
* [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md): How the history of the conversation, including tool calls and results, is managed when talking to an LLM.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,296 @@
# Chapter 5: ChatCompletionClient - Talking to the Brains
So far, we've learned about:
* [Agents](01_agent.md): The workers in our system.
* [Messaging](02_messaging_system__topic___subscription_.md): How agents communicate broadly.
* [AgentRuntime](03_agentruntime.md): The manager that runs the show.
* [Tools](04_tool.md): How agents get specific skills.
But how does an agent actually *think* or *generate text*? Many powerful agents rely on Large Language Models (LLMs) think of models like GPT-4, Claude, or Gemini as their "brains". How does an agent in AutoGen Core communicate with these external LLM services?
This is where the **`ChatCompletionClient`** comes in. It's the dedicated component for talking to LLMs.
## Motivation: Bridging the Gap to LLMs
Imagine you want to build an agent that can summarize long articles.
1. You give the agent an article (as a message).
2. The agent needs to send this article to an LLM (like GPT-4).
3. It also needs to tell the LLM: "Please summarize this."
4. The LLM processes the request and generates a summary.
5. The agent needs to receive this summary back from the LLM.
How does the agent handle the technical details of connecting to the LLM's specific API, formatting the request correctly, sending it over the internet, and understanding the response?
The `ChatCompletionClient` solves this! Think of it as the **standard phone line and translator** connecting your agent to the LLM service. You tell the client *what* to say (the conversation history and instructions), and it handles *how* to say it to the specific LLM and translates the LLM's reply back into a standard format.
## Key Concepts: Understanding the LLM Communicator
Let's break down the `ChatCompletionClient`:
1. **LLM Communication Bridge:** It's the primary way AutoGen agents interact with external LLM APIs (like OpenAI, Anthropic, Google Gemini, etc.). It hides the complexity of specific API calls.
2. **Standard Interface (`create` method):** It defines a common way to send requests and receive responses, regardless of the underlying LLM. The core method is `create`. You give it:
* `messages`: A list of messages representing the conversation history so far.
* Optional `tools`: A list of tools ([Chapter 4](04_tool.md)) the LLM might be able to use.
* Other parameters (like `json_output` hints, `cancellation_token`).
3. **Messages (`LLMMessage`):** The conversation history is passed as a sequence of specific message types defined in `autogen_core.models`:
* `SystemMessage`: Instructions for the LLM (e.g., "You are a helpful assistant.").
* `UserMessage`: Input from the user or another agent (e.g., the article text).
* `AssistantMessage`: Previous responses from the LLM (can include text or requests to call functions/tools).
* `FunctionExecutionResultMessage`: The results of executing a tool/function call.
4. **Tools (`ToolSchema`):** You can provide the schemas of available tools ([Chapter 4](04_tool.md)). The LLM might then respond not with text, but with a request to call one of these tools (`FunctionCall` inside an `AssistantMessage`).
5. **Response (`CreateResult`):** The `create` method returns a standard `CreateResult` object containing:
* `content`: The LLM's generated text or a list of `FunctionCall` requests.
* `finish_reason`: Why the LLM stopped generating (e.g., "stop", "length", "function_calls").
* `usage`: How many input (`prompt_tokens`) and output (`completion_tokens`) tokens were used.
* `cached`: Whether the response came from a cache.
6. **Token Tracking:** The client automatically tracks token usage (`prompt_tokens`, `completion_tokens`) for each call. You can query the total usage via methods like `total_usage()`. This is vital for monitoring costs, as most LLM APIs charge based on tokens.
## Use Case Example: Summarizing Text with an LLM
Let's build a simplified scenario where we use a `ChatCompletionClient` to ask an LLM to summarize text.
**Goal:** Send text to an LLM via a client and get a summary back.
**Step 1: Prepare the Input Messages**
We need to structure our request as a list of `LLMMessage` objects.
```python
# File: prepare_messages.py
from autogen_core.models import SystemMessage, UserMessage
# Instructions for the LLM
system_prompt = SystemMessage(
content="You are a helpful assistant designed to summarize text concisely."
)
# The text we want to summarize
article_text = """
AutoGen is a framework that enables the development of LLM applications using multiple agents
that can converse with each other to solve tasks. AutoGen agents are customizable,
conversable, and can seamlessly allow human participation. They can operate in various modes
that employ combinations of LLMs, human inputs, and tools.
"""
user_request = UserMessage(
content=f"Please summarize the following text in one sentence:\n\n{article_text}",
source="User" # Indicate who provided this input
)
# Combine into a list for the client
messages_to_send = [system_prompt, user_request]
print("Messages prepared:")
for msg in messages_to_send:
print(f"- {msg.type}: {msg.content[:50]}...") # Print first 50 chars
```
This code defines the instructions (`SystemMessage`) and the user's request (`UserMessage`) and puts them in a list, ready to be sent.
**Step 2: Use the ChatCompletionClient (Conceptual)**
Now, we need an instance of a `ChatCompletionClient`. In a real application, you'd configure a specific client (like `OpenAIChatCompletionClient` with your API key). For this example, let's imagine we have a pre-configured client called `llm_client`.
```python
# File: call_llm_client.py
import asyncio
from autogen_core.models import CreateResult, RequestUsage
# Assume 'messages_to_send' is from the previous step
# Assume 'llm_client' is a pre-configured ChatCompletionClient instance
# (e.g., llm_client = OpenAIChatCompletionClient(config=...))
async def get_summary(client, messages):
print("\nSending messages to LLM via ChatCompletionClient...")
try:
# The core call: send messages, get structured result
response: CreateResult = await client.create(
messages=messages,
# We aren't providing tools in this simple example
tools=[]
)
print("Received response:")
print(f"- Finish Reason: {response.finish_reason}")
print(f"- Content: {response.content}") # This should be the summary
print(f"- Usage (Tokens): Prompt={response.usage.prompt_tokens}, Completion={response.usage.completion_tokens}")
print(f"- Cached: {response.cached}")
# Also, check total usage tracked by the client
total_usage = client.total_usage()
print(f"\nClient Total Usage: Prompt={total_usage.prompt_tokens}, Completion={total_usage.completion_tokens}")
except Exception as e:
print(f"An error occurred: {e}")
# --- Placeholder for actual client ---
class MockChatCompletionClient: # Simulate a real client
_total_usage = RequestUsage(prompt_tokens=0, completion_tokens=0)
async def create(self, messages, tools=[], **kwargs) -> CreateResult:
# Simulate API call and response
prompt_len = sum(len(str(m.content)) for m in messages) // 4 # Rough token estimate
summary = "AutoGen is a multi-agent framework for developing LLM applications."
completion_len = len(summary) // 4 # Rough token estimate
usage = RequestUsage(prompt_tokens=prompt_len, completion_tokens=completion_len)
self._total_usage.prompt_tokens += usage.prompt_tokens
self._total_usage.completion_tokens += usage.completion_tokens
return CreateResult(
finish_reason="stop", content=summary, usage=usage, cached=False
)
def total_usage(self) -> RequestUsage: return self._total_usage
# Other required methods (count_tokens, model_info etc.) omitted for brevity
async def main():
from prepare_messages import messages_to_send # Get messages from previous step
mock_client = MockChatCompletionClient()
await get_summary(mock_client, messages_to_send)
# asyncio.run(main()) # If you run this, it uses the mock client
```
This code shows the essential `client.create(...)` call. We pass our `messages_to_send` and receive a `CreateResult`. We then print the summary (`response.content`) and the token usage reported for that specific call (`response.usage`) and the total tracked by the client (`client.total_usage()`).
**How an Agent Uses It:**
Typically, an agent's logic (e.g., inside its `on_message` handler) would:
1. Receive an incoming message (like the article to summarize).
2. Prepare the list of `LLMMessage` objects (including system prompts, history, and the new request).
3. Access a `ChatCompletionClient` instance (often provided during agent setup or accessed via its context).
4. Call `await client.create(...)`.
5. Process the `CreateResult` (e.g., extract the summary text, check for function calls if tools were provided).
6. Potentially send the result as a new message to another agent or return it.
## Under the Hood: How the Client Talks to the LLM
What happens when you call `await client.create(...)`?
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant Agent as Agent Logic
participant Client as ChatCompletionClient
participant Formatter as API Formatter
participant HTTP as HTTP Client
participant LLM_API as External LLM API
Agent->>+Client: create(messages, tools)
Client->>+Formatter: Format messages & tools for specific API (e.g., OpenAI JSON format)
Formatter-->>-Client: Return formatted request body
Client->>+HTTP: Send POST request to LLM API endpoint with formatted body & API Key
HTTP->>+LLM_API: Transmit request over network
LLM_API->>LLM_API: Process request, generate completion/function call
LLM_API-->>-HTTP: Return API response (e.g., JSON)
HTTP-->>-Client: Receive HTTP response
Client->>+Formatter: Parse API response (extract content, usage, finish_reason)
Formatter-->>-Client: Return parsed data
Client->>Client: Create standard CreateResult object
Client-->>-Agent: Return CreateResult
```
1. **Prepare:** The `ChatCompletionClient` takes the standard `LLMMessage` list and `ToolSchema` list.
2. **Format:** It translates these into the specific format required by the target LLM's API (e.g., the JSON structure expected by OpenAI's `/chat/completions` endpoint). This might involve renaming roles (like `SystemMessage` to `system`), formatting tool descriptions, etc.
3. **Request:** It uses an underlying HTTP client to send a network request (usually a POST request) to the LLM service's API endpoint, including the formatted data and authentication (like an API key).
4. **Wait & Receive:** It waits for the LLM service to process the request and send back a response over the network.
5. **Parse:** It receives the raw HTTP response (usually JSON) from the API.
6. **Standardize:** It parses this specific API response, extracting the generated text or function calls, token usage figures, finish reason, etc.
7. **Return:** It packages all this information into a standard `CreateResult` object and returns it to the calling agent code.
**Code Glimpse:**
* **`ChatCompletionClient` Protocol (`models/_model_client.py`):** This is the abstract base class (or protocol) defining the *contract* that all specific clients must follow.
```python
# From: models/_model_client.py (Simplified ABC)
from abc import ABC, abstractmethod
from typing import Sequence, Optional, Mapping, Any, AsyncGenerator, Union
from ._types import LLMMessage, CreateResult, RequestUsage
from ..tools import Tool, ToolSchema
from .. import CancellationToken
class ChatCompletionClient(ABC):
@abstractmethod
async def create(
self, messages: Sequence[LLMMessage], *,
tools: Sequence[Tool | ToolSchema] = [],
json_output: Optional[bool] = None, # Hint for JSON mode
extra_create_args: Mapping[str, Any] = {}, # API-specific args
cancellation_token: Optional[CancellationToken] = None,
) -> CreateResult: ... # The core method
@abstractmethod
def create_stream(
self, # Similar to create, but yields results incrementally
# ... parameters ...
) -> AsyncGenerator[Union[str, CreateResult], None]: ...
@abstractmethod
def total_usage(self) -> RequestUsage: ... # Get total tracked usage
@abstractmethod
def count_tokens(self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = []) -> int: ... # Estimate token count
# Other methods like close(), actual_usage(), remaining_tokens(), model_info...
```
Concrete classes like `OpenAIChatCompletionClient`, `AnthropicChatCompletionClient` etc., implement these methods using the specific libraries and API calls for each service.
* **`LLMMessage` Types (`models/_types.py`):** These define the structure of messages passed *to* the client.
```python
# From: models/_types.py (Simplified)
from pydantic import BaseModel
from typing import List, Union, Literal
from .. import FunctionCall # From Chapter 4 context
class SystemMessage(BaseModel):
content: str
type: Literal["SystemMessage"] = "SystemMessage"
class UserMessage(BaseModel):
content: Union[str, List[Union[str, Image]]] # Can include images!
source: str
type: Literal["UserMessage"] = "UserMessage"
class AssistantMessage(BaseModel):
content: Union[str, List[FunctionCall]] # Can be text or function calls
source: str
type: Literal["AssistantMessage"] = "AssistantMessage"
# FunctionExecutionResultMessage also exists here...
```
* **`CreateResult` (`models/_types.py`):** This defines the structure of the response *from* the client.
```python
# From: models/_types.py (Simplified)
from pydantic import BaseModel
from dataclasses import dataclass
from typing import Union, List, Optional
from .. import FunctionCall
@dataclass
class RequestUsage:
prompt_tokens: int
completion_tokens: int
FinishReasons = Literal["stop", "length", "function_calls", "content_filter", "unknown"]
class CreateResult(BaseModel):
finish_reason: FinishReasons
content: Union[str, List[FunctionCall]] # LLM output
usage: RequestUsage # Token usage for this call
cached: bool
# Optional fields like logprobs, thought...
```
Using these standard types ensures that agent logic can work consistently, even if you switch the underlying LLM service by using a different `ChatCompletionClient` implementation.
## Next Steps
You now understand the role of `ChatCompletionClient` as the crucial link between AutoGen agents and the powerful capabilities of Large Language Models. It provides a standard way to send conversational history and tool definitions, receive generated text or function call requests, and track token usage.
Managing the conversation history (`messages`) sent to the client is very important. How do you ensure the LLM has the right context, especially after tool calls have happened?
* [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md): Learn how AutoGen helps manage the conversation history, including adding tool call requests and their results, before sending it to the `ChatCompletionClient`.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,330 @@
# Chapter 6: ChatCompletionContext - Remembering the Conversation
In [Chapter 5: ChatCompletionClient](05_chatcompletionclient.md), we learned how agents talk to Large Language Models (LLMs) using a `ChatCompletionClient`. We saw that we need to send a list of `messages` (the conversation history) to the LLM so it knows the context.
But conversations can get very long! Imagine talking on the phone for an hour. Can you remember *every single word* that was said? Probably not. You remember the main points, the beginning, and what was said most recently. LLMs have a similar limitation they can only pay attention to a certain amount of text at once (called the "context window").
If we send the *entire* history of a very long chat, it might be too much for the LLM, lead to errors, be slow, or cost more money (since many LLMs charge based on the amount of text).
So, how do we smartly choose *which* parts of the conversation history to send? This is the problem that **`ChatCompletionContext`** solves.
## Motivation: Keeping LLM Conversations Focused
Let's say we have a helpful assistant agent chatting with a user:
1. **User:** "Hi! Can you tell me about AutoGen?"
2. **Assistant:** "Sure! AutoGen is a framework..." (provides details)
3. **User:** "Thanks! Now, can you draft an email to my team about our upcoming meeting?"
4. **Assistant:** "Okay, what's the meeting about?"
5. **User:** "It's about the project planning for Q3."
6. **Assistant:** (Needs to draft the email)
When the Assistant needs to draft the email (step 6), does it need the *exact* text from step 2 about what AutoGen is? Probably not. It definitely needs the instructions from step 3 and the topic from step 5. Maybe the initial greeting isn't super important either.
`ChatCompletionContext` acts like a **smart transcript editor**. Before sending the history to the LLM via the `ChatCompletionClient`, it reviews the full conversation log and prepares a shorter, focused version containing only the messages it thinks are most relevant for the LLM's next response.
## Key Concepts: Managing the Chat History
1. **The Full Transcript Holder:** A `ChatCompletionContext` object holds the *complete* list of messages (`LLMMessage` objects like `SystemMessage`, `UserMessage`, `AssistantMessage` from Chapter 5) that have occurred in a specific conversation thread. You add new messages using its `add_message` method.
2. **The Smart View Generator (`get_messages`):** The core job of `ChatCompletionContext` is done by its `get_messages` method. When called, it looks at the *full* transcript it holds, but returns only a *subset* of those messages based on its specific strategy. This subset is what you'll actually send to the `ChatCompletionClient`.
3. **Different Strategies for Remembering:** Because different situations require different focus, AutoGen Core provides several `ChatCompletionContext` implementations (strategies):
* **`UnboundedChatCompletionContext`:** The simplest (and sometimes riskiest!). It doesn't edit anything; `get_messages` just returns the *entire* history. Good for short chats, but can break with long ones.
* **`BufferedChatCompletionContext`:** Like remembering only the last few things someone said. It keeps the most recent `N` messages (where `N` is the `buffer_size` you set). Good for focusing on recent interactions.
* **`HeadAndTailChatCompletionContext`:** Tries to get the best of both worlds. It keeps the first few messages (the "head", maybe containing initial instructions) and the last few messages (the "tail", the recent context). It skips the messages in the middle.
## Use Case Example: Chatting with Different Memory Strategies
Let's simulate adding messages to different context managers and see what `get_messages` returns.
**Step 1: Define some messages**
```python
# File: define_chat_messages.py
from autogen_core.models import (
SystemMessage, UserMessage, AssistantMessage, LLMMessage
)
from typing import List
# The initial instruction for the assistant
system_msg = SystemMessage(content="You are a helpful assistant.")
# A sequence of user/assistant turns
chat_sequence: List[LLMMessage] = [
UserMessage(content="What is AutoGen?", source="User"),
AssistantMessage(content="AutoGen is a multi-agent framework...", source="Agent"),
UserMessage(content="What can it do?", source="User"),
AssistantMessage(content="It can build complex LLM apps.", source="Agent"),
UserMessage(content="Thanks!", source="User")
]
# Combine system message and the chat sequence
full_history: List[LLMMessage] = [system_msg] + chat_sequence
print(f"Total messages in full history: {len(full_history)}")
# Output: Total messages in full history: 6
```
We have a full history of 6 messages (1 system + 5 chat turns).
**Step 2: Use `UnboundedChatCompletionContext`**
This context keeps everything.
```python
# File: use_unbounded_context.py
import asyncio
from define_chat_messages import full_history
from autogen_core.model_context import UnboundedChatCompletionContext
async def main():
# Create context and add all messages
context = UnboundedChatCompletionContext()
for msg in full_history:
await context.add_message(msg)
# Get the messages to send to the LLM
messages_for_llm = await context.get_messages()
print(f"--- Unbounded Context ({len(messages_for_llm)} messages) ---")
for i, msg in enumerate(messages_for_llm):
print(f"{i+1}. [{msg.type}]: {msg.content[:30]}...")
# asyncio.run(main()) # If run
```
**Expected Output (Unbounded):**
```
--- Unbounded Context (6 messages) ---
1. [SystemMessage]: You are a helpful assistant....
2. [UserMessage]: What is AutoGen?...
3. [AssistantMessage]: AutoGen is a multi-agent fram...
4. [UserMessage]: What can it do?...
5. [AssistantMessage]: It can build complex LLM apps...
6. [UserMessage]: Thanks!...
```
It returns all 6 messages, exactly as added.
**Step 3: Use `BufferedChatCompletionContext`**
Let's keep only the last 3 messages.
```python
# File: use_buffered_context.py
import asyncio
from define_chat_messages import full_history
from autogen_core.model_context import BufferedChatCompletionContext
async def main():
# Keep only the last 3 messages
context = BufferedChatCompletionContext(buffer_size=3)
for msg in full_history:
await context.add_message(msg)
messages_for_llm = await context.get_messages()
print(f"--- Buffered Context (buffer=3, {len(messages_for_llm)} messages) ---")
for i, msg in enumerate(messages_for_llm):
print(f"{i+1}. [{msg.type}]: {msg.content[:30]}...")
# asyncio.run(main()) # If run
```
**Expected Output (Buffered):**
```
--- Buffered Context (buffer=3, 3 messages) ---
1. [UserMessage]: What can it do?...
2. [AssistantMessage]: It can build complex LLM apps...
3. [UserMessage]: Thanks!...
```
It only returns the last 3 messages from the full history. The system message and the first chat turn are omitted.
**Step 4: Use `HeadAndTailChatCompletionContext`**
Let's keep the first message (head=1) and the last two messages (tail=2).
```python
# File: use_head_tail_context.py
import asyncio
from define_chat_messages import full_history
from autogen_core.model_context import HeadAndTailChatCompletionContext
async def main():
# Keep first 1 and last 2 messages
context = HeadAndTailChatCompletionContext(head_size=1, tail_size=2)
for msg in full_history:
await context.add_message(msg)
messages_for_llm = await context.get_messages()
print(f"--- Head & Tail Context (h=1, t=2, {len(messages_for_llm)} messages) ---")
for i, msg in enumerate(messages_for_llm):
print(f"{i+1}. [{msg.type}]: {msg.content[:30]}...")
# asyncio.run(main()) # If run
```
**Expected Output (Head & Tail):**
```
--- Head & Tail Context (h=1, t=2, 4 messages) ---
1. [SystemMessage]: You are a helpful assistant....
2. [UserMessage]: Skipped 3 messages....
3. [AssistantMessage]: It can build complex LLM apps...
4. [UserMessage]: Thanks!...
```
It keeps the very first message (`SystemMessage`), then inserts a placeholder telling the LLM that some messages were skipped, and finally includes the last two messages. This preserves the initial instruction and the most recent context.
**Which one to choose?** It depends on your agent's task!
* Simple Q&A? `Buffered` might be fine.
* Following complex initial instructions? `HeadAndTail` or even `Unbounded` (if short) might be better.
## Under the Hood: How Context is Managed
The core idea is defined by the `ChatCompletionContext` abstract base class.
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant Agent as Agent Logic
participant Context as ChatCompletionContext
participant FullHistory as Internal Message List
Agent->>+Context: add_message(newMessage)
Context->>+FullHistory: Append newMessage to list
FullHistory-->>-Context: List updated
Context-->>-Agent: Done
Agent->>+Context: get_messages()
Context->>+FullHistory: Read the full list
FullHistory-->>-Context: Return full list
Context->>Context: Apply Strategy (e.g., slice list for Buffered/HeadTail)
Context-->>-Agent: Return selected list of messages
```
1. **Adding:** When `add_message(message)` is called, the context simply appends the `message` to its internal list (`self._messages`).
2. **Getting:** When `get_messages()` is called:
* The context accesses its internal `self._messages` list.
* The specific implementation (`Unbounded`, `Buffered`, `HeadAndTail`) applies its logic to select which messages to return.
* It returns the selected list.
**Code Glimpse:**
* **Base Class (`_chat_completion_context.py`):** Defines the structure and common methods.
```python
# From: model_context/_chat_completion_context.py (Simplified)
from abc import ABC, abstractmethod
from typing import List
from ..models import LLMMessage
class ChatCompletionContext(ABC):
component_type = "chat_completion_context" # Identifies this as a component type
def __init__(self, initial_messages: List[LLMMessage] | None = None) -> None:
# Holds the COMPLETE history
self._messages: List[LLMMessage] = initial_messages or []
async def add_message(self, message: LLMMessage) -> None:
"""Add a message to the full context."""
self._messages.append(message)
@abstractmethod
async def get_messages(self) -> List[LLMMessage]:
"""Get the subset of messages based on the strategy."""
# Each subclass MUST implement this logic
...
# Other methods like clear(), save_state(), load_state() exist too
```
The base class handles storing messages; subclasses define *how* to retrieve them.
* **Unbounded (`_unbounded_chat_completion_context.py`):** The simplest implementation.
```python
# From: model_context/_unbounded_chat_completion_context.py (Simplified)
from typing import List
from ._chat_completion_context import ChatCompletionContext
from ..models import LLMMessage
class UnboundedChatCompletionContext(ChatCompletionContext):
async def get_messages(self) -> List[LLMMessage]:
"""Returns all messages."""
return self._messages # Just return the whole internal list
```
* **Buffered (`_buffered_chat_completion_context.py`):** Uses slicing to get the end of the list.
```python
# From: model_context/_buffered_chat_completion_context.py (Simplified)
from typing import List
from ._chat_completion_context import ChatCompletionContext
from ..models import LLMMessage, FunctionExecutionResultMessage
class BufferedChatCompletionContext(ChatCompletionContext):
def __init__(self, buffer_size: int, ...):
super().__init__(...)
self._buffer_size = buffer_size
async def get_messages(self) -> List[LLMMessage]:
"""Get at most `buffer_size` recent messages."""
# Slice the list to get the last 'buffer_size' items
messages = self._messages[-self._buffer_size :]
# Special case: Avoid starting with a function result message
if messages and isinstance(messages[0], FunctionExecutionResultMessage):
messages = messages[1:]
return messages
```
* **Head and Tail (`_head_and_tail_chat_completion_context.py`):** Combines slices from the beginning and end.
```python
# From: model_context/_head_and_tail_chat_completion_context.py (Simplified)
from typing import List
from ._chat_completion_context import ChatCompletionContext
from ..models import LLMMessage, UserMessage
class HeadAndTailChatCompletionContext(ChatCompletionContext):
def __init__(self, head_size: int, tail_size: int, ...):
super().__init__(...)
self._head_size = head_size
self._tail_size = tail_size
async def get_messages(self) -> List[LLMMessage]:
head = self._messages[: self._head_size] # First 'head_size' items
tail = self._messages[-self._tail_size :] # Last 'tail_size' items
num_skipped = len(self._messages) - len(head) - len(tail)
if num_skipped <= 0: # If no overlap or gap
return self._messages
else: # If messages were skipped
placeholder = [UserMessage(content=f"Skipped {num_skipped} messages.", source="System")]
# Combine head + placeholder + tail
return head + placeholder + tail
```
These implementations provide different ways to manage the context window effectively.
## Putting it Together with ChatCompletionClient
How does an agent use `ChatCompletionContext` with the `ChatCompletionClient` from Chapter 5?
1. An agent has an instance of a `ChatCompletionContext` (e.g., `BufferedChatCompletionContext`) to store its conversation history.
2. When the agent receives a new message (e.g., a `UserMessage`), it calls `await context.add_message(new_user_message)`.
3. To prepare for calling the LLM, the agent calls `messages_to_send = await context.get_messages()`. This gets the strategically selected subset of the history.
4. The agent then passes this list to the `ChatCompletionClient`: `response = await llm_client.create(messages=messages_to_send, ...)`.
5. When the LLM replies (e.g., with an `AssistantMessage`), the agent adds it back to the context: `await context.add_message(llm_response_message)`.
This loop ensures that the history is continuously updated and intelligently trimmed before each call to the LLM.
## Next Steps
You've learned how `ChatCompletionContext` helps manage the conversation history sent to LLMs, preventing context window overflows and keeping the interaction focused using different strategies (`Unbounded`, `Buffered`, `HeadAndTail`).
This context management is a specific form of **memory**. Agents might need to remember things beyond just the chat history. How do they store general information, state, or knowledge over time?
* [Chapter 7: Memory](07_memory.md): Explore the broader concept of Memory in AutoGen Core, which provides more general ways for agents to store and retrieve information.
* [Chapter 8: Component](08_component.md): Understand how `ChatCompletionContext` fits into the general `Component` model, allowing configuration and integration within the AutoGen system.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,323 @@
# Chapter 7: Memory - The Agent's Notebook
In [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md), we saw how agents manage the *short-term* history of a single conversation before talking to an LLM. It's like remembering what was just said in the last few minutes.
But what if an agent needs to remember things for much longer, across *multiple* conversations or tasks? For example, imagine an assistant agent that learns your preferences:
* You tell it: "Please always write emails in a formal style for me."
* Weeks later, you ask it to draft a new email.
How does it remember that preference? The short-term `ChatCompletionContext` might have forgotten the earlier instruction, especially if using a strategy like `BufferedChatCompletionContext`. The agent needs a **long-term memory**.
This is where the **`Memory`** abstraction comes in. Think of it as the agent's **long-term notebook or database**. While `ChatCompletionContext` is the scratchpad for the current chat, `Memory` holds persistent information the agent can add to or look up later.
## Motivation: Remembering Across Conversations
Our goal is to give an agent the ability to store a piece of information (like a user preference) and retrieve it later to influence its behavior, even in a completely new conversation. `Memory` provides the mechanism for this long-term storage and retrieval.
## Key Concepts: How the Notebook Works
1. **What it Stores (`MemoryContent`):** Agents can store various types of information in their memory. This could be:
* Plain text notes (`text/plain`)
* Structured data like JSON (`application/json`)
* Even images (`image/*`)
Each piece of information is wrapped in a `MemoryContent` object, which includes the data itself, its type (`mime_type`), and optional descriptive `metadata`.
```python
# From: memory/_base_memory.py (Simplified Concept)
from pydantic import BaseModel
from typing import Any, Dict, Union
# Represents one entry in the memory notebook
class MemoryContent(BaseModel):
content: Union[str, bytes, Dict[str, Any]] # The actual data
mime_type: str # What kind of data (e.g., "text/plain")
metadata: Dict[str, Any] | None = None # Extra info (optional)
```
This standard format helps manage different kinds of memories.
2. **Adding to Memory (`add`):** When an agent learns something important it wants to remember long-term (like the user's preferred style), it uses the `memory.add(content)` method. This is like writing a new entry in the notebook.
3. **Querying Memory (`query`):** When an agent needs to recall information, it can use `memory.query(query_text)`. This is like searching the notebook for relevant entries. How the search works depends on the specific memory implementation (it could be a simple text match, or a sophisticated vector search in more advanced memories).
4. **Updating Chat Context (`update_context`):** This is a crucial link! Before an agent talks to the LLM (using the `ChatCompletionClient` from [Chapter 5](05_chatcompletionclient.md)), it can use `memory.update_context(chat_context)` method. This method:
* Looks at the current conversation (`chat_context`).
* Queries the long-term memory (`Memory`) for relevant information.
* Injects the retrieved memories *into* the `chat_context`, often as a `SystemMessage`.
This way, the LLM gets the benefit of the long-term memory *in addition* to the short-term conversation history, right before generating its response.
5. **Different Memory Implementations:** Just like there are different `ChatCompletionContext` strategies, there can be different `Memory` implementations:
* `ListMemory`: A very simple memory that stores everything in a Python list (like a simple chronological notebook).
* *Future Possibilities*: More advanced implementations could use databases or vector stores for more efficient storage and retrieval of vast amounts of information.
## Use Case Example: Remembering User Preferences with `ListMemory`
Let's implement our user preference use case using the simple `ListMemory`.
**Goal:**
1. Create a `ListMemory`.
2. Add a user preference ("formal style") to it.
3. Start a *new* chat context.
4. Use `update_context` to inject the preference into the new chat context.
5. Show how the chat context looks *before* being sent to the LLM.
**Step 1: Create the Memory**
We'll use `ListMemory`, the simplest implementation provided by AutoGen Core.
```python
# File: create_list_memory.py
from autogen_core.memory import ListMemory
# Create a simple list-based memory instance
user_prefs_memory = ListMemory(name="user_preferences")
print(f"Created memory: {user_prefs_memory.name}")
print(f"Initial content: {user_prefs_memory.content}")
# Output:
# Created memory: user_preferences
# Initial content: []
```
We have an empty memory notebook named "user_preferences".
**Step 2: Add the Preference**
Let's add the user's preference as a piece of text memory.
```python
# File: add_preference.py
import asyncio
from autogen_core.memory import MemoryContent
# Assume user_prefs_memory exists from the previous step
# Define the preference as MemoryContent
preference = MemoryContent(
content="User prefers all communication to be written in a formal style.",
mime_type="text/plain", # It's just text
metadata={"source": "user_instruction_conversation_1"} # Optional info
)
async def add_to_memory():
# Add the content to our memory instance
await user_prefs_memory.add(preference)
print(f"Memory content after adding: {user_prefs_memory.content}")
asyncio.run(add_to_memory())
# Output (will show the MemoryContent object):
# Memory content after adding: [MemoryContent(content='User prefers...', mime_type='text/plain', metadata={'source': '...'})]
```
We've successfully written the preference into our `ListMemory` notebook.
**Step 3: Start a New Chat Context**
Imagine time passes, and the user starts a new conversation asking for an email draft. We create a fresh `ChatCompletionContext`.
```python
# File: start_new_chat.py
from autogen_core.model_context import UnboundedChatCompletionContext
from autogen_core.models import UserMessage
# Start a new, empty chat context for a new task
new_chat_context = UnboundedChatCompletionContext()
# Add the user's new request
new_request = UserMessage(content="Draft an email to the team about the Q3 results.", source="User")
# await new_chat_context.add_message(new_request) # In a real app, add the request
print("Created a new, empty chat context.")
# Output: Created a new, empty chat context.
```
This context currently *doesn't* know about the "formal style" preference stored in our long-term memory.
**Step 4: Inject Memory into Chat Context**
Before sending the `new_chat_context` to the LLM, we use `update_context` to bring in relevant long-term memories.
```python
# File: update_chat_with_memory.py
import asyncio
# Assume user_prefs_memory exists (with the preference added)
# Assume new_chat_context exists (empty or with just the new request)
# Assume new_request exists
async def main():
# --- This is where Memory connects to Chat Context ---
print("Updating chat context with memory...")
update_result = await user_prefs_memory.update_context(new_chat_context)
print(f"Memories injected: {len(update_result.memories.results)}")
# Now let's add the actual user request for this task
await new_chat_context.add_message(new_request)
# See what messages are now in the context
messages_for_llm = await new_chat_context.get_messages()
print("\nMessages to be sent to LLM:")
for msg in messages_for_llm:
print(f"- [{msg.type}]: {msg.content}")
asyncio.run(main())
```
**Expected Output:**
```
Updating chat context with memory...
Memories injected: 1
Messages to be sent to LLM:
- [SystemMessage]:
Relevant memory content (in chronological order):
1. User prefers all communication to be written in a formal style.
- [UserMessage]: Draft an email to the team about the Q3 results.
```
Look! The `ListMemory.update_context` method automatically queried the memory (in this simple case, it just takes *all* entries) and added a `SystemMessage` to the `new_chat_context`. This message explicitly tells the LLM about the stored preference *before* it sees the user's request to draft the email.
**Step 5: (Conceptual) Sending to LLM**
Now, if we were to send `messages_for_llm` to the `ChatCompletionClient` (Chapter 5):
```python
# Conceptual code - Requires a configured client
# response = await llm_client.create(messages=messages_for_llm)
```
The LLM would receive both the instruction about the formal style preference (from Memory) and the request to draft the email. It's much more likely to follow the preference now!
**Step 6: Direct Query (Optional)**
We can also directly query the memory if needed, without involving a chat context.
```python
# File: query_memory.py
import asyncio
# Assume user_prefs_memory exists
async def main():
# Query the memory (ListMemory returns all items regardless of query text)
query_result = await user_prefs_memory.query("style preference")
print("\nDirect query result:")
for item in query_result.results:
print(f"- Content: {item.content}, Type: {item.mime_type}")
asyncio.run(main())
# Output:
# Direct query result:
# - Content: User prefers all communication to be written in a formal style., Type: text/plain
```
This shows how an agent could specifically look things up in its notebook.
## Under the Hood: How `ListMemory` Injects Context
Let's trace the `update_context` call for `ListMemory`.
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant AgentLogic as Agent Logic
participant ListMem as ListMemory
participant InternalList as Memory's Internal List
participant ChatCtx as ChatCompletionContext
AgentLogic->>+ListMem: update_context(chat_context)
ListMem->>+InternalList: Get all stored MemoryContent items
InternalList-->>-ListMem: Return list of [pref_content]
alt Memory list is NOT empty
ListMem->>ListMem: Format memories into a single string (e.g., "1. pref_content")
ListMem->>ListMem: Create SystemMessage with formatted string
ListMem->>+ChatCtx: add_message(SystemMessage)
ChatCtx-->>-ListMem: Context updated
end
ListMem->>ListMem: Create UpdateContextResult(memories=[pref_content])
ListMem-->>-AgentLogic: Return UpdateContextResult
```
1. The agent calls `user_prefs_memory.update_context(new_chat_context)`.
2. The `ListMemory` instance accesses its internal `_contents` list.
3. It checks if the list is empty. If not:
4. It iterates through the `MemoryContent` items in the list.
5. It formats them into a numbered string (like "Relevant memory content...\n1. Item 1\n2. Item 2...").
6. It creates a single `SystemMessage` containing this formatted string.
7. It calls `new_chat_context.add_message()` to add this `SystemMessage` to the chat history that will be sent to the LLM.
8. It returns an `UpdateContextResult` containing the list of memories it just processed.
**Code Glimpse:**
* **`Memory` Protocol (`memory/_base_memory.py`):** Defines the required methods for any memory implementation.
```python
# From: memory/_base_memory.py (Simplified ABC)
from abc import ABC, abstractmethod
# ... other imports: MemoryContent, MemoryQueryResult, UpdateContextResult, ChatCompletionContext
class Memory(ABC):
component_type = "memory"
@abstractmethod
async def update_context(self, model_context: ChatCompletionContext) -> UpdateContextResult: ...
@abstractmethod
async def query(self, query: str | MemoryContent, ...) -> MemoryQueryResult: ...
@abstractmethod
async def add(self, content: MemoryContent, ...) -> None: ...
@abstractmethod
async def clear(self) -> None: ...
@abstractmethod
async def close(self) -> None: ...
```
Any class wanting to act as Memory must provide these methods.
* **`ListMemory` Implementation (`memory/_list_memory.py`):**
```python
# From: memory/_list_memory.py (Simplified)
from typing import List
# ... other imports: Memory, MemoryContent, ..., SystemMessage, ChatCompletionContext
class ListMemory(Memory):
def __init__(self, ..., memory_contents: List[MemoryContent] | None = None):
# Stores memory items in a simple list
self._contents: List[MemoryContent] = memory_contents or []
async def add(self, content: MemoryContent, ...) -> None:
"""Add new content to the internal list."""
self._contents.append(content)
async def query(self, query: str | MemoryContent = "", ...) -> MemoryQueryResult:
"""Return all memories, ignoring the query."""
# Simple implementation: just return everything
return MemoryQueryResult(results=self._contents)
async def update_context(self, model_context: ChatCompletionContext) -> UpdateContextResult:
"""Add all memories as a SystemMessage to the chat context."""
if not self._contents: # Do nothing if memory is empty
return UpdateContextResult(memories=MemoryQueryResult(results=[]))
# Format all memories into a numbered list string
memory_strings = [f"{i}. {str(mem.content)}" for i, mem in enumerate(self._contents, 1)]
memory_context_str = "Relevant memory content...\n" + "\n".join(memory_strings) + "\n"
# Add this string as a SystemMessage to the provided chat context
await model_context.add_message(SystemMessage(content=memory_context_str))
# Return info about which memories were added
return UpdateContextResult(memories=MemoryQueryResult(results=self._contents))
# ... clear(), close(), config methods ...
```
This shows the straightforward logic of `ListMemory`: store in a list, retrieve the whole list, and inject the whole list as a single system message into the chat context. More complex memories might use smarter retrieval (e.g., based on the `query` in `query()` or the last message in `update_context`) and inject memories differently.
## Next Steps
You've learned about `Memory`, AutoGen Core's mechanism for giving agents long-term recall beyond the immediate conversation (`ChatCompletionContext`). We saw how `MemoryContent` holds information, `add` stores it, `query` retrieves it, and `update_context` injects relevant memories into the LLM's working context. We explored the simple `ListMemory` as a basic example.
Memory systems are crucial for agents that learn, adapt, or need to maintain state across interactions.
This concludes our deep dive into the core abstractions of AutoGen Core! We've covered Agents, Messaging, Runtime, Tools, LLM Clients, Chat Context, and now Memory. There's one final concept that ties many of these together from a configuration perspective:
* [Chapter 8: Component](08_component.md): Understand the general `Component` model in AutoGen Core, how it allows pieces like `Memory`, `ChatCompletionContext`, and `ChatCompletionClient` to be configured and managed consistently.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,359 @@
# Chapter 8: Component - The Standardized Building Blocks
Welcome to Chapter 8! In our journey so far, we've met several key players in AutoGen Core:
* [Agents](01_agent.md): The workers.
* [Messaging System](02_messaging_system__topic___subscription_.md): How they communicate.
* [AgentRuntime](03_agentruntime.md): The manager.
* [Tools](04_tool.md): Their special skills.
* [ChatCompletionClient](05_chatcompletionclient.md): How they talk to LLMs.
* [ChatCompletionContext](06_chatcompletioncontext.md): How they remember recent chat history.
* [Memory](07_memory.md): How they remember things long-term.
Now, imagine you've built a fantastic agent system using these parts. You've configured a specific `ChatCompletionClient` to use OpenAI's `gpt-4o` model, and you've set up a `ListMemory` (from Chapter 7) to store user preferences. How do you save this exact setup so you can easily recreate it later, or share it with a friend? And what if you later want to swap out the `gpt-4o` client for a different one, like Anthropic's Claude, without rewriting your agent's core logic?
This is where the **`Component`** concept comes in. It provides a standard way to define, configure, save, and load these reusable building blocks.
## Motivation: Making Setups Portable and Swappable
Think of the parts we've used so far `ChatCompletionClient`, `Memory`, `Tool` like specialized **Lego bricks**. Each brick has a specific function (connecting to an LLM, remembering things, performing an action).
Wouldn't it be great if:
1. Each Lego brick had a standard way to describe its properties (like "Red 2x4 Brick")?
2. You could easily save the description of all the bricks used in your creation (your agent system)?
3. Someone else could take that description and automatically rebuild your exact creation?
4. You could easily swap a "Red 2x4 Brick" for a "Blue 2x4 Brick" without having to rebuild everything around it?
The `Component` abstraction in AutoGen Core provides exactly this! It makes your building blocks **configurable**, **savable**, **loadable**, and **swappable**.
## Key Concepts: Understanding Components
Let's break down what makes the Component system work:
1. **Component:** A class (like `ListMemory` or `OpenAIChatCompletionClient`) that is designed to be a standard, reusable building block. It performs a specific role within the AutoGen ecosystem. Many core classes inherit from `Component` or related base classes.
2. **Configuration (`Config`):** Every Component has specific settings. For example, an `OpenAIChatCompletionClient` needs an API key and a model name. A `ListMemory` might have a name. These settings are defined in a standard way, usually using a Pydantic `BaseModel` specific to that component type. This `Config` acts like the "specification sheet" for the component instance.
3. **Saving Settings (`_to_config` method):** A Component instance knows how to generate its *current* configuration. It has an internal method, `_to_config()`, that returns a `Config` object representing its settings. This is like asking a configured Lego brick, "What color and size are you?"
4. **Loading Settings (`_from_config` class method):** A Component *class* knows how to create a *new* instance of itself from a given configuration. It has a class method, `_from_config(config)`, that takes a `Config` object and builds a new, configured component instance. This is like having instructions: "Build a brick with this color and size."
5. **`ComponentModel` (The Box):** This is the standard package format used to save and load components. It's like the label and instructions on the Lego box. A `ComponentModel` contains:
* `provider`: A string telling AutoGen *which* Python class to use (e.g., `"autogen_core.memory.ListMemory"`).
* `config`: A dictionary holding the specific settings for this instance (the output of `_to_config()`).
* `component_type`: The general role of the component (e.g., `"memory"`, `"model"`, `"tool"`).
* Other metadata like `version`, `description`, `label`.
```python
# From: _component_config.py (Conceptual Structure)
from pydantic import BaseModel
from typing import Dict, Any
class ComponentModel(BaseModel):
provider: str # Path to the class (e.g., "autogen_core.memory.ListMemory")
config: Dict[str, Any] # The specific settings for this instance
component_type: str | None = None # Role (e.g., "memory")
# ... other fields like version, description, label ...
```
This `ComponentModel` is what you typically save to a file (often as JSON or YAML).
## Use Case Example: Saving and Loading `ListMemory`
Let's see how this works with the `ListMemory` we used in [Chapter 7: Memory](07_memory.md).
**Goal:**
1. Create a `ListMemory` instance.
2. Save its configuration using the Component system (`dump_component`).
3. Load that configuration to create a *new*, identical `ListMemory` instance (`load_component`).
**Step 1: Create and Configure a `ListMemory`**
First, let's make a memory component. `ListMemory` is already designed as a Component.
```python
# File: create_memory_component.py
import asyncio
from autogen_core.memory import ListMemory, MemoryContent
# Create an instance of ListMemory
my_memory = ListMemory(name="user_prefs_v1")
# Add some content (from Chapter 7 example)
async def add_content():
pref = MemoryContent(content="Use formal style", mime_type="text/plain")
await my_memory.add(pref)
print(f"Created memory '{my_memory.name}' with content: {my_memory.content}")
asyncio.run(add_content())
# Output: Created memory 'user_prefs_v1' with content: [MemoryContent(content='Use formal style', mime_type='text/plain', metadata=None)]
```
We have our configured `my_memory` instance.
**Step 2: Save the Configuration (`dump_component`)**
Now, let's ask this component instance to describe itself by creating a `ComponentModel`.
```python
# File: save_memory_config.py
# Assume 'my_memory' exists from the previous step
# Dump the component's configuration into a ComponentModel
memory_model = my_memory.dump_component()
# Let's print it (converting to dict for readability)
print("Saved ComponentModel:")
print(memory_model.model_dump_json(indent=2))
```
**Expected Output:**
```json
Saved ComponentModel:
{
"provider": "autogen_core.memory.ListMemory",
"component_type": "memory",
"version": 1,
"component_version": 1,
"description": "ListMemory stores memory content in a simple list.",
"label": "ListMemory",
"config": {
"name": "user_prefs_v1",
"memory_contents": [
{
"content": "Use formal style",
"mime_type": "text/plain",
"metadata": null
}
]
}
}
```
Look at the output! `dump_component` created a `ComponentModel` that contains:
* `provider`: Exactly which class to use (`autogen_core.memory.ListMemory`).
* `config`: The specific settings, including the `name` and even the `memory_contents` we added!
* `component_type`: Its role is `"memory"`.
* Other useful info like description and version.
You could save this JSON structure to a file (`my_memory_config.json`).
**Step 3: Load the Configuration (`load_component`)**
Now, imagine you're starting a new script or sharing the config file. You can load this `ComponentModel` to recreate the memory instance.
```python
# File: load_memory_config.py
from autogen_core import ComponentModel
from autogen_core.memory import ListMemory # Need the class for type hint/loading
# Assume 'memory_model' is the ComponentModel we just created
# (or loaded from a file)
print(f"Loading component from ComponentModel (Provider: {memory_model.provider})...")
# Use the ComponentLoader mechanism (available on Component classes)
# to load the model. We specify the expected type (ListMemory).
loaded_memory: ListMemory = ListMemory.load_component(memory_model)
print(f"Successfully loaded memory!")
print(f"- Name: {loaded_memory.name}")
print(f"- Content: {loaded_memory.content}")
```
**Expected Output:**
```
Loading component from ComponentModel (Provider: autogen_core.memory.ListMemory)...
Successfully loaded memory!
- Name: user_prefs_v1
- Content: [MemoryContent(content='Use formal style', mime_type='text/plain', metadata=None)]
```
Success! `load_component` read the `ComponentModel`, found the right class (`ListMemory`), used its `_from_config` method with the saved `config` data, and created a brand new `loaded_memory` instance that is identical to our original `my_memory`.
**Benefits Shown:**
* **Reproducibility:** We saved the exact state (including content!) and loaded it perfectly.
* **Configuration:** We could easily save this to a JSON/YAML file and manage it outside our Python code.
* **Modularity (Conceptual):** If `ListMemory` and `VectorDBMemory` were both Components of type "memory", we could potentially load either one from a configuration file just by changing the `provider` and `config` in the file, without altering the agent code that *uses* the memory component (assuming the agent interacts via the standard `Memory` interface from Chapter 7).
## Under the Hood: How Saving and Loading Work
Let's peek behind the curtain.
**Saving (`dump_component`) Flow:**
```mermaid
sequenceDiagram
participant User
participant MyMemory as my_memory (ListMemory instance)
participant ListMemConfig as ListMemoryConfig (Pydantic Model)
participant CompModel as ComponentModel
User->>+MyMemory: dump_component()
MyMemory->>MyMemory: Calls internal self._to_config()
MyMemory->>+ListMemConfig: Creates Config object (name="...", contents=[...])
ListMemConfig-->>-MyMemory: Returns Config object
MyMemory->>MyMemory: Gets provider string ("autogen_core.memory.ListMemory")
MyMemory->>MyMemory: Gets component_type ("memory"), version, etc.
MyMemory->>+CompModel: Creates ComponentModel(provider=..., config=config_dict, ...)
CompModel-->>-MyMemory: Returns ComponentModel instance
MyMemory-->>-User: Returns ComponentModel instance
```
1. You call `my_memory.dump_component()`.
2. It calls its own `_to_config()` method. For `ListMemory`, this gathers the `name` and current `_contents`.
3. `_to_config()` returns a `ListMemoryConfig` object (a Pydantic model) holding these values.
4. `dump_component()` takes this `ListMemoryConfig` object, converts its data into a dictionary (`config` field).
5. It figures out its own class path (`provider`) and other metadata (`component_type`, `version`, etc.).
6. It packages all this into a `ComponentModel` object and returns it.
**Loading (`load_component`) Flow:**
```mermaid
sequenceDiagram
participant User
participant Loader as ComponentLoader (e.g., ListMemory.load_component)
participant Importer as Python Import System
participant ListMemClass as ListMemory (Class definition)
participant ListMemConfig as ListMemoryConfig (Pydantic Model)
participant NewMemory as New ListMemory Instance
User->>+Loader: load_component(component_model)
Loader->>Loader: Reads provider ("autogen_core.memory.ListMemory") from model
Loader->>+Importer: Imports the class `autogen_core.memory.ListMemory`
Importer-->>-Loader: Returns ListMemory class object
Loader->>+ListMemClass: Checks if it's a valid Component class
Loader->>ListMemClass: Gets expected config schema (ListMemoryConfig)
Loader->>+ListMemConfig: Validates `config` dict from model against schema
ListMemConfig-->>-Loader: Returns validated ListMemoryConfig object
Loader->>+ListMemClass: Calls _from_config(validated_config)
ListMemClass->>+NewMemory: Creates new ListMemory instance using config
NewMemory-->>-ListMemClass: Returns new instance
ListMemClass-->>-Loader: Returns new instance
Loader-->>-User: Returns the new ListMemory instance
```
1. You call `ListMemory.load_component(memory_model)`.
2. The loader reads the `provider` string from `memory_model`.
3. It dynamically imports the class specified by `provider`.
4. It verifies this class is a proper `Component` subclass.
5. It finds the configuration schema defined by the class (e.g., `ListMemoryConfig`).
6. It validates the `config` dictionary from `memory_model` using this schema.
7. It calls the class's `_from_config()` method, passing the validated configuration object.
8. `_from_config()` uses the configuration data to initialize and return a new instance of the class (e.g., a new `ListMemory` with the loaded name and content).
9. The loader returns this newly created instance.
**Code Glimpse:**
The core logic lives in `_component_config.py`.
* **`Component` Base Class:** Classes like `ListMemory` inherit from `Component`. This requires them to define `component_type`, `component_config_schema`, and implement `_to_config()` and `_from_config()`.
```python
# From: _component_config.py (Simplified Concept)
from pydantic import BaseModel
from typing import Type, TypeVar, Generic, ClassVar
# ... other imports
ConfigT = TypeVar("ConfigT", bound=BaseModel)
class Component(Generic[ConfigT]): # Generic over its config type
# Required Class Variables for Concrete Components
component_type: ClassVar[str]
component_config_schema: Type[ConfigT]
# Required Instance Method for Saving
def _to_config(self) -> ConfigT:
raise NotImplementedError
# Required Class Method for Loading
@classmethod
def _from_config(cls, config: ConfigT) -> Self:
raise NotImplementedError
# dump_component and load_component are also part of the system
# (often inherited from base classes like ComponentBase)
def dump_component(self) -> ComponentModel: ...
@classmethod
def load_component(cls, model: ComponentModel | Dict[str, Any]) -> Self: ...
```
* **`ComponentModel`:** As shown before, a Pydantic model to hold the `provider`, `config`, `type`, etc.
* **`dump_component` Implementation (Conceptual):**
```python
# Inside ComponentBase or similar
def dump_component(self) -> ComponentModel:
# 1. Get the specific config from the instance
obj_config: BaseModel = self._to_config()
config_dict = obj_config.model_dump() # Convert to dictionary
# 2. Determine the provider string (class path)
provider_str = _type_to_provider_str(self.__class__)
# (Handle overrides like self.component_provider_override)
# 3. Get other metadata
comp_type = self.component_type
comp_version = self.component_version
# ... description, label ...
# 4. Create and return the ComponentModel
model = ComponentModel(
provider=provider_str,
config=config_dict,
component_type=comp_type,
version=comp_version,
# ... other metadata ...
)
return model
```
* **`load_component` Implementation (Conceptual):**
```python
# Inside ComponentLoader or similar
@classmethod
def load_component(cls, model: ComponentModel | Dict[str, Any]) -> Self:
# 1. Ensure we have a ComponentModel object
if isinstance(model, dict):
loaded_model = ComponentModel(**model)
else:
loaded_model = model
# 2. Import the class based on the provider string
provider_str = loaded_model.provider
# ... (handle WELL_KNOWN_PROVIDERS mapping) ...
module_path, class_name = provider_str.rsplit(".", 1)
module = importlib.import_module(module_path)
component_class = getattr(module, class_name)
# 3. Validate the class and config
if not is_component_class(component_class): # Check it's a valid Component
raise TypeError(...)
schema = component_class.component_config_schema
validated_config = schema.model_validate(loaded_model.config)
# 4. Call the class's factory method to create instance
instance = component_class._from_config(validated_config)
# 5. Return the instance (after type checks)
return instance
```
This system provides a powerful and consistent way to manage the building blocks of your AutoGen applications.
## Wrapping Up
Congratulations! You've reached the end of our core concepts tour. You now understand the `Component` model AutoGen Core's standard way to define configurable, savable, and loadable building blocks like `Memory`, `ChatCompletionClient`, `Tool`, and even aspects of `Agents` themselves.
* **Components** are like standardized Lego bricks.
* They use **`_to_config`** to describe their settings.
* They use **`_from_config`** to be built from settings.
* **`ComponentModel`** is the standard "box" storing the provider and config, enabling saving/loading (often via JSON/YAML).
This promotes:
* **Modularity:** Easily swap implementations (e.g., different LLM clients).
* **Reproducibility:** Save and load exact agent system configurations.
* **Configuration:** Manage settings in external files.
With these eight core concepts (`Agent`, `Messaging`, `AgentRuntime`, `Tool`, `ChatCompletionClient`, `ChatCompletionContext`, `Memory`, and `Component`), you have a solid foundation for understanding and building powerful multi-agent applications with AutoGen Core!
Happy building!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,47 @@
# Tutorial: AutoGen Core
AutoGen Core helps you build applications with multiple **_Agents_** that can work together.
Think of it like creating a team of specialized workers (*Agents*) who can communicate and use tools to solve problems.
The **_AgentRuntime_** acts as the manager, handling messages and agent lifecycles.
Agents communicate using a **_Messaging System_** (Topics and Subscriptions), can use **_Tools_** for specific tasks, interact with language models via a **_ChatCompletionClient_** while managing conversation history with **_ChatCompletionContext_**, and remember information using **_Memory_**.
**_Components_** provide a standard way to define and configure these building blocks.
**Source Repository:** [https://github.com/microsoft/autogen/tree/e45a15766746d95f8cfaaa705b0371267bec812e/python/packages/autogen-core/src/autogen_core](https://github.com/microsoft/autogen/tree/e45a15766746d95f8cfaaa705b0371267bec812e/python/packages/autogen-core/src/autogen_core)
```mermaid
flowchart TD
A0["0: Agent"]
A1["1: AgentRuntime"]
A2["2: Messaging System (Topic & Subscription)"]
A3["3: Component"]
A4["4: Tool"]
A5["5: ChatCompletionClient"]
A6["6: ChatCompletionContext"]
A7["7: Memory"]
A1 -- "Manages lifecycle" --> A0
A1 -- "Uses for message routing" --> A2
A0 -- "Uses LLM client" --> A5
A0 -- "Executes tools" --> A4
A0 -- "Accesses memory" --> A7
A5 -- "Gets history from" --> A6
A5 -- "Uses tool schema" --> A4
A7 -- "Updates LLM context" --> A6
A4 -- "Implemented as" --> A3
```
## Chapters
1. [Agent](01_agent.md)
2. [Messaging System (Topic & Subscription)](02_messaging_system__topic___subscription_.md)
3. [AgentRuntime](03_agentruntime.md)
4. [Tool](04_tool.md)
5. [ChatCompletionClient](05_chatcompletionclient.md)
6. [ChatCompletionContext](06_chatcompletioncontext.md)
7. [Memory](07_memory.md)
8. [Component](08_component.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,259 @@
# Chapter 1: The Agent - Your Browser Assistant's Brain
Welcome to the `Browser Use` tutorial! We're excited to help you learn how to automate web tasks using the power of Large Language Models (LLMs).
Imagine you want to perform a simple task, like searching Google for "cute cat pictures" and clicking on the very first image result. For a human, this is easy! You open your browser, type in the search, look at the results, and click.
But how do you tell a computer program to do this? It needs to understand the goal, look at the webpage like a human does, decide what to click or type next, and then actually perform those actions. This is where the **Agent** comes in.
## What Problem Does the Agent Solve?
The Agent is the core orchestrator, the "brain" or "project manager" of your browser automation task. It connects all the different pieces needed to achieve your goal. Without the Agent, you'd have a bunch of tools (like a browser controller and an LLM) but no central coordinator telling them what to do and when.
The Agent solves the problem of turning a high-level goal (like "find cat pictures") into concrete actions on a webpage, using intelligence to adapt to what it "sees" in the browser.
## Meet the Agent: Your Project Manager
Think of the `Agent` like a project manager overseeing a complex task. It doesn't do *all* the work itself, but it coordinates specialists:
1. **Receives the Task:** You give the Agent the overall goal (e.g., "Search Google for 'cute cat pictures' and click the first image result.").
2. **Consults the Planner (LLM):** The Agent shows the current state of the webpage (using the [BrowserContext](03_browsercontext.md)) to a Large Language Model (LLM). It asks, "Here's the goal, and here's what the webpage looks like right now. What should be the very next step?" The LLM acts as a smart planner, suggesting actions like "type 'cute cat pictures' into the search bar" or "click the element with index 5". We'll learn more about how we instruct the LLM in the [System Prompt](02_system_prompt.md) chapter.
3. **Manages History:** The Agent keeps track of everything that has happened so far the actions taken, the results, and the state of the browser at each step. This "memory" is managed by the [Message Manager](06_message_manager.md) and helps the LLM make better decisions.
4. **Instructs the Doer (Controller):** Once the LLM suggests an action (like "click element 5"), the Agent tells the [Action Controller & Registry](05_action_controller___registry.md) to actually perform that specific action within the browser.
5. **Observes the Results (BrowserContext):** After the Controller acts, the Agent uses the [BrowserContext](03_browsercontext.md) again to see the new state of the webpage (e.g., the Google search results page).
6. **Repeats:** The Agent repeats steps 2-5, continuously consulting the LLM, instructing the Controller, and observing the results, until the original task is complete or it reaches a stopping point.
## Using the Agent: A Simple Example
Let's see how you might use the Agent in Python code. Don't worry about understanding every detail yet; focus on the main idea. We're setting up the Agent with our task and the necessary components.
```python
# --- Simplified Example ---
# We need to import the necessary parts from the browser_use library
from browser_use import Agent, Browser, Controller, BrowserConfig, BrowserContextConfig
# Assume 'my_llm' is your configured Large Language Model (e.g., from OpenAI, Anthropic)
from my_llm_setup import my_llm # Placeholder for your specific LLM setup
# 1. Define the task for the Agent
my_task = "Go to google.com, search for 'cute cat pictures', and click the first image result."
# 2. Basic browser configuration (we'll learn more later)
browser_config = BrowserConfig() # Default settings
context_config = BrowserContextConfig() # Default settings
# 3. Initialize the components the Agent needs
# The Browser manages the underlying browser application
browser = Browser(config=browser_config)
# The Controller knows *how* to perform actions like 'click' or 'type'
controller = Controller()
async def main():
# The BrowserContext represents a single browser tab/window environment
# It uses the Browser and its configuration
async with BrowserContext(browser=browser, config=context_config) as browser_context:
# 4. Create the Agent instance!
agent = Agent(
task=my_task,
llm=my_llm, # The "brain" - the Language Model
browser_context=browser_context, # The "eyes" - interacts with the browser tab
controller=controller # The "hands" - executes actions
# Many other settings can be configured here!
)
print(f"Agent created. Starting task: {my_task}")
# 5. Run the Agent! This starts the loop.
# It will keep taking steps until the task is done or it hits the limit.
history = await agent.run(max_steps=15) # Limit steps for safety
# 6. Check the result
if history.is_done() and history.is_successful():
print("✅ Agent finished the task successfully!")
print(f"Final message from agent: {history.final_result()}")
else:
print("⚠️ Agent stopped. Maybe max_steps reached or task wasn't completed successfully.")
# The 'async with' block automatically cleans up the browser_context
await browser.close() # Close the browser application
# Run the asynchronous function
import asyncio
asyncio.run(main())
```
**What happens when you run this?**
1. An `Agent` object is created with your task, the LLM, the browser context, and the controller.
2. Calling `agent.run(max_steps=15)` starts the main loop.
3. The Agent gets the initial state of the browser (likely a blank page).
4. It asks the LLM what to do. The LLM might say "Go to google.com".
5. The Agent tells the Controller to execute the "go to URL" action.
6. The browser navigates to Google.
7. The Agent gets the new state (Google's homepage).
8. It asks the LLM again. The LLM says "Type 'cute cat pictures' into the search bar".
9. The Agent tells the Controller to type the text.
10. This continues step-by-step: pressing Enter, seeing results, asking the LLM, clicking the image.
11. Eventually, the LLM will hopefully tell the Agent the task is "done".
12. `agent.run()` finishes and returns the `history` object containing details of what happened.
## How it Works Under the Hood: The Agent Loop
Let's visualize the process with a simple diagram:
```mermaid
sequenceDiagram
participant User
participant Agent
participant LLM
participant Controller
participant BC as BrowserContext
User->>Agent: Start task("Search Google for cats...")
Note over Agent: Agent Loop Starts
Agent->>BC: Get current state (e.g., blank page)
BC-->>Agent: Current Page State
Agent->>LLM: What's next? (Task + State + History)
LLM-->>Agent: Plan: [Action: Type 'cute cat pictures', Action: Press Enter]
Agent->>Controller: Execute: type_text(...)
Controller->>BC: Perform type action
Agent->>Controller: Execute: press_keys('Enter')
Controller->>BC: Perform press action
Agent->>BC: Get new state (search results page)
BC-->>Agent: New Page State
Agent->>LLM: What's next? (Task + New State + History)
LLM-->>Agent: Plan: [Action: click_element(index=5)]
Agent->>Controller: Execute: click_element(index=5)
Controller->>BC: Perform click action
Note over Agent: Loop continues until done...
LLM-->>Agent: Plan: [Action: done(success=True, text='Found cat picture!')]
Agent->>Controller: Execute: done(...)
Controller-->>Agent: ActionResult (is_done=True)
Note over Agent: Agent Loop Ends
Agent->>User: Return History (Task Complete)
```
The core of the `Agent` lives in the `agent/service.py` file. The `Agent` class manages the overall process.
1. **Initialization (`__init__`)**: When you create an `Agent`, it sets up its internal state, stores the task, the LLM, the controller, and prepares the [Message Manager](06_message_manager.md) to keep track of the conversation history. It also figures out the best way to talk to the specific LLM you provided.
```python
# --- File: agent/service.py (Simplified __init__) ---
class Agent:
def __init__(
self,
task: str,
llm: BaseChatModel,
browser_context: BrowserContext,
controller: Controller,
# ... other settings like use_vision, max_failures, etc.
**kwargs
):
self.task = task
self.llm = llm
self.browser_context = browser_context
self.controller = controller
self.settings = AgentSettings(**kwargs) # Store various settings
self.state = AgentState() # Internal state (step count, failures, etc.)
# Setup message manager for history, using the task and system prompt
self._message_manager = MessageManager(
task=self.task,
system_message=self.settings.system_prompt_class(...).get_system_message(),
settings=MessageManagerSettings(...)
# ... more setup ...
)
# ... other initializations ...
logger.info("Agent initialized.")
```
2. **Running the Task (`run`)**: The `run` method orchestrates the main loop. It calls the `step` method repeatedly until the task is marked as done, an error occurs, or `max_steps` is reached.
```python
# --- File: agent/service.py (Simplified run method) ---
class Agent:
# ... (init) ...
async def run(self, max_steps: int = 100) -> AgentHistoryList:
self._log_agent_run() # Log start event
try:
for step_num in range(max_steps):
if self.state.stopped or self.state.consecutive_failures >= self.settings.max_failures:
break # Stop conditions
# Wait if paused
while self.state.paused: await asyncio.sleep(0.2)
step_info = AgentStepInfo(step_number=step_num, max_steps=max_steps)
await self.step(step_info) # <<< Execute one step of the loop
if self.state.history.is_done():
await self.log_completion() # Log success/failure
break # Exit loop if agent signaled 'done'
else:
logger.info("Max steps reached.") # Ran out of steps
finally:
# ... (cleanup, telemetry, potentially save history/gif) ...
pass
return self.state.history # Return the recorded history
```
3. **Taking a Step (`step`)**: This is the heart of the loop. In each step, the Agent:
* Gets the current browser state (`browser_context.get_state()`).
* Adds this state to the history via the `_message_manager`.
* Asks the LLM for the next action (`get_next_action()`).
* Tells the `Controller` to execute the action(s) (`multi_act()`).
* Records the outcome in the history.
* Handles any errors that might occur.
```python
# --- File: agent/service.py (Simplified step method) ---
class Agent:
# ... (init, run) ...
async def step(self, step_info: Optional[AgentStepInfo] = None) -> None:
logger.info(f"📍 Step {self.state.n_steps}")
state = None
model_output = None
result: list[ActionResult] = []
try:
# 1. Get current state from the browser
state = await self.browser_context.get_state() # Uses BrowserContext
# 2. Add state (+ previous result) to message history for LLM context
self._message_manager.add_state_message(state, self.state.last_result, ...)
# 3. Get LLM's decision on the next action(s)
input_messages = self._message_manager.get_messages()
model_output = await self.get_next_action(input_messages) # Calls the LLM
self.state.n_steps += 1 # Increment step counter
# 4. Execute the action(s) using the Controller
result = await self.multi_act(model_output.action) # Uses Controller
self.state.last_result = result # Store result for next step's context
# 5. Record step details (actions, results, state snapshot)
self._make_history_item(model_output, state, result, ...)
self.state.consecutive_failures = 0 # Reset failure count on success
except Exception as e:
# Handle errors, increment failure count, maybe retry later
result = await self._handle_step_error(e)
self.state.last_result = result
# ... (finally block for logging/telemetry) ...
```
## Conclusion
You've now met the `Agent`, the central coordinator in `Browser Use`. You learned that it acts like a project manager, taking your high-level task, consulting an LLM for step-by-step planning, managing the history, and instructing a `Controller` to perform actions within a `BrowserContext`.
The Agent's effectiveness heavily relies on how well we instruct the LLM planner. In the next chapter, we'll dive into exactly that: crafting the **System Prompt** to guide the LLM's behavior.
[Next Chapter: System Prompt](02_system_prompt.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,235 @@
# Chapter 2: The System Prompt - Setting the Rules for Your AI Assistant
In [Chapter 1: The Agent](01_agent.md), we met the `Agent`, our project manager for automating browser tasks. We saw it consults a Large Language Model (LLM) the "planner" to decide the next steps based on the current state of the webpage. But how does the Agent tell the LLM *how* it should think, behave, and respond? Just giving it the task isn't enough!
Imagine hiring a new assistant. You wouldn't just say, "Organize my files!" You'd give them specific instructions: "Please sort the files alphabetically by client name, put them in the blue folders, and give me a summary list when you're done." Without these rules, the assistant might do something completely different!
The **System Prompt** solves this exact problem for our LLM. It's the set of core instructions and rules we give the LLM at the very beginning, telling it exactly how to act as a browser automation assistant and, crucially, how to format its responses so the `Agent` can understand them.
## What is the System Prompt? The AI's Rulebook
Think of the System Prompt like the AI assistant's fundamental operating manual, its "Prime Directive," or the rules of a board game. It defines:
1. **Persona:** "You are an AI agent designed to automate browser tasks."
2. **Goal:** "Your goal is to accomplish the ultimate task..."
3. **Input:** How to understand the information it receives about the webpage ([DOM Representation](04_dom_representation.md)).
4. **Capabilities:** What actions it can take ([Action Controller & Registry](05_action_controller___registry.md)).
5. **Limitations:** What it *shouldn't* do (e.g., hallucinate actions).
6. **Response Format:** The *exact* structure (JSON format) its thoughts and planned actions must follow.
Without this rulebook, the LLM might just chat casually, give vague suggestions, or produce output in a format the `Agent` code can't parse. The System Prompt ensures the LLM behaves like the specialized tool we need.
## Why is the Response Format So Important?
This is a critical point. The `Agent` code isn't a human reading the LLM's response. It's a program expecting data in a very specific structure. The System Prompt tells the LLM to *always* respond in a JSON format that looks something like this (simplified):
```json
{
"current_state": {
"evaluation_previous_goal": "Success - Found the search bar.",
"memory": "On google.com main page. Need to search for cats.",
"next_goal": "Type 'cute cat pictures' into the search bar."
},
"action": [
{
"input_text": {
"index": 5, // The index of the search bar element
"text": "cute cat pictures"
}
},
{
"press_keys": {
"keys": "Enter" // Press the Enter key
}
}
]
}
```
The `Agent` can easily read this JSON:
* It understands the LLM's thoughts (`current_state`).
* It sees the exact `action` list the LLM wants to perform.
* It passes these actions (like `input_text` or `press_keys`) to the [Action Controller & Registry](05_action_controller___registry.md) to execute them in the browser.
If the LLM responded with just "Okay, I'll type 'cute cat pictures' into the search bar and press Enter," the `Agent` wouldn't know *which* element index corresponds to the search bar or exactly which actions to call. The strict JSON format is essential for automation.
## A Peek Inside the Rulebook (`system_prompt.md`)
The actual instructions live in a text file within the `Browser Use` library: `browser_use/agent/system_prompt.md`. It's quite detailed, but here's a tiny snippet focusing on the response format rule:
```markdown
# Response Rules
1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
{{"current_state": {{"evaluation_previous_goal": "...",
"memory": "...",
"next_goal": "..."}},
"action":[{{"one_action_name": {{...}}}}, ...]}}
2. ACTIONS: You can specify multiple actions in the list... Use maximum {{max_actions}} actions...
```
*(This is heavily simplified! The real file has many more rules about element interaction, error handling, task completion, etc.)*
This file clearly defines the JSON structure (`current_state` and `action`) and other crucial behaviors required from the LLM.
## How the Agent Uses the System Prompt
The `Agent` uses a helper class called `SystemPrompt` (found in `agent/prompts.py`) to manage these rules. Here's the flow:
1. **Loading:** When you create an `Agent`, it internally creates a `SystemPrompt` object. This object reads the rules from the `system_prompt.md` file.
2. **Formatting:** The `SystemPrompt` object formats these rules into a special `SystemMessage` object that LLMs understand as foundational instructions.
3. **Conversation Start:** This `SystemMessage` is given to the [Message Manager](06_message_manager.md), which keeps track of the conversation history with the LLM. The `SystemMessage` becomes the *very first message*, setting the context for all future interactions in that session.
Think of it like starting a meeting: the first thing you do is state the agenda and rules (System Prompt), and then the discussion (LLM interaction) follows based on that foundation.
Let's look at a simplified view of the `SystemPrompt` class loading the rules:
```python
# --- File: agent/prompts.py (Simplified) ---
import importlib.resources # Helps find files within the installed library
from langchain_core.messages import SystemMessage # Special message type for LLMs
class SystemPrompt:
def __init__(self, action_description: str, max_actions_per_step: int = 10):
# We ignore these details for now
self.default_action_description = action_description
self.max_actions_per_step = max_actions_per_step
self._load_prompt_template() # <--- Loads the rules file
def _load_prompt_template(self) -> None:
"""Load the prompt rules from the system_prompt.md file."""
try:
# Finds the 'system_prompt.md' file inside the browser_use package
filepath = importlib.resources.files('browser_use.agent').joinpath('system_prompt.md')
with filepath.open('r') as f:
self.prompt_template = f.read() # Read the text content
print("System Prompt template loaded successfully!")
except Exception as e:
print(f"Error loading system prompt: {e}")
self.prompt_template = "Error: Could not load prompt." # Fallback
def get_system_message(self) -> SystemMessage:
"""Format the loaded rules into a message for the LLM."""
# Replace placeholders like {{max_actions}} with actual values
prompt = self.prompt_template.format(max_actions=self.max_actions_per_step)
# Wrap the final rules text in a SystemMessage object
return SystemMessage(content=prompt)
# --- How it plugs into Agent creation (Conceptual) ---
# from browser_use import Agent, SystemPrompt
# from my_llm_setup import my_llm # Your LLM
# ... other setup ...
# When you create an Agent:
# agent = Agent(
# task="Find cat pictures",
# llm=my_llm,
# browser_context=...,
# controller=...,
# # The Agent's __init__ method does something like this internally:
# # system_prompt_obj = SystemPrompt(action_description="...", max_actions_per_step=10)
# # system_message_for_llm = system_prompt_obj.get_system_message()
# # This system_message_for_llm is then passed to the Message Manager.
# )
```
This code shows how the `SystemPrompt` class finds and reads the `system_prompt.md` file and prepares the instructions as a `SystemMessage` ready for the LLM conversation.
## Under the Hood: Initialization and Conversation Flow
Let's visualize how the System Prompt fits into the Agent's setup and interaction loop:
```mermaid
sequenceDiagram
participant User
participant Agent_Init as Agent Initialization
participant SP as SystemPrompt Class
participant MM as Message Manager
participant Agent_Run as Agent Run Loop
participant LLM
User->>Agent_Init: Create Agent(task, llm, ...)
Note over Agent_Init: Agent needs the rules!
Agent_Init->>SP: Create SystemPrompt(...)
SP->>SP: _load_prompt_template() reads system_prompt.md
SP-->>Agent_Init: SystemPrompt instance
Agent_Init->>SP: get_system_message()
SP-->>Agent_Init: system_message (The Formatted Rules)
Note over Agent_Init: Pass rules to conversation manager
Agent_Init->>MM: Initialize MessageManager(task, system_message)
MM->>MM: Store system_message as message #1
MM-->>Agent_Init: MessageManager instance ready
Agent_Init-->>User: Agent created and ready
User->>Agent_Run: agent.run() starts the task
Note over Agent_Run: Agent needs context for LLM
Agent_Run->>MM: get_messages()
MM-->>Agent_Run: [system_message, user_message(state), ...]
Note over Agent_Run: Send rules + current state to LLM
Agent_Run->>LLM: Ask for next action (Input includes rules)
LLM-->>Agent_Run: JSON response (LLM followed rules!)
Agent_Run->>MM: add_model_output(...)
Note over Agent_Run: Loop continues...
```
Internally, the `Agent`'s initialization code (`__init__` in `agent/service.py`) explicitly creates the `SystemPrompt` and passes its output to the `MessageManager`:
```python
# --- File: agent/service.py (Simplified Agent __init__) ---
# ... other imports ...
from browser_use.agent.prompts import SystemPrompt # Import the class
from browser_use.agent.message_manager.service import MessageManager, MessageManagerSettings
class Agent:
def __init__(
self,
task: str,
llm: BaseChatModel,
browser_context: BrowserContext,
controller: Controller,
system_prompt_class: Type[SystemPrompt] = SystemPrompt, # Allows customizing the prompt class
max_actions_per_step: int = 10,
# ... other parameters ...
**kwargs
):
self.task = task
self.llm = llm
# ... store other components ...
# Get the list of available actions from the controller
self.available_actions = controller.registry.get_prompt_description()
# 1. Create the SystemPrompt instance using the provided class
system_prompt_instance = system_prompt_class(
action_description=self.available_actions,
max_actions_per_step=max_actions_per_step,
)
# 2. Get the formatted SystemMessage (the rules)
system_message = system_prompt_instance.get_system_message()
# 3. Initialize the Message Manager with the task and the rules
self._message_manager = MessageManager(
task=self.task,
system_message=system_message, # <--- Pass the rules here!
settings=MessageManagerSettings(...)
# ... other message manager setup ...
)
# ... rest of initialization ...
logger.info("Agent initialized with System Prompt.")
```
When the `Agent` runs its loop (`agent.run()` calls `agent.step()`), it asks the `MessageManager` for the current conversation history (`self._message_manager.get_messages()`). The `MessageManager` always ensures that the `SystemMessage` (containing the rules) is the very first item in that history list sent to the LLM.
## Conclusion
The System Prompt is the essential rulebook that governs the LLM's behavior within the `Browser Use` framework. It tells the LLM how to interpret the browser state, what actions it can take, and most importantly, dictates the exact JSON format for its responses. This structured communication is key to enabling the `Agent` to reliably understand the LLM's plan and execute browser automation tasks.
Without a clear System Prompt, the LLM would be like an untrained assistant potentially intelligent, but unable to follow the specific procedures needed for the job.
Now that we understand how the `Agent` gets its fundamental instructions, how does it actually perceive the webpage it's supposed to interact with? In the next chapter, we'll explore the component responsible for representing the browser's state: the [BrowserContext](03_browsercontext.md).
[Next Chapter: BrowserContext](03_browsercontext.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,295 @@
# Chapter 3: BrowserContext - The Agent's Isolated Workspace
In the [previous chapter](02_system_prompt.md), we learned how the `System Prompt` acts as the rulebook for the AI assistant (LLM) that guides our `Agent`. We know the Agent uses the LLM to decide *what* to do next based on the current situation in the browser.
But *where* does the Agent actually "see" the webpage and perform its actions? How does it keep track of the current website address (URL), the page content, and things like cookies, all while staying focused on its specific task without getting mixed up with your other browsing?
This is where the **BrowserContext** comes in.
## What Problem Does BrowserContext Solve?
Imagine you ask your `Agent` to log into a specific online shopping website and check your order status. You might already be logged into that same website in your regular browser window with your personal account.
If the Agent just used your main browser window, it might:
1. Get confused by your existing login.
2. Accidentally use your personal cookies or saved passwords.
3. Interfere with other tabs you have open.
We need a way to give the Agent its *own*, clean, separate browsing environment for each task. It needs an isolated "workspace" where it can open websites, log in, click buttons, and manage its own cookies without affecting anything else.
The `BrowserContext` solves this by representing a single, isolated browser session.
## Meet the BrowserContext: Your Agent's Private Browser Window
Think of a `BrowserContext` like opening a brand new **Incognito Window** or creating a **separate User Profile** in your web browser (like Chrome or Firefox).
* **It's Isolated:** What happens in one `BrowserContext` doesn't affect others or your main browser session. It has its own cookies, its own history (for that session), and its own set of tabs.
* **It Manages State:** It keeps track of everything important about the current web session the Agent is working on:
* The current URL.
* Which tabs are open within its "window".
* Cookies specific to that session.
* The structure and content of the current webpage (the DOM - Document Object Model, which we'll explore in the [next chapter](04_dom_representation.md)).
* **It's the Agent's Viewport:** The `Agent` looks through the `BrowserContext` to "see" the current state of the webpage. When the Agent decides to perform an action (like clicking a button), it tells the [Action Controller](05_action_controller___registry.md) to perform it *within* that specific `BrowserContext`.
Essentially, the `BrowserContext` is like a dedicated, clean desk or workspace given to the Agent for its specific job.
## Using the BrowserContext
Before we can have an isolated session (`BrowserContext`), we first need the main browser application itself. This is handled by the `Browser` class. Think of `Browser` as the entire Chrome or Firefox application installed on your computer, while `BrowserContext` is just one window or profile within that application.
Here's a simplified example of how you might set up a `Browser` and then create a `BrowserContext` to navigate to a page:
```python
import asyncio
# Import necessary classes
from browser_use import Browser, BrowserConfig, BrowserContext, BrowserContextConfig
async def main():
# 1. Configure the main browser application (optional, defaults are usually fine)
browser_config = BrowserConfig(headless=False) # Show the browser window
# 2. Create the main Browser instance
# This might launch a browser application in the background (or connect to one)
browser = Browser(config=browser_config)
print("Browser application instance created.")
# 3. Configure the specific session/window (optional)
context_config = BrowserContextConfig(
user_agent="MyCoolAgent/1.0", # Example: Set a custom user agent
cookies_file="my_session_cookies.json" # Example: Save/load cookies
)
# 4. Create the isolated BrowserContext (like opening an incognito window)
# We use 'async with' to ensure it cleans up automatically afterwards
async with browser.new_context(config=context_config) as browser_context:
print(f"BrowserContext created (ID: {browser_context.context_id}).")
# 5. Use the context to interact with the browser session
start_url = "https://example.com"
print(f"Navigating to: {start_url}")
await browser_context.navigate_to(start_url)
# 6. Get information *from* the context
current_state = await browser_context.get_state() # Get current page info
print(f"Current page title: {current_state.title}")
print(f"Current page URL: {current_state.url}")
# The Agent would use this 'browser_context' object to see the page
# and tell the Controller to perform actions within it.
print("BrowserContext closed automatically.")
# 7. Close the main browser application when done
await browser.close()
print("Browser application closed.")
# Run the asynchronous code
asyncio.run(main())
```
**What happens here?**
1. We set up a `BrowserConfig` (telling it *not* to run headless so we can see the window).
2. We create a `Browser` instance, which represents the overall browser program.
3. We create a `BrowserContextConfig` to specify settings for our isolated session (like a custom name or where to save cookies).
4. Crucially, `browser.new_context(...)` creates our isolated session. The `async with` block ensures this session is properly closed later.
5. We use methods *on the `browser_context` object* like `navigate_to()` to control *this specific session*.
6. We use `browser_context.get_state()` to get information about the current page within *this session*. The `Agent` heavily relies on this method.
7. After the `async with` block finishes, the `browser_context` is closed (like closing the incognito window), and finally, we close the main `browser` application.
## How it Works Under the Hood
When the `Agent` needs to understand the current situation to decide the next step, it asks the `BrowserContext` for the latest state using the `get_state()` method. What happens then?
1. **Wait for Stability:** The `BrowserContext` first waits for the webpage to finish loading and for network activity to settle down (`_wait_for_page_and_frames_load`). This prevents the Agent from acting on an incomplete page.
2. **Analyze the Page:** It then uses the [DOM Representation](04_dom_representation.md) service (`DomService`) to analyze the current HTML structure of the page. This service figures out which elements are visible, interactive (buttons, links, input fields), and where they are.
3. **Capture Visuals:** It often takes a screenshot of the current view (`take_screenshot`). This can be helpful for advanced agents or debugging.
4. **Gather Metadata:** It gets the current URL, page title, and information about any other tabs open *within this context*.
5. **Package the State:** All this information (DOM structure, URL, title, screenshot, etc.) is bundled into a `BrowserState` object.
6. **Return to Agent:** The `BrowserContext` returns this `BrowserState` object to the `Agent`. The Agent then uses this information (often sending it to the LLM) to plan its next action.
Here's a simplified diagram of the `get_state()` process:
```mermaid
sequenceDiagram
participant Agent
participant BC as BrowserContext
participant PlaywrightPage as Underlying Browser Page
participant DomService as DOM Service
Agent->>BC: get_state()
Note over BC: Wait for page to be ready...
BC->>PlaywrightPage: Ensure page/network is stable
PlaywrightPage-->>BC: Page is ready
Note over BC: Analyze the page content...
BC->>DomService: Get simplified DOM structure + interactive elements
DomService-->>BC: DOMState (element tree, etc.)
Note over BC: Get visuals and metadata...
BC->>PlaywrightPage: Take screenshot()
PlaywrightPage-->>BC: Screenshot data
BC->>PlaywrightPage: Get URL, Title
PlaywrightPage-->>BC: URL, Title data
Note over BC: Combine everything...
BC->>BC: Create BrowserState object
BC-->>Agent: Return BrowserState
```
Let's look at some simplified code snippets from the library.
The `BrowserContext` is initialized (`__init__` in `browser/context.py`) with its configuration and a reference to the main `Browser` instance that created it.
```python
# --- File: browser/context.py (Simplified __init__) ---
import uuid
# ... other imports ...
if TYPE_CHECKING:
from browser_use.browser.browser import Browser # Link to the Browser class
@dataclass
class BrowserContextConfig: # Configuration settings
# ... various settings like user_agent, cookies_file, window_size ...
pass
@dataclass
class BrowserSession: # Holds the actual Playwright context
context: PlaywrightBrowserContext # The underlying Playwright object
cached_state: Optional[BrowserState] = None # Stores the last known state
class BrowserContext:
def __init__(
self,
browser: 'Browser', # Reference to the main Browser instance
config: BrowserContextConfig = BrowserContextConfig(),
# ... other optional state ...
):
self.context_id = str(uuid.uuid4()) # Unique ID for this session
self.config = config # Store the configuration
self.browser = browser # Store the reference to the parent Browser
# The actual Playwright session is created later, when needed
self.session: BrowserSession | None = None
logger.debug(f"BrowserContext object created (ID: {self.context_id}). Session not yet initialized.")
# The 'async with' statement calls __aenter__ which initializes the session
async def __aenter__(self):
await self._initialize_session() # Creates the actual browser window/tab
return self
async def _initialize_session(self):
# ... (complex setup code happens here) ...
# Gets the main Playwright browser from self.browser
playwright_browser = await self.browser.get_playwright_browser()
# Creates the isolated Playwright context (like the incognito window)
context = await self._create_context(playwright_browser)
# Creates the BrowserSession to hold the context and state
self.session = BrowserSession(context=context, cached_state=None)
logger.debug(f"BrowserContext session initialized (ID: {self.context_id}).")
# ... (sets up the initial page) ...
return self.session
# ... other methods like navigate_to, close, etc. ...
```
The `get_state` method orchestrates fetching the current information from the browser session.
```python
# --- File: browser/context.py (Simplified get_state and helpers) ---
# ... other imports ...
from browser_use.dom.service import DomService # Imports the DOM analyzer
from browser_use.browser.views import BrowserState # Imports the state structure
class BrowserContext:
# ... (init, aenter, etc.) ...
async def get_state(self) -> BrowserState:
"""Get the current state of the browser session."""
logger.debug(f"Getting state for context {self.context_id}...")
# 1. Make sure the page is loaded and stable
await self._wait_for_page_and_frames_load()
# 2. Get the actual Playwright session object
session = await self.get_session()
# 3. Update the state (this does the heavy lifting)
session.cached_state = await self._update_state()
logger.debug(f"State update complete for {self.context_id}.")
# 4. Optionally save cookies if configured
if self.config.cookies_file:
asyncio.create_task(self.save_cookies())
return session.cached_state
async def _wait_for_page_and_frames_load(self, timeout_overwrite: float | None = None):
"""Ensures page is fully loaded before continuing."""
# ... (complex logic to wait for network idle, minimum times) ...
page = await self.get_current_page()
await page.wait_for_load_state('load', timeout=5000) # Simplified wait
logger.debug("Page load/network stability checks passed.")
await asyncio.sleep(self.config.minimum_wait_page_load_time) # Ensure minimum wait
async def _update_state(self) -> BrowserState:
"""Fetches all info and builds the BrowserState."""
session = await self.get_session()
page = await self.get_current_page() # Get the active Playwright page object
try:
# Use DomService to analyze the page content
dom_service = DomService(page)
# Get the simplified DOM tree and interactive elements map
content_info = await dom_service.get_clickable_elements(
highlight_elements=self.config.highlight_elements,
# ... other DOM options ...
)
# Take a screenshot
screenshot_b64 = await self.take_screenshot()
# Get URL, Title, Tabs, Scroll info etc.
url = page.url
title = await page.title()
tabs = await self.get_tabs_info()
pixels_above, pixels_below = await self.get_scroll_info(page)
# Create the BrowserState object
browser_state = BrowserState(
element_tree=content_info.element_tree,
selector_map=content_info.selector_map,
url=url,
title=title,
tabs=tabs,
screenshot=screenshot_b64,
pixels_above=pixels_above,
pixels_below=pixels_below,
)
return browser_state
except Exception as e:
logger.error(f'Failed to update state: {str(e)}')
# Maybe return old state or raise error
raise BrowserError("Failed to get browser state") from e
async def take_screenshot(self, full_page: bool = False) -> str:
"""Takes a screenshot and returns base64 encoded string."""
page = await self.get_current_page()
screenshot_bytes = await page.screenshot(full_page=full_page, animations='disabled')
return base64.b64encode(screenshot_bytes).decode('utf-8')
# ... many other helper methods (_get_current_page, get_tabs_info, etc.) ...
```
This shows how `BrowserContext` acts as a manager for a specific browser session, using underlying tools (like Playwright and `DomService`) to gather the necessary information (`BrowserState`) that the `Agent` needs to operate.
## Conclusion
The `BrowserContext` is a fundamental concept in `Browser Use`. It provides the necessary **isolated environment** for the `Agent` to perform its tasks, much like an incognito window or a separate browser profile. It manages the session's state (URL, cookies, tabs, page content) and provides the `Agent` with a snapshot of the current situation via the `get_state()` method.
Understanding the `BrowserContext` helps clarify *where* the Agent works. Now, how does the Agent actually understand the *content* of the webpage within that context? How is the complex structure of a webpage represented in a way the Agent (and the LLM) can understand?
In the next chapter, we'll dive into exactly that: the [DOM Representation](04_dom_representation.md).
[Next Chapter: DOM Representation](04_dom_representation.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,316 @@
# Chapter 4: DOM Representation - Mapping the Webpage
In the [previous chapter](03_browsercontext.md), we learned about the `BrowserContext`, the Agent's private workspace for browsing. We saw that the Agent uses `browser_context.get_state()` to get a snapshot of the current webpage. But how does the Agent actually *understand* the content of that snapshot?
Imagine you're looking at the Google homepage. You instantly recognize the logo, the search bar, and the buttons. But a computer program just sees a wall of code (HTML). How can our `Agent` figure out: "This rectangular box is the search bar I need to type into," or "This specific image link is the first result I should click"?
This is the problem solved by **DOM Representation**.
## What Problem Does DOM Representation Solve?
Webpages are built using HTML (HyperText Markup Language), which describes the structure and content. Your browser reads this HTML and creates an internal, structured representation called the **Document Object Model (DOM)**. It's like the browser builds a detailed blueprint or an outline from the HTML instructions.
However, this raw DOM blueprint is incredibly complex and contains lots of information irrelevant to our Agent's task. The Agent doesn't need to know about every single tiny visual detail; it needs a *simplified map* focused on what's important for interaction:
1. **What elements are on the page?** (buttons, links, input fields, text)
2. **Are they visible to a user?** (Hidden elements shouldn't be interacted with)
3. **Are they interactive?** (Can you click it? Can you type in it?)
4. **How can the Agent refer to them?** (We need a simple way to say "click *this* button")
DOM Representation solves the problem of translating the complex, raw DOM blueprint into a simplified, structured map that highlights the interactive "landmarks" and pathways the Agent can use.
## Meet `DomService`: The Map Maker
The component responsible for creating this map is the `DomService`. Think of it as a cartographer specializing in webpages.
When the `Agent` (via the `BrowserContext`) asks for the current state of the page, the `BrowserContext` employs the `DomService` to analyze the page's live DOM.
Here's what the `DomService` does:
1. **Examines the Live Page:** It looks at the current structure rendered in the browser tab, not just the initial HTML source code (because JavaScript can change the page after it loads).
2. **Identifies Elements:** It finds all the meaningful elements like buttons, links, input fields, and text blocks.
3. **Checks Properties:** For each element, it determines crucial properties:
* **Visibility:** Is it actually displayed on the screen?
* **Interactivity:** Is it something a user can click, type into, or otherwise interact with?
* **Position:** Where is it located (roughly)?
4. **Assigns Interaction Indices:** This is key! For elements deemed interactive and visible, `DomService` assigns a unique number, called a `highlight_index` (like `[5]`, `[12]`, etc.). This gives the Agent and the LLM a simple, unambiguous way to refer to specific elements.
5. **Builds a Structured Tree:** It organizes this information into a simplified tree structure (`element_tree`) that reflects the page layout but is much easier to process than the full DOM.
6. **Creates an Index Map:** It generates a `selector_map`, which is like an index in a book, mapping each `highlight_index` directly to its corresponding element node in the tree.
The final output is a `DOMState` object containing the simplified `element_tree` and the handy `selector_map`. This `DOMState` is then included in the `BrowserState` that `BrowserContext.get_state()` returns to the Agent.
## The Output: `DOMState` - The Agent's Map
The `DOMState` object produced by `DomService` has two main parts:
1. **`element_tree`:** This is the root of our simplified map, represented as a `DOMElementNode` object (defined in `dom/views.py`). Each node in the tree can be either an element (`DOMElementNode`) or a piece of text (`DOMTextNode`). `DOMElementNode`s contain information like the tag name (`<button>`, `<input>`), attributes (`aria-label="Search"`), visibility, interactivity, and importantly, the `highlight_index` if applicable. The tree structure helps understand the page layout (e.g., this button is inside that section).
*Conceptual Example Tree:*
```
<body> [no index]
|-- <div> [no index]
| |-- <input aria-label="Search"> [highlight_index: 5]
| +-- <button> [highlight_index: 6]
| +-- "Google Search" (TextNode)
+-- <a> href="/images"> [highlight_index: 7]
+-- "Images" (TextNode)
```
2. **`selector_map`:** This is a Python dictionary that acts as a quick lookup. It maps the integer `highlight_index` directly to the corresponding `DOMElementNode` object in the `element_tree`.
*Conceptual Example Map:*
```python
{
5: <DOMElementNode tag_name='input', attributes={'aria-label':'Search'}, ...>,
6: <DOMElementNode tag_name='button', ...>,
7: <DOMElementNode tag_name='a', attributes={'href':'/images'}, ...>
}
```
This `selector_map` is incredibly useful because when the LLM decides "click element 5", the Agent can instantly find the correct `DOMElementNode` using `selector_map[5]` and tell the [Action Controller & Registry](05_action_controller___registry.md) exactly which element to interact with.
## How the Agent Uses the Map
The `Agent` takes the `DOMState` (usually simplifying the `element_tree` further into a text representation) and includes it in the information sent to the LLM. Remember the JSON response format from [Chapter 2](02_system_prompt.md)? The LLM uses the `highlight_index` from this map to specify actions:
```json
// LLM might receive a simplified text view like:
// "[5]<input aria-label='Search'>\n[6]<button>Google Search</button>\n[7]<a>Images</a>"
// And respond with:
{
"current_state": {
"evaluation_previous_goal": "...",
"memory": "On Google homepage, need to search for cats.",
"next_goal": "Type 'cute cats' into the search bar [5]."
},
"action": [
{
"input_text": {
"index": 5, // <-- Uses the highlight_index from the DOM map!
"text": "cute cats"
}
}
// ... maybe press Enter action ...
]
}
```
## Code Example: Seeing the Map
We don't usually interact with `DomService` directly. Instead, we get its output via the `BrowserContext`. Let's revisit the example from Chapter 3 and see where the DOM representation fits:
```python
import asyncio
from browser_use import Browser, BrowserConfig, BrowserContext, BrowserContextConfig
async def main():
browser_config = BrowserConfig(headless=False)
browser = Browser(config=browser_config)
context_config = BrowserContextConfig()
async with browser.new_context(config=context_config) as browser_context:
# Navigate to a page (e.g., Google)
await browser_context.navigate_to("https://www.google.com")
print("Getting current page state...")
# This call uses DomService internally to generate the DOM representation
current_state = await browser_context.get_state()
print(f"\nCurrent Page URL: {current_state.url}")
print(f"Current Page Title: {current_state.title}")
# Accessing the DOM Representation parts within the BrowserState
print("\n--- DOM Representation Details ---")
# The element_tree is the root node of our simplified DOM map
if current_state.element_tree:
print(f"Root element tag of simplified tree: <{current_state.element_tree.tag_name}>")
else:
print("Element tree is empty.")
# The selector_map provides direct access to interactive elements by index
if current_state.selector_map:
print(f"Number of interactive elements found: {len(current_state.selector_map)}")
# Let's try to find the element the LLM might call [5] (often the search bar)
example_index = 5 # Note: Indices can change depending on the page!
if example_index in current_state.selector_map:
element_node = current_state.selector_map[example_index]
print(f"Element [{example_index}]: Tag=<{element_node.tag_name}>, Attributes={element_node.attributes}")
# The Agent uses this node reference to perform actions
else:
print(f"Element [{example_index}] not found in the selector map for this page state.")
else:
print("No interactive elements found (selector map is empty).")
# The Agent would typically convert element_tree into a compact text format
# (using methods like element_tree.clickable_elements_to_string())
# to send to the LLM along with the task instructions.
print("\nBrowserContext closed.")
await browser.close()
print("Browser closed.")
# Run the asynchronous code
asyncio.run(main())
```
**What happens here?**
1. We set up the `Browser` and `BrowserContext`.
2. We navigate to Google.
3. `browser_context.get_state()` is called. **Internally**, this triggers the `DomService`.
4. `DomService` analyzes the Google page, finds interactive elements (like the search bar, buttons), assigns them `highlight_index` numbers, and builds the `element_tree` and `selector_map`.
5. This `DOMState` (containing the tree and map) is packaged into the `BrowserState` object returned by `get_state()`.
6. Our code then accesses `current_state.element_tree` and `current_state.selector_map` to peek at the map created by `DomService`.
7. We demonstrate looking up an element using its potential index (`selector_map[5]`).
## How It Works Under the Hood: `DomService` in Action
Let's trace the flow when `BrowserContext.get_state()` is called:
```mermaid
sequenceDiagram
participant Agent
participant BC as BrowserContext
participant DomService
participant PlaywrightPage as Browser Page (JS Env)
participant buildDomTree_js as buildDomTree.js
Agent->>BC: get_state()
Note over BC: Needs to analyze the page content
BC->>DomService: get_clickable_elements(...)
Note over DomService: Needs to run analysis script in browser
DomService->>PlaywrightPage: evaluate(js_code='buildDomTree.js', args={...})
Note over PlaywrightPage: Execute JavaScript code
PlaywrightPage->>buildDomTree_js: Run analysis function
Note over buildDomTree_js: Analyzes live DOM, finds visible & interactive elements, assigns highlight_index
buildDomTree_js-->>PlaywrightPage: Return structured data (nodes, indices, map)
PlaywrightPage-->>DomService: Return JS execution result (JSON-like data)
Note over DomService: Process the raw data from JS
DomService->>DomService: _construct_dom_tree(result)
Note over DomService: Builds Python DOMElementNode tree and selector_map
DomService-->>BC: Return DOMState (element_tree, selector_map)
Note over BC: Combine DOMState with URL, title, screenshot etc.
BC->>BC: Create BrowserState object
BC-->>Agent: Return BrowserState (containing DOM map)
```
**Key Code Points:**
1. **`BrowserContext` calls `DomService`:** Inside `browser/context.py`, the `_update_state` method (called by `get_state`) initializes and uses the `DomService`:
```python
# --- File: browser/context.py (Simplified _update_state) ---
from browser_use.dom.service import DomService # Import the service
from browser_use.browser.views import BrowserState
class BrowserContext:
# ... other methods ...
async def _update_state(self) -> BrowserState:
page = await self.get_current_page() # Get the active Playwright page object
# ... error handling ...
try:
# 1. Create DomService instance for the current page
dom_service = DomService(page)
# 2. Call DomService to get the DOM map (DOMState)
content_info = await dom_service.get_clickable_elements(
highlight_elements=self.config.highlight_elements,
viewport_expansion=self.config.viewport_expansion,
# ... other options ...
)
# 3. Get other info (screenshot, URL, title etc.)
screenshot_b64 = await self.take_screenshot()
url = page.url
title = await page.title()
# ... gather more state ...
# 4. Package everything into BrowserState
browser_state = BrowserState(
element_tree=content_info.element_tree, # <--- From DomService
selector_map=content_info.selector_map, # <--- From DomService
url=url,
title=title,
screenshot=screenshot_b64,
# ... other state info ...
)
return browser_state
except Exception as e:
logger.error(f'Failed to update state: {str(e)}')
raise # Or handle error
```
2. **`DomService` runs JavaScript:** Inside `dom/service.py`, the `_build_dom_tree` method executes the JavaScript code stored in `buildDomTree.js` within the browser page's context.
```python
# --- File: dom/service.py (Simplified _build_dom_tree) ---
import logging
from importlib import resources
# ... other imports ...
logger = logging.getLogger(__name__)
class DomService:
def __init__(self, page: 'Page'):
self.page = page
# Load the JavaScript code from the file when DomService is created
self.js_code = resources.read_text('browser_use.dom', 'buildDomTree.js')
# ...
async def _build_dom_tree(
self, highlight_elements: bool, focus_element: int, viewport_expansion: int
) -> tuple[DOMElementNode, SelectorMap]:
# Prepare arguments for the JavaScript function
args = {
'doHighlightElements': highlight_elements,
'focusHighlightIndex': focus_element,
'viewportExpansion': viewport_expansion,
'debugMode': logger.getEffectiveLevel() == logging.DEBUG,
}
try:
# Execute the JavaScript code in the browser page!
# The JS code analyzes the live DOM and returns a structured result.
eval_page = await self.page.evaluate(self.js_code, args)
except Exception as e:
logger.error('Error evaluating JavaScript: %s', e)
raise
# ... (optional debug logging) ...
# Parse the result from JavaScript into Python objects
return await self._construct_dom_tree(eval_page)
async def _construct_dom_tree(self, eval_page: dict) -> tuple[DOMElementNode, SelectorMap]:
# ... (logic to parse js_node_map from eval_page) ...
# ... (loops through nodes, creates DOMElementNode/DOMTextNode objects) ...
# ... (builds the tree structure by linking parents/children) ...
# ... (populates the selector_map dictionary) ...
# This uses the structures defined in dom/views.py
# ...
root_node = ... # Parsed root DOMElementNode
selector_map = ... # Populated dictionary {index: DOMElementNode}
return root_node, selector_map
# ... other methods like get_clickable_elements ...
```
3. **`buildDomTree.js` (Conceptual):** This JavaScript file (located at `dom/buildDomTree.js` in the library) is the core map-making logic that runs *inside the browser*. It traverses the live DOM, checks element visibility and interactivity using browser APIs (like `element.getBoundingClientRect()`, `window.getComputedStyle()`, `document.elementFromPoint()`), assigns the `highlight_index`, and packages the results into a structured format that the Python `DomService` can understand. *We don't need to understand the JS code itself, just its purpose.*
4. **Python Data Structures (`DOMElementNode`, `DOMTextNode`):** The results from the JavaScript are parsed into Python objects defined in `dom/views.py`. These dataclasses (`DOMElementNode`, `DOMTextNode`) hold the information about each mapped element or text segment.
## Conclusion
DOM Representation, primarily handled by the `DomService`, is crucial for bridging the gap between the complex reality of a webpage (the DOM) and the Agent/LLM's need for a simplified, actionable understanding. By creating a structured `element_tree` and an indexed `selector_map`, it provides a clear map of interactive landmarks on the page, identified by simple `highlight_index` numbers.
This map allows the LLM to make specific plans like "type into element [5]" or "click element [12]", which the Agent can then reliably translate into concrete actions.
Now that we understand how the Agent sees the page, how does it actually *perform* those actions like clicking or typing? In the next chapter, we'll explore the component responsible for executing the LLM's plan: the [Action Controller & Registry](05_action_controller___registry.md).
[Next Chapter: Action Controller & Registry](05_action_controller___registry.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,340 @@
# Chapter 5: Action Controller & Registry - The Agent's Hands and Toolbox
In the [previous chapter](04_dom_representation.md), we saw how the `DomService` creates a simplified map (`DOMState`) of the webpage, allowing the Agent and its LLM planner to identify interactive elements like buttons and input fields using unique numbers (`highlight_index`). The LLM uses this map to decide *what* specific action to take next, like "click element [5]" or "type 'hello world' into element [12]".
But how does the program actually *do* that? How does the abstract idea "click element [5]" turn into a real click inside the browser window managed by the [BrowserContext](03_browsercontext.md)?
This is where the **Action Controller** and **Action Registry** come into play. They are the "hands" and "toolbox" that execute the Agent's decisions.
## What Problem Do They Solve?
Imagine you have a detailed instruction manual (the LLM's plan) for building a model car. The manual tells you exactly which piece to pick up (`index=5`) and what to do with it ("click" or "attach"). However, you still need:
1. **A Toolbox:** A collection of all the tools you might need (screwdriver, glue, pliers). You need to know what tools are available.
2. **A Mechanic:** Someone (or you!) who can read the instruction ("Use the screwdriver on screw #5"), select the correct tool from the toolbox, and skillfully use it on the specified part.
Without the toolbox and the mechanic, the instruction manual is useless.
Similarly, the `Browser Use` Agent needs:
1. **Action Registry (The Toolbox):** A defined list of all possible actions the Agent can perform (e.g., `click_element`, `input_text`, `scroll_down`, `go_to_url`, `done`). This registry also holds details about each action, like what parameters it needs (e.g., `click_element` needs an `index`).
2. **Action Controller (The Mechanic):** A component that takes the specific action requested by the LLM (e.g., "execute `click_element` with `index=5`"), finds the corresponding function (the "tool") in the Registry, ensures the request is valid, and then executes that function using the [BrowserContext](03_browsercontext.md) (the "car").
The Controller and Registry solve the problem of translating the LLM's high-level plan into concrete, executable browser operations in a structured and reliable way.
## Meet the Toolbox and the Mechanic
Let's break down these two closely related concepts:
### 1. Action Registry: The Toolbox (`controller/registry/service.py`)
Think of the `Registry` as a carefully organized toolbox. Each drawer is labeled with the name of a tool (an action like `click_element`), and inside, you find the tool itself (the actual code function) along with its instructions (description and required parameters).
* **Catalog of Actions:** It holds a dictionary where keys are action names (strings like `"click_element"`) and values are `RegisteredAction` objects containing:
* The action's `name`.
* A `description` (for humans and the LLM).
* The actual Python `function` to call.
* A `param_model` (a Pydantic model defining required parameters like `index` or `text`).
* **Informs the LLM:** The `Registry` can generate a description of all available actions and their parameters. This description is given to the LLM (as part of the [System Prompt](02_system_prompt.md)) so it knows exactly what "tools" it's allowed to ask the Agent to use.
### 2. Action Controller: The Mechanic (`controller/service.py`)
The `Controller` is the skilled mechanic who uses the tools from the Registry.
* **Receives Instructions:** It gets the action request from the Agent. This request typically comes in the form of an `ActionModel` object, which represents the LLM's JSON output (e.g., `{"click_element": {"index": 5}}`).
* **Selects the Tool:** It looks at the `ActionModel`, identifies the action name (`"click_element"`), and retrieves the corresponding `RegisteredAction` from the `Registry`.
* **Validates Parameters:** It uses the action's `param_model` (e.g., `ClickElementAction`) to check if the provided parameters (`{"index": 5}`) are correct.
* **Executes the Action:** It calls the actual Python function associated with the action (e.g., the `click_element` function), passing it the validated parameters and the necessary `BrowserContext` (so the function knows *which* browser tab to act upon).
* **Reports the Result:** The action function performs the task (e.g., clicking the element) and returns an `ActionResult` object, indicating whether it succeeded, failed, or produced some output. The Controller passes this result back to the Agent.
## Using the Controller: Executing an Action
In the Agent's main loop ([Chapter 1: Agent](01_agent.md)), after the LLM provides its plan as an `ActionModel`, the Agent simply hands this model over to the `Controller` to execute it.
```python
# --- Simplified Agent step calling the Controller ---
# Assume 'llm_response_model' is the ActionModel object parsed from LLM's JSON
# Assume 'self.controller' is the Controller instance
# Assume 'self.browser_context' is the current BrowserContext
# ... inside the Agent's step method ...
try:
# Agent tells the Controller: "Execute this action!"
action_result: ActionResult = await self.controller.act(
action=llm_response_model, # The LLM's chosen action and parameters
browser_context=self.browser_context # The browser tab to act within
# Other context like LLMs for extraction might be passed too
)
# Agent receives the result from the Controller
print(f"Action executed. Result: {action_result.extracted_content}")
if action_result.is_done:
print("Task marked as done by the action!")
if action_result.error:
print(f"Action encountered an error: {action_result.error}")
# Agent records this result in the history ([Message Manager](06_message_manager.md))
# ...
except Exception as e:
print(f"Failed to execute action: {e}")
# Handle the error
```
**What happens here?**
1. The Agent has received `llm_response_model` (e.g., representing `{"click_element": {"index": 5}}`).
2. It calls `self.controller.act()`, passing the action model and the active `browser_context`.
3. The `controller.act()` method handles looking up the `"click_element"` function in the `Registry`, validating the `index` parameter, and calling the function to perform the click within the `browser_context`.
4. The `click_element` function executes (interacting with the browser via `BrowserContext` methods).
5. It returns an `ActionResult` (e.g., `ActionResult(extracted_content="Clicked button with index 5")`).
6. The Agent receives this `action_result` and proceeds.
## How it Works Under the Hood: The Execution Flow
Let's trace the journey of an action request from the Agent to the browser click:
```mermaid
sequenceDiagram
participant Agent
participant Controller
participant Registry
participant ClickFunc as click_element Function
participant BC as BrowserContext
Note over Agent: LLM decided: click_element(index=5)
Agent->>Controller: act(action={"click_element": {"index": 5}}, browser_context=BC)
Note over Controller: Identify action and params
Controller->>Controller: action_name = "click_element", params = {"index": 5}
Note over Controller: Ask Registry for the tool
Controller->>Registry: Get action definition for "click_element"
Registry-->>Controller: Return RegisteredAction(name="click_element", function=ClickFunc, param_model=ClickElementAction, ...)
Note over Controller: Validate params using param_model
Controller->>Controller: ClickElementAction(index=5) # Validation OK
Note over Controller: Execute the function
Controller->>ClickFunc: ClickFunc(params=ClickElementAction(index=5), browser=BC)
Note over ClickFunc: Perform the click via BrowserContext
ClickFunc->>BC: Find element with index 5
BC-->>ClickFunc: Element reference
ClickFunc->>BC: Execute click on element
BC-->>ClickFunc: Click successful
ClickFunc-->>Controller: Return ActionResult(extracted_content="Clicked button...")
Controller-->>Agent: Return ActionResult
```
This diagram shows the Controller orchestrating the process: receiving the request, consulting the Registry, validating, calling the specific action function, and returning the result.
## Diving Deeper into the Code
Let's peek at simplified versions of the key files.
### 1. Registering Actions (`controller/registry/service.py`)
Actions are typically registered using a decorator `@registry.action`.
```python
# --- File: controller/registry/service.py (Simplified Registry) ---
from typing import Callable, Type
from pydantic import BaseModel
# Assume ActionModel, RegisteredAction are defined in views.py
class Registry:
def __init__(self, exclude_actions: list[str] = []):
self.registry: dict[str, RegisteredAction] = {}
self.exclude_actions = exclude_actions
# ... other initializations ...
def _create_param_model(self, function: Callable) -> Type[BaseModel]:
"""Creates a Pydantic model from function signature (simplified)"""
# ... (Inspects function signature to build a model) ...
# Example: for func(index: int, text: str), creates a model
# class func_parameters(ActionModel):
# index: int
# text: str
# return func_parameters
pass # Placeholder for complex logic
def action(
self,
description: str,
param_model: Type[BaseModel] | None = None,
):
"""Decorator for registering actions"""
def decorator(func: Callable):
if func.__name__ in self.exclude_actions: return func # Skip excluded
# If no specific param_model provided, try to generate one
actual_param_model = param_model # Or self._create_param_model(func) if needed
# Ensure function is awaitable (async)
wrapped_func = func # Assume func is already async for simplicity
action = RegisteredAction(
name=func.__name__,
description=description,
function=wrapped_func,
param_model=actual_param_model,
)
self.registry[func.__name__] = action # Add to the toolbox!
print(f"Action '{func.__name__}' registered.")
return func
return decorator
def get_prompt_description(self) -> str:
"""Get a description of all actions for the prompt (simplified)"""
descriptions = []
for action in self.registry.values():
# Format description for LLM (e.g., "click_element: Click element {index: {'type': 'integer'}}")
descriptions.append(f"{action.name}: {action.description} {action.param_model.schema()}")
return "\n".join(descriptions)
async def execute_action(self, action_name: str, params: dict, browser, **kwargs) -> Any:
"""Execute a registered action (simplified)"""
if action_name not in self.registry:
raise ValueError(f"Action {action_name} not found")
action = self.registry[action_name]
try:
# Validate params using the registered Pydantic model
validated_params = action.param_model(**params)
# Call the actual action function with validated params and browser context
# Assumes function takes validated_params model and browser
result = await action.function(validated_params, browser=browser, **kwargs)
return result
except Exception as e:
raise RuntimeError(f"Error executing {action_name}: {e}") from e
```
This shows how the `@registry.action` decorator takes a function, its description, and parameter model, and stores them in the `registry` dictionary. `execute_action` is the core method used by the `Controller` to run a specific action.
### 2. Defining Action Parameters (`controller/views.py`)
Each action often has its own Pydantic model to define its expected parameters.
```python
# --- File: controller/views.py (Simplified Action Parameter Models) ---
from pydantic import BaseModel
from typing import Optional
# Example parameter model for the 'click_element' action
class ClickElementAction(BaseModel):
index: int # The highlight_index of the element to click
xpath: Optional[str] = None # Optional hint (usually index is enough)
# Example parameter model for the 'input_text' action
class InputTextAction(BaseModel):
index: int # The highlight_index of the input field
text: str # The text to type
xpath: Optional[str] = None # Optional hint
# Example parameter model for the 'done' action (task completion)
class DoneAction(BaseModel):
text: str # A final message or result
success: bool # Was the overall task successful?
# ... other action models like GoToUrlAction, ScrollAction etc. ...
```
These models ensure that when the Controller receives parameters like `{"index": 5}`, it can validate that `index` is indeed an integer as required by `ClickElementAction`.
### 3. The Controller Service (`controller/service.py`)
The `Controller` class ties everything together. It initializes the `Registry` and registers the default browser actions. Its main job is the `act` method.
```python
# --- File: controller/service.py (Simplified Controller) ---
import logging
from browser_use.agent.views import ActionModel, ActionResult # Input/Output types
from browser_use.browser.context import BrowserContext # Needed by actions
from browser_use.controller.registry.service import Registry # The toolbox
from browser_use.controller.views import ClickElementAction, InputTextAction, DoneAction # Param models
logger = logging.getLogger(__name__)
class Controller:
def __init__(self, exclude_actions: list[str] = []):
self.registry = Registry(exclude_actions=exclude_actions) # Initialize the toolbox
# --- Register Default Actions ---
# (Registration happens when Controller is created)
@self.registry.action("Click element", param_model=ClickElementAction)
async def click_element(params: ClickElementAction, browser: BrowserContext):
logger.info(f"Attempting to click element index {params.index}")
# --- Actual click logic using browser object ---
element_node = await browser.get_dom_element_by_index(params.index)
await browser._click_element_node(element_node) # Internal browser method
# ---
msg = f"🖱️ Clicked element with index {params.index}"
return ActionResult(extracted_content=msg, include_in_memory=True)
@self.registry.action("Input text into an element", param_model=InputTextAction)
async def input_text(params: InputTextAction, browser: BrowserContext):
logger.info(f"Attempting to type into element index {params.index}")
# --- Actual typing logic using browser object ---
element_node = await browser.get_dom_element_by_index(params.index)
await browser._input_text_element_node(element_node, params.text) # Internal method
# ---
msg = f"⌨️ Input text into index {params.index}"
return ActionResult(extracted_content=msg, include_in_memory=True)
@self.registry.action("Complete task", param_model=DoneAction)
async def done(params: DoneAction):
logger.info(f"Task completion requested. Success: {params.success}")
return ActionResult(is_done=True, success=params.success, extracted_content=params.text)
# ... registration for scroll_down, go_to_url, etc. ...
async def act(
self,
action: ActionModel, # The ActionModel from the LLM
browser_context: BrowserContext, # The context to act within
**kwargs # Other potential context (LLMs, etc.)
) -> ActionResult:
"""Execute an action defined in the ActionModel"""
try:
# ActionModel might look like: ActionModel(click_element=ClickElementAction(index=5))
# model_dump gets {'click_element': {'index': 5}}
action_data = action.model_dump(exclude_unset=True)
for action_name, params in action_data.items():
if params is not None:
logger.debug(f"Executing action: {action_name} with params: {params}")
# Call the registry's execute method
result = await self.registry.execute_action(
action_name=action_name,
params=params,
browser=browser_context, # Pass the essential context
**kwargs # Pass any other context needed by actions
)
# Ensure result is ActionResult or convert it
if isinstance(result, ActionResult): return result
if isinstance(result, str): return ActionResult(extracted_content=result)
return ActionResult() # Default empty result if action returned None
logger.warning("ActionModel had no action to execute.")
return ActionResult(error="No action specified in the model")
except Exception as e:
logger.error(f"Error during controller.act: {e}", exc_info=True)
return ActionResult(error=str(e)) # Return error in ActionResult
```
The `Controller` registers all the standard browser actions during initialization. The `act` method then dynamically finds and executes the requested action using the `Registry`.
## Conclusion
The **Action Registry** acts as the definitive catalog or "toolbox" of all operations the `Browser Use` Agent can perform. The **Action Controller** is the "mechanic" that interprets the LLM's plan, selects the appropriate tool from the Registry, and executes it within the specified [BrowserContext](03_browsercontext.md).
Together, they provide a robust and extensible way to translate high-level instructions into low-level browser interactions, forming the crucial link between the Agent's "brain" (LLM planner) and its "hands" (browser manipulation).
Now that we know how actions are chosen and executed, how does the Agent keep track of the conversation with the LLM, including the history of states observed and actions taken? We'll explore this in the next chapter on the [Message Manager](06_message_manager.md).
[Next Chapter: Message Manager](06_message_manager.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,386 @@
# Chapter 6: Message Manager - Keeping the Conversation Straight
In the [previous chapter](05_action_controller___registry.md), we learned how the `Action Controller` and `Registry` act as the Agent's "hands" and "toolbox", executing the specific actions decided by the LLM planner. But how does the LLM get all the information it needs to make those decisions in the first place? How does the Agent keep track of the ongoing conversation, including what it "saw" on the page and what happened after each action?
Imagine you're having a long, multi-step discussion with an assistant about a complex task. If the assistant has a poor memory, they might forget earlier instructions, the current status, or previous results, making it impossible to proceed correctly. LLMs face a similar challenge: they need the conversation history for context, but they have a limited memory (called the "context window").
This is the problem the **Message Manager** solves.
## What Problem Does the Message Manager Solve?
The `Agent` needs to have a conversation with the LLM. This conversation isn't just chat; it includes:
1. **Initial Instructions:** The core rules from the [System Prompt](02_system_prompt.md).
2. **The Task:** The overall goal the Agent needs to achieve.
3. **Observations:** What the Agent currently "sees" in the browser ([BrowserContext](03_browsercontext.md) state, including the [DOM Representation](04_dom_representation.md)).
4. **Action Results:** What happened after the last action was performed ([Action Controller & Registry](05_action_controller___registry.md)).
5. **LLM's Plan:** The sequence of actions the LLM decided on.
The Message Manager solves several key problems:
* **Organizes History:** It structures the conversation chronologically, keeping track of who said what (System, User/Agent State, AI/LLM Plan).
* **Formats Messages:** It ensures the browser state, action results, and even images are formatted correctly so the LLM can understand them.
* **Tracks Size:** It keeps count of the "tokens" (roughly, words or parts of words) used in the conversation history.
* **Manages Limits:** It helps prevent the conversation history from exceeding the LLM's context window limit, potentially by removing older parts of the conversation if it gets too long.
Think of the `MessageManager` as a meticulous secretary for the Agent-LLM conversation. It takes clear, concise notes, presents the current situation accurately, and ensures the conversation doesn't ramble on for too long, keeping everything within the LLM's "attention span".
## Meet the Message Manager: The Conversation Secretary
The `MessageManager` (found in `agent/message_manager/service.py`) is responsible for managing the list of messages that are sent to the LLM in each step.
Here are its main jobs:
1. **Initialization:** When the `Agent` starts, the `MessageManager` is created. It immediately adds the foundational messages:
* The `SystemMessage` containing the rules from the [System Prompt](02_system_prompt.md).
* A `HumanMessage` stating the overall `task`.
* Other initial setup messages (like examples or sensitive data placeholders).
2. **Adding Browser State:** Before asking the LLM what to do next, the `Agent` gets the current `BrowserState`. It then tells the `MessageManager` to add this information as a `HumanMessage`. This message includes the simplified DOM map, the current URL, and potentially a screenshot (if `use_vision` is enabled). It also includes the results (`ActionResult`) from the *previous* step, so the LLM knows what happened last.
3. **Adding LLM Output:** After the LLM responds with its plan (`AgentOutput`), the `Agent` tells the `MessageManager` to add this plan as an `AIMessage`. This typically includes the LLM's reasoning and the list of actions to perform.
4. **Adding Action Results (Indirectly):** The results from the `Controller.act` call (`ActionResult`) aren't added as separate messages *after* the action. Instead, they are included in the *next* `HumanMessage` that contains the browser state (see step 2). This keeps the context tight: "Here's the current page, and here's what happened right before we got here."
5. **Providing Messages to LLM:** When the `Agent` is ready to call the LLM, it asks the `MessageManager` for the current conversation history (`get_messages()`).
6. **Token Management:** Every time a message is added, the `MessageManager` calculates how many tokens it adds (`_count_tokens`) and updates the total. If the total exceeds the limit (`max_input_tokens`), it might trigger a truncation strategy (`cut_messages`) to shorten the history, usually by removing parts of the oldest user state message or removing the image first.
## How the Agent Uses the Message Manager
Let's revisit the simplified `Agent.step` method from [Chapter 1](01_agent.md) and highlight the `MessageManager` interactions (using `self._message_manager`):
```python
# --- File: agent/service.py (Simplified step method - Highlighting MessageManager) ---
class Agent:
# ... (init, run) ...
async def step(self, step_info: Optional[AgentStepInfo] = None) -> None:
logger.info(f"📍 Step {self.state.n_steps}")
state = None
model_output = None
result: list[ActionResult] = []
try:
# 1. Get current state from the browser
state = await self.browser_context.get_state() # Uses BrowserContext
# 2. Add state + PREVIOUS result to message history via MessageManager
# 'self.state.last_result' holds the outcome of the *previous* step's action
self._message_manager.add_state_message(
state,
self.state.last_result, # Result from previous action
step_info,
self.settings.use_vision # Tell it whether to include image
)
# 3. Get the complete, formatted message history for the LLM
input_messages = self._message_manager.get_messages()
# 4. Get LLM's decision on the next action(s)
model_output = await self.get_next_action(input_messages) # Calls the LLM
# --- Agent increments step counter ---
self.state.n_steps += 1
# 5. Remove the potentially large state message before adding the compact AI response
# (This is an optimization mentioned in the provided code)
self._message_manager._remove_last_state_message()
# 6. Add the LLM's response (the plan) to the history
self._message_manager.add_model_output(model_output)
# 7. Execute the action(s) using the Controller
result = await self.multi_act(model_output.action) # Uses Controller
# 8. Store the result of THIS action. It will be used in the *next* step's
# call to self._message_manager.add_state_message()
self.state.last_result = result
# ... (Record step details, handle success/failure) ...
except Exception as e:
# Handle errors...
result = await self._handle_step_error(e)
self.state.last_result = result
# ... (finally block) ...
```
This flow shows the cycle: add state/previous result -> get messages -> call LLM -> add LLM response -> execute action -> store result for *next* state message.
## How it Works Under the Hood: Managing the Flow
Let's visualize the key interactions during one step of the Agent loop involving the `MessageManager`:
```mermaid
sequenceDiagram
participant Agent
participant BC as BrowserContext
participant MM as MessageManager
participant LLM
participant Controller
Note over Agent: Start of step
Agent->>BC: get_state()
BC-->>Agent: Current BrowserState (DOM map, URL, screenshot?)
Note over Agent: Have BrowserState and `last_result` from previous step
Agent->>MM: add_state_message(BrowserState, last_result)
MM->>MM: Format state/result into HumanMessage (with text/image)
MM->>MM: Calculate tokens for new message
MM->>MM: Add HumanMessage to internal history list
MM->>MM: Update total token count
MM->>MM: Check token limit, potentially call cut_messages()
Note over Agent: Ready to ask LLM
Agent->>MM: get_messages()
MM-->>Agent: Return List[BaseMessage] (System, Task, State1, Plan1, State2...)
Agent->>LLM: Invoke LLM with message list
LLM-->>Agent: LLM Response (AgentOutput containing plan)
Note over Agent: Got LLM's plan
Agent->>MM: _remove_last_state_message() # Optimization
MM->>MM: Remove last (large) HumanMessage from list
Agent->>MM: add_model_output(AgentOutput)
MM->>MM: Format plan into AIMessage (with tool calls)
MM->>MM: Calculate tokens for AIMessage
MM->>MM: Add AIMessage to internal history list
MM->>MM: Update total token count
Note over Agent: Ready to execute plan
Agent->>Controller: multi_act(AgentOutput.action)
Controller-->>Agent: List[ActionResult] (Result of this step's actions)
Agent->>Agent: Store ActionResult in `self.state.last_result` (for next step)
Note over Agent: End of step
```
This shows how `MessageManager` sits between the Agent, the Browser State, and the LLM, managing the history list and token counts.
## Diving Deeper into the Code (`agent/message_manager/service.py`)
Let's look at simplified versions of key methods in `MessageManager`.
**1. Initialization (`__init__` and `_init_messages`)**
When the `Agent` creates the `MessageManager`, it passes the task and the already-formatted `SystemMessage`.
```python
# --- File: agent/message_manager/service.py (Simplified __init__) ---
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, ToolMessage
# ... other imports ...
from browser_use.agent.views import MessageManagerState # Internal state storage
from browser_use.agent.message_manager.views import MessageMetadata, ManagedMessage # Message wrapper
class MessageManager:
def __init__(
self,
task: str,
system_message: SystemMessage, # Received from Agent
settings: MessageManagerSettings = MessageManagerSettings(),
state: MessageManagerState = MessageManagerState(), # Stores history
):
self.task = task
self.settings = settings # Max tokens, image settings, etc.
self.state = state # Holds the 'history' object
self.system_prompt = system_message
# Only initialize if history is empty (e.g., not resuming from saved state)
if len(self.state.history.messages) == 0:
self._init_messages()
def _init_messages(self) -> None:
"""Add the initial fixed messages to the history."""
# Add the main system prompt (rules)
self._add_message_with_tokens(self.system_prompt)
# Add the user's task
task_message = HumanMessage(
content=f'Your ultimate task is: """{self.task}"""...'
)
self._add_message_with_tokens(task_message)
# Add other setup messages (context, sensitive data info, examples)
# ... (simplified - see full code for details) ...
# Example: Add a placeholder for where the main history begins
placeholder_message = HumanMessage(content='[Your task history memory starts here]')
self._add_message_with_tokens(placeholder_message)
```
This sets up the foundational context for the LLM.
**2. Adding Browser State (`add_state_message`)**
This method takes the current `BrowserState` and the previous `ActionResult`, formats them into a `HumanMessage` (potentially multi-modal with image and text parts), and adds it to the history.
```python
# --- File: agent/message_manager/service.py (Simplified add_state_message) ---
# ... imports ...
from browser_use.browser.views import BrowserState
from browser_use.agent.views import ActionResult, AgentStepInfo
from browser_use.agent.prompts import AgentMessagePrompt # Helper to format state
class MessageManager:
# ... (init) ...
def add_state_message(
self,
state: BrowserState, # The current view of the browser
result: Optional[List[ActionResult]] = None, # Result from *previous* action
step_info: Optional[AgentStepInfo] = None,
use_vision=True, # Flag to include screenshot
) -> None:
"""Add browser state and previous result as a human message."""
# Add any 'memory' messages from the previous result first (if any)
if result:
for r in result:
if r.include_in_memory and (r.extracted_content or r.error):
content = f"Action result: {r.extracted_content}" if r.extracted_content else f"Action error: {r.error}"
msg = HumanMessage(content=content)
self._add_message_with_tokens(msg)
result = None # Don't include again in the main state message
# Use a helper class to format the BrowserState (+ optional remaining result)
# into the correct message structure (text + optional image)
state_prompt = AgentMessagePrompt(
state,
result, # Pass any remaining result info
include_attributes=self.settings.include_attributes,
step_info=step_info,
)
# Get the formatted message (could be complex list for vision)
state_message = state_prompt.get_user_message(use_vision)
# Add the formatted message (with token calculation) to history
self._add_message_with_tokens(state_message)
```
**3. Adding Model Output (`add_model_output`)**
This takes the LLM's plan (`AgentOutput`) and formats it as an `AIMessage` with specific "tool calls" structure that many models expect.
```python
# --- File: agent/message_manager/service.py (Simplified add_model_output) ---
# ... imports ...
from browser_use.agent.views import AgentOutput
class MessageManager:
# ... (init, add_state_message) ...
def add_model_output(self, model_output: AgentOutput) -> None:
"""Add model output (the plan) as an AI message with tool calls."""
# Format the output according to OpenAI's tool calling standard
tool_calls = [
{
'name': 'AgentOutput', # The 'tool' name
'args': model_output.model_dump(mode='json', exclude_unset=True), # The LLM's JSON output
'id': str(self.state.tool_id), # Unique ID for the call
'type': 'tool_call',
}
]
# Create the AIMessage containing the tool calls
msg = AIMessage(
content='', # Content is often empty when using tool calls
tool_calls=tool_calls,
)
# Add it to history
self._add_message_with_tokens(msg)
# Add a corresponding empty ToolMessage (required by some models)
self.add_tool_message(content='') # Content depends on tool execution result
def add_tool_message(self, content: str) -> None:
"""Add tool message to history (often confirms tool call receipt/result)"""
# ToolMessage links back to the AIMessage's tool_call_id
msg = ToolMessage(content=content, tool_call_id=str(self.state.tool_id))
self.state.tool_id += 1 # Increment for next potential tool call
self._add_message_with_tokens(msg)
```
**4. Adding Messages and Counting Tokens (`_add_message_with_tokens`, `_count_tokens`)**
This is the core function called by others to add any message to the history, ensuring token counts are tracked.
```python
# --- File: agent/message_manager/service.py (Simplified _add_message_with_tokens) ---
# ... imports ...
from langchain_core.messages import BaseMessage
from browser_use.agent.message_manager.views import MessageMetadata, ManagedMessage
class MessageManager:
# ... (other methods) ...
def _add_message_with_tokens(self, message: BaseMessage, position: int | None = None) -> None:
"""Internal helper to add any message with its token count metadata."""
# 1. Optionally filter sensitive data (replace actual data with placeholders)
# if self.settings.sensitive_data:
# message = self._filter_sensitive_data(message) # Simplified
# 2. Count the tokens in the message
token_count = self._count_tokens(message)
# 3. Create metadata object
metadata = MessageMetadata(tokens=token_count)
# 4. Add the message and its metadata to the history list
# (self.state.history is a MessageHistory object)
self.state.history.add_message(message, metadata, position)
# Note: self.state.history.add_message also updates the total token count
# 5. Check if history exceeds token limit and truncate if needed
self.cut_messages() # Check and potentially trim history
def _count_tokens(self, message: BaseMessage) -> int:
"""Estimate tokens in a message."""
tokens = 0
if isinstance(message.content, list): # Multi-modal (text + image)
for item in message.content:
if isinstance(item, dict) and 'image_url' in item:
# Add fixed cost for images
tokens += self.settings.image_tokens
elif isinstance(item, dict) and 'text' in item:
# Estimate tokens based on text length
tokens += len(item['text']) // self.settings.estimated_characters_per_token
elif isinstance(message.content, str): # Text message
text = message.content
if hasattr(message, 'tool_calls'): # Add tokens for tool call structure
text += str(getattr(message, 'tool_calls', ''))
tokens += len(text) // self.settings.estimated_characters_per_token
return tokens
def cut_messages(self):
"""Trim messages if total tokens exceed the limit."""
# Calculate how many tokens we are over the limit
diff = self.state.history.current_tokens - self.settings.max_input_tokens
if diff <= 0:
return # We are within limits
logger.debug(f"Token limit exceeded by {diff}. Trimming history.")
# Strategy:
# 1. Try removing the image from the *last* (most recent) state message if present.
# (Code logic finds the last message, checks content list, removes image item, updates counts)
# ... (Simplified - see full code for image removal logic) ...
# 2. If still over limit after image removal (or no image was present),
# trim text content from the *end* of the last state message.
# Calculate proportion to remove, shorten string, create new message.
# ... (Simplified - see full code for text trimming logic) ...
# Ensure we don't get stuck if trimming isn't enough (raise error)
if self.state.history.current_tokens > self.settings.max_input_tokens:
raise ValueError("Max token limit reached even after trimming.")
```
This shows the basic mechanics of adding messages, calculating their approximate size, and applying strategies to keep the history within the LLM's context window limit.
## Conclusion
The `MessageManager` is the Agent's conversation secretary. It meticulously records the dialogue between the Agent (reporting browser state and action results) and the LLM (providing analysis and action plans), starting from the initial `System Prompt` and task definition.
Crucially, it formats these messages correctly, tracks the conversation's size using token counts, and implements strategies to keep the history concise enough for the LLM's limited context window. Without the `MessageManager`, the Agent would quickly lose track of the conversation, and the LLM wouldn't have the necessary context to guide the browser effectively.
Many of the objects managed and passed around by the `MessageManager`, like `BrowserState`, `ActionResult`, and `AgentOutput`, are defined as specific data structures. In the next chapter, we'll take a closer look at these important **Data Structures (Views)**.
[Next Chapter: Data Structures (Views)](07_data_structures__views_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,235 @@
# Chapter 7: Data Structures (Views) - The Project's Blueprints
In the [previous chapter](06_message_manager.md), we saw how the `MessageManager` acts like a secretary, carefully organizing the conversation between the [Agent](01_agent.md) and the LLM. It manages different pieces of information the browser's current state, the LLM's plan, the results of actions, and more.
But how do all these different components the Agent, the LLM parser, the [BrowserContext](03_browsercontext.md), the [Action Controller & Registry](05_action_controller___registry.md), and the [Message Manager](06_message_manager.md) ensure they understand each other perfectly? If the LLM gives a plan in one format, and the Controller expects it in another, things will break!
Imagine trying to build furniture using instructions written in a language you don't fully understand, or trying to fill out a form where every section uses a different layout. It would be confusing and error-prone. We need a shared, consistent language and format.
This is where **Data Structures (Views)** come in. They act as the official blueprints or standardized forms for all the important information passed around within the `Browser Use` project.
## What Problem Do Data Structures Solve?
In a complex system like `Browser Use`, many components need to exchange data:
* The [BrowserContext](03_browsercontext.md) needs to package up the current state of the webpage.
* The [Agent](01_agent.md) needs to understand the LLM's multi-step plan.
* The [Action Controller & Registry](05_action_controller___registry.md) needs to know exactly which action to perform and with what specific parameters (like which element index to click).
* The Controller needs to report back the result of an action in a predictable way.
Without a standard format for each piece of data, you might encounter problems like:
* Misinterpreting data (e.g., is `5` an element index or a quantity?).
* Missing required information.
* Inconsistent naming (`element_id` vs `index` vs `element_number`).
* Difficulty debugging when data looks different every time.
Data Structures (Views) solve this by defining **strict, consistent blueprints** for the data. Everyone agrees to use these blueprints, ensuring smooth communication and preventing errors.
## Meet Pydantic: The Blueprint Maker and Checker
In `Browser Use`, these blueprints are primarily defined using a popular Python library called **Pydantic**.
Think of Pydantic like a combination of:
1. **A Blueprint Designer:** It provides an easy way to define the structure of your data using standard Python type hints (like `str` for text, `int` for whole numbers, `bool` for True/False, `list` for lists).
2. **A Quality Inspector:** When data comes in (e.g., from the LLM or from an action's result), Pydantic automatically checks if it matches the blueprint. Does it have all the required fields? Are the data types correct? If not, Pydantic raises an error, stopping bad data before it causes problems later.
These Pydantic models (our blueprints) are often stored in files named `views.py` within different component directories (like `agent/views.py`, `browser/views.py`), which is why we sometimes call them "Views".
## Key Blueprints in `Browser Use`
Let's look at some of the most important data structures used in the project. Don't worry about memorizing every detail; focus on *what kind* of information each blueprint holds and *who* uses it.
*(Note: These are simplified representations. The actual models might have more fields or features.)*
### 1. `BrowserState` (from `browser/views.py`)
* **Purpose:** Represents a complete snapshot of the browser's state at a specific moment.
* **Blueprint Contents (Simplified):**
* `url`: The current web address (string).
* `title`: The title of the webpage (string).
* `element_tree`: The simplified map of the webpage content (from [DOM Representation](04_dom_representation.md)).
* `selector_map`: The lookup map for interactive elements (from [DOM Representation](04_dom_representation.md)).
* `screenshot`: An optional image of the page (string, base64 encoded).
* `tabs`: Information about other open tabs in this context (list).
* **Who Uses It:**
* Created by: [BrowserContext](03_browsercontext.md) (`get_state()` method).
* Used by: [Agent](01_agent.md) (to see the current situation), [Message Manager](06_message_manager.md) (to store in history).
```python
# --- Conceptual Pydantic Model ---
# File: browser/views.py (Simplified Example)
from pydantic import BaseModel
from typing import Optional, List, Dict # For type hints
# Assume DOMElementNode and TabInfo are defined elsewhere
class BrowserState(BaseModel):
url: str
title: str
element_tree: Optional[object] # Simplified: Actual type is DOMElementNode
selector_map: Optional[Dict[int, object]] # Simplified: Actual type is SelectorMap
screenshot: Optional[str] = None # Optional field
tabs: List[object] = [] # Simplified: Actual type is TabInfo
# Pydantic ensures that when a BrowserState is created,
# 'url' and 'title' MUST be provided as strings.
```
### 2. `ActionModel` (from `controller/registry/views.py`)
* **Purpose:** Represents a *single* specific action the LLM wants to perform, including its parameters. This model is often created *dynamically* based on the actions available in the [Action Controller & Registry](05_action_controller___registry.md).
* **Blueprint Contents (Example for `click_element`):**
* `index`: The `highlight_index` of the element to click (integer).
* `xpath`: An optional hint about the element's location (string).
* **Blueprint Contents (Example for `input_text`):**
* `index`: The `highlight_index` of the input field (integer).
* `text`: The text to type (string).
* **Who Uses It:**
* Defined by/Registered in: [Action Controller & Registry](05_action_controller___registry.md).
* Created based on: LLM output (often part of `AgentOutput`).
* Used by: [Action Controller & Registry](05_action_controller___registry.md) (to validate parameters and know what function to call).
```python
# --- Conceptual Pydantic Models ---
# File: controller/views.py (Simplified Examples)
from pydantic import BaseModel
from typing import Optional
class ClickElementAction(BaseModel):
index: int
xpath: Optional[str] = None # Optional hint
class InputTextAction(BaseModel):
index: int
text: str
xpath: Optional[str] = None # Optional hint
# Base model that dynamically holds ONE of the above actions
class ActionModel(BaseModel):
# Pydantic allows models like this where only one field is expected
# e.g., ActionModel(click_element=ClickElementAction(index=5))
# or ActionModel(input_text=InputTextAction(index=12, text="hello"))
click_element: Optional[ClickElementAction] = None
input_text: Optional[InputTextAction] = None
# ... fields for other possible actions (scroll, done, etc.) ...
pass # More complex logic handles ensuring only one action is present
```
### 3. `AgentOutput` (from `agent/views.py`)
* **Purpose:** Represents the complete plan received from the LLM after it analyzes the current state. This is the structure the [System Prompt](02_system_prompt.md) tells the LLM to follow.
* **Blueprint Contents (Simplified):**
* `current_state`: The LLM's thoughts/reasoning (a nested structure, often called `AgentBrain`).
* `action`: A *list* of one or more `ActionModel` objects representing the steps the LLM wants to take.
* **Who Uses It:**
* Created by: The [Agent](01_agent.md) parses the LLM's raw JSON output into this structure.
* Used by: [Agent](01_agent.md) (to understand the plan), [Message Manager](06_message_manager.md) (to store the plan in history), [Action Controller & Registry](05_action_controller___registry.md) (reads the `action` list).
```python
# --- Conceptual Pydantic Model ---
# File: agent/views.py (Simplified Example)
from pydantic import BaseModel
from typing import List
# Assume ActionModel and AgentBrain are defined elsewhere
class AgentOutput(BaseModel):
current_state: object # Simplified: Actual type is AgentBrain
action: List[ActionModel] # A list of actions to execute
# Pydantic ensures the LLM output MUST have 'current_state' and 'action',
# and that 'action' MUST be a list containing valid ActionModel objects.
```
### 4. `ActionResult` (from `agent/views.py`)
* **Purpose:** Represents the outcome after the [Action Controller & Registry](05_action_controller___registry.md) attempts to execute a single action.
* **Blueprint Contents (Simplified):**
* `is_done`: Did this action signal the end of the overall task? (boolean, optional).
* `success`: If done, was the task successful overall? (boolean, optional).
* `extracted_content`: Any text result from the action (e.g., "Clicked button X") (string, optional).
* `error`: Any error message if the action failed (string, optional).
* `include_in_memory`: Should this result be explicitly shown to the LLM next time? (boolean).
* **Who Uses It:**
* Created by: Functions within the [Action Controller & Registry](05_action_controller___registry.md) (like `click_element`).
* Used by: [Agent](01_agent.md) (to check status, record results), [Message Manager](06_message_manager.md) (includes info in the next state message sent to LLM).
```python
# --- Conceptual Pydantic Model ---
# File: agent/views.py (Simplified Example)
from pydantic import BaseModel
from typing import Optional
class ActionResult(BaseModel):
is_done: Optional[bool] = False
success: Optional[bool] = None
extracted_content: Optional[str] = None
error: Optional[str] = None
include_in_memory: bool = False # Default to False
# Pydantic helps ensure results are consistently structured.
# For example, 'is_done' must be True or False if provided.
```
## The Power of Blueprints: Ensuring Consistency
Using Pydantic models for these data structures provides a huge benefit: **automatic validation**.
Imagine the LLM sends back a plan, but it forgets to include the `index` for a `click_element` action.
```json
// Bad LLM Response (Missing 'index')
{
"current_state": { ... },
"action": [
{
"click_element": {
"xpath": "//button[@id='submit']" // 'index' is missing!
}
}
]
}
```
When the [Agent](01_agent.md) tries to parse this JSON into the `AgentOutput` Pydantic model, Pydantic will immediately notice that the `index` field (which is required by the `ClickElementAction` blueprint) is missing. It will raise a `ValidationError`.
```python
# --- Conceptual Agent Code ---
import pydantic
# Assume AgentOutput is the Pydantic model defined earlier
# Assume 'llm_json_response' contains the bad JSON from above
try:
# Try to create the AgentOutput object from the LLM's response
llm_plan = AgentOutput.model_validate_json(llm_json_response)
# If validation succeeds, proceed...
print("LLM Plan Validated:", llm_plan)
except pydantic.ValidationError as e:
# Pydantic catches the error!
print(f"Validation Error: The LLM response didn't match the blueprint!")
print(e)
# The Agent can now handle this error gracefully,
# maybe asking the LLM to try again, instead of crashing later.
```
This automatic checking catches errors early, preventing the [Action Controller & Registry](05_action_controller___registry.md) from receiving incomplete instructions and making the whole system much more robust and easier to debug. It enforces the "contract" between different components.
## Under the Hood: Simple Classes
These data structures are simply Python classes, mostly inheriting from `pydantic.BaseModel` or defined using Python's built-in `dataclass`. They don't contain complex logic themselves; their main job is to define the *shape* and *type* of the data. You'll find their definitions scattered across the various `views.py` files within the project's component directories (like `agent/`, `browser/`, `controller/`, `dom/`).
Think of them as the official vocabulary and grammar rules that all the components agree to use when communicating.
## Conclusion
Data Structures (Views), primarily defined using Pydantic models, are the essential blueprints that ensure consistent and reliable communication within the `Browser Use` project. They act like standardized forms for `BrowserState`, `AgentOutput`, `ActionModel`, and `ActionResult`, making sure every component knows exactly what kind of data to expect and how to interpret it.
By defining these clear structures and leveraging Pydantic's automatic validation, `Browser Use` prevents misunderstandings between components, catches errors early, and makes the overall system more robust and maintainable. These standardized structures also make it easier to log and understand what's happening in the system.
Speaking of logging and understanding the system's behavior, how can we monitor the Agent's performance and gather data for improvement? In the next and final chapter, we'll explore the [Telemetry Service](08_telemetry_service.md).
[Next Chapter: Telemetry Service](08_telemetry_service.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,290 @@
# Chapter 8: Telemetry Service - Helping Improve the Project (Optional)
In the [previous chapter](07_data_structures__views_.md), we explored the essential blueprints (`Data Structures (Views)`) that keep communication clear and consistent between all the parts of `Browser Use`. We saw how components like the [Agent](01_agent.md) and the [Action Controller & Registry](05_action_controller___registry.md) use these blueprints to exchange information reliably.
Now, let's think about the project itself. How do the developers who build `Browser Use` know if it's working well for users? How do they find out about common errors or which features are most popular, so they can make the tool better?
## What Problem Does the Telemetry Service Solve?
Imagine you released a new tool, like `Browser Use`. You want it to be helpful, but you don't know how people are actually using it. Are they running into unexpected errors? Are certain actions (like clicking vs. scrolling) causing problems? Is the performance okay? Without some feedback, it's hard to know where to focus improvements.
One way to get feedback is through bug reports or feature requests, but that only captures a small fraction of user experiences. We need a way to get a broader, anonymous picture of how the tool is performing "in the wild."
The **Telemetry Service** solves this by providing an *optional* and *anonymous* way to send basic usage statistics back to the project developers. Think of it like an anonymous suggestion box or an automatic crash report that doesn't include any personal information.
**Crucially:** This service is designed to protect user privacy. It doesn't collect website content, personal data, or anything sensitive. It only sends anonymous statistics about the tool's operation, and **it can be completely disabled**.
## Meet `ProductTelemetry`: The Anonymous Reporter
The component responsible for this is the `ProductTelemetry` service, found in `telemetry/service.py`.
* **Collects Usage Data:** It gathers anonymized information about events like:
* When an [Agent](01_agent.md) starts or finishes a run.
* Details about each step the Agent takes (like which actions were used).
* Errors encountered during agent runs.
* Which actions are defined in the [Action Controller & Registry](05_action_controller___registry.md).
* **Anonymizes Data:** It uses a randomly generated user ID (stored locally, not linked to you) to group events from the same installation without knowing *who* the user is.
* **Sends Data:** It sends this anonymous data to a secure third-party service (PostHog) used by the developers to analyze trends and identify potential issues.
* **Optional:** You can easily turn it off.
## How is Telemetry Used? (Mostly Automatic)
You usually don't interact with the `ProductTelemetry` service directly. Instead, other components like the `Agent` and `Controller` automatically call it at key moments.
**Example: Agent Run Start/End**
When you create an `Agent` and call `agent.run()`, the Agent automatically notifies the Telemetry Service.
```python
# --- File: agent/service.py (Simplified Agent run method) ---
class Agent:
# ... (other methods) ...
# Agent has a telemetry object initialized in __init__
# self.telemetry = ProductTelemetry()
async def run(self, max_steps: int = 100) -> AgentHistoryList:
# ---> Tell Telemetry: Agent run is starting <---
self._log_agent_run() # This includes a telemetry.capture() call
try:
# ... (main agent loop runs here) ...
for step_num in range(max_steps):
# ... (agent takes steps) ...
if self.state.history.is_done():
break
# ...
finally:
# ---> Tell Telemetry: Agent run is ending <---
self.telemetry.capture(
AgentEndTelemetryEvent( # Uses a specific data structure
agent_id=self.state.agent_id,
is_done=self.state.history.is_done(),
success=self.state.history.is_successful(),
# ... other anonymous stats ...
)
)
# ... (cleanup browser etc.) ...
return self.state.history
```
**Explanation:**
1. When the `Agent` is created, it gets an instance of `ProductTelemetry`.
2. Inside the `run` method, before the main loop starts, `_log_agent_run()` is called, which internally uses `self.telemetry.capture()` to send an `AgentRunTelemetryEvent`.
3. After the loop finishes (or an error occurs), the `finally` block ensures that another `self.telemetry.capture()` call is made, this time sending an `AgentEndTelemetryEvent` with summary statistics about the run.
Similarly, the `Agent.step` method captures an `AgentStepTelemetryEvent`, and the `Controller`'s `Registry` captures a `ControllerRegisteredFunctionsTelemetryEvent` when it's initialized. This happens automatically in the background if telemetry is enabled.
## How to Disable Telemetry
If you prefer not to send any anonymous usage data, you can easily disable the Telemetry Service.
Set the environment variable `ANONYMIZED_TELEMETRY` to `False`.
How you set environment variables depends on your operating system:
* **Linux/macOS (in terminal):**
```bash
export ANONYMIZED_TELEMETRY=False
# Now run your Python script in the same terminal
python your_agent_script.py
```
* **Windows (Command Prompt):**
```cmd
set ANONYMIZED_TELEMETRY=False
python your_agent_script.py
```
* **Windows (PowerShell):**
```powershell
$env:ANONYMIZED_TELEMETRY="False"
python your_agent_script.py
```
* **In Python Code (using `os` module, *before* importing `browser_use`):**
```python
import os
os.environ['ANONYMIZED_TELEMETRY'] = 'False'
# Now import and use browser_use
from browser_use import Agent # ... other imports
# ... rest of your script ...
```
If this environment variable is set to `False`, the `ProductTelemetry` service will be initialized in a disabled state, and no data will be collected or sent.
## How It Works Under the Hood: Sending Anonymous Data
When telemetry is enabled and an event occurs (like `agent.run()` starting):
1. **Component Calls Capture:** The `Agent` (or `Controller`) calls `telemetry.capture(event_data)`.
2. **Telemetry Service Checks:** The `ProductTelemetry` service checks if it's enabled. If not, it does nothing.
3. **Get User ID:** It retrieves or generates a unique, anonymous user ID. This is typically a random UUID (like `a1b2c3d4-e5f6-7890-abcd-ef1234567890`) stored in a hidden file on your computer (`~/.cache/browser_use/telemetry_user_id`). This ID helps group events from the same installation without identifying the actual user.
4. **Send to PostHog:** It sends the event data (structured using Pydantic models like `AgentRunTelemetryEvent`) along with the anonymous user ID to PostHog, a third-party service specialized in product analytics.
5. **Analysis:** Developers can then look at aggregated, anonymous trends in PostHog (e.g., "What percentage of agent runs finish successfully?", "What are the most common errors?") to understand usage patterns and prioritize improvements.
Here's a simplified diagram:
```mermaid
sequenceDiagram
participant Agent
participant TelemetrySvc as ProductTelemetry
participant LocalFile as ~/.cache/.../user_id
participant PostHog
Agent->>TelemetrySvc: capture(AgentRunEvent)
Note over TelemetrySvc: Telemetry Enabled? Yes.
TelemetrySvc->>LocalFile: Read existing User ID (or create new)
LocalFile-->>TelemetrySvc: Anonymous User ID (UUID)
Note over TelemetrySvc: Package Event + User ID
TelemetrySvc->>PostHog: Send(EventData, UserID)
PostHog-->>TelemetrySvc: Acknowledgment (Optional)
```
Let's look at the simplified code involved.
**1. Initializing Telemetry (`telemetry/service.py`)**
The service checks the environment variable during initialization.
```python
# --- File: telemetry/service.py (Simplified __init__) ---
import os
import uuid
import logging
from pathlib import Path
from posthog import Posthog # The library for the external service
from browser_use.utils import singleton
logger = logging.getLogger(__name__)
@singleton # Ensures only one instance exists
class ProductTelemetry:
USER_ID_PATH = str(Path.home() / '.cache' / 'browser_use' / 'telemetry_user_id')
# ... (API key constants) ...
_curr_user_id = None
def __init__(self) -> None:
# Check the environment variable
telemetry_disabled = os.getenv('ANONYMIZED_TELEMETRY', 'true').lower() == 'false'
if telemetry_disabled:
self._posthog_client = None # Telemetry is off
logger.debug('Telemetry disabled by environment variable.')
else:
# Initialize the PostHog client if enabled
self._posthog_client = Posthog(...)
logger.info(
'Anonymized telemetry enabled.' # Inform the user
)
# Optionally silence PostHog's own logs
# ...
# ... (other methods) ...
```
**2. Capturing an Event (`telemetry/service.py`)**
The `capture` method sends the data if the client is active.
```python
# --- File: telemetry/service.py (Simplified capture) ---
# Assume BaseTelemetryEvent is the base Pydantic model for events
from browser_use.telemetry.views import BaseTelemetryEvent
class ProductTelemetry:
# ... (init) ...
def capture(self, event: BaseTelemetryEvent) -> None:
# Do nothing if telemetry is disabled
if self._posthog_client is None:
return
try:
# Get the anonymous user ID (lazy loaded)
anon_user_id = self.user_id
# Send the event name and its properties (as a dictionary)
self._posthog_client.capture(
distinct_id=anon_user_id,
event=event.name, # e.g., "agent_run"
properties=event.properties # Data from the event model
)
logger.debug(f'Telemetry event captured: {event.name}')
except Exception as e:
# Don't crash the main application if telemetry fails
logger.error(f'Failed to send telemetry event {event.name}: {e}')
@property
def user_id(self) -> str:
"""Gets or creates the anonymous user ID."""
if self._curr_user_id:
return self._curr_user_id
try:
# Check if the ID file exists
id_file = Path(self.USER_ID_PATH)
if not id_file.exists():
# Create directory and generate a new UUID if it doesn't exist
id_file.parent.mkdir(parents=True, exist_ok=True)
new_user_id = str(uuid.uuid4())
id_file.write_text(new_user_id)
self._curr_user_id = new_user_id
else:
# Read the existing UUID from the file
self._curr_user_id = id_file.read_text().strip()
except Exception:
# Fallback if file access fails
self._curr_user_id = 'UNKNOWN_USER_ID'
return self._curr_user_id
```
**3. Event Data Structures (`telemetry/views.py`)**
Like other components, Telemetry uses Pydantic models to define the structure of the data being sent.
```python
# --- File: telemetry/views.py (Simplified Event Example) ---
from dataclasses import dataclass, asdict
from typing import Any, Dict, Sequence
# Base class for all telemetry events (conceptual)
@dataclass
class BaseTelemetryEvent:
@property
def name(self) -> str:
raise NotImplementedError
@property
def properties(self) -> Dict[str, Any]:
# Helper to convert the dataclass fields to a dictionary
return {k: v for k, v in asdict(self).items() if k != 'name'}
# Specific event for when an agent run starts
@dataclass
class AgentRunTelemetryEvent(BaseTelemetryEvent):
agent_id: str # Anonymous ID for the specific agent instance
use_vision: bool # Was vision enabled?
task: str # The task description (anonymized/hashed in practice)
model_name: str # Name of the LLM used
chat_model_library: str # Library used for the LLM (e.g., ChatOpenAI)
version: str # browser-use version
source: str # How browser-use was installed (e.g., pip, git)
name: str = 'agent_run' # The event name sent to PostHog
# ... other event models like AgentEndTelemetryEvent, AgentStepTelemetryEvent ...
```
These structures ensure the data sent to PostHog is consistent and well-defined.
## Conclusion
The **Telemetry Service** (`ProductTelemetry`) provides an optional and privacy-conscious way for the `Browser Use` project to gather anonymous feedback about how the tool is being used. It automatically captures events like agent runs, steps, and errors, sending anonymized statistics to developers via PostHog.
This feedback loop is vital for identifying common issues, understanding feature usage, and ultimately improving the `Browser Use` library for everyone. Remember, you have full control and can easily disable this service by setting the `ANONYMIZED_TELEMETRY=False` environment variable.
This chapter concludes our tour of the core components within the `Browser Use` project. You've learned about the [Agent](01_agent.md), the guiding [System Prompt](02_system_prompt.md), the isolated [BrowserContext](03_browsercontext.md), the webpage map ([DOM Representation](04_dom_representation.md)), the action execution engine ([Action Controller & Registry](05_action_controller___registry.md)), the conversation tracker ([Message Manager](06_message_manager.md)), the data blueprints ([Data Structures (Views)](07_data_structures__views_.md)), and now the optional feedback mechanism ([Telemetry Service](08_telemetry_service.md)). We hope this gives you a solid foundation for understanding and using `Browser Use`!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,53 @@
# Tutorial: Browser Use
**Browser Use** is a project that allows an *AI agent* to control a web browser and perform tasks automatically.
Think of it like an AI assistant that can browse websites, fill forms, click buttons, and extract information based on your instructions. It uses a Large Language Model (LLM) as its "brain" to decide what actions to take on a webpage to complete a given *task*. The project manages the browser session, understands the page structure (DOM), and communicates back and forth with the LLM.
**Source Repository:** [https://github.com/browser-use/browser-use/tree/3076ba0e83f30b45971af58fe2aeff64472da812/browser_use](https://github.com/browser-use/browser-use/tree/3076ba0e83f30b45971af58fe2aeff64472da812/browser_use)
```mermaid
flowchart TD
A0["Agent"]
A1["BrowserContext"]
A2["Action Controller & Registry"]
A3["DOM Representation"]
A4["Message Manager"]
A5["System Prompt"]
A6["Data Structures (Views)"]
A7["Telemetry Service"]
A0 -- "Gets state from" --> A1
A0 -- "Uses to execute actions" --> A2
A0 -- "Uses for LLM communication" --> A4
A0 -- "Gets instructions from" --> A5
A0 -- "Uses/Produces data formats" --> A6
A0 -- "Logs events to" --> A7
A1 -- "Gets DOM structure via" --> A3
A1 -- "Provides BrowserState" --> A6
A2 -- "Executes actions on" --> A1
A2 -- "Defines/Uses ActionModel/Ac..." --> A6
A2 -- "Logs registered functions to" --> A7
A3 -- "Provides structure to" --> A1
A3 -- "Uses DOM structures" --> A6
A4 -- "Provides messages to" --> A0
A4 -- "Initializes with" --> A5
A4 -- "Formats data using" --> A6
A5 -- "Defines structure for Agent..." --> A6
A7 -- "Receives events from" --> A0
```
## Chapters
1. [Agent](01_agent.md)
2. [System Prompt](02_system_prompt.md)
3. [BrowserContext](03_browsercontext.md)
4. [DOM Representation](04_dom_representation.md)
5. [Action Controller & Registry](05_action_controller___registry.md)
6. [Message Manager](06_message_manager.md)
7. [Data Structures (Views)](07_data_structures__views_.md)
8. [Telemetry Service](08_telemetry_service.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,293 @@
# Chapter 1: The Celery App - Your Task Headquarters
Welcome to the world of Celery! If you've ever thought, "I wish this slow part of my web request could run somewhere else later," or "How can I process this huge amount of data without freezing my main application?", then Celery is here to help.
Celery allows you to run code (we call these "tasks") separately from your main application, either in the background on the same machine or distributed across many different machines.
But how do you tell Celery *what* tasks to run and *how* to run them? That's where the **Celery App** comes in.
## What Problem Does the Celery App Solve?
Imagine you're building a website. When a user uploads a profile picture, you need to resize it into different formats (thumbnail, medium, large). Doing this immediately when the user clicks "upload" can make the request slow and keep the user waiting.
Ideally, you want to:
1. Quickly save the original image.
2. Tell the user "Okay, got it!"
3. *Later*, in the background, resize the image.
Celery helps with step 3. But you need a central place to define the "resize image" task and configure *how* it should be run (e.g., where to send the request to resize, where to store the result). The **Celery App** is that central place.
Think of it like the main application object in web frameworks like Flask or Django. It's the starting point, the brain, the headquarters for everything Celery-related in your project.
## Creating Your First Celery App
Getting started is simple. You just need to create an instance of the `Celery` class.
Let's create a file named `celery_app.py`:
```python
# celery_app.py
from celery import Celery
# Create a Celery app instance
# 'tasks' is just a name for this app instance, often the module name.
# 'broker' tells Celery where to send task messages.
# We'll use Redis here for simplicity (you need Redis running).
app = Celery('tasks',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/0') # Added backend for results
print(f"Celery app created: {app}")
```
**Explanation:**
* `from celery import Celery`: We import the main `Celery` class.
* `app = Celery(...)`: We create an instance.
* `'tasks'`: This is the *name* of our Celery application. It's often good practice to use the name of the module where your app is defined. Celery uses this name to automatically name tasks if you don't provide one explicitly.
* `broker='redis://localhost:6379/0'`: This is crucial! It tells Celery where to send the task messages. A "broker" is like a post office for tasks. We're using Redis here, but Celery supports others like RabbitMQ. We'll learn more about the [Broker Connection (AMQP)](04_broker_connection__amqp_.md) in Chapter 4. (Note: AMQP is the protocol often used with brokers like RabbitMQ, but the concept applies even when using Redis).
* `backend='redis://localhost:6379/0'`: This tells Celery where to store the results of your tasks. If your task returns a value (like `2+2` returns `4`), Celery can store this `4` in the backend. We'll cover the [Result Backend](06_result_backend.md) in Chapter 6.
That's it! You now have a `Celery` application instance named `app`. This `app` object is your main tool for working with Celery.
## Defining a Task with the App
Now that we have our `app`, how do we define a task? We use the `@app.task` decorator.
Let's modify `celery_app.py`:
```python
# celery_app.py
from celery import Celery
import time
# Create a Celery app instance
app = Celery('tasks',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/0')
# Define a simple task using the app's decorator
@app.task
def add(x, y):
print(f"Task 'add' started with args: ({x}, {y})")
time.sleep(2) # Simulate some work
result = x + y
print(f"Task 'add' finished with result: {result}")
return result
print(f"Task 'add' is registered: {app.tasks.get('celery_app.add')}")
```
**Explanation:**
* `@app.task`: This is the magic decorator. It takes our regular Python function `add(x, y)` and registers it as a Celery task within our `app`.
* Now, `app` knows about a task called `celery_app.add` (Celery automatically generates the name based on the module `celery_app` and function `add`).
* We'll learn all about [Task](03_task.md)s in Chapter 3.
## Sending a Task (Conceptual)
How do we actually *run* this `add` task in the background? We use methods like `.delay()` or `.apply_async()` on the task object itself.
```python
# In a separate Python script or interpreter, after importing 'add' from celery_app.py
from celery_app import add
# Send the task to the broker configured in our 'app'
result_promise = add.delay(4, 5)
print(f"Task sent! It will run in the background.")
print(f"We got back a promise object: {result_promise}")
# We can later check the result using result_promise.get()
# (Requires a result backend and a worker running the task)
```
**Explanation:**
* `add.delay(4, 5)`: This doesn't run the `add` function *right now*. Instead, it:
1. Packages the task name (`celery_app.add`) and its arguments (`4`, `5`) into a message.
2. Sends this message to the **broker** (Redis, in our case) that was configured in our `Celery` app instance (`app`).
* It returns an `AsyncResult` object (our `result_promise`), which is like an IOU or a placeholder for the actual result. We can use this later to check if the task finished and what its result was (if we configured a [Result Backend](06_result_backend.md)).
A separate program, called a Celery [Worker](05_worker.md), needs to be running. This worker watches the broker for new task messages, executes the corresponding task function, and (optionally) stores the result in the backend. We'll learn how to run a worker in Chapter 5.
The key takeaway here is that the **Celery App** holds the configuration needed (`broker` and `backend` URLs) for `add.delay()` to know *where* to send the task message and potentially where the result will be stored.
## How It Works Internally (High-Level)
Let's visualize the process of creating the app and sending a task:
1. **Initialization (`Celery(...)`)**: When you create `app = Celery(...)`, the app instance stores the `broker` and `backend` URLs and sets up internal components like the task registry.
2. **Task Definition (`@app.task`)**: The decorator tells the `app` instance: "Hey, remember this function `add`? It's a task." The app stores this information in its internal task registry (`app.tasks`).
3. **Sending a Task (`add.delay(4, 5)`)**:
* `add.delay()` looks up the `app` it belongs to.
* It asks the `app` for the `broker` URL.
* It creates a message containing the task name (`celery_app.add`), arguments (`4, 5`), and other details.
* It uses the `broker` URL to connect to the broker (Redis) and sends the message.
```mermaid
sequenceDiagram
participant Client as Your Python Code
participant CeleryApp as app = Celery(...)
participant AddTask as @app.task add()
participant Broker as Redis/RabbitMQ
Client->>CeleryApp: Create instance (broker='redis://...')
Client->>AddTask: Define add() function with @app.task
Note over AddTask,CeleryApp: Decorator registers 'add' with 'app'
Client->>AddTask: Call add.delay(4, 5)
AddTask->>CeleryApp: Get broker configuration
CeleryApp-->>AddTask: 'redis://...'
AddTask->>Broker: Send task message ('add', 4, 5)
Broker-->>AddTask: Acknowledgment (message sent)
AddTask-->>Client: Return AsyncResult (promise)
```
This diagram shows how the `Celery App` acts as the central coordinator, holding configuration and enabling the task (`add`) to send its execution request to the Broker.
## Code Dive: Inside the `Celery` Class
Let's peek at some relevant code snippets (simplified for clarity).
**Initialization (`app/base.py`)**
When you call `Celery(...)`, the `__init__` method runs:
```python
# Simplified from celery/app/base.py
from .registry import TaskRegistry
from .utils import Settings
class Celery:
def __init__(self, main=None, broker=None, backend=None,
include=None, config_source=None, task_cls=None,
autofinalize=True, **kwargs):
self.main = main # Store the app name ('tasks' in our example)
self._tasks = TaskRegistry({}) # Create an empty dictionary for tasks
# Store broker/backend/include settings temporarily
self._preconf = {}
self.__autoset('broker_url', broker)
self.__autoset('result_backend', backend)
self.__autoset('include', include)
# ... other kwargs ...
# Configuration object - initially pending, loaded later
self._conf = Settings(...)
# ... other setup ...
_register_app(self) # Register this app instance globally (sometimes useful)
# Helper to store initial settings before full configuration load
def __autoset(self, key, value):
if value is not None:
self._preconf[key] = value
```
This shows how the `Celery` object is initialized, storing the name, setting up a task registry, and holding onto initial configuration like the `broker` URL. The full configuration is often loaded later (see [Configuration](02_configuration.md)).
**Task Decorator (`app/base.py`)**
The `@app.task` decorator ultimately calls `_task_from_fun`:
```python
# Simplified from celery/app/base.py
def task(self, *args, **opts):
# ... logic to handle decorator arguments ...
def _create_task_cls(fun):
# If app isn't finalized, might return a proxy object first
# Eventually calls _task_from_fun to create/register the task
ret = self._task_from_fun(fun, **opts)
return ret
return _create_task_cls
def _task_from_fun(self, fun, name=None, base=None, bind=False, **options):
# Generate task name if not provided (e.g., 'celery_app.add')
name = name or self.gen_task_name(fun.__name__, fun.__module__)
base = base or self.Task # Default base Task class
# Check if task already registered
if name not in self._tasks:
# Create a Task class dynamically based on the function
task = type(fun.__name__, (base,), {
'app': self, # Link task back to this app instance!
'name': name,
'run': staticmethod(fun), # The actual function to run
# ... other attributes and options ...
})() # Instantiate the new task class
self._tasks[task.name] = task # Add to app's task registry
task.bind(self) # Perform any binding steps
else:
task = self._tasks[name] # Task already exists
return task
```
This shows how the decorator uses the `app` instance (`self`) to generate a name, create a `Task` object wrapping your function, associate the task with the app (`'app': self`), and store it in the `app._tasks` registry.
**Sending Tasks (`app/base.py`)**
Calling `.delay()` or `.apply_async()` eventually uses `app.send_task`:
```python
# Simplified from celery/app/base.py
def send_task(self, name, args=None, kwargs=None, task_id=None,
producer=None, connection=None, router=None, **options):
# ... lots of logic to prepare options, task_id, routing ...
# Get the routing info (exchange, routing_key, queue)
# Uses app.conf for defaults if not specified
options = self.amqp.router.route(options, name, args, kwargs)
# Create the message body
message = self.amqp.create_task_message(
task_id or uuid(), # Generate task ID if needed
name, args, kwargs, # Task details
# ... other arguments like countdown, eta, expires ...
)
# Get a producer (handles connection/channel to broker)
# Uses the app's producer pool (app.producer_pool)
with self.producer_or_acquire(producer) as P:
# Tell the backend we're about to send (if tracking results)
if not options.get('ignore_result', False):
self.backend.on_task_call(P, task_id)
# Actually send the message via the producer
self.amqp.send_task_message(P, name, message, **options)
# Create the AsyncResult object to return to the caller
result = self.AsyncResult(task_id)
# ... set result properties ...
return result
```
This highlights how `send_task` relies on the `app` (via `self`) to:
* Access configuration (`self.conf`).
* Use the AMQP utilities (`self.amqp`) for routing and message creation.
* Access the result backend (`self.backend`).
* Get a connection/producer from the pool (`self.producer_or_acquire`).
* Create the `AsyncResult` using the app's result class (`self.AsyncResult`).
## Conclusion
You've learned that the `Celery App` is the essential starting point for any Celery project.
* It acts as the central **headquarters** or **brain**.
* You create it using `app = Celery(...)`, providing at least a name and a `broker` URL.
* It holds **configuration** (like broker/backend URLs).
* It **registers tasks** defined using the `@app.task` decorator.
* It enables tasks to be **sent** to the broker using methods like `.delay()`.
The app ties everything together. But how do you manage all the different settings Celery offers, beyond just the `broker` and `backend`?
In the next chapter, we'll dive deeper into how to configure your Celery app effectively.
**Next:** [Chapter 2: Configuration](02_configuration.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,252 @@
# Chapter 2: Configuration - Telling Celery How to Work
In [Chapter 1: The Celery App](01_celery_app.md), we created our first `Celery` app instance. We gave it a name and told it where our message broker and result backend were located using the `broker` and `backend` arguments:
```python
# From Chapter 1
from celery import Celery
app = Celery('tasks',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/0')
```
This worked, but what if we want to change settings later, or manage many different settings? Passing everything directly when creating the `app` can become messy.
## What Problem Does Configuration Solve?
Think of Celery as a busy workshop with different stations (workers, schedulers) and tools (message brokers, result storage). **Configuration** is the central instruction manual or settings panel for this entire workshop.
It tells Celery things like:
* **Where is the message broker?** (The post office for tasks)
* **Where should results be stored?** (The filing cabinet for completed work)
* **How should tasks be handled?** (e.g., What format should the messages use? Are there any speed limits for certain tasks?)
* **How should the workers behave?** (e.g., How many tasks can they work on at once?)
* **How should scheduled tasks run?** (e.g., What timezone should be used?)
Without configuration, Celery wouldn't know how to connect to your broker, where to put results, or how to manage the workflow. Configuration allows you to customize Celery to fit your specific needs.
## Key Configuration Concepts
While Celery has many settings, here are some fundamental ones you'll encounter often:
1. **`broker_url`**: The address of your message broker (like Redis or RabbitMQ). This is essential for sending and receiving task messages. We'll learn more about brokers in [Chapter 4: Broker Connection (AMQP)](04_broker_connection__amqp_.md).
2. **`result_backend`**: The address of your result store. This is needed if you want to keep track of task status or retrieve return values. We cover this in [Chapter 6: Result Backend](06_result_backend.md).
3. **`include`**: A list of module names that the Celery worker should import when it starts. This is often where your task definitions live (like the `add` task from Chapter 1).
4. **`task_serializer`**: Defines the format used to package task messages before sending them to the broker (e.g., 'json', 'pickle'). 'json' is a safe and common default.
5. **`timezone`**: Sets the timezone Celery uses, which is important for scheduled tasks managed by [Chapter 7: Beat (Scheduler)](07_beat__scheduler_.md).
## How to Configure Your Celery App
Celery is flexible and offers several ways to set its configuration.
**Method 1: Directly on the App Object (After Creation)**
You can update the configuration *after* creating the `Celery` app instance using the `app.conf.update()` method. This is handy for simple adjustments or quick tests.
```python
# celery_app.py
from celery import Celery
# Create the app (maybe with initial settings)
app = Celery('tasks', broker='redis://localhost:6379/0')
# Update configuration afterwards
app.conf.update(
result_backend='redis://localhost:6379/1', # Use database 1 for results
task_serializer='json',
result_serializer='json',
accept_content=['json'], # Only accept json formatted tasks
timezone='Europe/Oslo',
enable_utc=True, # Use UTC timezone internally
# Add task modules to import when worker starts
include=['my_tasks'] # Assumes you have a file my_tasks.py with tasks
)
print(f"Broker URL set to: {app.conf.broker_url}")
print(f"Result backend set to: {app.conf.result_backend}")
print(f"Timezone set to: {app.conf.timezone}")
```
**Explanation:**
* We create the `app` like before, potentially setting some initial config like the `broker`.
* `app.conf.update(...)`: We pass a Python dictionary to this method. The keys are Celery setting names (like `result_backend`, `timezone`), and the values are what we want to set them to.
* `app.conf` is the central configuration object attached to your `app` instance.
**Method 2: Dedicated Configuration Module (Recommended)**
For most projects, especially larger ones, it's cleaner to keep your Celery settings in a separate Python file (e.g., `celeryconfig.py`).
1. **Create `celeryconfig.py`:**
```python
# celeryconfig.py
# Broker settings
broker_url = 'redis://localhost:6379/0'
# Result backend settings
result_backend = 'redis://localhost:6379/1'
# Task settings
task_serializer = 'json'
result_serializer = 'json'
accept_content = ['json']
# Timezone settings
timezone = 'America/New_York'
enable_utc = True # Recommended
# List of modules to import when the Celery worker starts.
imports = ('proj.tasks',) # Example: Assuming tasks are in proj/tasks.py
```
**Explanation:**
* This is just a standard Python file.
* We define variables whose names match the Celery configuration settings (e.g., `broker_url`, `timezone`). Celery expects these specific names.
2. **Load the configuration in your app file (`celery_app.py`):**
```python
# celery_app.py
from celery import Celery
# Create the app instance (no need to pass broker/backend here now)
app = Celery('tasks')
# Load configuration from the 'celeryconfig' module
# Assumes celeryconfig.py is in the same directory or Python path
app.config_from_object('celeryconfig')
print(f"Loaded Broker URL from config file: {app.conf.broker_url}")
print(f"Loaded Timezone from config file: {app.conf.timezone}")
# You might still define tasks in this file or in the modules listed
# in celeryconfig.imports
@app.task
def multiply(x, y):
return x * y
```
**Explanation:**
* `app = Celery('tasks')`: We create the app instance, but we don't need to specify the broker or backend here because they will be loaded from the file.
* `app.config_from_object('celeryconfig')`: This is the key line. It tells Celery to:
* Find a module named `celeryconfig`.
* Look at all the uppercase variables defined in that module.
* Use those variables to configure the `app`.
This approach keeps your settings organized and separate from your application logic.
**Method 3: Environment Variables**
Celery settings can also be controlled via environment variables. This is very useful for deployments (e.g., using Docker) where you might want to change the broker address without changing code.
Environment variable names typically follow the pattern `CELERY_<SETTING_NAME_IN_UPPERCASE>`.
For example, you could set the broker URL in your terminal before running your app or worker:
```bash
# In your terminal (Linux/macOS)
export CELERY_BROKER_URL='amqp://guest:guest@localhost:5672//'
export CELERY_RESULT_BACKEND='redis://localhost:6379/2'
# Now run your Python script or Celery worker
python your_script.py
# or
# celery -A your_app_module worker --loglevel=info
```
Celery automatically picks up these environment variables. They often take precedence over settings defined in a configuration file or directly on the app, making them ideal for overriding settings in different environments (development, staging, production).
*Note: The exact precedence order can sometimes depend on how and when configuration is loaded, but environment variables are generally a high-priority source.*
## How It Works Internally (Simplified View)
1. **Loading:** When you create a `Celery` app or call `app.config_from_object()`, Celery reads the settings from the specified source (arguments, object/module, environment variables).
2. **Storing:** These settings are stored in a dictionary-like object accessible via `app.conf`. Celery uses a default set of values initially, which are then updated or overridden by your configuration.
3. **Accessing:** When a Celery component needs a setting (e.g., the worker needs the `broker_url` to connect, or a task needs the `task_serializer`), it simply looks up the required key in the `app.conf` object.
```mermaid
sequenceDiagram
participant ClientCode as Your App Setup (e.g., celery_app.py)
participant CeleryApp as app = Celery(...)
participant ConfigSource as celeryconfig.py / Env Vars
participant Worker as Celery Worker Process
participant Broker as Message Broker (e.g., Redis)
ClientCode->>CeleryApp: Create instance
ClientCode->>CeleryApp: app.config_from_object('celeryconfig')
CeleryApp->>ConfigSource: Read settings (broker_url, etc.)
ConfigSource-->>CeleryApp: Return settings values
Note over CeleryApp: Stores settings in app.conf
Worker->>CeleryApp: Start worker for 'app'
Worker->>CeleryApp: Access app.conf.broker_url
CeleryApp-->>Worker: Return 'redis://localhost:6379/0'
Worker->>Broker: Connect using 'redis://localhost:6379/0'
```
This diagram shows the app loading configuration first, and then the worker using that stored configuration (`app.conf`) to perform its duties, like connecting to the broker.
## Code Dive: Where Configuration Lives
* **`app.conf`:** This is the primary interface you interact with. It's an instance of a special dictionary-like class (`celery.app.utils.Settings`) that handles loading defaults, converting keys (Celery has changed setting names over time), and providing convenient access. You saw this in the direct update example: `app.conf.update(...)`.
* **Loading Logic (`config_from_object`)**: Methods like `app.config_from_object` typically delegate to the app's "loader" (`app.loader`). The loader (e.g., `celery.loaders.base.BaseLoader` or `celery.loaders.app.AppLoader`) handles the actual importing of the configuration module and extracting the settings. See `loaders/base.py` for the `config_from_object` method definition.
* **Default Settings**: Celery has a built-in set of default values for all its settings. These are defined in `celery.app.defaults`. Your configuration overrides these defaults. See `app/defaults.py`.
* **Accessing Settings**: Throughout the Celery codebase, different components access the configuration via `app.conf`. For instance, when sending a task (`app/base.py:send_task`), the code looks up `app.conf.broker_url` (or related settings) to know where and how to send the message.
```python
# Simplified concept from loaders/base.py
class BaseLoader:
# ...
def config_from_object(self, obj, silent=False):
if isinstance(obj, str):
# Import the module (e.g., 'celeryconfig')
obj = self._smart_import(obj, imp=self.import_from_cwd)
# ... error handling ...
# Store the configuration (simplified - actual process merges)
self._conf = force_mapping(obj) # Treat obj like a dictionary
# ...
return True
# Simplified concept from app/base.py (where settings are used)
class Celery:
# ...
def send_task(self, name, args=None, kwargs=None, **options):
# ... other setup ...
# Access configuration to know where the broker is
broker_connection_url = self.conf.broker_url # Reads from app.conf
# Use the broker URL to get a connection/producer
with self.producer_or_acquire(producer) as P:
# ... create message ...
# Send message using the connection derived from broker_url
self.amqp.send_task_message(P, name, message, **options)
# ... return result object ...
```
This illustrates the core idea: load configuration into `app.conf`, then components read from `app.conf` when they need instructions.
## Conclusion
Configuration is the backbone of Celery's flexibility. You've learned:
* **Why it's needed:** To tell Celery *how* to operate (broker, backend, tasks settings).
* **What can be configured:** Broker/backend URLs, serializers, timezones, task imports, and much more.
* **How to configure:**
* Directly via `app.conf.update()`.
* Using a dedicated module (`celeryconfig.py`) with `app.config_from_object()`. (Recommended)
* Using environment variables (great for deployment).
* **How it works:** Settings are loaded into `app.conf` and accessed by Celery components as needed.
With your Celery app configured, you're ready to define the actual work you want Celery to do. That's where Tasks come in!
**Next:** [Chapter 3: Task](03_task.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

245
output/Celery/03_task.md Normal file
View File

@@ -0,0 +1,245 @@
# Chapter 3: Task - The Job Description
In [Chapter 1: The Celery App](01_celery_app.md), we set up our Celery headquarters, and in [Chapter 2: Configuration](02_configuration.md), we learned how to give it instructions. Now, we need to define the *actual work* we want Celery to do. This is where **Tasks** come in.
## What Problem Does a Task Solve?
Imagine you have a specific job that needs doing, like "Resize this image to thumbnail size" or "Send a welcome email to this new user." In Celery, each of these specific jobs is represented by a **Task**.
A Task is like a **job description** or a **recipe**. It contains the exact steps (the code) needed to complete a specific piece of work. You write this recipe once as a Python function, and then you can tell Celery to follow that recipe whenever you need that job done, potentially many times with different inputs (like resizing different images or sending emails to different users).
The key benefit is that you don't run the recipe immediately yourself. You hand the recipe (the Task) and the ingredients (the arguments, like the image file or the user's email) over to Celery. Celery then finds an available helper (a [Worker](05_worker.md)) who knows how to follow that specific recipe and lets them do the work in the background. This keeps your main application free to do other things.
## Defining Your First Task
Defining a task in Celery is surprisingly simple. You just take a regular Python function and "decorate" it using `@app.task`. Remember our `app` object from [Chapter 1](01_celery_app.md)? We use its `task` decorator.
Let's create a file, perhaps named `tasks.py`, to hold our task definitions:
```python
# tasks.py
import time
from celery_app import app # Import the app instance we created
@app.task
def add(x, y):
"""A simple task that adds two numbers."""
print(f"Task 'add' starting with ({x}, {y})")
# Simulate some work taking time
time.sleep(5)
result = x + y
print(f"Task 'add' finished with result: {result}")
return result
@app.task
def send_welcome_email(user_id):
"""A task simulating sending a welcome email."""
print(f"Task 'send_welcome_email' starting for user {user_id}")
# Simulate email sending process
time.sleep(3)
print(f"Welcome email supposedly sent to user {user_id}")
return f"Email sent to {user_id}"
# You can have many tasks in one file!
```
**Explanation:**
1. **`from celery_app import app`**: We import the `Celery` app instance we configured earlier. This instance holds the knowledge about our broker and backend.
2. **`@app.task`**: This is the magic decorator! When Celery sees this above a function (`add` or `send_welcome_email`), it says, "Ah! This isn't just a regular function; it's a job description that my workers need to know about."
3. **The Function (`add`, `send_welcome_email`)**: This is the actual Python code that performs the work. It's the core of the task the steps in the recipe. It can take arguments (like `x`, `y`, or `user_id`) and can return a value.
4. **Registration**: The `@app.task` decorator automatically *registers* this function with our Celery `app`. Now, `app` knows about a task named `tasks.add` and another named `tasks.send_welcome_email` (Celery creates the name from `module_name.function_name`). Workers connected to this `app` will be able to find and execute this code when requested.
*Self-Host Note:* If you are running this code, make sure you have a `celery_app.py` file containing your Celery app instance as shown in previous chapters, and that the `tasks.py` file can import `app` from it.
## Sending a Task for Execution
Okay, we've written our recipes (`add` and `send_welcome_email`). How do we tell Celery, "Please run the `add` recipe with the numbers 5 and 7"?
We **don't call the function directly** like `add(5, 7)`. If we did that, it would just run immediately in our current program, which defeats the purpose of using Celery!
Instead, we use special methods on the task object itself, most commonly `.delay()` or `.apply_async()`.
Let's try this in a separate Python script or an interactive Python session:
```python
# run_tasks.py
from tasks import add, send_welcome_email
print("Let's send some tasks!")
# --- Using .delay() ---
# Tell Celery to run add(5, 7) in the background
result_promise_add = add.delay(5, 7)
print(f"Sent task add(5, 7). Task ID: {result_promise_add.id}")
# Tell Celery to run send_welcome_email(123) in the background
result_promise_email = send_welcome_email.delay(123)
print(f"Sent task send_welcome_email(123). Task ID: {result_promise_email.id}")
# --- Using .apply_async() ---
# Does the same thing as .delay() but allows more options
result_promise_add_later = add.apply_async(args=(10, 20), countdown=10) # Run after 10s
print(f"Sent task add(10, 20) to run in 10s. Task ID: {result_promise_add_later.id}")
print("Tasks have been sent to the broker!")
print("A Celery worker needs to be running to pick them up.")
```
**Explanation:**
1. **`from tasks import add, send_welcome_email`**: We import our *task functions*. Because they were decorated with `@app.task`, they are now special Celery Task objects.
2. **`add.delay(5, 7)`**: This is the simplest way to send a task.
* It *doesn't* run `add(5, 7)` right now.
* It takes the arguments `(5, 7)`.
* It packages them up into a **message** along with the task's name (`tasks.add`).
* It sends this message to the **message broker** (like Redis or RabbitMQ) that we configured in our `celery_app.py`. Think of it like dropping a request slip into a mailbox.
3. **`send_welcome_email.delay(123)`**: Same idea, but for our email task. A message with `tasks.send_welcome_email` and the argument `123` is sent to the broker.
4. **`add.apply_async(args=(10, 20), countdown=10)`**: This is a more powerful way to send tasks.
* It does the same fundamental thing: sends a message to the broker.
* It allows for more options, like `args` (positional arguments as a tuple), `kwargs` (keyword arguments as a dict), `countdown` (delay execution by seconds), `eta` (run at a specific future time), and many others.
* `.delay(*args, **kwargs)` is just a convenient shortcut for `.apply_async(args=args, kwargs=kwargs)`.
5. **`result_promise_... = ...`**: Both `.delay()` and `apply_async()` return an `AsyncResult` object immediately. This is *not* the actual result of the task (like `12` for `add(5, 7)`). It's more like a receipt or a tracking number (notice the `.id` attribute). You can use this object later to check if the task finished and what its result was, but only if you've set up a [Result Backend](06_result_backend.md) (Chapter 6).
6. **The Worker**: Sending the task only puts the message on the queue. A separate process, the Celery [Worker](05_worker.md) (Chapter 5), needs to be running. The worker constantly watches the queue, picks up messages, finds the corresponding task function (using the name like `tasks.add`), and executes it with the provided arguments.
## How It Works Internally (Simplified)
Let's trace the journey of defining and sending our `add` task:
1. **Definition (`@app.task` in `tasks.py`)**:
* Python defines the `add` function.
* The `@app.task` decorator sees this function.
* It tells the `Celery` instance (`app`) about this function, registering it under the name `tasks.add` in an internal dictionary (`app.tasks`). The `app` instance knows the broker/backend settings.
2. **Sending (`add.delay(5, 7)` in `run_tasks.py`)**:
* You call `.delay()` on the `add` task object.
* `.delay()` (or `.apply_async()`) internally uses the `app` the task is bound to.
* It asks the `app` for the configured broker URL.
* It creates a message containing:
* Task Name: `tasks.add`
* Arguments: `(5, 7)`
* Other options (like a unique Task ID).
* It connects to the **Broker** (e.g., Redis) using the broker URL.
* It sends the message to a specific queue (usually named 'celery' by default) on the broker.
* It returns an `AsyncResult` object referencing the Task ID.
3. **Waiting**: The message sits in the queue on the broker, waiting.
4. **Execution (by a [Worker](05_worker.md))**:
* A separate Celery Worker process is running, connected to the same broker and `app`.
* The Worker fetches the message from the queue.
* It reads the task name: `tasks.add`.
* It looks up `tasks.add` in its copy of the `app.tasks` registry to find the actual `add` function code.
* It calls the `add` function with the arguments from the message: `add(5, 7)`.
* The function runs (prints logs, sleeps, calculates `12`).
* If a [Result Backend](06_result_backend.md) is configured, the Worker takes the return value (`12`) and stores it in the backend, associated with the Task ID.
* The Worker acknowledges the message to the broker, removing it from the queue.
```mermaid
sequenceDiagram
participant Client as Your Code (run_tasks.py)
participant TaskDef as @app.task def add()
participant App as Celery App Instance
participant Broker as Message Broker (e.g., Redis)
participant Worker as Celery Worker (separate process)
Note over TaskDef, App: 1. @app.task registers 'add' function with App's task registry
Client->>TaskDef: 2. Call add.delay(5, 7)
TaskDef->>App: 3. Get broker config
App-->>TaskDef: Broker URL
TaskDef->>Broker: 4. Send message ('tasks.add', (5, 7), task_id, ...)
Broker-->>TaskDef: Ack (Message Queued)
TaskDef-->>Client: 5. Return AsyncResult(task_id)
Worker->>Broker: 6. Fetch next message
Broker-->>Worker: Message ('tasks.add', (5, 7), task_id)
Worker->>App: 7. Lookup 'tasks.add' in registry
App-->>Worker: add function code
Worker->>Worker: 8. Execute add(5, 7) -> returns 12
Note over Worker: (Optionally store result in Backend)
Worker->>Broker: 9. Acknowledge message completion
```
## Code Dive: Task Creation and Sending
* **Task Definition (`@app.task`)**: This decorator is defined in `celery/app/base.py` within the `Celery` class method `task`. It ultimately calls `_task_from_fun`.
```python
# Simplified from celery/app/base.py
class Celery:
# ...
def task(self, *args, **opts):
# ... handles decorator arguments ...
def _create_task_cls(fun):
# Returns a Task instance or a Proxy that creates one later
ret = self._task_from_fun(fun, **opts)
return ret
return _create_task_cls
def _task_from_fun(self, fun, name=None, base=None, bind=False, **options):
# Generate name like 'tasks.add' if not given
name = name or self.gen_task_name(fun.__name__, fun.__module__)
base = base or self.Task # The base Task class (from celery.app.task)
if name not in self._tasks: # If not already registered...
# Dynamically create a Task class wrapping the function
task = type(fun.__name__, (base,), {
'app': self, # Link task back to this app instance!
'name': name,
'run': staticmethod(fun), # The actual function to run
'__doc__': fun.__doc__,
'__module__': fun.__module__,
# ... other options ...
})() # Instantiate the new Task class
self._tasks[task.name] = task # Add to app's registry!
task.bind(self) # Perform binding steps
else:
task = self._tasks[name] # Task already exists
return task
```
This shows how the decorator essentially creates a specialized object (an instance of a class derived from `celery.app.task.Task`) that wraps your original function and registers it with the `app` under a specific name.
* **Task Sending (`.delay`)**: The `.delay()` method is defined on the `Task` class itself in `celery/app/task.py`. It's a simple shortcut.
```python
# Simplified from celery/app/task.py
class Task:
# ...
def delay(self, *args, **kwargs):
"""Shortcut for apply_async(args, kwargs)"""
return self.apply_async(args, kwargs)
def apply_async(self, args=None, kwargs=None, ..., **options):
# ... argument checking, option processing ...
# Get the app associated with this task instance
app = self._get_app()
# If always_eager is set, run locally instead of sending
if app.conf.task_always_eager:
return self.apply(args, kwargs, ...) # Runs inline
# The main path: tell the app to send the task message
return app.send_task(
self.name, args, kwargs, task_type=self,
**options # Includes things like countdown, eta, queue etc.
)
```
You can see how `.delay` just calls `.apply_async`, which then (usually) delegates the actual message sending to the `app.send_task` method we saw briefly in [Chapter 1](01_celery_app.md). The `app` uses its configuration to know *how* and *where* to send the message.
## Conclusion
You've learned the core concept of a Celery **Task**:
* It represents a single, well-defined **unit of work** or **job description**.
* You define a task by decorating a normal Python function with `@app.task`. This **registers** the task with your Celery application.
* You **send** a task request (not run it directly) using `.delay()` or `.apply_async()`.
* Sending a task puts a **message** onto a queue managed by a **message broker**.
* A separate **Worker** process picks up the message and executes the corresponding task function.
Tasks are the fundamental building blocks of work in Celery. Now that you know how to define a task and request its execution, let's look more closely at the crucial component that handles passing these requests around: the message broker.
**Next:** [Chapter 4: Broker Connection (AMQP)](04_broker_connection__amqp_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,167 @@
# Chapter 4: Broker Connection (AMQP) - Celery's Postal Service
In [Chapter 3: Task](03_task.md), we learned how to define "job descriptions" (Tasks) like `add(x, y)` and how to request them using `.delay()`. But when you call `add.delay(2, 2)`, how does that request actually *get* to a worker process that can perform the addition? It doesn't just magically appear!
This is where the **Broker Connection** comes in. Think of it as Celery's built-in postal service.
## What Problem Does the Broker Connection Solve?
Imagine you want to send a letter (a task request) to a friend (a worker) who lives in another city. You can't just shout the message out your window and hope they hear it. You need:
1. A **Post Office** (the Message Broker, like RabbitMQ or Redis) that handles mail.
2. A way to **talk to the Post Office** (the Broker Connection) to drop off your letter or pick up mail addressed to you.
The Broker Connection is that crucial link between your application (where you call `.delay()`) or your Celery worker and the message broker system. It manages sending messages *to* the broker and receiving messages *from* the broker reliably.
Without this connection, your task requests would never leave your application, and your workers would never know there's work waiting for them.
## Key Concepts: Post Office & Rules
Let's break down the pieces:
1. **The Message Broker (The Post Office):** This is a separate piece of software that acts as a central hub for messages. Common choices are RabbitMQ and Redis. You tell Celery its address using the `broker_url` setting in your [Configuration](02_configuration.md).
```python
# From Chapter 2 - celeryconfig.py
broker_url = 'amqp://guest:guest@localhost:5672//' # Example for RabbitMQ
# Or maybe: broker_url = 'redis://localhost:6379/0' # Example for Redis
```
2. **The Connection (Talking to the Staff):** This is the active communication channel established between your Python code (either your main app or a worker) and the broker. It's like having an open phone line to the post office. Celery, using a library called `kombu`, handles creating and managing these connections based on the `broker_url`.
3. **AMQP (The Postal Rules):** AMQP stands for **Advanced Message Queuing Protocol**. Think of it as a specific set of rules and procedures for how post offices should operate how letters should be addressed, sorted, delivered, and confirmed.
* RabbitMQ is a broker that speaks AMQP natively.
* Other brokers, like Redis, use different protocols (their own set of rules).
* **Why mention AMQP?** It's a very common and powerful protocol for message queuing, and the principles behind it (exchanges, queues, routing) are fundamental to how Celery routes tasks, even when using other brokers. Celery's internal component for handling this communication is often referred to as `app.amqp` (found in `app/amqp.py`), even though the underlying library (`kombu`) supports multiple protocols. So, we focus on the *concept* of managing the broker connection, often using AMQP terminology as a reference point.
4. **Producer (Sending Mail):** When your application calls `add.delay(2, 2)`, it acts as a *producer*. It uses its broker connection to send a message ("Please run 'add' with arguments (2, 2)") to the broker.
5. **Consumer (Receiving Mail):** A Celery [Worker](05_worker.md) acts as a *consumer*. It uses its *own* broker connection to constantly check a specific mailbox (queue) at the broker for new messages. When it finds one, it takes it, performs the task, and tells the broker it's done.
## How Sending a Task Uses the Connection
Let's revisit sending a task from [Chapter 3: Task](03_task.md):
```python
# run_tasks.py (simplified)
from tasks import add
from celery_app import app # Assume app is configured with a broker_url
# 1. You call .delay()
print("Sending task...")
result_promise = add.delay(2, 2)
# Behind the scenes:
# a. Celery looks at the 'add' task, finds its associated 'app'.
# b. It asks 'app' for the broker_url from its configuration.
# c. It uses the app.amqp component (powered by Kombu) to get a connection
# to the broker specified by the URL (e.g., 'amqp://localhost...').
# d. It packages the task name 'tasks.add' and args (2, 2) into a message.
# e. It uses the connection to 'publish' (send) the message to the broker.
print(f"Task sent! ID: {result_promise.id}")
```
The `add.delay(2, 2)` call triggers this whole process. It needs the configured `broker_url` to know *which* post office to connect to, and the broker connection handles the actual sending of the "letter" (task message).
Similarly, a running Celery [Worker](05_worker.md) establishes its own connection to the *same* broker. It uses this connection to *listen* for incoming messages on the queues it's assigned to.
## How It Works Internally (Simplified)
Celery uses a powerful library called **Kombu** to handle the low-level details of connecting and talking to different types of brokers (RabbitMQ, Redis, etc.). The `app.amqp` object in Celery acts as a high-level interface to Kombu's features.
1. **Configuration:** The `broker_url` tells Kombu where and how to connect.
2. **Connection Pool:** To be efficient, Celery (via Kombu) often maintains a *pool* of connections. When you send a task, it might grab an existing, idle connection from the pool instead of creating a new one every time. This is faster. You can see this managed by `app.producer_pool` in `app/base.py`.
3. **Producer:** When `task.delay()` is called, it ultimately uses a `kombu.Producer` object. This object represents the ability to *send* messages. It's tied to a specific connection and channel.
4. **Publishing:** The producer's `publish()` method is called. This takes the task message (already serialized into a format like JSON), specifies the destination (exchange and routing key - think of these like the address and sorting code on an envelope), and sends it over the connection to the broker.
5. **Consumer:** A Worker uses a `kombu.Consumer` object. This object is set up to listen on specific queues via its connection. When a message arrives in one of those queues, the broker pushes it to the consumer over the connection, and the consumer triggers the appropriate Celery task execution logic.
```mermaid
sequenceDiagram
participant Client as Your App Code
participant Task as add.delay()
participant App as Celery App
participant AppAMQP as app.amqp (Kombu Interface)
participant Broker as RabbitMQ / Redis
Client->>Task: Call add.delay(2, 2)
Task->>App: Get broker config (broker_url)
App-->>Task: broker_url
Task->>App: Ask to send task 'tasks.add'
App->>AppAMQP: Send task message('tasks.add', (2, 2), ...)
Note over AppAMQP: Gets connection/producer (maybe from pool)
AppAMQP->>Broker: publish(message, routing_info) via Connection
Broker-->>AppAMQP: Acknowledge message received
AppAMQP-->>App: Message sent successfully
App-->>Task: Return AsyncResult
Task-->>Client: Return AsyncResult
```
This shows the flow: your code calls `.delay()`, Celery uses its configured connection details (`app.amqp` layer) to get a connection and producer, and then publishes the message to the broker.
## Code Dive: Sending a Message
Let's peek inside `app/amqp.py` where the `AMQP` class orchestrates sending. The `send_task_message` method (simplified below) is key.
```python
# Simplified from app/amqp.py within the AMQP class
# This function is configured internally and gets called by app.send_task
def _create_task_sender(self):
# ... (lots of setup: getting defaults from config, signals) ...
default_serializer = self.app.conf.task_serializer
default_compressor = self.app.conf.task_compression
def send_task_message(producer, name, message,
exchange=None, routing_key=None, queue=None,
serializer=None, compression=None, declare=None,
retry=None, retry_policy=None,
**properties):
# ... (Determine exchange, routing_key, queue based on config/options) ...
# ... (Prepare headers, properties, handle retries) ...
headers, properties, body, sent_event = message # Unpack the prepared message tuple
# The core action: Use the producer to publish the message!
ret = producer.publish(
body, # The actual task payload (args, kwargs, etc.)
exchange=exchange,
routing_key=routing_key,
serializer=serializer or default_serializer, # e.g., 'json'
compression=compression or default_compressor,
retry=retry,
retry_policy=retry_policy,
declare=declare, # Maybe declare queues/exchanges if needed
headers=headers,
**properties # Other message properties (correlation_id, etc.)
)
# ... (Send signals like task_sent, publish events if configured) ...
return ret
return send_task_message
```
**Explanation:**
* This function takes a `producer` object (which is linked to a broker connection via Kombu).
* It figures out the final destination details (exchange, routing key).
* It calls `producer.publish()`, passing the task body and all the necessary options (like serializer). This is the function that actually sends the data over the network connection to the broker.
The `Connection` objects themselves are managed by Kombu (see `kombu/connection.py`). Celery uses these objects via its `app.connection_for_write()` or `app.connection_for_read()` methods, which often pull from the connection pool (`kombu.pools`).
## Conclusion
The Broker Connection is Celery's vital communication link, its "postal service."
* It connects your application and workers to the **Message Broker** (like RabbitMQ or Redis).
* It uses the `broker_url` from your [Configuration](02_configuration.md) to know where to connect.
* Protocols like **AMQP** define the "rules" for communication, although Celery's underlying library (Kombu) handles various protocols.
* Your app **produces** task messages and sends them over the connection.
* Workers **consume** task messages received over their connection.
* Celery manages connections efficiently, often using **pools**.
Understanding the broker connection helps clarify how tasks move from where they're requested to where they run. Now that we know how tasks are defined and sent across the wire, let's look at the entity that actually picks them up and does the work.
**Next:** [Chapter 5: Worker](05_worker.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

223
output/Celery/05_worker.md Normal file
View File

@@ -0,0 +1,223 @@
# Chapter 5: Worker - The Task Doer
In [Chapter 4: Broker Connection (AMQP)](04_broker_connection__amqp_.md), we learned how Celery uses a message broker, like a postal service, to send task messages. When you call `add.delay(2, 2)`, a message asking to run the `add` task with arguments `(2, 2)` gets dropped into a mailbox (the broker queue).
But who actually checks that mailbox, picks up the message, and performs the addition? That's the job of the **Celery Worker**.
## What Problem Does the Worker Solve?
Imagine our workshop analogy again. You've defined the blueprint for a job ([Task](03_task.md)) and you've dropped the work order into the central inbox ([Broker Connection (AMQP)](04_broker_connection__amqp_.md)). Now you need an actual employee or a machine to:
1. Look in the inbox for new work orders.
2. Pick up an order.
3. Follow the instructions (run the task code).
4. Maybe put the finished product (the result) somewhere specific.
5. Mark the order as complete.
The **Celery Worker** is that employee or machine. It's a separate program (process) that you run, whose sole purpose is to execute the tasks you send to the broker. Without a worker running, your task messages would just sit in the queue forever, waiting for someone to process them.
## Starting Your First Worker
Running a worker is typically done from your command line or terminal. You need to tell the worker where to find your [Celery App](01_celery_app.md) instance (which holds the configuration, including the broker address and the list of known tasks).
Assuming you have:
* A file `celery_app.py` containing your `app = Celery(...)` instance.
* A file `tasks.py` containing your task definitions (like `add` and `send_welcome_email`) decorated with `@app.task`.
* Your message broker (e.g., Redis or RabbitMQ) running.
You can start a worker like this:
```bash
# In your terminal, in the same directory as celery_app.py and tasks.py
# Make sure your Python environment has celery and the broker driver installed
# (e.g., pip install celery redis)
celery -A celery_app worker --loglevel=info
```
**Explanation:**
* `celery`: This is the main Celery command-line program.
* `-A celery_app`: The `-A` flag (or `--app`) tells Celery where to find your `Celery` app instance. `celery_app` refers to the `celery_app.py` file (or module) and implies Celery should look for an instance named `app` inside it.
* `worker`: This specifies that you want to run the worker component.
* `--loglevel=info`: This sets the logging level. `info` is a good starting point, showing you when the worker connects, finds tasks, and executes them. Other levels include `debug` (more verbose), `warning`, `error`, and `critical`.
**What You'll See:**
When the worker starts successfully, you'll see a banner like this (details may vary):
```text
-------------- celery@yourhostname v5.x.x (stars)
--- ***** -----
-- ******* ---- Linux-5.15.0...-generic-x86_64-with-... 2023-10-27 10:00:00
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: tasks:0x7f...
- ** ---------- .> transport: redis://localhost:6379/0
- ** ---------- .> results: redis://localhost:6379/0
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. tasks.add
. tasks.send_welcome_email
[2023-10-27 10:00:01,000: INFO/MainProcess] Connected to redis://localhost:6379/0
[2023-10-27 10:00:01,050: INFO/MainProcess] mingle: searching for neighbors
[2023-10-27 10:00:02,100: INFO/MainProcess] mingle: all alone
[2023-10-27 10:00:02,150: INFO/MainProcess] celery@yourhostname ready.
```
**Key Parts of the Banner:**
* `celery@yourhostname`: The unique name of this worker instance.
* `transport`: The broker URL it connected to (from your app config).
* `results`: The result backend URL (if configured).
* `concurrency`: How many tasks this worker can potentially run at once (defaults to the number of CPU cores) and the execution pool type (`prefork` is common). We'll touch on this later.
* `queues`: The specific "mailboxes" (queues) the worker is listening to. `celery` is the default queue name.
* `[tasks]`: A list of all the tasks the worker discovered (like our `tasks.add` and `tasks.send_welcome_email`). If your tasks don't show up here, the worker won't be able to run them!
The final `celery@yourhostname ready.` message means the worker is connected and waiting for jobs!
## What the Worker Does
Now that the worker is running, let's trace what happens when you send a task (e.g., from `run_tasks.py` in [Chapter 3: Task](03_task.md)):
1. **Waiting:** The worker is connected to the broker, listening on the `celery` queue.
2. **Message Arrival:** Your `add.delay(5, 7)` call sends a message to the `celery` queue on the broker. The broker notifies the worker.
3. **Receive & Decode:** The worker receives the raw message. It decodes it to find the task name (`tasks.add`), the arguments (`(5, 7)`), and other info (like a unique task ID).
4. **Find Task Code:** The worker looks up the name `tasks.add` in its internal registry (populated when it started) to find the actual Python function `add` defined in `tasks.py`.
5. **Execute:** The worker executes the function: `add(5, 7)`.
* You will see the `print` statements from your task function appear in the *worker's* terminal output:
```text
[2023-10-27 10:05:00,100: INFO/ForkPoolWorker-1] Task tasks.add[some-task-id] received
Task 'add' starting with (5, 7)
Task 'add' finished with result: 12
[2023-10-27 10:05:05,150: INFO/ForkPoolWorker-1] Task tasks.add[some-task-id] succeeded in 5.05s: 12
```
6. **Store Result (Optional):** If a [Result Backend](06_result_backend.md) is configured, the worker takes the return value (`12`) and sends it to the backend, associating it with the task's unique ID.
7. **Acknowledge:** The worker sends an "acknowledgement" (ack) back to the broker. This tells the broker, "I finished processing this message successfully, you can delete it from the queue." This ensures tasks aren't lost if a worker crashes mid-execution (the message would remain on the queue for another worker to pick up).
8. **Wait Again:** The worker goes back to waiting for the next message.
## Running Multiple Workers and Concurrency
* **Multiple Workers:** You can start multiple worker processes by running the `celery worker` command again, perhaps on different machines or in different terminals on the same machine. They will all connect to the same broker and pull tasks from the queue, allowing you to process tasks in parallel and scale your application.
* **Concurrency within a Worker:** A single worker process can often handle more than one task concurrently. Celery achieves this using *execution pools*.
* **Prefork (Default):** The worker starts several child *processes*. Each child process handles one task at a time. The `-c` (or `--concurrency`) flag controls the number of child processes (default is the number of CPU cores). This is good for CPU-bound tasks.
* **Eventlet/Gevent:** Uses *green threads* (lightweight concurrency managed by libraries like eventlet or gevent). A single worker process can handle potentially hundreds or thousands of tasks concurrently, especially if the tasks are I/O-bound (e.g., waiting for network requests). You select these using the `-P` flag: `celery -A celery_app worker -P eventlet -c 1000`. Requires installing the respective library (`pip install eventlet` or `pip install gevent`).
* **Solo:** Executes tasks one after another in the main worker process. Useful for debugging. `-P solo`.
* **Threads:** Uses regular OS threads. `-P threads`. Less common for Celery tasks due to Python's Global Interpreter Lock (GIL) limitations for CPU-bound tasks, but can be useful for I/O-bound tasks.
For beginners, sticking with the default **prefork** pool is usually fine. Just know that the worker can likely handle multiple tasks simultaneously.
## How It Works Internally (Simplified)
Let's visualize the worker's main job: processing a single task.
1. **Startup:** The `celery worker` command starts the main worker process. It loads the `Celery App`, reads the configuration (`broker_url`, tasks to import, etc.).
2. **Connect & Listen:** The worker establishes a connection to the message broker and tells it, "I'm ready to consume messages from the 'celery' queue."
3. **Message Delivery:** The broker sees a message for the 'celery' queue (sent by `add.delay(5, 7)`) and delivers it to the connected worker.
4. **Consumer Receives:** The worker's internal "Consumer" component receives the message.
5. **Task Dispatch:** The Consumer decodes the message, identifies the task (`tasks.add`), and finds the arguments (`(5, 7)`). It then hands this off to the configured execution pool (e.g., prefork).
6. **Pool Execution:** The pool (e.g., a child process in the prefork pool) gets the task function and arguments and executes `add(5, 7)`.
7. **Result Return:** The pool process finishes execution and returns the result (`12`) back to the main worker process.
8. **Result Handling (Optional):** The main worker process, if a [Result Backend](06_result_backend.md) is configured, sends the result (`12`) and task ID to the backend store.
9. **Acknowledgement:** The main worker process sends an "ack" message back to the broker, confirming the task message was successfully processed. The broker then deletes the message.
```mermaid
sequenceDiagram
participant CLI as Terminal (celery worker)
participant WorkerMain as Worker Main Process
participant App as Celery App Instance
participant Broker as Message Broker
participant Pool as Execution Pool (e.g., Prefork Child)
participant TaskCode as Your Task Function (add)
CLI->>WorkerMain: Start celery -A celery_app worker
WorkerMain->>App: Load App & Config (broker_url, tasks)
WorkerMain->>Broker: Connect & Listen on 'celery' queue
Broker-->>WorkerMain: Deliver Message ('tasks.add', (5, 7), task_id)
WorkerMain->>WorkerMain: Decode Message
WorkerMain->>Pool: Request Execute add(5, 7) with task_id
Pool->>TaskCode: Run add(5, 7)
TaskCode-->>Pool: Return 12
Pool-->>WorkerMain: Result=12 for task_id
Note over WorkerMain: (Optionally) Store 12 in Result Backend
WorkerMain->>Broker: Acknowledge task_id is complete
```
## Code Dive: Where Worker Logic Lives
* **Command Line Entry Point (`celery/bin/worker.py`):** This script handles parsing the command-line arguments (`-A`, `-l`, `-c`, `-P`, etc.) when you run `celery worker ...`. It ultimately creates and starts a `WorkController` instance. (See `worker()` function in the file).
* **Main Worker Class (`celery/worker/worker.py`):** The `WorkController` class is the heart of the worker. It manages all the different components (like the pool, consumer, timer, etc.) using a system called "bootsteps". It handles the overall startup, shutdown, and coordination. (See `WorkController` class).
* **Message Handling (`celery/worker/consumer/consumer.py`):** The `Consumer` class (specifically its `Blueprint` and steps like `Tasks` and `Evloop`) is responsible for the core loop of fetching messages from the broker via the connection, decoding them, and dispatching them to the execution pool using task strategies. (See `Consumer.create_task_handler`).
* **Execution Pools (`celery/concurrency/`):** Modules like `prefork.py`, `solo.py`, `eventlet.py`, `gevent.py` implement the different concurrency models (`-P` flag). The `WorkController` selects and manages one of these pools.
A highly simplified conceptual view of the core message processing logic within the `Consumer`:
```python
# Conceptual loop inside the Consumer (highly simplified)
def message_handler(message):
try:
# 1. Decode message (task name, args, kwargs, id, etc.)
task_name, args, kwargs, task_id = decode_message(message.body, message.headers)
# 2. Find the registered task function
task_func = app.tasks[task_name]
# 3. Prepare execution request for the pool
request = TaskRequest(task_id, task_name, task_func, args, kwargs)
# 4. Send request to the pool for execution
# (Pool runs request.execute() which calls task_func(*args, **kwargs))
pool.apply_async(request.execute, accept_callback=task_succeeded, ...)
except Exception as e:
# Handle errors (e.g., unknown task, decoding error)
log_error(e)
message.reject() # Tell broker it failed
def task_succeeded(task_id, retval):
# Called by the pool when task finishes successfully
# 5. Store result (optional)
if app.backend:
app.backend.store_result(task_id, retval, status='SUCCESS')
# 6. Acknowledge message to broker
message.ack()
# --- Setup ---
# Worker connects to broker and registers message_handler
# for incoming messages on the subscribed queue(s)
connection.consume(queue_name, callback=message_handler)
# Start the event loop to wait for messages
connection.drain_events()
```
This illustrates the fundamental cycle: receive -> decode -> find task -> execute via pool -> handle result -> acknowledge. The actual code involves much more detail regarding error handling, state management, different protocols, rate limiting, etc., managed through the bootstep system.
## Conclusion
You've now met the **Celery Worker**, the essential component that actually *runs* your tasks.
* It's a **separate process** you start from the command line (`celery worker`).
* It connects to the **broker** using the configuration from your **Celery App**.
* It **listens** for task messages on queues.
* It **executes** the corresponding task code when a message arrives.
* It handles **concurrency** using execution pools (like prefork, eventlet, gevent).
* It **acknowledges** messages to the broker upon successful completion.
Without workers, Celery tasks would never get done. But what happens when a task finishes? What if it returns a value, like our `add` task returning `12`? How can your original application find out the result? That's where the Result Backend comes in.
**Next:** [Chapter 6: Result Backend](06_result_backend.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,318 @@
# Chapter 6: Result Backend - Checking Your Task's Homework
In [Chapter 5: Worker](05_worker.md), we met the Celery Worker, the diligent entity that picks up task messages from the [Broker Connection (AMQP)](04_broker_connection__amqp_.md) and executes the code defined in our [Task](03_task.md).
But what happens after the worker finishes a task? What if the task was supposed to calculate something, like `add(2, 2)`? How do we, back in our main application, find out the answer (`4`)? Or even just know if the task finished successfully or failed?
This is where the **Result Backend** comes in. It's like a dedicated place to check the status and results of the homework assigned to the workers.
## What Problem Does the Result Backend Solve?
Imagine you give your Celery worker a math problem: "What is 123 + 456?". The worker goes away, calculates the answer (579), and... then what?
If you don't tell the worker *where* to put the answer, it just disappears! You, back in your main program, have no idea if the worker finished, if it got the right answer, or if it encountered an error.
The **Result Backend** solves this by providing a storage location (like a database, a cache like Redis, or even via the message broker itself) where the worker can:
1. Record the final **state** of the task (e.g., `SUCCESS`, `FAILURE`).
2. Store the task's **return value** (e.g., `579`) if it succeeded.
3. Store the **error** information (e.g., `TypeError: unsupported operand type(s)...`) if it failed.
Later, your main application can query this Result Backend using the task's unique ID to retrieve this information.
Think of it as a shared filing cabinet:
* The **Worker** puts the completed homework (result and status) into a specific folder (identified by the task ID).
* Your **Application** can later look inside that folder (using the task ID) to see the results.
## Key Concepts
1. **Storage:** It's a place to store task results and states. This could be Redis, a relational database (like PostgreSQL or MySQL), MongoDB, RabbitMQ (using RPC), and others.
2. **Task ID:** Each task execution gets a unique ID (remember the `result_promise_add.id` from Chapter 3?). This ID is the key used to store and retrieve the result from the backend.
3. **State:** Besides the return value, the backend stores the task's current state (e.g., `PENDING`, `STARTED`, `SUCCESS`, `FAILURE`, `RETRY`, `REVOKED`).
4. **Return Value / Exception:** If the task finishes successfully (`SUCCESS`), the backend stores the value the task function returned. If it fails (`FAILURE`), it stores details about the exception that occurred.
5. **`AsyncResult` Object:** When you call `task.delay()` or `task.apply_async()`, Celery gives you back an `AsyncResult` object. This object holds the task's ID and provides methods to interact with the result backend (check status, get the result, etc.).
## How to Use a Result Backend
**1. Configure It!**
First, you need to tell your Celery app *where* the result backend is located. You do this using the `result_backend` configuration setting, just like you set the `broker_url` in [Chapter 2: Configuration](02_configuration.md).
Let's configure our app to use Redis (make sure you have Redis running!) as the result backend. We'll use database number `1` for results to keep it separate from the broker which might be using database `0`.
```python
# celery_app.py
from celery import Celery
# Configure BOTH broker and result backend
app = Celery('tasks',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/1') # <-- Result Backend URL
# You could also use app.config_from_object('celeryconfig')
# if result_backend = 'redis://localhost:6379/1' is in celeryconfig.py
# ... your task definitions (@app.task) would go here or be imported ...
@app.task
def add(x, y):
import time
time.sleep(3) # Simulate work
return x + y
@app.task
def fail_sometimes(x):
import random
if random.random() < 0.5:
raise ValueError("Something went wrong!")
return f"Processed {x}"
```
**Explanation:**
* `backend='redis://localhost:6379/1'`: We provide a URL telling Celery to use the Redis server running on `localhost`, port `6379`, and specifically database `1` for storing results. (The `backend` argument is an alias for `result_backend`).
**2. Send a Task and Get the `AsyncResult`**
When you send a task, the returned object is your key to the result.
```python
# run_tasks.py
from celery_app import add, fail_sometimes
# Send the add task
result_add = add.delay(10, 20)
print(f"Sent task add(10, 20). Task ID: {result_add.id}")
# Send the task that might fail
result_fail = fail_sometimes.delay("my data")
print(f"Sent task fail_sometimes('my data'). Task ID: {result_fail.id}")
```
**Explanation:**
* `result_add` and `result_fail` are `AsyncResult` objects. They contain the `.id` attribute, which is the unique identifier for *this specific execution* of the task.
**3. Check the Status and Get the Result**
Now, you can use the `AsyncResult` object to interact with the result backend.
**(Run a worker in another terminal first: `celery -A celery_app worker --loglevel=info`)**
```python
# continue in run_tasks.py or a new Python session
from celery_app import app # Need app for AsyncResult if creating from ID
# Use the AsyncResult objects we got earlier
# Or, if you only have the ID, you can recreate the AsyncResult:
# result_add = app.AsyncResult('the-task-id-you-saved-earlier')
print(f"\nChecking results for add task ({result_add.id})...")
# Check if the task is finished (returns True/False immediately)
print(f"Is add ready? {result_add.ready()}")
# Check the state (returns 'PENDING', 'STARTED', 'SUCCESS', 'FAILURE', etc.)
print(f"State of add: {result_add.state}")
# Get the result. IMPORTANT: This call will BLOCK until the task is finished!
# If the task failed, this will raise the exception that occurred in the worker.
try:
# Set a timeout (in seconds) to avoid waiting forever
final_result = result_add.get(timeout=10)
print(f"Result of add: {final_result}")
print(f"Did add succeed? {result_add.successful()}")
print(f"Final state of add: {result_add.state}")
except Exception as e:
print(f"Could not get result for add: {type(e).__name__} - {e}")
print(f"Final state of add: {result_add.state}")
print(f"Did add fail? {result_add.failed()}")
# Get the traceback if it failed
print(f"Traceback: {result_add.traceback}")
print(f"\nChecking results for fail_sometimes task ({result_fail.id})...")
try:
# Wait up to 10 seconds for this task
fail_result = result_fail.get(timeout=10)
print(f"Result of fail_sometimes: {fail_result}")
print(f"Did fail_sometimes succeed? {result_fail.successful()}")
print(f"Final state of fail_sometimes: {result_fail.state}")
except Exception as e:
print(f"Could not get result for fail_sometimes: {type(e).__name__} - {e}")
print(f"Final state of fail_sometimes: {result_fail.state}")
print(f"Did fail_sometimes fail? {result_fail.failed()}")
print(f"Traceback:\n{result_fail.traceback}")
```
**Explanation & Potential Output:**
* `result.ready()`: Checks if the task has finished (reached a `SUCCESS`, `FAILURE`, or other final state). Non-blocking.
* `result.state`: Gets the current state string. Non-blocking.
* `result.successful()`: Returns `True` if the state is `SUCCESS`. Non-blocking.
* `result.failed()`: Returns `True` if the state is `FAILURE` or another exception state. Non-blocking.
* `result.get(timeout=...)`: This is the most common way to get the actual return value.
* **It blocks** (waits) until the task completes *or* the timeout expires.
* If the task state becomes `SUCCESS`, it returns the value the task function returned (e.g., `30`).
* If the task state becomes `FAILURE`, it **raises** the exception that occurred in the worker (e.g., `ValueError: Something went wrong!`).
* If the timeout is reached before the task finishes, it raises a `celery.exceptions.TimeoutError`.
* `result.traceback`: If the task failed, this contains the error traceback string from the worker.
**(Example Output - might vary for `fail_sometimes` due to randomness)**
```text
Sent task add(10, 20). Task ID: f5e8a3f6-c7b1-4a9e-8f0a-1b2c3d4e5f6a
Sent task fail_sometimes('my data'). Task ID: 9b1d8c7e-a6f5-4b3a-9c8d-7e6f5a4b3c2d
Checking results for add task (f5e8a3f6-c7b1-4a9e-8f0a-1b2c3d4e5f6a)...
Is add ready? False
State of add: PENDING # Or STARTED if checked quickly after worker picks it up
Result of add: 30
Did add succeed? True
Final state of add: SUCCESS
Checking results for fail_sometimes task (9b1d8c7e-a6f5-4b3a-9c8d-7e6f5a4b3c2d)...
Could not get result for fail_sometimes: ValueError - Something went wrong!
Final state of fail_sometimes: FAILURE
Did fail_sometimes fail? True
Traceback:
Traceback (most recent call last):
File "/path/to/celery/app/trace.py", line ..., in trace_task
R = retval = fun(*args, **kwargs)
File "/path/to/celery/app/trace.py", line ..., in __protected_call__
return self.run(*args, **kwargs)
File "/path/to/your/project/celery_app.py", line ..., in fail_sometimes
raise ValueError("Something went wrong!")
ValueError: Something went wrong!
```
## How It Works Internally
1. **Task Sent:** Your application calls `add.delay(10, 20)`. It sends a message to the **Broker** and gets back an `AsyncResult` object containing the unique `task_id`.
2. **Worker Executes:** A **Worker** picks up the task message from the Broker. It finds the `add` function and executes `add(10, 20)`. The function returns `30`.
3. **Worker Stores Result:** Because a `result_backend` is configured (`redis://.../1`), the Worker:
* Connects to the Result Backend (Redis DB 1).
* Prepares the result data (e.g., `{'status': 'SUCCESS', 'result': 30, 'task_id': 'f5e8...', ...}`).
* Stores this data in the backend, using the `task_id` as the key (e.g., in Redis, it might set a key like `celery-task-meta-f5e8a3f6-c7b1-4a9e-8f0a-1b2c3d4e5f6a` to the JSON representation of the result data).
* It might also set an expiry time on the result if configured (`result_expires`).
4. **Client Checks Result:** Your application calls `result_add.get(timeout=10)` on the `AsyncResult` object.
5. **Client Queries Backend:** The `AsyncResult` object uses the `task_id` (`f5e8...`) and the configured `result_backend` URL:
* It connects to the Result Backend (Redis DB 1).
* It repeatedly fetches the data associated with the `task_id` key (e.g., `GET celery-task-meta-f5e8...` in Redis).
* It checks the `status` field in the retrieved data.
* If the status is `PENDING` or `STARTED`, it waits a short interval and tries again, until the timeout is reached.
* If the status is `SUCCESS`, it extracts the `result` field (`30`) and returns it.
* If the status is `FAILURE`, it extracts the `result` field (which contains exception info), reconstructs the exception, and raises it.
```mermaid
sequenceDiagram
participant Client as Your Application
participant Task as add.delay(10, 20)
participant Broker as Message Broker (Redis DB 0)
participant Worker as Celery Worker
participant ResultBackend as Result Backend (Redis DB 1)
participant AsyncResult as result_add = AsyncResult(...)
Client->>Task: Call add.delay(10, 20)
Task->>Broker: Send task message (task_id: 't1')
Task-->>Client: Return AsyncResult (id='t1')
Worker->>Broker: Fetch message (task_id: 't1')
Worker->>Worker: Execute add(10, 20) -> returns 30
Worker->>ResultBackend: Store result (key='t1', value={'status': 'SUCCESS', 'result': 30, ...})
ResultBackend-->>Worker: Ack (Result stored)
Worker->>Broker: Ack message complete
Client->>AsyncResult: Call result_add.get(timeout=10)
loop Check Backend Until Ready or Timeout
AsyncResult->>ResultBackend: Get result for key='t1'
ResultBackend-->>AsyncResult: Return {'status': 'SUCCESS', 'result': 30, ...}
end
AsyncResult-->>Client: Return 30
```
## Code Dive: Storing and Retrieving Results
* **Backend Loading (`celery/app/backends.py`):** When Celery starts, it uses the `result_backend` URL to look up the correct backend class (e.g., `RedisBackend`, `DatabaseBackend`, `RPCBackend`) using functions like `by_url` and `by_name`. These map URL schemes (`redis://`, `db+postgresql://`, `rpc://`) or aliases ('redis', 'db', 'rpc') to the actual Python classes. The mapping is defined in `BACKEND_ALIASES`.
* **Base Classes (`celery/backends/base.py`):** All result backends inherit from `BaseBackend`. Many common backends (like Redis, Memcached) inherit from `BaseKeyValueStoreBackend`, which provides common logic for storing results using keys.
* **Storing Result (`BaseKeyValueStoreBackend._store_result` in `celery/backends/base.py`):** This method (called by the worker) is responsible for actually saving the result.
```python
# Simplified from backends/base.py (inside BaseKeyValueStoreBackend)
def _store_result(self, task_id, result, state,
traceback=None, request=None, **kwargs):
# 1. Prepare the metadata dictionary
meta = self._get_result_meta(result=result, state=state,
traceback=traceback, request=request)
meta['task_id'] = bytes_to_str(task_id) # Ensure task_id is str
# (Check if already successfully stored to prevent overwrites - omitted for brevity)
# 2. Encode the metadata (e.g., to JSON or pickle)
encoded_meta = self.encode(meta)
# 3. Get the specific key for this task
key = self.get_key_for_task(task_id) # e.g., b'celery-task-meta-<task_id>'
# 4. Call the specific backend's 'set' method (implemented by RedisBackend etc.)
# It might also set an expiry time (self.expires)
try:
self._set_with_state(key, encoded_meta, state) # Calls self.set(key, encoded_meta)
except Exception as exc:
# Handle potential storage errors, maybe retry
raise BackendStoreError(...) from exc
return result # Returns the original (unencoded) result
```
The `self.set()` method is implemented by the concrete backend (e.g., `RedisBackend.set` uses `redis-py` client's `setex` or `set` command).
* **Retrieving Result (`BaseBackend.wait_for` or `BaseKeyValueStoreBackend.get_many` in `celery/backends/base.py`):** When you call `AsyncResult.get()`, it often ends up calling `wait_for` or similar methods that poll the backend.
```python
# Simplified from backends/base.py (inside SyncBackendMixin)
def wait_for(self, task_id,
timeout=None, interval=0.5, no_ack=True, on_interval=None):
"""Wait for task and return its result meta."""
self._ensure_not_eager() # Check if running in eager mode
time_elapsed = 0.0
while True:
# 1. Get metadata from backend (calls self._get_task_meta_for)
meta = self.get_task_meta(task_id)
# 2. Check if the task is in a final state
if meta['status'] in states.READY_STATES:
return meta # Return the full metadata dict
# 3. Call interval callback if provided
if on_interval:
on_interval()
# 4. Sleep to avoid busy-waiting
time.sleep(interval)
time_elapsed += interval
# 5. Check for timeout
if timeout and time_elapsed >= timeout:
raise TimeoutError('The operation timed out.')
```
The `self.get_task_meta(task_id)` eventually calls `self._get_task_meta_for(task_id)`, which in `BaseKeyValueStoreBackend` uses `self.get(key)` (e.g., `RedisBackend.get` uses `redis-py` client's `GET` command) and then decodes the result using `self.decode_result`.
## Conclusion
You've learned about the crucial **Result Backend**:
* It acts as a **storage place** (like a filing cabinet or database) for task results and states.
* It's configured using the `result_backend` setting in your [Celery App](01_celery_app.md).
* The [Worker](05_worker.md) stores the outcome (success value or failure exception) in the backend after executing a [Task](03_task.md).
* You use the `AsyncResult` object (returned by `.delay()` or `.apply_async()`) and its methods (`.get()`, `.state`, `.ready()`) to query the backend using the task's unique ID.
* Various backend types exist (Redis, Database, RPC, etc.), each with different characteristics.
Result backends allow your application to track the progress and outcome of background work. But what if you want tasks to run automatically at specific times or on a regular schedule, like sending a report every morning? That's where Celery's scheduler comes in.
**Next:** [Chapter 7: Beat (Scheduler)](07_beat__scheduler_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,329 @@
# Chapter 7: Beat (Scheduler) - Celery's Alarm Clock
In the last chapter, [Chapter 6: Result Backend](06_result_backend.md), we learned how to track the status and retrieve the results of our background tasks. This is great when we manually trigger tasks from our application. But what if we want tasks to run automatically, without us needing to press a button every time?
Maybe you need to:
* Send out a newsletter email every Friday morning.
* Clean up temporary files in your system every night.
* Check the health of your external services every 5 minutes.
How can you make Celery do these things on a regular schedule? Meet **Celery Beat**.
## What Problem Does Beat Solve?
Imagine you have a task, say `send_daily_report()`, that needs to run every morning at 8:00 AM. How would you achieve this? You could try setting up a system `cron` job to call a Python script that sends the Celery task, but that adds another layer of complexity.
Celery provides its own built-in solution: **Beat**.
**Beat is Celery's periodic task scheduler.** Think of it like a dedicated alarm clock or a `cron` job system built specifically for triggering Celery tasks. It's a separate program that you run alongside your workers. Its job is simple:
1. Read a list of scheduled tasks (e.g., "run `send_daily_report` every day at 8:00 AM").
2. Keep track of the time.
3. When the time comes for a scheduled task, Beat sends the task message to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md), just as if you had called `.delay()` yourself.
4. A regular Celery [Worker](05_worker.md) then picks up the task from the broker and executes it.
Beat doesn't run the tasks itself; it just *schedules* them by sending the messages at the right time.
## Key Concepts
1. **Beat Process:** A separate Celery program you run (like `celery -A your_app beat`). It needs access to your Celery app's configuration.
2. **Schedule:** A configuration setting (usually `beat_schedule` in your Celery config) that defines which tasks should run and when. This schedule can use simple intervals (like every 30 seconds) or cron-like patterns (like "every Monday at 9 AM").
3. **Schedule Storage:** Beat needs to remember when each task was last run so it knows when it's due again. By default, it saves this information to a local file named `celerybeat-schedule` (using Python's `shelve` module).
4. **Ticker:** The heart of Beat. It's an internal loop that wakes up periodically, checks the schedule against the current time, and sends messages for any due tasks.
## How to Use Beat
Let's schedule two tasks:
* Our `add` task from [Chapter 3: Task](03_task.md) to run every 15 seconds.
* A new (dummy) task `send_report` to run every minute.
**1. Define the Schedule in Configuration**
The best place to define your schedule is in your configuration, either directly on the `app` object or in a separate `celeryconfig.py` file (see [Chapter 2: Configuration](02_configuration.md)). We'll use a separate file.
First, create the new task in your `tasks.py`:
```python
# tasks.py (add this new task)
from celery_app import app
import time
@app.task
def add(x, y):
"""A simple task that adds two numbers."""
print(f"Task 'add' starting with ({x}, {y})")
time.sleep(2) # Simulate short work
result = x + y
print(f"Task 'add' finished with result: {result}")
return result
@app.task
def send_report(name):
"""A task simulating sending a report."""
print(f"Task 'send_report' starting for report: {name}")
time.sleep(5) # Simulate longer work
print(f"Report '{name}' supposedly sent.")
return f"Report {name} sent."
```
Now, update or create `celeryconfig.py`:
```python
# celeryconfig.py
from datetime import timedelta
from celery.schedules import crontab
# Basic Broker/Backend settings (replace with your actual URLs)
broker_url = 'redis://localhost:6379/0'
result_backend = 'redis://localhost:6379/1'
timezone = 'UTC' # Or your preferred timezone, e.g., 'America/New_York'
enable_utc = True
# List of modules to import when the Celery worker starts.
# Make sure tasks.py is discoverable in your Python path
imports = ('tasks',)
# Define the Beat schedule
beat_schedule = {
# Executes tasks.add every 15 seconds with arguments (16, 16)
'add-every-15-seconds': {
'task': 'tasks.add', # The task name
'schedule': 15.0, # Run every 15 seconds (float or timedelta)
'args': (16, 16), # Positional arguments for the task
},
# Executes tasks.send_report every minute
'send-report-every-minute': {
'task': 'tasks.send_report',
'schedule': crontab(), # Use crontab() for "every minute"
'args': ('daily-summary',), # Argument for the report name
# Example using crontab for more specific timing:
# 'schedule': crontab(hour=8, minute=0, day_of_week='fri'), # Every Friday at 8:00 AM
},
}
```
**Explanation:**
* `from datetime import timedelta`: Used for simple interval schedules.
* `from celery.schedules import crontab`: Used for cron-like scheduling.
* `imports = ('tasks',)`: Ensures the worker and beat know about the tasks defined in `tasks.py`.
* `beat_schedule = {...}`: This dictionary holds all your scheduled tasks.
* Each key (`'add-every-15-seconds'`, `'send-report-every-minute'`) is a unique name for the schedule entry.
* Each value is another dictionary describing the schedule:
* `'task'`: The full name of the task to run (e.g., `'module_name.task_name'`).
* `'schedule'`: Defines *when* to run.
* A `float` or `int`: number of seconds between runs.
* A `timedelta` object: the time interval between runs.
* A `crontab` object: for complex schedules (minute, hour, day_of_week, etc.). `crontab()` with no arguments means "every minute".
* `'args'`: A tuple of positional arguments to pass to the task.
* `'kwargs'`: (Optional) A dictionary of keyword arguments to pass to the task.
* `'options'`: (Optional) A dictionary of execution options like `queue`, `priority`.
**2. Load the Configuration in Your App**
Make sure your `celery_app.py` loads this configuration:
```python
# celery_app.py
from celery import Celery
# Create the app instance
app = Celery('tasks')
# Load configuration from the 'celeryconfig' module
app.config_from_object('celeryconfig')
# Tasks might be defined here, but we put them in tasks.py
# which is loaded via the 'imports' setting in celeryconfig.py
```
**3. Run Celery Beat**
Now, open a terminal and run the Beat process. You need to tell it where your app is (`-A celery_app`):
```bash
# In your terminal
celery -A celery_app beat --loglevel=info
```
**Explanation:**
* `celery`: The Celery command-line tool.
* `-A celery_app`: Points to your app instance (in `celery_app.py`).
* `beat`: Tells Celery to start the scheduler process.
* `--loglevel=info`: Shows informational messages about what Beat is doing.
You'll see output similar to this:
```text
celery beat v5.x.x is starting.
__ - ... __ - _
LocalTime -> 2023-10-27 11:00:00
Configuration ->
. broker -> redis://localhost:6379/0
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]@INFO
. maxinterval -> 300.0s (5m0s)
celery beat v5.x.x has started.
```
Beat is now running! It will check the schedule and:
* Every 15 seconds, it will send a message to run `tasks.add(16, 16)`.
* Every minute, it will send a message to run `tasks.send_report('daily-summary')`.
**4. Run a Worker (Crucial!)**
Beat only *sends* the task messages. You still need a [Worker](05_worker.md) running to actually *execute* the tasks. Open **another terminal** and start a worker:
```bash
# In a SECOND terminal
celery -A celery_app worker --loglevel=info
```
Now, watch the output in the **worker's terminal**. You should see logs appearing periodically as the worker receives and executes the tasks sent by Beat:
```text
# Output in the WORKER terminal (example)
[2023-10-27 11:00:15,000: INFO/MainProcess] Task tasks.add[task-id-1] received
Task 'add' starting with (16, 16)
Task 'add' finished with result: 32
[2023-10-27 11:00:17,050: INFO/MainProcess] Task tasks.add[task-id-1] succeeded in 2.05s: 32
[2023-10-27 11:01:00,000: INFO/MainProcess] Task tasks.send_report[task-id-2] received
Task 'send_report' starting for report: daily-summary
[2023-10-27 11:01:00,000: INFO/MainProcess] Task tasks.add[task-id-3] received # Another 'add' task might arrive while 'send_report' runs
Task 'add' starting with (16, 16)
Task 'add' finished with result: 32
[2023-10-27 11:01:02,050: INFO/MainProcess] Task tasks.add[task-id-3] succeeded in 2.05s: 32
Report 'daily-summary' supposedly sent.
[2023-10-27 11:01:05,100: INFO/MainProcess] Task tasks.send_report[task-id-2] succeeded in 5.10s: "Report daily-summary sent."
... and so on ...
```
You have successfully set up scheduled tasks!
## How It Works Internally (Simplified)
1. **Startup:** You run `celery -A celery_app beat`. The Beat process starts.
2. **Load Config:** It loads the Celery app (`celery_app`) and reads its configuration, paying special attention to `beat_schedule`.
3. **Load State:** It opens the schedule file (e.g., `celerybeat-schedule`) to see when each task was last run. If the file doesn't exist, it creates it.
4. **Main Loop (Tick):** Beat enters its main loop (the "ticker").
5. **Calculate Due Tasks:** In each tick, Beat looks at every entry in `beat_schedule`. For each entry, it compares the current time with the task's `schedule` definition and its `last_run_at` time (from the schedule file). It calculates which tasks are due to run *right now*.
6. **Send Task Message:** If a task (e.g., `add-every-15-seconds`) is due, Beat constructs a task message (containing `'tasks.add'`, `args=(16, 16)`, etc.) just like `.delay()` would. It sends this message to the configured **Broker**.
7. **Update State:** Beat updates the `last_run_at` time for the task it just sent in its internal state and saves this back to the schedule file.
8. **Sleep:** Beat calculates the time until the *next* scheduled task is due and sleeps for that duration (or up to a maximum interval, `beat_max_loop_interval`, usually 5 minutes, whichever is shorter).
9. **Repeat:** Go back to step 5.
Meanwhile, a **Worker** process is connected to the same **Broker**, picks up the task messages sent by Beat, and executes them.
```mermaid
sequenceDiagram
participant Beat as Celery Beat Process
participant ScheduleCfg as beat_schedule Config
participant ScheduleDB as celerybeat-schedule File
participant Broker as Message Broker
participant Worker as Celery Worker
Beat->>ScheduleCfg: Load schedule definitions on startup
Beat->>ScheduleDB: Load last run times on startup
loop Tick Loop (e.g., every second or more)
Beat->>Beat: Check current time
Beat->>ScheduleCfg: Get definition for 'add-every-15'
Beat->>ScheduleDB: Get last run time for 'add-every-15'
Beat->>Beat: Calculate if 'add-every-15' is due now
alt Task 'add-every-15' is due
Beat->>Broker: Send task message('tasks.add', (16, 16))
Broker-->>Beat: Ack (Message Queued)
Beat->>ScheduleDB: Update last run time for 'add-every-15'
ScheduleDB-->>Beat: Ack (Saved)
end
Beat->>Beat: Calculate time until next task is due
Beat->>Beat: Sleep until next check
end
Worker->>Broker: Fetch task message ('tasks.add', ...)
Broker-->>Worker: Deliver message
Worker->>Worker: Execute task add(16, 16)
Worker->>Broker: Ack message complete
```
## Code Dive: Where Beat Lives
* **Command Line (`celery/bin/beat.py`):** Handles the `celery beat` command, parses arguments (`-A`, `-s`, `-S`, `--loglevel`), and creates/runs the `Beat` service object.
* **Beat Service Runner (`celery/apps/beat.py`):** The `Beat` class sets up the environment, loads the app, initializes logging, creates the actual scheduler service (`celery.beat.Service`), installs signal handlers, and starts the service.
* **Beat Service (`celery/beat.py:Service`):** This class manages the lifecycle of the scheduler. Its `start()` method contains the main loop that repeatedly calls `scheduler.tick()`. It loads the scheduler class specified in the configuration (defaulting to `PersistentScheduler`).
* **Scheduler (`celery/beat.py:Scheduler` / `PersistentScheduler`):** This is the core logic.
* `Scheduler` is the base class. Its `tick()` method calculates the time until the next event, finds due tasks, calls `apply_entry` for due tasks, and returns the sleep interval.
* `PersistentScheduler` inherits from `Scheduler` and adds the logic to load/save the schedule state (last run times) using `shelve` (the `celerybeat-schedule` file). It overrides methods like `setup_schedule`, `sync`, `close`, and `schedule` property to interact with the `shelve` store (`self._store`).
* **Schedule Types (`celery/schedules.py`):** Defines classes like `schedule` (for `timedelta` intervals) and `crontab`. These classes implement the `is_due(last_run_at)` method, which the `Scheduler.tick()` method uses to determine if a task entry should run.
A simplified conceptual look at the `beat_schedule` config structure:
```python
# Example structure from celeryconfig.py
beat_schedule = {
'schedule-name-1': { # Unique name for this entry
'task': 'my_app.tasks.task1', # Task to run (module.task_name)
'schedule': 30.0, # When to run (e.g., seconds, timedelta, crontab)
'args': (arg1, arg2), # Optional: Positional arguments
'kwargs': {'key': 'value'}, # Optional: Keyword arguments
'options': {'queue': 'hipri'},# Optional: Execution options
},
'schedule-name-2': {
'task': 'my_app.tasks.task2',
'schedule': crontab(minute=0, hour=0), # e.g., Run at midnight
# ... other options ...
},
}
```
And a very simplified concept of the `Scheduler.tick()` method:
```python
# Simplified conceptual logic of Scheduler.tick()
def tick(self):
remaining_times = []
due_tasks = []
# 1. Iterate through schedule entries
for entry in self.schedule.values(): # self.schedule reads from PersistentScheduler._store['entries']
# 2. Check if entry is due using its schedule object (e.g., crontab)
is_due, next_time_to_run = entry.is_due() # Calls schedule.is_due(entry.last_run_at)
if is_due:
due_tasks.append(entry)
else:
remaining_times.append(next_time_to_run) # Store time until next check
# 3. Apply due tasks (send message to broker)
for entry in due_tasks:
self.apply_entry(entry) # Sends task message and updates entry's last_run_at in schedule store
# 4. Calculate minimum sleep time until next event
return min(remaining_times + [self.max_interval])
```
## Conclusion
Celery Beat is your tool for automating task execution within the Celery ecosystem.
* It acts as a **scheduler**, like an alarm clock or `cron` for Celery tasks.
* It runs as a **separate process** (`celery beat`).
* You define the schedule using the `beat_schedule` setting in your configuration, specifying **what** tasks run, **when** (using intervals or crontabs), and with what **arguments**.
* Beat **sends task messages** to the broker at the scheduled times.
* Running **Workers** are still required to pick up and execute these tasks.
Beat allows you to reliably automate recurring background jobs, from simple periodic checks to complex, time-specific operations.
Now that we know how to run individual tasks, get their results, and schedule them automatically, what if we want to create more complex workflows involving multiple tasks that depend on each other? That's where Celery's Canvas comes in.
**Next:** [Chapter 8: Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,343 @@
# Chapter 8: Canvas (Signatures & Primitives) - Building Task Workflows
In the previous chapter, [Chapter 7: Beat (Scheduler)](07_beat__scheduler_.md), we learned how to schedule tasks to run automatically at specific times using Celery Beat. This is great for recurring jobs. But what if you need to run a sequence of tasks, where one task depends on the result of another? Or run multiple tasks in parallel and then collect their results?
Imagine you're building a feature where a user uploads an article, and you need to:
1. Fetch the article content from a URL.
2. Process the text to extract keywords.
3. Process the text to detect the language.
4. Once *both* processing steps are done, save the article and the extracted metadata to your database.
Simply running these tasks independently won't work. Keyword extraction and language detection can happen at the same time, but only *after* the content is fetched. Saving can only happen *after* both processing steps are complete. How do you orchestrate this multi-step workflow?
This is where **Celery Canvas** comes in. It provides the building blocks to design complex task workflows.
## What Problem Does Canvas Solve?
Canvas helps you connect individual [Task](03_task.md)s together to form more sophisticated processes. It solves the problem of defining dependencies and flow control between tasks. Instead of just firing off tasks one by one and hoping they complete in the right order or manually checking results, Canvas lets you declare the desired workflow structure directly.
Think of it like having different types of Lego bricks:
* Some bricks represent a single task.
* Other bricks let you connect tasks end-to-end (run in sequence).
* Some let you stack bricks side-by-side (run in parallel).
* Others let you build a structure where several parallel steps must finish before the next piece is added.
Canvas gives you these connecting bricks for your Celery tasks.
## Key Concepts: Signatures and Primitives
The core ideas in Canvas are **Signatures** and **Workflow Primitives**.
1. **Signature (`signature` or `.s()`): The Basic Building Block**
* A `Signature` wraps up everything needed to call a single task: the task's name, the arguments (`args`), the keyword arguments (`kwargs`), and any execution options (like `countdown`, `eta`, queue name).
* Think of it as a **pre-filled request form** or a **recipe card** for a specific task execution. It doesn't *run* the task immediately; it just holds the plan for running it.
* The easiest way to create a signature is using the `.s()` shortcut on a task function.
```python
# tasks.py
from celery_app import app # Assuming app is defined in celery_app.py
@app.task
def add(x, y):
return x + y
# Create a signature for add(2, 3)
add_sig = add.s(2, 3)
# add_sig now holds the 'plan' to run add(2, 3)
print(f"Signature: {add_sig}")
print(f"Task name: {add_sig.task}")
print(f"Arguments: {add_sig.args}")
# To actually run it, you call .delay() or .apply_async() ON the signature
# result_promise = add_sig.delay()
```
**Output:**
```text
Signature: tasks.add(2, 3)
Task name: tasks.add
Arguments: (2, 3)
```
2. **Primitives: Connecting the Blocks**
Canvas provides several functions (primitives) to combine signatures into workflows:
* **`chain`:** Links tasks sequentially. The result of the first task is passed as the first argument to the second task, and so on.
* Analogy: An assembly line where each station passes its output to the next.
* Syntax: `(sig1 | sig2 | sig3)` or `chain(sig1, sig2, sig3)`
* **`group`:** Runs a list of tasks in parallel. It returns a special result object that helps track the group.
* Analogy: Hiring several workers to do similar jobs independently at the same time.
* Syntax: `group(sig1, sig2, sig3)`
* **`chord`:** Runs a group of tasks in parallel (the "header"), and *then*, once *all* tasks in the group have finished successfully, it runs a single callback task (the "body") with the results of the header tasks.
* Analogy: A team of researchers works on different parts of a project in parallel. Once everyone is done, a lead researcher collects all the findings to write the final report.
* Syntax: `chord(group(header_sigs), body_sig)`
There are other primitives like `chunks`, `xmap`, and `starmap`, but `chain`, `group`, and `chord` are the most fundamental ones for building workflows.
## How to Use Canvas: Building the Article Processing Workflow
Let's build the workflow we described earlier: Fetch -> (Process Keywords & Detect Language in parallel) -> Save.
**1. Define the Tasks**
First, we need our basic tasks. Let's create dummy versions in `tasks.py`:
```python
# tasks.py
from celery_app import app
import time
import random
@app.task
def fetch_data(url):
print(f"Fetching data from {url}...")
time.sleep(1)
# Simulate fetching some data
data = f"Content from {url} - {random.randint(1, 100)}"
print(f"Fetched: {data}")
return data
@app.task
def process_part_a(data):
print(f"Processing Part A for: {data}")
time.sleep(2)
result_a = f"Keywords for '{data}'"
print("Part A finished.")
return result_a
@app.task
def process_part_b(data):
print(f"Processing Part B for: {data}")
time.sleep(3) # Simulate slightly longer processing
result_b = f"Language for '{data}'"
print("Part B finished.")
return result_b
@app.task
def combine_results(results):
# 'results' will be a list containing the return values
# of process_part_a and process_part_b
print(f"Combining results: {results}")
time.sleep(1)
final_output = f"Combined: {results[0]} | {results[1]}"
print(f"Final Output: {final_output}")
return final_output
```
**2. Define the Workflow Using Canvas**
Now, in a separate script or Python shell, let's define the workflow using signatures and primitives.
```python
# run_workflow.py
from celery import chain, group, chord
from tasks import fetch_data, process_part_a, process_part_b, combine_results
# The URL we want to process
article_url = "http://example.com/article1"
# Create the workflow structure
# 1. Fetch data. The result (data) is passed to the next step.
# 2. The next step is a chord:
# - Header: A group running process_part_a and process_part_b in parallel.
# Both tasks receive the 'data' from fetch_data.
# - Body: combine_results receives a list of results from the group.
workflow = chain(
fetch_data.s(article_url), # Step 1: Fetch
chord( # Step 2: Chord
group(process_part_a.s(), process_part_b.s()), # Header: Parallel processing
combine_results.s() # Body: Combine results
)
)
print(f"Workflow definition:\n{workflow}")
# Start the workflow
print("\nSending workflow to Celery...")
result_promise = workflow.apply_async()
print(f"Workflow sent! Final result ID: {result_promise.id}")
print("Run a Celery worker to execute the tasks.")
# You can optionally wait for the final result:
# final_result = result_promise.get()
# print(f"\nWorkflow finished! Final result: {final_result}")
```
**Explanation:**
* We import `chain`, `group`, `chord` from `celery`.
* We import our task functions.
* `fetch_data.s(article_url)`: Creates a signature for the first step.
* `process_part_a.s()` and `process_part_b.s()`: Create signatures for the parallel tasks. Note that we *don't* provide the `data` argument here. `chain` automatically passes the result of `fetch_data` to the *next* task in the sequence. Since the next task is a `chord` containing a `group`, Celery cleverly passes the `data` to *each* task within that group.
* `combine_results.s()`: Creates the signature for the final step (the chord's body). It doesn't need arguments initially because the `chord` will automatically pass the list of results from the header group to it.
* `chain(...)`: Connects `fetch_data` to the `chord`.
* `chord(group(...), ...)`: Defines that the group must finish before `combine_results` is called.
* `group(...)`: Defines that `process_part_a` and `process_part_b` run in parallel.
* `workflow.apply_async()`: This sends the *first* task (`fetch_data`) to the broker. The rest of the workflow is encoded in the task's options (like `link` or `chord` information) so that Celery knows what to do next after each step completes.
If you run this script (and have a [Worker](05_worker.md) running), you'll see the tasks execute in the worker logs, respecting the defined dependencies and parallelism. `fetch_data` runs first, then `process_part_a` and `process_part_b` run concurrently, and finally `combine_results` runs after both A and B are done.
## How It Works Internally (Simplified Walkthrough)
Let's trace a simpler workflow: `my_chain = (add.s(2, 2) | add.s(4))`
1. **Workflow Definition:** When you create `my_chain`, Celery creates a `chain` object containing the signatures `add.s(2, 2)` and `add.s(4)`.
2. **Sending (`my_chain.apply_async()`):**
* Celery looks at the first task in the chain: `add.s(2, 2)`.
* It prepares to send this task message to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md).
* Crucially, it adds a special option to the message, often called `link` (or uses the `chain` field in newer protocols). This option contains the *signature* of the next task in the chain: `add.s(4)`.
* The message for `add(2, 2)` (with the link to `add(4)`) is sent to the broker.
3. **Worker 1 Executes First Task:**
* A [Worker](05_worker.md) picks up the message for `add(2, 2)`.
* It runs the `add` function with arguments `(2, 2)`. The result is `4`.
* The worker stores the result `4` in the [Result Backend](06_result_backend.md) (if configured).
* The worker notices the `link` option in the original message, pointing to `add.s(4)`.
4. **Worker 1 Sends Second Task:**
* The worker takes the result of the first task (`4`).
* It uses the linked signature `add.s(4)`.
* It *prepends* the result (`4`) to the arguments of the linked signature, making it effectively `add.s(4, 4)`. *(Note: The original `4` in `add.s(4)` came from the chain definition, the first `4` is the result)*.
* It sends a *new* message to the broker for `add(4, 4)`.
5. **Worker 2 Executes Second Task:**
* Another (or the same) worker picks up the message for `add(4, 4)`.
* It runs `add(4, 4)`. The result is `8`.
* It stores the result `8` in the backend.
* There are no more links, so the chain is complete.
`group` works by sending all task messages in the group concurrently. `chord` is more complex; it involves the workers coordinating via the [Result Backend](06_result_backend.md) to count completed tasks in the header before the callback task is finally sent.
```mermaid
sequenceDiagram
participant Client as Your Code
participant Canvas as workflow = chain(...)
participant Broker as Message Broker
participant Worker as Celery Worker
Client->>Canvas: workflow.apply_async()
Note over Canvas: Prepare msg for add(2, 2) with link=add.s(4)
Canvas->>Broker: Send Task 1 msg ('add', (2, 2), link=add.s(4), id=T1)
Broker-->>Canvas: Ack
Canvas-->>Client: Return AsyncResult(id=T2) # ID of the *last* task in chain
Worker->>Broker: Fetch msg (T1)
Broker-->>Worker: Deliver Task 1 msg
Worker->>Worker: Execute add(2, 2) -> returns 4
Note over Worker: Store result 4 for T1 in Backend
Worker->>Worker: Check 'link' option -> add.s(4)
Note over Worker: Prepare msg for add(4, 4) using result 4 + linked args
Worker->>Broker: Send Task 2 msg ('add', (4, 4), id=T2)
Broker-->>Worker: Ack
Worker->>Broker: Ack Task 1 msg complete
Worker->>Broker: Fetch msg (T2)
Broker-->>Worker: Deliver Task 2 msg
Worker->>Worker: Execute add(4, 4) -> returns 8
Note over Worker: Store result 8 for T2 in Backend
Worker->>Broker: Ack Task 2 msg complete
```
## Code Dive: Canvas Implementation
The logic for signatures and primitives resides primarily in `celery/canvas.py`.
* **`Signature` Class:**
* Defined in `celery/canvas.py`. It's essentially a dictionary subclass holding `task`, `args`, `kwargs`, `options`, etc.
* The `.s()` method on a `Task` instance (in `celery/app/task.py`) is a shortcut to create a `Signature`.
* `apply_async`: Prepares arguments/options by calling `_merge` and then delegates to `self.type.apply_async` (the task's method) or `app.send_task`.
* `link`, `link_error`: Methods that modify the `options` dictionary to add callbacks.
* `__or__`: The pipe operator (`|`) overload. It checks the type of the right-hand operand (`other`) and constructs a `_chain` object accordingly.
```python
# Simplified from celery/canvas.py
class Signature(dict):
# ... methods like __init__, clone, set, apply_async ...
def link(self, callback):
# Appends callback signature to the 'link' list in options
return self.append_to_list_option('link', callback)
def link_error(self, errback):
# Appends errback signature to the 'link_error' list in options
return self.append_to_list_option('link_error', errback)
def __or__(self, other):
# Called when you use the pipe '|' operator
if isinstance(other, Signature):
# task | task -> chain
return _chain(self, other, app=self._app)
# ... other cases for group, chain ...
return NotImplemented
```
* **`_chain` Class:**
* Also in `celery/canvas.py`, inherits from `Signature`. Its `task` name is hardcoded to `'celery.chain'`. The actual task signatures are stored in `kwargs['tasks']`.
* `apply_async` / `run`: Contains the logic to handle sending the first task with the rest of the chain embedded in the options (either via `link` for protocol 1 or the `chain` message property for protocol 2).
* `prepare_steps`: This complex method recursively unwraps nested primitives (like a chain within a chain, or a group that needs to become a chord) and sets up the linking between steps.
```python
# Simplified concept from celery/canvas.py (chain execution)
class _chain(Signature):
# ... __init__, __or__ ...
def apply_async(self, args=None, kwargs=None, **options):
# ... handle always_eager ...
return self.run(args, kwargs, app=self.app, **options)
def run(self, args=None, kwargs=None, app=None, **options):
# ... setup ...
tasks, results = self.prepare_steps(...) # Unroll and freeze tasks
if results: # If there are tasks to run
first_task = tasks.pop() # Get the first task (list is reversed)
remaining_chain = tasks if tasks else None
# Determine how to pass the chain info (link vs. message field)
use_link = self._use_link # ... logic to decide ...
if use_link:
# Protocol 1: Link first task to the second task
if remaining_chain:
first_task.link(remaining_chain.pop())
# (Worker handles subsequent links)
options_to_apply = options # Pass original options
else:
# Protocol 2: Embed the rest of the reversed chain in options
options_to_apply = ChainMap({'chain': remaining_chain}, options)
# Send the *first* task only
result_from_apply = first_task.apply_async(**options_to_apply)
# Return AsyncResult of the *last* task in the original chain
return results[0]
```
* **`group` Class:**
* In `celery/canvas.py`. Its `task` name is `'celery.group'`.
* `apply_async`: Iterates through its `tasks`, freezes each one (assigning a common `group_id`), sends their messages, and collects the `AsyncResult` objects into a `GroupResult`. It uses a `barrier` (from the `vine` library) to track completion.
* **`chord` Class:**
* In `celery/canvas.py`. Its `task` name is `'celery.chord'`.
* `apply_async` / `run`: Coordinates with the result backend (`backend.apply_chord`). It typically runs the header `group` first, configuring it to notify the backend upon completion. The backend then triggers the `body` task once the count is reached.
## Conclusion
Celery Canvas transforms simple tasks into powerful workflow components.
* A **Signature** (`task.s()`) captures the details for a single task call without running it.
* Primitives like **`chain`** (`|`), **`group`**, and **`chord`** combine signatures to define complex execution flows:
* `chain`: Sequence (output of one to input of next).
* `group`: Parallel execution.
* `chord`: Parallel execution followed by a callback with all results.
* You compose these primitives like building with Lego bricks to model your application's logic.
* Calling `.apply_async()` on a workflow primitive starts the process by sending the first task(s), embedding the rest of the workflow logic in the task options or using backend coordination.
Canvas allows you to move complex orchestration logic out of your application code and into Celery, making your tasks more modular and your overall system more robust.
Now that you can build and run complex workflows, how do you monitor what's happening inside Celery? How do you know when tasks start, finish, or fail in real-time?
**Next:** [Chapter 9: Events](09_events.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

310
output/Celery/09_events.md Normal file
View File

@@ -0,0 +1,310 @@
# Chapter 9: Events - Listening to Celery's Heartbeat
In [Chapter 8: Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md), we saw how to build complex workflows by chaining tasks together or running them in parallel. But as your Celery system gets busier, you might wonder: "What are my workers doing *right now*? Which tasks have started? Which ones finished successfully or failed?"
Imagine you're running an important data processing job involving many tasks. Wouldn't it be great to have a live dashboard showing the progress, or get immediate notifications if something goes wrong? This is where **Celery Events** come in.
## What Problem Do Events Solve?
Celery Events provide a **real-time monitoring system** for your tasks and workers. Think of it like a live activity log or a notification system built into Celery.
Without events, finding out what happened requires checking logs or querying the [Result Backend](06_result_backend.md) for each task individually. This isn't ideal for getting a live overview of the entire cluster.
Events solve this by having workers broadcast messages (events) about important actions they take, such as:
* A worker coming online or going offline.
* A worker receiving a task.
* A worker starting to execute a task.
* A task succeeding or failing.
* A worker sending out a heartbeat signal.
Other programs can then listen to this stream of event messages to monitor the health and activity of the Celery cluster in real-time, build dashboards (like the popular tool Flower), or trigger custom alerts.
## Key Concepts
1. **Events:** Special messages sent by workers (and sometimes clients) describing an action. Each event has a `type` (e.g., `task-received`, `worker-online`) and contains details relevant to that action (like the task ID, worker hostname, timestamp).
2. **Event Exchange:** Events aren't sent to the regular task queues. They are published to a dedicated, named exchange on the [Broker Connection (AMQP)](04_broker_connection__amqp_.md). Think of it as a separate broadcast channel just for monitoring messages.
3. **Event Sender (`EventDispatcher`):** A component within the [Worker](05_worker.md) responsible for creating and sending event messages to the broker's event exchange. This is usually disabled by default for performance reasons.
4. **Event Listener (`EventReceiver`):** Any program that connects to the event exchange on the broker and consumes the stream of event messages. This could be the `celery events` command-line tool, Flower, or your own custom monitoring script.
5. **Event Types:** Celery defines many event types. Some common ones include:
* `worker-online`, `worker-offline`, `worker-heartbeat`: Worker status updates.
* `task-sent`: Client sent a task request (requires `task_send_sent_event` setting).
* `task-received`: Worker received the task message.
* `task-started`: Worker started executing the task code.
* `task-succeeded`: Task finished successfully.
* `task-failed`: Task failed with an error.
* `task-retried`: Task is being retried.
* `task-revoked`: Task was cancelled/revoked.
## How to Use Events: Simple Monitoring
Let's see how to enable events and watch the live stream using Celery's built-in tool.
**1. Enable Events in the Worker**
By default, workers don't send events to save resources. You need to explicitly tell them to start sending. You can do this in two main ways:
* **Command-line flag (`-E`):** When starting your worker, add the `-E` flag.
```bash
# Start a worker AND enable sending events
celery -A celery_app worker --loglevel=info -E
```
* **Configuration Setting:** Set `worker_send_task_events = True` in your Celery configuration ([Chapter 2: Configuration](02_configuration.md)). This is useful if you always want events enabled for workers using that configuration. You can also enable worker-specific events (`worker-online`, `worker-heartbeat`) with `worker_send_worker_events = True` (which defaults to True).
```python
# celeryconfig.py (example)
broker_url = 'redis://localhost:6379/0'
result_backend = 'redis://localhost:6379/1'
imports = ('tasks',)
# Enable sending task-related events
task_send_sent_event = False # Optional: If you want task-sent events too
worker_send_task_events = True
worker_send_worker_events = True # Usually True by default
```
Now, any worker started with this configuration (or the `-E` flag) will publish events to the broker.
**2. Watch the Event Stream**
Celery provides a command-line tool called `celery events` that acts as a simple event listener and prints the events it receives to your console.
Open **another terminal** (while your worker with events enabled is running) and run:
```bash
# Watch for events associated with your app
celery -A celery_app events
```
Alternatively, you can use the more descriptive (but older) command `celery control enable_events` to tell already running workers to start sending events, and `celery control disable_events` to stop them.
**What You'll See:**
Initially, `celery events` might show nothing. Now, try sending a task from another script or shell (like the `run_tasks.py` from [Chapter 3: Task](03_task.md)):
```python
# In a third terminal/shell
from tasks import add
result = add.delay(5, 10)
print(f"Sent task {result.id}")
```
Switch back to the terminal running `celery events`. You should see output similar to this (details and timestamps will vary):
```text
-> celery events v5.x.x
-> connected to redis://localhost:6379/0
-------------- task-received celery@myhostname [2023-10-27 12:00:01.100]
uuid:a1b2c3d4-e5f6-7890-1234-567890abcdef
name:tasks.add
args:[5, 10]
kwargs:{}
retries:0
eta:null
hostname:celery@myhostname
timestamp:1666872001.1
pid:12345
...
-------------- task-started celery@myhostname [2023-10-27 12:00:01.150]
uuid:a1b2c3d4-e5f6-7890-1234-567890abcdef
hostname:celery@myhostname
timestamp:1666872001.15
pid:12345
...
-------------- task-succeeded celery@myhostname [2023-10-27 12:00:04.200]
uuid:a1b2c3d4-e5f6-7890-1234-567890abcdef
result:'15'
runtime:3.05
hostname:celery@myhostname
timestamp:1666872004.2
pid:12345
...
```
**Explanation:**
* `celery events` connects to the broker defined in `celery_app`.
* It listens for messages on the event exchange.
* As the worker processes the `add(5, 10)` task, it sends `task-received`, `task-started`, and `task-succeeded` events.
* `celery events` receives these messages and prints their details.
This gives you a raw, real-time feed of what's happening in your Celery cluster!
**Flower: A Visual Monitor**
While `celery events` is useful, it's quite basic. A very popular tool called **Flower** uses the same event stream to provide a web-based dashboard for monitoring your Celery cluster. It shows running tasks, completed tasks, worker status, task details, and more, all updated in real-time thanks to Celery Events. You can typically install it (`pip install flower`) and run it (`celery -A celery_app flower`).
## How It Works Internally (Simplified)
1. **Worker Action:** A worker performs an action (e.g., starts executing task `T1`).
2. **Event Dispatch:** If events are enabled, the worker's internal `EventDispatcher` component is notified.
3. **Create Event Message:** The `EventDispatcher` creates a dictionary representing the event (e.g., `{'type': 'task-started', 'uuid': 'T1', 'hostname': 'worker1', ...}`).
4. **Publish to Broker:** The `EventDispatcher` uses its connection to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md) to publish this event message to a specific **event exchange** (usually named `celeryev`). It uses a routing key based on the event type (e.g., `task.started`).
5. **Listener Connects:** A monitoring tool (like `celery events` or Flower) starts up. It creates an `EventReceiver`.
6. **Declare Queue:** The `EventReceiver` connects to the same broker and declares a temporary, unique queue bound to the event exchange (`celeryev`), often configured to receive all event types (`#` routing key).
7. **Consume Events:** The `EventReceiver` starts consuming messages from its dedicated queue.
8. **Process Event:** When an event message (like the `task-started` message for `T1`) arrives from the broker, the `EventReceiver` decodes it and passes it to a handler (e.g., `celery events` prints it, Flower updates its web UI).
```mermaid
sequenceDiagram
participant Worker
participant Dispatcher as EventDispatcher (in Worker)
participant Broker as Message Broker
participant Receiver as EventReceiver (e.g., celery events tool)
participant Display as Console/UI
Worker->>Worker: Starts executing Task T1
Worker->>Dispatcher: Notify: Task T1 started
Dispatcher->>Dispatcher: Create event message {'type': 'task-started', ...}
Dispatcher->>Broker: Publish event msg to 'celeryev' exchange (routing_key='task.started')
Broker-->>Dispatcher: Ack (Message Sent)
Receiver->>Broker: Connect and declare unique queue bound to 'celeryev' exchange
Broker-->>Receiver: Queue ready
Broker->>Receiver: Deliver event message {'type': 'task-started', ...}
Receiver->>Receiver: Decode message
Receiver->>Display: Process event (e.g., print to console)
```
## Code Dive: Sending and Receiving Events
* **Enabling Events (`celery/worker/consumer/events.py`):** The `Events` bootstep in the worker process is responsible for initializing the `EventDispatcher`. The `-E` flag or configuration settings control whether this bootstep actually enables the dispatcher.
```python
# Simplified from worker/consumer/events.py
class Events(bootsteps.StartStopStep):
requires = (Connection,)
def __init__(self, c, task_events=True, # Controlled by config/flags
# ... other flags ...
**kwargs):
self.send_events = task_events # or other flags
self.enabled = self.send_events
# ...
super().__init__(c, **kwargs)
def start(self, c):
# ... gets connection ...
# Creates the actual dispatcher instance
dis = c.event_dispatcher = c.app.events.Dispatcher(
c.connection_for_write(),
hostname=c.hostname,
enabled=self.send_events, # Only sends if enabled
# ... other options ...
)
# ... flush buffer ...
```
* **Sending Events (`celery/events/dispatcher.py`):** The `EventDispatcher` class has the `send` method, which creates the event dictionary and calls `publish`.
```python
# Simplified from events/dispatcher.py
class EventDispatcher:
# ... __init__ setup ...
def send(self, type, blind=False, ..., **fields):
if self.enabled:
groups, group = self.groups, group_from(type)
if groups and group not in groups:
return # Don't send if this group isn't enabled
# ... potential buffering logic (omitted) ...
# Call publish to actually send
return self.publish(type, fields, self.producer, blind=blind,
Event=Event, ...)
def publish(self, type, fields, producer, blind=False, Event=Event, **kwargs):
# Create the event dictionary
clock = None if blind else self.clock.forward()
event = Event(type, hostname=self.hostname, utcoffset=utcoffset(),
pid=self.pid, clock=clock, **fields)
# Publish using the underlying Kombu producer
with self.mutex:
return self._publish(event, producer,
routing_key=type.replace('-', '.'), **kwargs)
def _publish(self, event, producer, routing_key, **kwargs):
exchange = self.exchange # The dedicated event exchange
try:
# Kombu's publish method sends the message
producer.publish(
event, # The dictionary payload
routing_key=routing_key,
exchange=exchange.name,
declare=[exchange], # Ensure exchange exists
serializer=self.serializer, # e.g., 'json'
headers=self.headers,
delivery_mode=self.delivery_mode, # e.g., transient
**kwargs
)
except Exception as exc:
# ... error handling / buffering ...
raise
```
* **Receiving Events (`celery/events/receiver.py`):** The `EventReceiver` class (used by tools like `celery events`) sets up a consumer to listen for messages on the event exchange.
```python
# Simplified from events/receiver.py
class EventReceiver(ConsumerMixin): # Uses Kombu's ConsumerMixin
def __init__(self, channel, handlers=None, routing_key='#', ...):
# ... setup app, channel, handlers ...
self.exchange = get_exchange(..., name=self.app.conf.event_exchange)
self.queue = Queue( # Create a unique, auto-deleting queue
'.'.join([self.queue_prefix, self.node_id]),
exchange=self.exchange,
routing_key=routing_key, # Often '#' to get all events
auto_delete=True, durable=False,
# ... other queue options ...
)
# ...
def get_consumers(self, Consumer, channel):
# Tell ConsumerMixin to consume from our event queue
return [Consumer(queues=[self.queue],
callbacks=[self._receive], # Method to call on message
no_ack=True, # Events usually don't need explicit ack
accept=self.accept)]
# This method is registered as the callback for new messages
def _receive(self, body, message):
# Decode message body (can be single event or list in newer Celery)
if isinstance(body, list):
process, from_message = self.process, self.event_from_message
[process(*from_message(event)) for event in body]
else:
self.process(*self.event_from_message(body))
# process() calls the appropriate handler from self.handlers
def process(self, type, event):
"""Process event by dispatching to configured handler."""
handler = self.handlers.get(type) or self.handlers.get('*')
handler and handler(event) # Call the handler function
```
## Conclusion
Celery Events provide a powerful mechanism for **real-time monitoring** of your distributed task system.
* Workers (when enabled via `-E` or configuration) send **event messages** describing their actions (like task start/finish, worker online).
* These messages go to a dedicated **event exchange** on the broker.
* Tools like `celery events` or Flower act as **listeners** (`EventReceiver`), consuming this stream to provide insights into the cluster's activity.
* Events are the foundation for building dashboards, custom monitoring, and diagnostic tools.
Understanding events helps you observe and manage your Celery application more effectively.
So far, we've explored the major components and concepts of Celery. But how does a worker actually start up? How does it initialize all these different parts like the connection, the consumer, the event dispatcher, and the execution pool in the right order? That's orchestrated by a system called Bootsteps.
**Next:** [Chapter 10: Bootsteps](10_bootsteps.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,227 @@
# Chapter 10: Bootsteps - How Celery Workers Start Up
In [Chapter 9: Events](09_events.md), we learned how to monitor the real-time activity within our Celery system. We've now covered most of the key parts of Celery: the [Celery App](01_celery_app.md), [Task](03_task.md)s, the [Broker Connection (AMQP)](04_broker_connection__amqp_.md), the [Worker](05_worker.md), the [Result Backend](06_result_backend.md), [Beat (Scheduler)](07_beat__scheduler_.md), [Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md), and [Events](09_events.md).
But have you ever wondered how the Celery worker manages to get all these different parts working together when you start it? When you run `celery worker`, it needs to connect to the broker, set up the execution pool, start listening for tasks, maybe start the event dispatcher, and possibly even start an embedded Beat scheduler. How does it ensure all these things happen in the correct order? That's where **Bootsteps** come in.
## What Problem Do Bootsteps Solve?
Imagine you're assembling a complex piece of furniture. You have many parts and screws, and the instructions list a specific sequence of steps. You can't attach the tabletop before you've built the legs! Similarly, a Celery worker has many internal components that need to be initialized and started in a precise order.
For example, the worker needs to:
1. Establish a connection to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md).
2. *Then*, start the consumer logic that uses this connection to fetch tasks.
3. Set up the execution pool (like prefork or eventlet) that will actually run the tasks.
4. Start optional components like the [Events](09_events.md) dispatcher or the embedded [Beat (Scheduler)](07_beat__scheduler_.md).
If these steps happen out of order (e.g., trying to fetch tasks before connecting to the broker), the worker will fail.
**Bootsteps** provide a framework within Celery to define this startup (and shutdown) sequence. It's like the assembly instructions or a detailed checklist for the worker. Each major component or initialization phase is defined as a "step," and steps can declare dependencies on each other (e.g., "Step B requires Step A to be finished"). Celery uses this information to automatically figure out the correct order to start everything up and, just as importantly, the correct reverse order to shut everything down cleanly.
This makes the worker's internal structure more organized, modular, and easier for Celery developers to extend with new features. As a user, you generally don't write bootsteps yourself, but understanding the concept helps demystify the worker's startup process.
## Key Concepts
1. **Step (`Step`):** A single, distinct part of the worker's startup or shutdown logic. Think of it as one instruction in the assembly manual. Examples include initializing the broker connection, starting the execution pool, or starting the component that listens for task messages (the consumer).
2. **Blueprint (`Blueprint`):** A collection of related steps that manage a larger component. For instance, the main "Consumer" component within the worker has its own blueprint defining steps for connection, event handling, task fetching, etc.
3. **Dependencies (`requires`):** A step can declare that it needs other steps to be completed first. For example, the step that starts fetching tasks (`Tasks`) *requires* the step that establishes the broker connection (`Connection`).
4. **Order:** Celery analyzes the `requires` declarations of all steps within a blueprint (and potentially across blueprints) to build a dependency graph. It then sorts this graph to determine the exact order in which steps must be started. Shutdown usually happens in the reverse order.
## How It Works: The Worker Startup Sequence
You don't typically interact with bootsteps directly, but you see their effect every time you start a worker.
When you run:
`celery -A your_app worker --loglevel=info`
Celery initiates the **Worker Controller** (`WorkController`). This controller uses the Bootstep framework, specifically a main **Blueprint**, to manage its initialization.
Here's a simplified idea of what happens under the hood, orchestrated by Bootsteps:
1. **Load Blueprint:** The `WorkController` loads its main blueprint, which includes steps for core functionalities.
2. **Build Graph:** Celery looks at all the steps defined in the blueprint (e.g., `Connection`, `Pool`, `Consumer`, `Timer`, `Events`, potentially `Beat`) and their `requires` attributes. It builds a dependency graph.
3. **Determine Order:** It calculates the correct startup order from the graph (a "topological sort"). For example, it determines that `Connection` must start before `Consumer`, and `Pool` must start before `Consumer` can start dispatching tasks to it.
4. **Execute Steps:** The `WorkController` iterates through the steps in the determined order and calls each step's `start` method.
* The `Connection` step establishes the link to the broker.
* The `Timer` step sets up internal timers.
* The `Pool` step initializes the execution pool (e.g., starts prefork child processes).
* The `Events` step starts the event dispatcher (if `-E` was used).
* The `Consumer` step (usually last) starts the main loop that fetches tasks from the broker and dispatches them to the pool.
5. **Worker Ready:** Once all essential bootsteps have successfully started, the worker prints the "ready" message and begins processing tasks.
When you stop the worker (e.g., with Ctrl+C), a similar process happens in reverse using the steps' `stop` or `terminate` methods, ensuring connections are closed, pools are shut down, etc., in the correct order.
## Internal Implementation Walkthrough
Let's visualize the simplified startup flow managed by bootsteps:
```mermaid
sequenceDiagram
participant CLI as `celery worker ...`
participant WorkerMain as Worker Main Process
participant Blueprint as Main Worker Blueprint
participant DepGraph as Dependency Graph Builder
participant Step1 as Connection Step
participant Step2 as Pool Step
participant Step3 as Consumer Step
CLI->>WorkerMain: Start worker command
WorkerMain->>Blueprint: Load blueprint definition (steps & requires)
Blueprint->>DepGraph: Define steps and dependencies
DepGraph->>Blueprint: Return sorted startup order [Step1, Step2, Step3]
WorkerMain->>Blueprint: Iterate through sorted steps
Blueprint->>Step1: Call start()
Step1-->>Blueprint: Connection established
Blueprint->>Step2: Call start()
Step2-->>Blueprint: Pool initialized
Blueprint->>Step3: Call start()
Step3-->>Blueprint: Consumer loop started
Blueprint-->>WorkerMain: Startup complete
WorkerMain->>WorkerMain: Worker is Ready
```
The Bootstep framework relies on classes defined mainly in `celery/bootsteps.py`.
## Code Dive: Anatomy of a Bootstep
Bootsteps are defined as classes inheriting from `Step` or `StartStopStep`.
* **Defining a Step:** A step class defines its logic and dependencies.
```python
# Simplified concept from celery/bootsteps.py
# Base class for all steps
class Step:
# List of other Step classes needed before this one runs
requires = ()
def __init__(self, parent, **kwargs):
# Called when the blueprint is applied to the parent (e.g., Worker)
# Can be used to set initial attributes on the parent.
pass
def create(self, parent):
# Create the service/component managed by this step.
# Often returns an object to be stored.
pass
def include(self, parent):
# Logic to add this step to the parent's step list.
# Called after __init__.
if self.should_include(parent):
self.obj = self.create(parent) # Store created object if needed
parent.steps.append(self)
return True
return False
# A common step type with start/stop/terminate methods
class StartStopStep(Step):
obj = None # Holds the object created by self.create
def start(self, parent):
# Logic to start the component/service
if self.obj and hasattr(self.obj, 'start'):
self.obj.start()
def stop(self, parent):
# Logic to stop the component/service gracefully
if self.obj and hasattr(self.obj, 'stop'):
self.obj.stop()
def terminate(self, parent):
# Logic to force shutdown (if different from stop)
if self.obj:
term_func = getattr(self.obj, 'terminate', None) or getattr(self.obj, 'stop', None)
if term_func:
term_func()
# include() method adds self to parent.steps if created
```
**Explanation:**
* `requires`: A tuple of other Step classes that must be fully started *before* this step's `start` method is called. This defines the dependencies.
* `__init__`, `create`, `include`: Methods involved in setting up the step and potentially creating the component it manages.
* `start`, `stop`, `terminate`: Methods called during the worker's lifecycle (startup, graceful shutdown, forced shutdown).
* **Blueprint:** Manages a collection of steps.
```python
# Simplified concept from celery/bootsteps.py
from celery.utils.graph import DependencyGraph
class Blueprint:
# Set of default step classes (or string names) included in this blueprint
default_steps = set()
def __init__(self, steps=None, name=None, **kwargs):
self.name = name or self.__class__.__name__
# Combine default steps with any provided steps
self.types = set(steps or []) | set(self.default_steps)
self.steps = {} # Will hold step instances
self.order = [] # Will hold sorted step instances
# ... other callbacks ...
def apply(self, parent, **kwargs):
# 1. Load step classes from self.types
step_classes = self.claim_steps() # {name: StepClass, ...}
# 2. Build the dependency graph
self.graph = DependencyGraph(
((Cls, Cls.requires) for Cls in step_classes.values()),
# ... formatter options ...
)
# 3. Get the topologically sorted order
sorted_classes = self.graph.topsort()
# 4. Instantiate and include each step
self.order = []
for S in sorted_classes:
step = S(parent, **kwargs) # Call Step.__init__
self.steps[step.name] = step
self.order.append(step)
for step in self.order:
step.include(parent) # Call Step.include -> Step.create
return self
def start(self, parent):
# Called by the parent (e.g., Worker) to start all steps
for step in self.order: # Use the sorted order
if hasattr(step, 'start'):
step.start(parent)
def stop(self, parent):
# Called by the parent to stop all steps (in reverse order)
for step in reversed(self.order):
if hasattr(step, 'stop'):
step.stop(parent)
# ... other methods like close, terminate, restart ...
```
**Explanation:**
* `default_steps`: Defines the standard components managed by this blueprint.
* `apply`: The core method that takes the step definitions, builds the `DependencyGraph` based on `requires`, gets the sorted execution `order`, and then instantiates and includes each step.
* `start`/`stop`: Iterate through the calculated `order` (or its reverse) to start/stop the components managed by each step.
* **Example Usage (Worker Components):** The worker's main components are defined as bootsteps in `celery/worker/components.py`. You can see classes like `Pool`, `Consumer`, `Timer`, `Beat`, each inheriting from `bootsteps.Step` or `bootsteps.StartStopStep` and potentially defining `requires`. The `Consumer` blueprint in `celery/worker/consumer/consumer.py` then lists many of these (`Connection`, `Events`, `Tasks`, etc.) in its `default_steps`.
## Conclusion
You've learned about Bootsteps, the underlying framework that brings order to the Celery worker's startup and shutdown procedures.
* They act as an **assembly guide** or **checklist** for the worker.
* Each core function (connecting, starting pool, consuming tasks) is a **Step**.
* Steps declare **Dependencies** (`requires`) on each other.
* A **Blueprint** groups related steps.
* Celery uses a **Dependency Graph** to determine the correct **order** to start and stop steps.
* This ensures components like the [Broker Connection (AMQP)](04_broker_connection__amqp_.md), [Worker](05_worker.md) pool, and task consumer initialize and terminate predictably.
While you typically don't write bootsteps as an end-user, understanding their role clarifies how the complex machinery of a Celery worker reliably comes to life and shuts down.
---
This concludes our introductory tour of Celery's core concepts! We hope these chapters have given you a solid foundation for understanding how Celery works and how you can use it to build robust and scalable distributed applications. Happy tasking!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

50
output/Celery/index.md Normal file
View File

@@ -0,0 +1,50 @@
# Tutorial: Celery
Celery is a system for running **distributed tasks** *asynchronously*. You define *units of work* (Tasks) in your Python code. When you want a task to run, you send a message using a **message broker** (like RabbitMQ or Redis). One or more **Worker** processes are running in the background, listening for these messages. When a worker receives a message, it executes the corresponding task. Optionally, the task's result (or any error) can be stored in a **Result Backend** (like Redis or a database) so you can check its status or retrieve the output later. Celery helps manage this whole process, making it easier to handle background jobs, scheduled tasks, and complex workflows.
**Source Repository:** [https://github.com/celery/celery/tree/d1c35bbdf014f13f4ab698d75e3ea381a017b090/celery](https://github.com/celery/celery/tree/d1c35bbdf014f13f4ab698d75e3ea381a017b090/celery)
```mermaid
flowchart TD
A0["Celery App"]
A1["Task"]
A2["Worker"]
A3["Broker Connection (AMQP)"]
A4["Result Backend"]
A5["Canvas (Signatures & Primitives)"]
A6["Beat (Scheduler)"]
A7["Configuration"]
A8["Events"]
A9["Bootsteps"]
A0 -- "Defines and sends" --> A1
A0 -- "Uses for messaging" --> A3
A0 -- "Uses for results" --> A4
A0 -- "Loads and uses" --> A7
A1 -- "Updates state in" --> A4
A2 -- "Executes" --> A1
A2 -- "Fetches tasks from" --> A3
A2 -- "Uses for lifecycle" --> A9
A5 -- "Represents task invocation" --> A1
A6 -- "Sends scheduled tasks via" --> A3
A8 -- "Sends events via" --> A3
A9 -- "Manages connection via" --> A3
```
## Chapters
1. [Celery App](01_celery_app.md)
2. [Configuration](02_configuration.md)
3. [Task](03_task.md)
4. [Broker Connection (AMQP)](04_broker_connection__amqp_.md)
5. [Worker](05_worker.md)
6. [Result Backend](06_result_backend.md)
7. [Beat (Scheduler)](07_beat__scheduler_.md)
8. [Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md)
9. [Events](09_events.md)
10. [Bootsteps](10_bootsteps.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,197 @@
# Chapter 1: Commands and Groups: The Building Blocks
Welcome to your first step in learning Click! Imagine you want to create your own command-line tool, maybe something like `git` or `docker`. How do you tell your program what to do when someone types `git commit` or `docker build`? That's where **Commands** and **Groups** come in. They are the fundamental building blocks for any Click application.
Think about a simple tool. Maybe you want a program that can greet someone. You'd type `greet Alice` in your terminal, and it would print "Hello Alice!". In Click, this single action, "greet", would be represented by a `Command`.
Now, what if your tool needed to do *more* than one thing? Maybe besides greeting, it could also say goodbye. You might want to type `mytool greet Alice` or `mytool goodbye Bob`. The main `mytool` part acts like a container or a menu, holding the different actions (`greet`, `goodbye`). This container is what Click calls a `Group`.
So:
* `Command`: Represents a single action your tool can perform.
* `Group`: Represents a collection of related actions (Commands or other Groups).
Let's dive in and see how to create them!
## Your First Command
Creating a command in Click is surprisingly simple. You basically write a normal Python function and then "decorate" it to tell Click it's a command-line command.
Let's make a command that just prints "Hello World!".
```python
# hello_app.py
import click
@click.command()
def hello():
"""A simple command that says Hello World"""
print("Hello World!")
if __name__ == '__main__':
hello()
```
Let's break this down:
1. `import click`: We need to import the Click library first.
2. `@click.command()`: This is the magic part! It's called a decorator. It transforms the Python function `hello()` right below it into a Click `Command` object. We'll learn more about [Decorators](02_decorators.md) in the next chapter, but for now, just know this line turns `hello` into something Click understands as a command.
3. `def hello(): ...`: This is a standard Python function. The code inside this function is what will run when you execute the command from your terminal.
4. `"""A simple command that says Hello World"""`: This is a docstring. Click cleverly uses the function's docstring as the help text for the command!
5. `if __name__ == '__main__': hello()`: This standard Python construct checks if the script is being run directly. If it is, it calls our `hello` command function (which is now actually a Click `Command` object).
**Try running it!** Save the code above as `hello_app.py`. Open your terminal in the same directory and run:
```bash
$ python hello_app.py
Hello World!
```
It works! You just created your first command-line command with Click.
**Bonus: Automatic Help!**
Click automatically generates help screens for you. Try running your command with `--help`:
```bash
$ python hello_app.py --help
Usage: hello_app.py [OPTIONS]
A simple command that says Hello World
Options:
--help Show this message and exit.
```
See? Click used the docstring we wrote (`A simple command that says Hello World`) and added a standard `--help` option for free!
## Grouping Commands
Okay, one command is nice, but real tools often have multiple commands. Like `git` has `commit`, `pull`, `push`, etc. Let's say we want our tool to have two commands: `hello` and `goodbye`.
We need a way to group these commands together. That's what `click.group()` is for. A `Group` acts as the main entry point and can have other commands attached to it.
```python
# multi_app.py
import click
# 1. Create the main group
@click.group()
def cli():
"""A simple tool with multiple commands."""
pass # The group function itself doesn't need to do anything
# 2. Define the 'hello' command
@click.command()
def hello():
"""Says Hello World"""
print("Hello World!")
# 3. Define the 'goodbye' command
@click.command()
def goodbye():
"""Says Goodbye World"""
print("Goodbye World!")
# 4. Attach the commands to the group
cli.add_command(hello)
cli.add_command(goodbye)
if __name__ == '__main__':
cli() # Run the main group
```
What's changed?
1. We created a function `cli` and decorated it with `@click.group()`. This makes `cli` our main entry point, a container for other commands. Notice the function body is just `pass` often, the group function itself doesn't need logic; its job is to hold other commands.
2. We defined `hello` and `goodbye` just like before, using `@click.command()`.
3. Crucially, we *attached* our commands to the group: `cli.add_command(hello)` and `cli.add_command(goodbye)`. This tells Click that `hello` and `goodbye` are subcommands of `cli`.
4. Finally, in the `if __name__ == '__main__':` block, we run `cli()`, our main group.
**Let's run this!** Save it as `multi_app.py`.
First, check the main help screen:
```bash
$ python multi_app.py --help
Usage: multi_app.py [OPTIONS] COMMAND [ARGS]...
A simple tool with multiple commands.
Options:
--help Show this message and exit.
Commands:
goodbye Says Goodbye World
hello Says Hello World
```
Look! Click now lists `goodbye` and `hello` under "Commands". It automatically figured out their names from the function names (`goodbye`, `hello`) and their help text from their docstrings.
Now, run the specific commands:
```bash
$ python multi_app.py hello
Hello World!
$ python multi_app.py goodbye
Goodbye World!
```
You've successfully created a multi-command CLI tool!
*(Self-promotion: There's an even shorter way to attach commands using decorators directly on the group, which we'll see in [Decorators](02_decorators.md)!)*
## How It Works Under the Hood
What's really happening when you use `@click.command()` or `@click.group()`?
1. **Decoration:** The decorator (`@click.command` or `@click.group`) takes your Python function (`hello`, `goodbye`, `cli`). It wraps this function inside a Click object either a `Command` instance or a `Group` instance (which is actually a special type of `Command`). These objects store your original function as the `callback` to be executed later. They also store metadata like the command name (derived from the function name) and the help text (from the docstring). You can find the code for these decorators in `decorators.py` and the `Command`/`Group` classes in `core.py`.
2. **Execution:** When you run `python multi_app.py hello`, Python executes the `cli()` call at the bottom. Since `cli` is a `Group` object created by Click, it knows how to parse the command-line arguments (`hello` in this case).
3. **Parsing & Dispatch:** The `cli` group looks at the first argument (`hello`). It checks its list of registered subcommands (which we added using `cli.add_command`). It finds a match with the `hello` command object.
4. **Callback:** The `cli` group then invokes the `hello` command object. The `hello` command object, in turn, calls the original Python function (`hello()`) that it stored earlier as its `callback`.
Here's a simplified view of what happens when you run `python multi_app.py hello`:
```mermaid
sequenceDiagram
participant User
participant Terminal
participant PythonScript (multi_app.py)
participant ClickRuntime
participant cli_Group as cli (Group Object)
participant hello_Command as hello (Command Object)
User->>Terminal: python multi_app.py hello
Terminal->>PythonScript: Executes script with args ["hello"]
PythonScript->>ClickRuntime: Calls cli() entry point
ClickRuntime->>cli_Group: Asks to handle args ["hello"]
cli_Group->>cli_Group: Parses args, identifies "hello" as subcommand
cli_Group->>hello_Command: Invokes the 'hello' command
hello_Command->>hello_Command: Executes its callback (the original hello() function)
hello_Command-->>PythonScript: Prints "Hello World!"
PythonScript-->>Terminal: Shows output
Terminal-->>User: Displays "Hello World!"
```
This process of parsing arguments and calling the right function based on the command structure is the core job of Click, making it easy for *you* to just focus on writing the functions for each command.
## Conclusion
You've learned about the two most fundamental concepts in Click:
* `Command`: Represents a single action, created by decorating a function with `@click.command()`.
* `Group`: Acts as a container for multiple commands (or other groups), created with `@click.group()`. Groups allow you to structure your CLI application logically.
We saw how Click uses decorators to transform simple Python functions into powerful command-line interface components, automatically handling things like help text generation and command dispatching.
Commands and Groups form the basic structure, but how do we pass information *into* our commands (like `git commit -m "My message"`)? And what other cool things can decorators do? We'll explore that starting with a deeper look at decorators in the next chapter!
Next up: [Chapter 2: Decorators](02_decorators.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,278 @@
# Chapter 2: Decorators: Magic Wands for Your Functions
In [Chapter 1: Commands and Groups](01_command___group.md), we learned how to create basic command-line actions (`Command`) and group them together (`Group`). You might have noticed those strange `@click.command()` and `@click.group()` lines above our functions. What are they, and why do we use them?
Those are **Decorators**, and they are the heart of how you build Click applications! Think of them as special annotations or modifiers you place *on top* of your Python functions to give them command-line superpowers.
## Why Decorators? Making Life Easier
Imagine you didn't have decorators. To create a simple command like `hello` from Chapter 1, you might have to write something like this (this is *not* real Click code, just an illustration):
```python
# NOT how Click works, but imagine...
import click
def hello_logic():
"""My command's help text"""
print("Hello World!")
# Manually create a Command object
hello_command = click.Command(
name='hello', # Give it a name
callback=hello_logic, # Tell it which function to run
help=hello_logic.__doc__ # Copy the help text
)
if __name__ == '__main__':
# Manually parse arguments and run
# (This part would be complex!)
pass
```
That looks like a lot more work! You have to:
1. Write the function (`hello_logic`).
2. Manually create a `Command` object.
3. Explicitly tell the `Command` object its name, which function to run (`callback`), and its help text.
Now, let's remember the Click way from Chapter 1:
```python
# The actual Click way
import click
@click.command() # <-- The Decorator!
def hello():
"""A simple command that says Hello World"""
print("Hello World!")
if __name__ == '__main__':
hello()
```
Much cleaner, right? The `@click.command()` decorator handles creating the `Command` object, figuring out the name (`hello`), and grabbing the help text from the docstring (`"""..."""`) all automatically!
Decorators let you *declare* what you want ("this function is a command") right next to the function's code, making your CLI definition much more readable and concise.
## What is a Decorator in Python? (A Quick Peek)
Before diving deeper into Click's decorators, let's understand what a decorator *is* in Python itself.
In Python, a decorator is essentially a function that takes another function as input and returns a *modified* version of that function. It's like wrapping a gift: you still have the original gift inside, but the wrapping adds something extra.
The `@` symbol is just syntactic sugar a shortcut for applying a decorator.
Here's a super simple example (not using Click):
```python
# A simple Python decorator
def simple_decorator(func):
def wrapper():
print("Something is happening before the function is called.")
func() # Call the original function
print("Something is happening after the function is called.")
return wrapper # Return the modified function
@simple_decorator # Apply the decorator
def say_whee():
print("Whee!")
# Now, when we call say_whee...
say_whee()
```
Running this would print:
```
Something is happening before the function is called.
Whee!
Something is happening after the function is called.
```
See? `simple_decorator` took our `say_whee` function and wrapped it with extra print statements. The `@simple_decorator` line is equivalent to writing `say_whee = simple_decorator(say_whee)` after defining `say_whee`.
Click's decorators (`@click.command`, `@click.group`, etc.) do something similar, but instead of just printing, they wrap your function inside Click's `Command` or `Group` objects and configure them.
## Click's Main Decorators
Click provides several decorators. The most common ones you'll use are:
* `@click.command()`: Turns a function into a single CLI command.
* `@click.group()`: Turns a function into a container for other commands.
* `@click.option()`: Adds an *option* (like `--name` or `-v`) to your command. Options are typically optional parameters.
* `@click.argument()`: Adds an *argument* (like a required filename) to your command. Arguments are typically required and positional.
We already saw `@click.command` and `@click.group` in Chapter 1. Let's focus on how decorators streamline adding commands to groups and introduce options.
## Decorators in Action: Simplifying Groups and Adding Options
Remember the `multi_app.py` example from Chapter 1? We had to define the group `cli` and the commands `hello` and `goodbye` separately, then manually attach them using `cli.add_command()`.
```python
# multi_app_v1.py (from Chapter 1)
import click
@click.group()
def cli():
"""A simple tool with multiple commands."""
pass
@click.command()
def hello():
"""Says Hello World"""
print("Hello World!")
@click.command()
def goodbye():
"""Says Goodbye World"""
print("Goodbye World!")
# Manual attachment
cli.add_command(hello)
cli.add_command(goodbye)
if __name__ == '__main__':
cli()
```
Decorators provide a more elegant way! If you have a `@click.group()`, you can use *its* `.command()` method as a decorator to automatically attach the command.
Let's rewrite `multi_app.py` using this decorator pattern and also add a simple name option to the `hello` command using `@click.option`:
```python
# multi_app_v2.py (using decorators more effectively)
import click
# 1. Create the main group
@click.group()
def cli():
"""A simple tool with multiple commands."""
pass # Group function still doesn't need to do much
# 2. Define 'hello' and attach it to 'cli' using a decorator
@cli.command() # <-- Decorator from the 'cli' group object!
@click.option('--name', default='World', help='Who to greet.')
def hello(name): # The 'name' parameter matches the option
"""Says Hello"""
print(f"Hello {name}!")
# 3. Define 'goodbye' and attach it to 'cli' using a decorator
@cli.command() # <-- Decorator from the 'cli' group object!
def goodbye():
"""Says Goodbye"""
print("Goodbye World!")
# No need for cli.add_command() anymore!
if __name__ == '__main__':
cli()
```
What changed?
1. Instead of `@click.command()`, we used `@cli.command()` above `hello` and `goodbye`. This tells Click, "This function is a command, *and* it belongs to the `cli` group." No more manual `cli.add_command()` needed!
2. We added `@click.option('--name', default='World', help='Who to greet.')` right below `@cli.command()` for the `hello` function. This adds a command-line option named `--name`.
3. The `hello` function now accepts an argument `name`. Click automatically passes the value provided via the `--name` option to this function parameter. If the user doesn't provide `--name`, it uses the `default='World'`.
**Let's run this new version:**
Check the help for the main command:
```bash
$ python multi_app_v2.py --help
Usage: multi_app_v2.py [OPTIONS] COMMAND [ARGS]...
A simple tool with multiple commands.
Options:
--help Show this message and exit.
Commands:
goodbye Says Goodbye
hello Says Hello
```
Now check the help for the `hello` subcommand:
```bash
$ python multi_app_v2.py hello --help
Usage: multi_app_v2.py hello [OPTIONS]
Says Hello
Options:
--name TEXT Who to greet. [default: World]
--help Show this message and exit.
```
See? The `--name` option is listed, along with its help text and default value!
Finally, run `hello` with and without the option:
```bash
$ python multi_app_v2.py hello
Hello World!
$ python multi_app_v2.py hello --name Alice
Hello Alice!
```
It works! Decorators made adding the command to the group cleaner, and adding the option was as simple as adding another decorator line and a function parameter. We'll learn much more about configuring options and arguments in the next chapter, [Parameter (Option / Argument)](03_parameter__option___argument_.md).
## How Click Decorators Work (Under the Hood)
So what's the "magic" behind these `@` symbols in Click?
1. **Decorator Functions:** When you write `@click.command()` or `@click.option()`, you're calling functions defined in Click (specifically in `decorators.py`). These functions are designed to *return another function* (the actual decorator).
2. **Wrapping the User Function:** Python takes the function you defined (e.g., `hello`) and passes it to the decorator function returned in step 1.
3. **Attaching Information:**
* `@click.option` / `@click.argument`: These decorators typically don't create the final `Command` object immediately. Instead, they attach the parameter information (like the option name `--name`, type, default value) to your function object itself, often using a special temporary attribute (like `__click_params__`). They then return the *original function*, but now with this extra metadata attached.
* `@click.command` / `@click.group`: This decorator usually runs *last* (decorators are applied bottom-up). It looks for any parameter information attached by previous `@option` or `@argument` decorators (like `__click_params__`). It then creates the actual `Command` or `Group` object (defined in `core.py`), configures it with the command name, help text (from the docstring), the attached parameters, and stores your original function as the `callback` to be executed. It returns this newly created `Command` or `Group` object, effectively replacing your original function definition with the Click object.
4. **Group Attachment:** When you use `@cli.command()`, the `@cli.command()` decorator not only creates the `Command` object but also automatically calls `cli.add_command()` to register the new command with the `cli` group object.
Here's a simplified sequence diagram showing what happens when you define the `hello` command in `multi_app_v2.py`:
```mermaid
sequenceDiagram
participant PythonInterpreter
participant click_option as @click.option('--name')
participant hello_func as hello(name)
participant cli_command as @cli.command()
participant cli_Group as cli (Group Object)
participant hello_Command as hello (New Command Object)
Note over PythonInterpreter, hello_func: Python processes decorators bottom-up
PythonInterpreter->>click_option: Processes @click.option('--name', ...) decorator
click_option->>hello_func: Attaches Option info (like in __click_params__)
click_option-->>PythonInterpreter: Returns original hello_func (with attached info)
PythonInterpreter->>cli_command: Processes @cli.command() decorator
cli_command->>hello_func: Reads function name, docstring, attached params (__click_params__)
cli_command->>hello_Command: Creates new Command object for 'hello'
cli_command->>cli_Group: Calls cli.add_command(hello_Command)
cli_command-->>PythonInterpreter: Returns the new hello_Command object
Note over PythonInterpreter: 'hello' in the code now refers to the Command object
```
The key takeaway is that decorators allow Click to gather all the necessary information (function logic, command name, help text, options, arguments) right where you define the function, and build the corresponding Click objects behind the scenes. You can find the implementation details in `click/decorators.py` and `click/core.py`. The `_param_memo` helper function in `decorators.py` is often used internally by `@option` and `@argument` to attach parameter info to the function before `@command` processes it.
## Conclusion
Decorators are fundamental to Click's design philosophy. They provide a clean, readable, and *declarative* way to turn your Python functions into powerful command-line interface components.
You've learned:
* Decorators are Python features (`@`) that modify functions.
* Click uses decorators like `@click.command`, `@click.group`, `@click.option`, and `@click.argument` extensively.
* Decorators handle the creation and configuration of `Command`, `Group`, `Option`, and `Argument` objects for you.
* Using decorators like `@group.command()` automatically attaches commands to groups.
* They make defining your CLI structure intuitive and keep related code together.
We've only scratched the surface of `@click.option` and `@click.argument`. How do you make options required? How do you handle different data types (numbers, files)? How do you define arguments that take multiple values? We'll explore all of this in the next chapter!
Next up: [Chapter 3: Parameter (Option / Argument)](03_parameter__option___argument_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,249 @@
# Chapter 3: Parameter (Option / Argument) - Giving Your Commands Input
In the last chapter, [Decorators](02_decorators.md), we saw how decorators like `@click.command()` and `@click.option()` act like magic wands, transforming our Python functions into CLI commands and adding features like command-line options.
But how do our commands actually *receive* information from the user? If we have a command `greet`, how do we tell it *who* to greet, like `greet --name Alice`? Or if we have a `copy` command, how do we specify the source and destination files, like `copy report.txt backup.txt`?
This is where **Parameters** come in. Parameters define the inputs your commands can accept, just like arguments define the inputs for a regular Python function. Click handles parsing these inputs from the command line, validating them, and making them available to your command function.
There are two main types of parameters in Click:
1. **Options:** These are usually preceded by flags like `--verbose` or `-f`. They are often optional and can either take a value (like `--name Alice`) or act as simple on/off switches (like `--verbose`). You define them using the `@click.option()` decorator.
2. **Arguments:** These are typically positional values that come *after* any options. They often represent required inputs, like a filename (`report.txt`). You define them using the `@click.argument()` decorator.
Let's see how to use them!
## Options: The Named Inputs (`@click.option`)
Think of options like keyword arguments in Python functions. In `def greet(name="World"):`, `name` is a keyword argument with a default value. Options serve a similar purpose for your CLI.
Let's modify our `hello` command from the previous chapter to accept a `--name` option.
```python
# greet_app.py
import click
@click.group()
def cli():
"""A simple tool with a greeting command."""
pass
@cli.command()
@click.option('--name', default='World', help='Who to greet.')
def hello(name): # <-- The 'name' parameter matches the option
"""Greets the person specified by the --name option."""
print(f"Hello {name}!")
if __name__ == '__main__':
cli()
```
Let's break down the new parts:
1. `@click.option('--name', default='World', help='Who to greet.')`: This decorator defines an option.
* `'--name'`: This is the primary name of the option on the command line.
* `default='World'`: If the user doesn't provide the `--name` option, the value `World` will be used.
* `help='Who to greet.'`: This text will appear in the help message for the `hello` command.
2. `def hello(name):`: Notice how the `hello` function now accepts an argument named `name`. Click cleverly matches the option name (`name`) to the function parameter name and passes the value automatically!
**Try running it!**
First, check the help message for the `hello` command:
```bash
$ python greet_app.py hello --help
Usage: greet_app.py hello [OPTIONS]
Greets the person specified by the --name option.
Options:
--name TEXT Who to greet. [default: World]
--help Show this message and exit.
```
See? Click added our `--name` option to the help screen, including the help text and default value we provided. The `TEXT` part indicates the type of value expected (we'll cover types in [ParamType](04_paramtype.md)).
Now, run it with and without the option:
```bash
$ python greet_app.py hello
Hello World!
$ python greet_app.py hello --name Alice
Hello Alice!
```
It works perfectly! Click parsed the `--name Alice` option and passed `"Alice"` to our `hello` function's `name` parameter. When we didn't provide the option, it used the default value `"World"`.
### Option Flavors: Short Names and Flags
Options can have variations:
* **Short Names:** You can provide shorter aliases, like `-n` for `--name`.
* **Flags:** Options that don't take a value but act as switches (e.g., `--verbose`).
Let's add a short name `-n` to our `--name` option and a `--shout` flag to make the greeting uppercase.
```python
# greet_app_v2.py
import click
@click.group()
def cli():
"""A simple tool with a greeting command."""
pass
@cli.command()
@click.option('--name', '-n', default='World', help='Who to greet.') # Added '-n'
@click.option('--shout', is_flag=True, help='Greet loudly.') # Added '--shout' flag
def hello(name, shout): # <-- Function now accepts 'shout' too
"""Greets the person, optionally shouting."""
greeting = f"Hello {name}!"
if shout:
greeting = greeting.upper()
print(greeting)
if __name__ == '__main__':
cli()
```
Changes:
1. `@click.option('--name', '-n', ...)`: We added `'-n'` as the second argument to the decorator. Now, both `--name` and `-n` work.
2. `@click.option('--shout', is_flag=True, ...)`: This defines a flag. `is_flag=True` tells Click this option doesn't take a value; its presence makes the corresponding parameter `True`, otherwise it's `False`.
3. `def hello(name, shout):`: The function signature is updated to accept the `shout` parameter.
**Run it again!**
```bash
$ python greet_app_v2.py hello -n Bob
Hello Bob!
$ python greet_app_v2.py hello --name Carol --shout
HELLO CAROL!
$ python greet_app_v2.py hello --shout
HELLO WORLD!
```
Flags and short names make your CLI more flexible and conventional!
## Arguments: The Positional Inputs (`@click.argument`)
Arguments are like positional arguments in Python functions. In `def copy(src, dst):`, `src` and `dst` are required positional arguments. Click arguments usually represent mandatory inputs that follow the command and any options.
Let's create a simple command that takes two arguments, `SRC` and `DST`, representing source and destination files (though we'll just print them for now).
```python
# copy_app.py
import click
@click.command()
@click.argument('src') # Defines the first argument
@click.argument('dst') # Defines the second argument
def copy(src, dst): # Function parameters match argument names
"""Copies SRC file to DST."""
print(f"Pretending to copy '{src}' to '{dst}'")
if __name__ == '__main__':
copy()
```
What's happening here?
1. `@click.argument('src')`: Defines a positional argument named `src`. By default, arguments are required. The name `'src'` is used both internally and often capitalized (`SRC`) in help messages by convention.
2. `@click.argument('dst')`: Defines the second required positional argument.
3. `def copy(src, dst):`: The function parameters `src` and `dst` receive the values provided on the command line in the order they appear.
**Let's try it!**
First, see what happens if we forget the arguments:
```bash
$ python copy_app.py
Usage: copy_app.py [OPTIONS] SRC DST
Try 'copy_app.py --help' for help.
Error: Missing argument 'SRC'.
```
Click automatically detects the missing argument and gives a helpful error message!
Now, provide the arguments:
```bash
$ python copy_app.py report.txt backup/report.txt
Pretending to copy 'report.txt' to 'backup/report.txt'
```
Click correctly captured the positional arguments and passed them to our `copy` function.
Arguments are essential for inputs that are fundamental to the command's operation, like the files to operate on. Options are better suited for modifying the command's behavior.
*(Note: Arguments can also be made optional or accept variable numbers of inputs, often involving the `required` and `nargs` settings, which tie into concepts we'll explore more in [ParamType](04_paramtype.md).)*
## How Parameters Work Together
When you run a command like `python greet_app_v2.py hello --shout -n Alice`, Click performs a sequence of steps:
1. **Parsing:** Click looks at the command-line arguments (`sys.argv`) provided by the operating system: `['greet_app_v2.py', 'hello', '--shout', '-n', 'Alice']`.
2. **Command Identification:** It identifies `hello` as the command to execute.
3. **Parameter Matching:** It scans the remaining arguments (`['--shout', '-n', 'Alice']`).
* It sees `--shout`. It looks up the parameters defined for the `hello` command (using the `@click.option` and `@click.argument` decorators). It finds the `shout` option definition (which has `is_flag=True`). It marks the value for `shout` as `True`.
* It sees `-n`. It finds the `name` option definition (which includes `-n` as an alias and expects a value).
* It sees `Alice`. Since the previous token (`-n`) expected a value, Click associates `"Alice"` with the `-n` (and thus `--name`) option. It marks the value for `name` as `"Alice"`.
4. **Validation & Conversion:** Click checks if all required parameters are present (they are). It also performs type conversion (though in this case, the default is string, which matches "Alice"). We'll see more complex conversions in the next chapter.
5. **Function Call:** Finally, Click calls the command's underlying Python function (`hello`) with the collected values as keyword arguments: `hello(name='Alice', shout=True)`.
Here's a simplified view of the process:
```mermaid
sequenceDiagram
participant User
participant Terminal
participant PythonScript as python greet_app_v2.py
participant ClickRuntime
participant hello_func as hello(name, shout)
User->>Terminal: python greet_app_v2.py hello --shout -n Alice
Terminal->>PythonScript: Executes script with args ["hello", "--shout", "-n", "Alice"]
PythonScript->>ClickRuntime: Calls cli() entry point
ClickRuntime->>ClickRuntime: Parses args, finds 'hello' command
ClickRuntime->>ClickRuntime: Identifies '--shout' as flag for 'shout' parameter (value=True)
ClickRuntime->>ClickRuntime: Identifies '-n' as option for 'name' parameter
ClickRuntime->>ClickRuntime: Consumes 'Alice' as value for '-n'/'name' parameter (value="Alice")
ClickRuntime->>ClickRuntime: Validates parameters, performs type conversion
ClickRuntime->>hello_func: Calls callback: hello(name="Alice", shout=True)
hello_func-->>PythonScript: Prints "HELLO ALICE!"
PythonScript-->>Terminal: Shows output
Terminal-->>User: Displays "HELLO ALICE!"
```
## Under the Hood: Decorators and Parameter Objects
How do `@click.option` and `@click.argument` actually work with `@click.command`?
1. **Parameter Definition (`decorators.py`, `core.py`):** When you use `@click.option(...)` or `@click.argument(...)`, these functions (defined in `click/decorators.py`) create instances of the `Option` or `Argument` classes (defined in `click/core.py`). These objects store all the configuration you provided (like `--name`, `-n`, `default='World'`, `is_flag=True`, etc.).
2. **Attaching to Function (`decorators.py`):** Crucially, these decorators don't immediately add the parameters to a command. Instead, they attach the created `Option` or `Argument` object to the function they are decorating. Click uses a helper mechanism (like the internal `_param_memo` function which adds to a `__click_params__` list) to store these parameter objects *on* the function object temporarily.
3. **Command Creation (`decorators.py`, `core.py`):** The `@click.command()` decorator (or `@group.command()`) runs *after* all the `@option` and `@argument` decorators for that function. It looks for the attached parameter objects (the `__click_params__` list). It gathers these objects and passes them to the constructor of the `Command` (or `Group`) object it creates. The `Command` object stores these parameters in its `params` attribute.
4. **Parsing (`parser.py`, `core.py`):** When the command is invoked, the `Command` object uses its `params` list to configure an internal parser (historically based on Python's `optparse`, see `click/parser.py`). This parser processes the command-line string (`sys.argv`) according to the rules defined by the `Option` and `Argument` objects in the `params` list.
5. **Callback Invocation (`core.py`):** After parsing and validation, Click takes the resulting values and calls the original Python function (stored as the `Command.callback`), passing the values as arguments.
So, the decorators work together: `@option`/`@argument` define the parameters and temporarily attach them to the function, while `@command` collects these definitions and builds the final `Command` object, ready for parsing.
## Conclusion
You've learned how to make your Click commands interactive by defining inputs using **Parameters**:
* **Options (`@click.option`):** Named inputs, often optional, specified with flags (`--name`, `-n`). Great for controlling behavior (like `--verbose`, `--shout`) or providing specific pieces of data (`--output file.txt`).
* **Arguments (`@click.argument`):** Positional inputs, often required, that follow options (`input.csv`). Ideal for core data the command operates on (like source/destination files).
You saw how Click uses decorators to define these parameters and automatically handles parsing the command line, providing default values, generating help messages, and passing the final values to your Python function.
But what if you want an option to accept only numbers? Or a choice from a predefined list? Or maybe an argument that represents a file path that must exist? Click handles this through **Parameter Types**. Let's explore those next!
Next up: [Chapter 4: ParamType](04_paramtype.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,257 @@
# Chapter 4: ParamType - Checking and Converting Inputs
In [Chapter 3: Parameter (Option / Argument)](03_parameter__option___argument_.md), we learned how to define inputs for our commands using `@click.option` and `@click.argument`. Our `greet` command could take a `--name` option, and our `copy` command took `SRC` and `DST` arguments.
But what if we need more control? What if our command needs a *number* as input, like `--count 3`? Or what if an option should only accept specific words, like `--level easy` or `--level hard`? Right now, Click treats most inputs as simple text strings.
This is where **ParamType** comes in! Think of `ParamType`s as the **gatekeepers** and **translators** for your command-line inputs. They:
1. **Validate:** Check if the user's input looks correct (e.g., "Is this actually a number?").
2. **Convert:** Change the input text (which is always initially a string) into the Python type you need (e.g., the string `"3"` becomes the integer `3`).
`ParamType`s make your commands more robust by catching errors early and giving your Python code the data types it expects.
## Why Do We Need ParamTypes?
Imagine you're writing a command to repeat a message multiple times:
```bash
repeat --times 5 "Hello!"
```
Inside your Python function, you want the `times` variable to be an integer so you can use it in a loop. If the user types `repeat --times five "Hello!"`, your code might crash if it tries to use the string `"five"` like a number.
`ParamType` solves this. By telling Click that the `--times` option expects an integer, Click will automatically:
* Check if the input (`"5"`) can be turned into an integer.
* If yes, convert it to the integer `5` and pass it to your function.
* If no (like `"five"`), stop immediately and show the user a helpful error message *before* your function even runs!
## Using Built-in ParamTypes
Click provides several ready-to-use `ParamType`s. You specify which one to use with the `type` argument in `@click.option` or `@click.argument`.
Let's modify an example to use `click.INT`.
```python
# count_app.py
import click
@click.command()
@click.option('--count', default=1, type=click.INT, help='Number of times to print.')
@click.argument('message')
def repeat(count, message):
"""Prints MESSAGE the specified number of times."""
# 'count' is now guaranteed to be an integer!
for _ in range(count):
click.echo(message)
if __name__ == '__main__':
repeat()
```
Breakdown:
1. `import click`: As always.
2. `@click.option('--count', ..., type=click.INT, ...)`: This is the key change! We added `type=click.INT`. This tells Click that the value provided for `--count` must be convertible to an integer. `click.INT` is one of Click's built-in `ParamType` instances.
3. `def repeat(count, message):`: The `count` parameter in our function will receive the *converted* integer value.
**Let's run it!**
```bash
$ python count_app.py --count 3 "Woohoo!"
Woohoo!
Woohoo!
Woohoo!
```
It works! Click converted the input string `"3"` into the Python integer `3` before calling our `repeat` function.
Now, see what happens with invalid input:
```bash
$ python count_app.py --count five "Oh no"
Usage: count_app.py [OPTIONS] MESSAGE
Try 'count_app.py --help' for help.
Error: Invalid value for '--count': 'five' is not a valid integer.
```
Perfect! Click caught the error because `"five"` couldn't be converted by `click.INT`. It printed a helpful message and prevented our `repeat` function from running with bad data.
## Common Built-in Types
Click offers several useful built-in types:
* `click.STRING`: The default type. Converts the input to a string (usually doesn't change much unless the input was bytes).
* `click.INT`: Converts to an integer. Fails if the input isn't a valid whole number.
* `click.FLOAT`: Converts to a floating-point number. Fails if the input isn't a valid number (e.g., `3.14`, `-0.5`).
* `click.BOOL`: Converts to a boolean (`True`/`False`). It's clever and understands inputs like `'1'`, `'true'`, `'t'`, `'yes'`, `'y'`, `'on'` as `True`, and `'0'`, `'false'`, `'f'`, `'no'`, `'n'`, `'off'` as `False`. Usually used for options that aren't flags.
* `click.Choice`: Checks if the value is one of a predefined list of choices.
```python
# choice_example.py
import click
@click.command()
@click.option('--difficulty', type=click.Choice(['easy', 'medium', 'hard'], case_sensitive=False), default='easy')
def setup(difficulty):
click.echo(f"Setting up game with difficulty: {difficulty}")
if __name__ == '__main__':
setup()
```
Running `python choice_example.py --difficulty MeDiUm` works (because `case_sensitive=False`), but `python choice_example.py --difficulty expert` would fail.
* `click.Path`: Represents a filesystem path. It can check if the path exists, if it's a file or directory, and if it has certain permissions (read/write/execute). It returns the path as a string (or `pathlib.Path` if configured).
```python
# path_example.py
import click
@click.command()
@click.argument('output_dir', type=click.Path(exists=True, file_okay=False, dir_okay=True, writable=True))
def process(output_dir):
click.echo(f"Processing data into directory: {output_dir}")
# We know output_dir exists, is a directory, and is writable!
if __name__ == '__main__':
process()
```
* `click.File`: Similar to `Path`, but it *automatically opens* the file and passes the open file object to your function. It also handles closing the file automatically. You can specify the mode (`'r'`, `'w'`, `'rb'`, `'wb'`).
```python
# file_example.py
import click
@click.command()
@click.argument('input_file', type=click.File('r')) # Open for reading text
def cat(input_file):
# input_file is an open file handle!
click.echo(input_file.read())
# Click will close the file automatically after this function returns
if __name__ == '__main__':
cat()
```
These built-in types cover most common use cases for validating and converting command-line inputs.
## How ParamTypes Work Under the Hood
What happens when you specify `type=click.INT`?
1. **Parsing:** As described in [Chapter 3](03_parameter__option___argument_.md), Click's parser identifies the command-line arguments and matches them to your defined `Option`s and `Argument`s. It finds the raw string value provided by the user (e.g., `"3"` for `--count`).
2. **Type Retrieval:** The parser looks at the `Parameter` object (the `Option` or `Argument`) and finds the `type` you assigned to it (e.g., the `click.INT` instance).
3. **Conversion Attempt:** The parser calls the `convert()` method of the `ParamType` instance, passing the raw string value (`"3"`), the parameter object itself, and the current [Context](05_context.md).
4. **Validation & Conversion Logic (Inside `ParamType.convert`)**:
* The `click.INT.convert()` method tries to call Python's built-in `int("3")`.
* If this succeeds, it returns the result (the integer `3`).
* If it fails (e.g., `int("five")` would raise a `ValueError`), the `convert()` method catches this error.
5. **Success or Failure**:
* **Success:** The parser receives the converted value (`3`) and stores it. Later, it passes this value to your command function.
* **Failure:** The `convert()` method calls its `fail()` helper method. The `fail()` method raises a `click.BadParameter` exception with a helpful error message (e.g., "'five' is not a valid integer."). Click catches this exception, stops further processing, and displays the error message to the user along with usage instructions.
Here's a simplified view of the successful conversion process:
```mermaid
sequenceDiagram
participant User
participant CLI
participant ClickParser as Click Parser
participant IntType as click.INT
participant CommandFunc as Command Function
User->>CLI: python count_app.py --count 3 ...
CLI->>ClickParser: Parse args, find '--count' option with value '3'
ClickParser->>IntType: Call convert(value='3', param=..., ctx=...)
IntType->>IntType: Attempt int('3') -> Success! returns 3
IntType-->>ClickParser: Return converted value: 3
ClickParser->>CommandFunc: Call repeat(count=3, ...)
CommandFunc-->>CLI: Executes logic (prints message 3 times)
```
And here's the failure process:
```mermaid
sequenceDiagram
participant User
participant CLI
participant ClickParser as Click Parser
participant IntType as click.INT
participant ClickException as Click Exception Handling
User->>CLI: python count_app.py --count five ...
CLI->>ClickParser: Parse args, find '--count' option with value 'five'
ClickParser->>IntType: Call convert(value='five', param=..., ctx=...)
IntType->>IntType: Attempt int('five') -> Fails! (ValueError)
IntType->>ClickException: Catch error, call fail("'five' is not...") -> raises BadParameter
ClickException-->>ClickParser: BadParameter exception raised
ClickParser-->>CLI: Catch exception, stop processing
CLI-->>User: Display "Error: Invalid value for '--count': 'five' is not a valid integer."
```
The core logic for built-in types resides in `click/types.py`. Each type (like `IntParamType`, `Choice`, `Path`) inherits from the base `ParamType` class and implements its own `convert` method containing the specific validation and conversion rules.
```python
# Simplified structure from click/types.py
class ParamType:
name: str # Human-readable name like "integer" or "filename"
def convert(self, value, param, ctx):
# Must be implemented by subclasses
# Should return the converted value or call self.fail()
raise NotImplementedError
def fail(self, message, param, ctx):
# Raises a BadParameter exception
raise BadParameter(message, ctx=ctx, param=param)
class IntParamType(ParamType):
name = "integer"
def convert(self, value, param, ctx):
try:
# The core conversion logic!
return int(value)
except ValueError:
# If conversion fails, raise the standard error
self.fail(f"{value!r} is not a valid integer.", param, ctx)
# click.INT is just an instance of this class
INT = IntParamType()
```
## Custom Types
What if none of the built-in types do exactly what you need? Click allows you to create your own custom `ParamType`s! You can do this by subclassing `click.ParamType` and implementing the `name` attribute and the `convert` method. This is an advanced topic, but it provides great flexibility.
## Shell Completion Hints
An added benefit of using specific `ParamType`s is that they can provide hints for shell completion (when the user presses Tab). For example:
* `click.Choice(['easy', 'medium', 'hard'])` can suggest `easy`, `medium`, or `hard`.
* `click.Path` can suggest file and directory names from the current location.
This makes your CLI even more user-friendly.
## Conclusion
`ParamType`s are a fundamental part of Click, acting as the bridge between raw command-line text input and the well-typed data your Python functions need. They handle the crucial tasks of:
* **Validating** user input against expected formats or rules.
* **Converting** input strings to appropriate Python types (integers, booleans, files, etc.).
* **Generating** user-friendly error messages for invalid input.
* Providing hints for **shell completion**.
By using built-in types like `click.INT`, `click.Choice`, `click.Path`, and `click.File`, you make your commands more robust, reliable, and easier to use.
So far, we've seen how commands are structured, how parameters get their values, and how those values are validated and converted. But how does Click manage the state *during* the execution of a command? How does it know which command is running or what the parent commands were? That's the job of the `Context`. Let's explore that next!
Next up: [Chapter 5: Context](05_context.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

271
output/Click/05_context.md Normal file
View File

@@ -0,0 +1,271 @@
# Chapter 5: Context - The Command's Nervous System
In the last chapter, [ParamType](04_paramtype.md), we saw how Click helps validate and convert user input into the right Python types, making our commands more robust. We used types like `click.INT` and `click.Path` to ensure data correctness.
But what happens *while* a command is running? How does Click keep track of which command is being executed, what parameters were passed, or even shared information between different commands in a nested structure (like `git remote add ...`)?
This is where the **Context** object, often referred to as `ctx`, comes into play. Think of the Context as the central nervous system for a single command invocation. It carries all the vital information about the current state of execution.
## Why Do We Need a Context?
Imagine you have a command that needs to behave differently based on a global configuration, maybe a `--verbose` flag set on the main application group. Or perhaps one command needs to call another command within the same application. How do they communicate?
The Context object solves these problems by providing a central place to:
* Access parameters passed to the *current* command.
* Access parameters or settings from *parent* commands.
* Share application-level objects (like configuration settings or database connections) between commands.
* Manage resources that need cleanup (like automatically closing files opened with `click.File`).
* Invoke other commands programmatically.
Let's explore how to access and use this powerful object.
## Getting the Context: `@pass_context`
Click doesn't automatically pass the Context object to your command function. You need to explicitly ask for it using a special decorator: `@click.pass_context`.
When you add `@click.pass_context` *above* your function definition (but typically *below* the `@click.command` or `@click.option` decorators), Click will automatically **inject** the `Context` object as the **very first argument** to your function.
Let's see a simple example:
```python
# context_basics.py
import click
@click.group()
@click.pass_context # Request the context for the group function
def cli(ctx):
"""A simple CLI with context."""
# We can store arbitrary data on the context's 'obj' attribute
ctx.obj = {'verbose': False} # Initialize a shared dictionary
@cli.command()
@click.option('--verbose', is_flag=True, help='Enable verbose mode.')
@click.pass_context # Request the context for the command function
def info(ctx, verbose):
"""Prints info, possibly verbosely."""
# Access the command name from the context
click.echo(f"Executing command: {ctx.command.name}")
# Access parameters passed to *this* command
click.echo(f"Verbose flag (local): {verbose}")
# We can modify the shared object from the parent context
if verbose:
ctx.obj['verbose'] = True
# Access the shared object from the parent context
click.echo(f"Verbose setting (shared): {ctx.obj['verbose']}")
if __name__ == '__main__':
cli()
```
Let's break it down:
1. `@click.pass_context`: We apply this decorator to both the `cli` group function and the `info` command function.
2. `def cli(ctx): ...`: Because of `@pass_context`, the `cli` function now receives the `Context` object as its first argument, which we've named `ctx`.
3. `ctx.obj = {'verbose': False}`: The `ctx.obj` attribute is a special place designed for you to store and share *your own* application data. Here, the main `cli` group initializes it as a dictionary. This object will be automatically inherited by child command contexts.
4. `def info(ctx, verbose): ...`: The `info` command function also receives the `Context` (`ctx`) as its first argument, followed by its own parameters (`verbose`).
5. `ctx.command.name`: We access the `Command` object associated with the current context via `ctx.command` and get its name.
6. `ctx.obj['verbose'] = True`: We can *modify* the shared `ctx.obj` from within the subcommand.
7. `click.echo(f"Verbose setting (shared): {ctx.obj['verbose']}")`: We access the potentially modified shared state.
**Run it!**
```bash
$ python context_basics.py info
Executing command: info
Verbose flag (local): False
Verbose setting (shared): False
$ python context_basics.py info --verbose
Executing command: info
Verbose flag (local): True
Verbose setting (shared): True
```
You can see how `@pass_context` gives us access to the runtime environment (`ctx.command.name`) and allows us to use `ctx.obj` to share state between the parent group (`cli`) and the subcommand (`info`).
## Key Context Attributes
The `Context` object has several useful attributes:
* `ctx.command`: The [Command](01_command___group.md) object that this context belongs to. You can get its name (`ctx.command.name`), parameters, etc.
* `ctx.parent`: The context of the invoking command. If this is the top-level command, `ctx.parent` will be `None`. This forms a linked list or chain back to the root context.
* `ctx.params`: A dictionary mapping parameter names to the *final* values passed to the command, after parsing, type conversion, and defaults have been applied.
```python
# access_params.py
import click
@click.command()
@click.option('--name', default='Guest')
@click.pass_context
def hello(ctx, name):
click.echo(f"Hello, {name}!")
# Access the parameter value directly via ctx.params
click.echo(f"(Value from ctx.params: {ctx.params['name']})")
if __name__ == '__main__':
hello()
```
Running `python access_params.py --name Alice` would show `Hello, Alice!` and `(Value from ctx.params: Alice)`.
* `ctx.obj`: As seen before, this is an arbitrary object that gets passed down the context chain. It's commonly used for shared configuration, database connections, or other application-level state. You can also use `@click.pass_obj` as a shortcut if you *only* need `ctx.obj`.
* `ctx.info_name`: The name that was used on the command line to invoke this command or group (e.g., `info` in `python context_basics.py info`).
* `ctx.invoked_subcommand`: For groups, this holds the name of the subcommand that was invoked (or `None` if no subcommand was called).
## Calling Other Commands
Sometimes, you want one command to trigger another. The Context provides methods for this:
* `ctx.invoke(other_command, **params)`: Calls another Click command (`other_command`), passing the current context's parent (`ctx.parent`) as the new command's parent. It uses the provided `params` for the call.
* `ctx.forward(other_command)`: Similar to `invoke`, but it automatically passes all parameters from the *current* context (`ctx.params`) to the `other_command`. This is useful for creating alias commands.
```python
# invoke_example.py
import click
@click.group()
def cli():
pass
@cli.command()
@click.argument('text')
def print_it(text):
"""Prints the given text."""
click.echo(f"Printing: {text}")
@cli.command()
@click.argument('message')
@click.pass_context # Need context to call invoke
def shout(ctx, message):
"""Shouts the message by calling print_it."""
click.echo("About to invoke print_it...")
# Call the 'print_it' command, passing the uppercased message
ctx.invoke(print_it, text=message.upper())
click.echo("Finished invoking print_it.")
if __name__ == '__main__':
cli()
```
Running `python invoke_example.py shout "hello world"` will output:
```
About to invoke print_it...
Printing: HELLO WORLD
Finished invoking print_it.
```
The `shout` command successfully called the `print_it` command programmatically using `ctx.invoke()`.
## Resource Management (`ctx.call_on_close`)
Click uses the context internally to manage resources. For instance, when you use `type=click.File('w')`, Click opens the file and registers a cleanup function using `ctx.call_on_close(file.close)`. This ensures the file is closed when the context is finished, even if errors occur.
You can use this mechanism yourself if you need custom resource cleanup tied to the command's lifecycle.
```python
# resource_management.py
import click
class MockResource:
def __init__(self, name):
self.name = name
click.echo(f"Resource '{self.name}' opened.")
def close(self):
click.echo(f"Resource '{self.name}' closed.")
@click.command()
@click.pass_context
def process(ctx):
"""Opens and closes a mock resource."""
res = MockResource("DataFile")
# Register the close method to be called when the context ends
ctx.call_on_close(res.close)
click.echo("Processing with resource...")
# Function ends, context tears down, call_on_close triggers
if __name__ == '__main__':
process()
```
Running this script will show:
```
Resource 'DataFile' opened.
Processing with resource...
Resource 'DataFile' closed.
```
The resource was automatically closed because we registered its `close` method with `ctx.call_on_close`.
## How Context Works Under the Hood
1. **Initial Context:** When you run your Click application (e.g., by calling `cli()`), Click creates the first `Context` object associated with the top-level command or group (`cli` in our examples).
2. **Parsing and Subcommand:** Click parses the command-line arguments. If a subcommand is identified (like `info` in `python context_basics.py info`), Click finds the corresponding `Command` object.
3. **Child Context Creation:** Before executing the subcommand's callback function, Click creates a *new* `Context` object for the subcommand. Crucially, it sets the `parent` attribute of this new context to the context of the invoking command (the `cli` context in our example).
4. **Object Inheritance:** The `ctx.obj` attribute is automatically passed down from the parent context to the child context *by reference* (unless the child explicitly sets its own `ctx.obj`).
5. **`@pass_context` Decorator:** This decorator (defined in `decorators.py`) wraps your callback function. When the wrapped function is called, the decorator uses `click.globals.get_current_context()` (which accesses a thread-local stack of contexts) to fetch the *currently active* context and inserts it as the first argument before calling your original function.
6. **`ctx.invoke`:** When you call `ctx.invoke(other_cmd, ...)`, Click finds the `other_cmd` object, creates a *new* context for it (setting its parent to `ctx.parent`), populates its `params` from the arguments you provided, and then executes `other_cmd`'s callback within that new context.
7. **Cleanup:** Once a command function finishes (or raises an exception that Click handles), its corresponding context is "torn down". This is when any functions registered with `ctx.call_on_close` are executed.
Here's a simplified diagram showing context creation and `ctx.obj` flow for `python context_basics.py info --verbose`:
```mermaid
sequenceDiagram
participant User
participant CLI as python context_basics.py
participant ClickRuntime
participant cli_ctx as cli Context
participant info_ctx as info Context
participant cli_func as cli(ctx)
participant info_func as info(ctx, verbose)
User->>CLI: info --verbose
CLI->>ClickRuntime: Calls cli() entry point
ClickRuntime->>cli_ctx: Creates root context for 'cli' group
Note over ClickRuntime, cli_func: ClickRuntime calls cli's callback (due to @click.group)
ClickRuntime->>cli_func: cli(ctx=cli_ctx)
cli_func->>cli_ctx: Sets ctx.obj = {'verbose': False}
cli_func-->>ClickRuntime: Returns
ClickRuntime->>ClickRuntime: Parses args, finds 'info' subcommand, '--verbose' option
ClickRuntime->>info_ctx: Creates child context for 'info' command
info_ctx->>cli_ctx: Sets info_ctx.parent = cli_ctx
info_ctx->>info_ctx: Inherits ctx.obj from parent (value = {'verbose': False})
Note over ClickRuntime, info_func: ClickRuntime prepares to call info's callback
ClickRuntime->>ClickRuntime: Uses @pass_context to get info_ctx
ClickRuntime->>info_func: info(ctx=info_ctx, verbose=True)
info_func->>info_ctx: Accesses ctx.command.name
info_func->>info_ctx: Accesses ctx.params['verbose'] (or local 'verbose')
info_func->>info_ctx: Modifies ctx.obj['verbose'] = True
info_func->>info_ctx: Accesses ctx.obj['verbose'] (now True)
info_func-->>ClickRuntime: Returns
ClickRuntime->>info_ctx: Tears down info_ctx (runs call_on_close)
ClickRuntime->>cli_ctx: Tears down cli_ctx (runs call_on_close)
ClickRuntime-->>CLI: Exits
```
The core `Context` class is defined in `click/core.py`. The decorators `pass_context` and `pass_obj` are in `click/decorators.py`, and the mechanism for tracking the current context is in `click/globals.py`.
## Conclusion
The `Context` (`ctx`) is a cornerstone concept in Click, acting as the runtime carrier of information for a command invocation.
You've learned:
* The Context holds data like the current command, parameters, parent context, and shared application objects (`ctx.obj`).
* The `@click.pass_context` decorator injects the current Context into your command function.
* `ctx.obj` is essential for sharing state between nested commands.
* `ctx.invoke()` and `ctx.forward()` allow commands to call each other programmatically.
* Click uses the context for resource management (`ctx.call_on_close`), ensuring cleanup.
Understanding the Context is key to building more complex Click applications where commands need to interact with each other or with shared application state. It provides the structure and communication channels necessary for sophisticated CLI tools.
So far, we've focused on the logic and structure of commands. But how can we make the interaction in the terminal itself more engaging? How do we prompt users for input, show progress bars, or display colored output? Let's explore Click's terminal UI capabilities next!
Next up: [Chapter 6: Term UI (Terminal User Interface)](06_term_ui__terminal_user_interface_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,290 @@
# Chapter 6: Term UI (Terminal User Interface)
Welcome back! In [Chapter 5: Context](05_context.md), we learned how Click uses the `Context` object (`ctx`) to manage the state of a command while it's running, allowing us to share information and call other commands.
So far, our commands have mostly just printed simple text. But what if we want to make our command-line tools more interactive and user-friendly? How can we:
* Ask the user for input (like their name or a filename)?
* Ask simple yes/no questions?
* Show a progress bar for long-running tasks?
* Make our output more visually appealing with colors or styles (like making errors red)?
This is where Click's **Terminal User Interface (Term UI)** functions come in handy. They are Click's toolkit for talking *back and forth* with the user through the terminal.
## Making Our Tools Talk: The Need for Term UI
Imagine you're building a tool that processes a large data file. A purely silent tool isn't very helpful. A better tool might:
1. Ask the user which file to process.
2. Ask for confirmation before starting a potentially long operation.
3. Show a progress bar while processing the data.
4. Print a nice, colored "Success!" message at the end, or a red "Error!" message if something went wrong.
Doing all this reliably across different operating systems (like Linux, macOS, and Windows) can be tricky. For example, getting colored text to work correctly on Windows requires special handling.
Click's Term UI functions wrap up these common interactive tasks into easy-to-use functions that work consistently everywhere. Let's explore some of the most useful ones!
## Printing with `click.echo()`
We've seen `print()` in Python, but Click provides its own version: `click.echo()`. Why use it?
* **Smarter:** It works better with different kinds of data (like Unicode text and raw bytes).
* **Cross-Platform:** It handles subtle differences between operating systems for you.
* **Color Aware:** It automatically strips out color codes if the output isn't going to a terminal (like if you redirect output to a file), preventing garbled text.
* **Integrated:** It works seamlessly with Click's other features, like redirecting output or testing.
Using it is just like `print()`:
```python
# echo_example.py
import click
@click.command()
def cli():
"""Demonstrates click.echo"""
click.echo("Hello from Click!")
# You can print errors to stderr easily
click.echo("Oops, something went wrong!", err=True)
if __name__ == '__main__':
cli()
```
Running this:
```bash
$ python echo_example.py
Hello from Click!
Oops, something went wrong! # (This line goes to stderr)
```
Simple! For most printing in Click apps, `click.echo()` is preferred over `print()`.
## Adding Style: `click.style()` and `click.secho()`
Want to make your output stand out? Click makes it easy to add colors and styles (like bold or underline) to your text.
* `click.style(text, fg='color', bg='color', bold=True, ...)`: Takes your text and wraps it with special codes that terminals understand to change its appearance. It returns the modified string.
* `click.secho(text, fg='color', ...)`: A shortcut that combines `style` and `echo`. It styles the text *and* prints it in one go.
Let's make our success and error messages more obvious:
```python
# style_example.py
import click
@click.command()
def cli():
"""Demonstrates styled output"""
# Style the text first, then echo it
success_message = click.style("Operation successful!", fg='green', bold=True)
click.echo(success_message)
# Or use secho for style + echo in one step
click.secho("Critical error!", fg='red', underline=True, err=True)
if __name__ == '__main__':
cli()
```
Running this (your terminal must support color):
```bash
$ python style_example.py
# Output will look something like:
# Operation successful! (in bold green)
# Critical error! (in underlined red, sent to stderr)
```
Click supports various colors (`'red'`, `'green'`, `'blue'`, etc.) and styles (`bold`, `underline`, `blink`, `reverse`). This makes your CLI output much more informative at a glance!
## Getting User Input: `click.prompt()`
Sometimes you need to ask the user for information. `click.prompt()` is designed for this. It shows a message and waits for the user to type something and press Enter.
```python
# prompt_example.py
import click
@click.command()
def cli():
"""Asks for user input"""
name = click.prompt("Please enter your name")
click.echo(f"Hello, {name}!")
# You can specify a default value
location = click.prompt("Enter location", default="Earth")
click.echo(f"Location: {location}")
# You can also require a specific type (like an integer)
age = click.prompt("Enter your age", type=int)
click.echo(f"You are {age} years old.")
if __name__ == '__main__':
cli()
```
Running this interactively:
```bash
$ python prompt_example.py
Please enter your name: Alice
Hello, Alice!
Enter location [Earth]: # Just press Enter here
Location: Earth
Enter your age: 30
You are 30 years old.
```
If you enter something that can't be converted to the `type` (like "abc" for age), `click.prompt` will automatically show an error and ask again! It can also hide input for passwords (`hide_input=True`).
## Asking Yes/No: `click.confirm()`
A common need is asking for confirmation before doing something potentially destructive or time-consuming. `click.confirm()` handles this nicely.
```python
# confirm_example.py
import click
import time
@click.command()
@click.option('--yes', is_flag=True, help='Assume Yes to confirmation.')
def cli(yes):
"""Asks for confirmation."""
click.echo("This might take a while or change things.")
# If --yes flag is given, `yes` is True, otherwise ask.
# abort=True means if user says No, stop the program.
if not yes:
click.confirm("Do you want to continue?", abort=True)
click.echo("Starting operation...")
time.sleep(2) # Simulate work
click.echo("Done!")
if __name__ == '__main__':
cli()
```
Running interactively:
```bash
$ python confirm_example.py
This might take a while or change things.
Do you want to continue? [y/N]: y # User types 'y'
Starting operation...
Done!
```
If the user types 'n' (or just presses Enter, since the default is No - indicated by `[y/N]`), the program will stop immediately because of `abort=True`. If you run `python confirm_example.py --yes`, it skips the question entirely.
## Showing Progress: `click.progressbar()`
For tasks that take a while, it's good practice to show the user that something is happening. `click.progressbar()` creates a visual progress bar. You typically use it with a Python `with` statement around a loop.
Let's simulate processing a list of items:
```python
# progress_example.py
import click
import time
items_to_process = range(100) # Simulate 100 items
@click.command()
def cli():
"""Shows a progress bar."""
# 'items_to_process' is the iterable
# 'label' is the text shown before the bar
with click.progressbar(items_to_process, label="Processing items") as bar:
for item in bar:
# Simulate work for each item
time.sleep(0.05)
# The 'bar' automatically updates with each iteration
click.echo("Finished processing!")
if __name__ == '__main__':
cli()
```
When you run this, you'll see a progress bar update in your terminal:
```bash
$ python progress_example.py
Processing items [####################################] 100% 00:00:05
Finished processing!
# (The bar animates in place while running)
```
The progress bar automatically figures out the percentage and estimated time remaining (ETA). It makes long tasks much less mysterious for the user. You can also use it without an iterable by manually calling the `bar.update(increment)` method inside the `with` block.
## How Term UI Works Under the Hood
These functions seem simple, but they handle quite a bit behind the scenes:
1. **Abstraction:** They provide a high-level API for common terminal tasks, hiding the low-level details.
2. **Input Handling:** Functions like `prompt` and `confirm` use Python's built-in `input()` or `getpass.getpass()` (for hidden input). They add loops for retries, default value handling, and type conversion/validation (using [ParamType](04_paramtype.md) concepts internally).
3. **Output Handling (`echo`, `secho`):**
* They check if the output stream (`stdout` or `stderr`) is connected to a terminal (`isatty`).
* If not a terminal, or if color is disabled, `style` codes are automatically removed (`strip_ansi`).
* On Windows, if `colorama` is installed, Click wraps the output streams to translate ANSI color codes into Windows API calls, making colors work automatically.
4. **Progress Bar (`progressbar`):**
* It calculates the percentage complete based on the iterable's length (or the provided `length`).
* It estimates the remaining time (ETA) by timing recent iterations.
* It formats the bar (`#` and `-` characters) and info text.
* Crucially, it uses special terminal control characters (like `\r` - carriage return) to move the cursor back to the beginning of the line before printing the updated bar. This makes the bar *appear* to update in place rather than printing many lines. It also hides/shows the cursor during updates (`\033[?25l`, `\033[?25h`) on non-Windows systems for a smoother look.
5. **Cross-Platform Compatibility:** A major goal is to make these interactions work consistently across different operating systems and terminal types, handling quirks like Windows console limitations (`_winconsole.py`, `_compat.py`).
Let's visualize what might happen when you call `click.secho("Error!", fg='red', err=True)`:
```mermaid
sequenceDiagram
participant UserCode as Your Code
participant ClickSecho as click.secho()
participant ClickStyle as click.style()
participant ClickEcho as click.echo()
participant CompatLayer as Click Compatibility Layer
participant Terminal
UserCode->>ClickSecho: secho("Error!", fg='red', err=True)
ClickSecho->>ClickStyle: style("Error!", fg='red', ...)
ClickStyle-->>ClickSecho: Returns "\033[31mError!\033[0m" (styled text)
ClickSecho->>ClickEcho: echo("\033[31mError!\033[0m", err=True)
ClickEcho->>CompatLayer: Check if output (stderr) is a TTY
CompatLayer-->>ClickEcho: Yes, it's a TTY
ClickEcho->>CompatLayer: Check if color is enabled
CompatLayer-->>ClickEcho: Yes, color is enabled
Note over ClickEcho, Terminal: On Windows, may wrap stream with Colorama here
ClickEcho->>CompatLayer: Write styled text to stderr
CompatLayer->>Terminal: Writes "\033[31mError!\033[0m\n"
Terminal-->>Terminal: Displays "Error!" in red
```
The key is that Click adds layers of checks and formatting (`style`, color stripping, platform adaptation) around the basic act of printing (`echo`) or getting input (`prompt`).
You can find the implementation details in:
* `click/termui.py`: Defines the main functions like `prompt`, `confirm`, `style`, `secho`, `progressbar`, `echo_via_pager`.
* `click/_termui_impl.py`: Contains the implementations for more complex features like `ProgressBar`, `Editor`, `pager`, and `getchar`.
* `click/utils.py`: Contains `echo` and helpers like `open_stream`.
* `click/_compat.py` & `click/_winconsole.py`: Handle differences between Python versions and operating systems, especially for terminal I/O and color support on Windows.
## Conclusion
Click's **Term UI** functions are essential for creating command-line applications that are interactive, informative, and pleasant to use. You've learned how to:
* Print output reliably with `click.echo`.
* Add visual flair with colors and styles using `click.style` and `click.secho`.
* Ask the user for input with `click.prompt`.
* Get yes/no confirmation using `click.confirm`.
* Show progress for long tasks with `click.progressbar`.
These tools handle many cross-platform complexities, letting you focus on building the core logic of your interactive CLI.
But what happens when things go wrong? How does Click handle errors, like invalid user input or missing files? That's where Click's exception handling comes in. Let's dive into that next!
Next up: [Chapter 7: Click Exceptions](07_click_exceptions.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,251 @@
# Chapter 7: Click Exceptions - Handling Errors Gracefully
In the last chapter, [Chapter 6: Term UI (Terminal User Interface)](06_term_ui__terminal_user_interface_.md), we explored how to make our command-line tools interactive and visually appealing using functions like `click.prompt`, `click.confirm`, and `click.secho`. We learned how to communicate effectively *with* the user.
But what happens when the user doesn't communicate effectively with *us*? What if they type the wrong command, forget a required argument, or enter text when a number was expected? Our programs need a way to handle these errors without just crashing.
This is where **Click Exceptions** come in. They are Click's way of signaling that something went wrong, usually because of a problem with the user's input or how they tried to run the command.
## Why Special Exceptions? The Problem with Crashes
Imagine you have a command that needs a number, like `--count 5`. You used `type=click.INT` like we learned in [Chapter 4: ParamType](04_paramtype.md). What happens if the user types `--count five`?
If Click didn't handle this specially, the `int("five")` conversion inside Click would fail, raising a standard Python `ValueError`. This might cause your program to stop with a long, confusing Python traceback message that isn't very helpful for the end-user. They might not understand what went wrong or how to fix it.
Click wants to provide a better experience. When something like this happens, Click catches the internal error and raises one of its own **custom exception types**. These special exceptions tell Click exactly what kind of problem occurred (e.g., bad input, missing argument).
## Meet the Click Exceptions
Click has a family of exception classes designed specifically for handling command-line errors. The most important ones inherit from the base class `click.ClickException`. Here are some common ones you'll encounter (or use):
* `ClickException`: The base for all Click-handled errors.
* `UsageError`: A general error indicating the command was used incorrectly (e.g., wrong number of arguments). It usually prints the command's usage instructions.
* `BadParameter`: Raised when the value provided for an option or argument is invalid (e.g., "five" for an integer type, or a value not in a `click.Choice`).
* `MissingParameter`: Raised when a required option or argument is not provided.
* `NoSuchOption`: Raised when the user tries to use an option that doesn't exist (e.g., `--verrbose` instead of `--verbose`).
* `FileError`: Raised by `click.File` or `click.Path` if a file can't be opened or accessed correctly.
* `Abort`: A special exception you can raise to stop execution immediately (like after a failed `click.confirm`).
**The Magic:** The really neat part is that Click's main command processing logic is designed to *catch* these specific exceptions. When it catches one, it doesn't just crash. Instead, it:
1. **Formats a helpful error message:** Often using information from the exception itself (like which parameter was bad).
2. **Prints the message** (usually prefixed with "Error:") to the standard error stream (`stderr`).
3. **Often shows relevant help text** (like the command's usage synopsis).
4. **Exits the application cleanly** with a non-zero exit code (signaling to the system that an error occurred).
This gives the user clear feedback about what they did wrong and how to potentially fix it, without seeing scary Python tracebacks.
## Seeing Exceptions in Action (Automatically)
You've already seen Click exceptions working! Remember our `count_app.py` from [Chapter 4: ParamType](04_paramtype.md)?
```python
# count_app.py (from Chapter 4)
import click
@click.command()
@click.option('--count', default=1, type=click.INT, help='Number of times to print.')
@click.argument('message')
def repeat(count, message):
"""Prints MESSAGE the specified number of times."""
for _ in range(count):
click.echo(message)
if __name__ == '__main__':
repeat()
```
If you run this with invalid input for `--count`:
```bash
$ python count_app.py --count five "Oh no"
Usage: count_app.py [OPTIONS] MESSAGE
Try 'count_app.py --help' for help.
Error: Invalid value for '--count': 'five' is not a valid integer.
```
That clear "Error: Invalid value for '--count': 'five' is not a valid integer." message? That's Click catching a `BadParameter` exception (raised internally by `click.INT.convert`) and showing it nicely!
What if you forget the required `MESSAGE` argument?
```bash
$ python count_app.py --count 3
Usage: count_app.py [OPTIONS] MESSAGE
Try 'count_app.py --help' for help.
Error: Missing argument 'MESSAGE'.
```
Again, a clear error message! This time, Click caught a `MissingParameter` exception.
## Raising Exceptions Yourself: Custom Validation
Click raises exceptions automatically for many common errors. But sometimes, you have validation logic that's specific to your application. For example, maybe an `--age` option must be positive.
The standard way to report these custom validation errors is to **raise a `click.BadParameter` exception** yourself, usually from within a callback function.
Let's add a callback to our `count_app.py` to ensure `count` is positive.
```python
# count_app_validate.py
import click
# 1. Define a validation callback function
def validate_count(ctx, param, value):
"""Callback to ensure count is positive."""
if value <= 0:
# 2. Raise BadParameter if validation fails
raise click.BadParameter("Count must be a positive number.")
# 3. Return the value if it's valid
return value
@click.command()
# 4. Attach the callback to the --count option
@click.option('--count', default=1, type=click.INT, help='Number of times to print.',
callback=validate_count) # <-- Added callback
@click.argument('message')
def repeat(count, message):
"""Prints MESSAGE the specified number of times (must be positive)."""
for _ in range(count):
click.echo(message)
if __name__ == '__main__':
repeat()
```
Let's break down the changes:
1. `def validate_count(ctx, param, value):`: We defined a function that takes the [Context](05_context.md), the [Parameter](03_parameter__option___argument_.md) object, and the *already type-converted* value.
2. `raise click.BadParameter(...)`: If the `value` (which we know is an `int` thanks to `type=click.INT`) is not positive, we raise `click.BadParameter` with our custom error message.
3. `return value`: If the value is valid, the callback **must** return it.
4. `callback=validate_count`: We told the `--count` option to use our `validate_count` function after type conversion.
**Run it with invalid input:**
```bash
$ python count_app_validate.py --count 0 "Zero?"
Usage: count_app_validate.py [OPTIONS] MESSAGE
Try 'count_app_validate.py --help' for help.
Error: Invalid value for '--count': Count must be a positive number.
$ python count_app_validate.py --count -5 "Negative?"
Usage: count_app_validate.py [OPTIONS] MESSAGE
Try 'count_app_validate.py --help' for help.
Error: Invalid value for '--count': Count must be a positive number.
```
It works! Our custom validation logic triggered, we raised `click.BadParameter`, and Click caught it, displaying our specific error message cleanly. This is the standard way to integrate your own validation rules into Click's error handling.
## How Click Handles Exceptions (Under the Hood)
What exactly happens when a Click exception is raised, either by Click itself or by your code?
1. **Raise:** An operation fails (like type conversion, parsing finding a missing argument, or your custom callback). A specific `ClickException` subclass (e.g., `BadParameter`, `MissingParameter`) is instantiated and raised.
2. **Catch:** Click's main application runner (usually triggered when you call your top-level `cli()` function) has a `try...except ClickException` block around the command execution logic.
3. **Show:** When a `ClickException` is caught, the runner calls the exception object's `show()` method.
4. **Format & Print:** The `show()` method (defined in `exceptions.py` for each exception type) formats the error message.
* `UsageError` (and its subclasses like `BadParameter`, `MissingParameter`, `NoSuchOption`) typically includes the command's usage string (`ctx.get_usage()`) and a hint to try the `--help` option.
* `BadParameter` adds context like "Invalid value for 'PARAMETER_NAME':".
* `MissingParameter` formats "Missing argument/option 'PARAMETER_NAME'.".
* The formatted message is printed to `stderr` using `click.echo()`, respecting color settings from the context.
5. **Exit:** After showing the message, Click calls `sys.exit()` with the exception's `exit_code` (usually `1` for general errors, `2` for usage errors). This terminates the program and signals the error status to the calling shell or script.
Heres a simplified sequence diagram for the `BadParameter` case when a user provides invalid input that fails type conversion:
```mermaid
sequenceDiagram
participant User
participant CLI as YourApp.py
participant ClickRuntime
participant ParamType as ParamType (e.g., click.INT)
participant ClickExceptionHandling
User->>CLI: python YourApp.py --count five
CLI->>ClickRuntime: Starts command execution
ClickRuntime->>ParamType: Calls convert(value='five', ...) for '--count'
ParamType->>ParamType: Tries int('five'), raises ValueError
ParamType->>ClickExceptionHandling: Catches ValueError, calls self.fail(...)
ClickExceptionHandling->>ClickExceptionHandling: Raises BadParameter("...'five' is not...")
ClickExceptionHandling-->>ClickRuntime: BadParameter propagates up
ClickRuntime->>ClickExceptionHandling: Catches BadParameter exception
ClickExceptionHandling->>ClickExceptionHandling: Calls exception.show()
ClickExceptionHandling->>CLI: Prints formatted "Error: Invalid value..." to stderr
ClickExceptionHandling->>CLI: Calls sys.exit(exception.exit_code)
CLI-->>User: Shows error message and exits
```
The core exception classes are defined in `click/exceptions.py`. You can see how `ClickException` defines the basic `show` method and `exit_code`, and how subclasses like `UsageError` and `BadParameter` override `format_message` to provide more specific output based on the context (`ctx`) and parameter (`param`) they might hold.
```python
# Simplified structure from click/exceptions.py
class ClickException(Exception):
exit_code = 1
def __init__(self, message: str) -> None:
# ... (stores message, gets color settings) ...
self.message = message
def format_message(self) -> str:
return self.message
def show(self, file=None) -> None:
# ... (gets stderr if file is None) ...
echo(f"Error: {self.format_message()}", file=file, color=self.show_color)
class UsageError(ClickException):
exit_code = 2
def __init__(self, message: str, ctx=None) -> None:
super().__init__(message)
self.ctx = ctx
# ...
def show(self, file=None) -> None:
# ... (gets stderr, color) ...
hint = ""
if self.ctx is not None and self.ctx.command.get_help_option(self.ctx):
hint = f"Try '{self.ctx.command_path} {self.ctx.help_option_names[0]}' for help.\n"
if self.ctx is not None:
echo(f"{self.ctx.get_usage()}\n{hint}", file=file, color=color)
# Call the base class's logic to print "Error: ..."
echo(f"Error: {self.format_message()}", file=file, color=color)
class BadParameter(UsageError):
def __init__(self, message: str, ctx=None, param=None, param_hint=None) -> None:
super().__init__(message, ctx)
self.param = param
self.param_hint = param_hint
def format_message(self) -> str:
# ... (logic to get parameter name/hint) ...
param_hint = self.param.get_error_hint(self.ctx) if self.param else self.param_hint
# ...
return f"Invalid value for {param_hint}: {self.message}"
# Other exceptions like MissingParameter, NoSuchOption follow similar patterns
```
By using this structured exception system, Click ensures that user errors are reported consistently and helpfully across any Click application.
## Conclusion
Click Exceptions are the standard mechanism for reporting errors related to command usage and user input within Click applications.
You've learned:
* Click uses custom exceptions like `UsageError`, `BadParameter`, and `MissingParameter` to signal specific problems.
* Click catches these exceptions automatically to display user-friendly error messages, usage hints, and exit cleanly.
* You can (and should) raise exceptions like `click.BadParameter` in your own validation callbacks to report custom errors in a standard way.
* This system prevents confusing Python tracebacks and provides helpful feedback to the user.
Understanding and using Click's exception hierarchy is key to building robust and user-friendly command-line interfaces that handle problems gracefully.
This concludes our journey through the core concepts of Click! We've covered everything from basic [Commands and Groups](01_command___group.md), [Decorators](02_decorators.md), [Parameters](03_parameter__option___argument_.md), and [Types](04_paramtype.md), to managing runtime state with the [Context](05_context.md), creating interactive [Terminal UIs](06_term_ui__terminal_user_interface_.md), and handling errors with [Click Exceptions](07_click_exceptions.md). Armed with this knowledge, you're well-equipped to start building your own powerful and elegant command-line tools with Click!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

44
output/Click/index.md Normal file
View File

@@ -0,0 +1,44 @@
# Tutorial: Click
Click is a Python library that makes creating **command-line interfaces (CLIs)** *easy and fun*.
It uses simple Python **decorators** (`@click.command`, `@click.option`, etc.) to turn your functions into CLI commands with options and arguments.
Click handles parsing user input, generating help messages, validating data types, and managing the flow between commands, letting you focus on your application's logic.
It also provides tools for *terminal interactions* like prompting users and showing progress bars.
**Source Repository:** [https://github.com/pallets/click/tree/main/src/click](https://github.com/pallets/click/tree/main/src/click)
```mermaid
flowchart TD
A0["Context"]
A1["Command / Group"]
A2["Parameter (Option / Argument)"]
A3["ParamType"]
A4["Decorators"]
A5["Term UI (Terminal User Interface)"]
A6["Click Exceptions"]
A4 -- "Creates/Configures" --> A1
A4 -- "Creates/Configures" --> A2
A0 -- "Manages execution of" --> A1
A0 -- "Holds parsed values for" --> A2
A2 -- "Uses for validation/conversion" --> A3
A3 -- "Raises on conversion error" --> A6
A1 -- "Uses for user interaction" --> A5
A0 -- "Handles/Raises" --> A6
A4 -- "Injects via @pass_context" --> A0
```
## Chapters
1. [Command / Group](01_command___group.md)
2. [Decorators](02_decorators.md)
3. [Parameter (Option / Argument)](03_parameter__option___argument_.md)
4. [ParamType](04_paramtype.md)
5. [Context](05_context.md)
6. [Term UI (Terminal User Interface)](06_term_ui__terminal_user_interface_.md)
7. [Click Exceptions](07_click_exceptions.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,242 @@
# Chapter 1: How We Fetch Webpages - AsyncCrawlerStrategy
Welcome to the Crawl4AI tutorial series! Our goal is to build intelligent agents that can understand and extract information from the web. The very first step in this process is actually *getting* the content from a webpage. This chapter explains how Crawl4AI handles that fundamental task.
Imagine you need to pick up a package from a specific address. How do you get there and retrieve it?
* You could send a **simple, fast drone** that just grabs the package off the porch (if it's easily accessible). This is quick but might fail if the package is inside or requires a signature.
* Or, you could send a **full delivery truck with a driver**. The driver can ring the bell, wait, sign for the package, and even handle complex instructions. This is more versatile but takes more time and resources.
In Crawl4AI, the `AsyncCrawlerStrategy` is like choosing your delivery vehicle. It defines *how* the crawler fetches the raw content (like the HTML, CSS, and maybe JavaScript results) of a webpage.
## What Exactly is AsyncCrawlerStrategy?
`AsyncCrawlerStrategy` is a core concept in Crawl4AI that represents the **method** or **technique** used to download the content of a given URL. Think of it as a blueprint: it specifies *that* we need a way to fetch content, but the specific *details* of how it's done can vary.
This "blueprint" approach is powerful because it allows us to swap out the fetching mechanism depending on our needs, without changing the rest of our crawling logic.
## The Default: AsyncPlaywrightCrawlerStrategy (The Delivery Truck)
By default, Crawl4AI uses `AsyncPlaywrightCrawlerStrategy`. This strategy uses a real, automated web browser engine (like Chrome, Firefox, or WebKit) behind the scenes.
**Why use a full browser?**
* **Handles JavaScript:** Modern websites rely heavily on JavaScript to load content, change the layout, or fetch data after the initial page load. `AsyncPlaywrightCrawlerStrategy` runs this JavaScript, just like your normal browser does.
* **Simulates User Interaction:** It can wait for elements to appear, handle dynamic content, and see the page *after* scripts have run.
* **Gets the "Final" View:** It fetches the content as a user would see it in their browser.
This is our "delivery truck" powerful and capable of handling complex websites. However, like a real truck, it's slower and uses more memory and CPU compared to simpler methods.
You generally don't need to *do* anything to use it, as it's the default! When you start Crawl4AI, it picks this strategy automatically.
## Another Option: AsyncHTTPCrawlerStrategy (The Delivery Drone)
Crawl4AI also offers `AsyncHTTPCrawlerStrategy`. This strategy is much simpler. It directly requests the URL and downloads the *initial* HTML source code that the web server sends back.
**Why use this simpler strategy?**
* **Speed:** It's significantly faster because it doesn't need to start a browser, render the page, or execute JavaScript.
* **Efficiency:** It uses much less memory and CPU.
This is our "delivery drone" super fast and efficient for simple tasks.
**What's the catch?**
* **No JavaScript:** It won't run any JavaScript on the page. If content is loaded dynamically by scripts, this strategy will likely miss it.
* **Basic HTML Only:** You get the raw HTML source, not necessarily what a user *sees* after the browser processes everything.
This strategy is great for websites with simple, static HTML content or when you only need the basic structure and metadata very quickly.
## Why Have Different Strategies? (The Power of Abstraction)
Having `AsyncCrawlerStrategy` as a distinct concept offers several advantages:
1. **Flexibility:** You can choose the best tool for the job. Need to crawl complex, dynamic sites? Use the default `AsyncPlaywrightCrawlerStrategy`. Need to quickly fetch basic HTML from thousands of simple pages? Switch to `AsyncHTTPCrawlerStrategy`.
2. **Maintainability:** The logic for *fetching* content is kept separate from the logic for *processing* it.
3. **Extensibility:** Advanced users could even create their *own* custom strategies for specialized fetching needs (though that's beyond this beginner tutorial).
## How It Works Conceptually
When you ask Crawl4AI to crawl a URL, the main `AsyncWebCrawler` doesn't fetch the content itself. Instead, it delegates the task to the currently selected `AsyncCrawlerStrategy`.
Here's a simplified flow:
```mermaid
sequenceDiagram
participant C as AsyncWebCrawler
participant S as AsyncCrawlerStrategy
participant W as Website
C->>S: Please crawl("https://example.com")
Note over S: I'm using my method (e.g., Browser or HTTP)
S->>W: Request Page Content
W-->>S: Return Raw Content (HTML, etc.)
S-->>C: Here's the result (AsyncCrawlResponse)
```
The `AsyncWebCrawler` only needs to know how to talk to *any* strategy through a common interface (the `crawl` method). The strategy handles the specific details of the fetching process.
## Using the Default Strategy (You're Already Doing It!)
Let's see how you use the default `AsyncPlaywrightCrawlerStrategy` without even needing to specify it.
```python
# main_example.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
async def main():
# When you create AsyncWebCrawler without specifying a strategy,
# it automatically uses AsyncPlaywrightCrawlerStrategy!
async with AsyncWebCrawler() as crawler:
print("Crawler is ready using the default strategy (Playwright).")
# Let's crawl a simple page that just returns HTML
# We use CacheMode.BYPASS to ensure we fetch it fresh each time for this demo.
config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
result = await crawler.arun(
url="https://httpbin.org/html",
config=config
)
if result.success:
print("\nSuccessfully fetched content!")
# The strategy fetched the raw HTML.
# AsyncWebCrawler then processes it (more on that later).
print(f"First 100 chars of fetched HTML: {result.html[:100]}...")
else:
print(f"\nFailed to fetch content: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. We import `AsyncWebCrawler` and supporting classes.
2. We create an instance of `AsyncWebCrawler()` inside an `async with` block (this handles setup and cleanup). Since we didn't tell it *which* strategy to use, it defaults to `AsyncPlaywrightCrawlerStrategy`.
3. We call `crawler.arun()` to crawl the URL. Under the hood, the `AsyncPlaywrightCrawlerStrategy` starts a browser, navigates to the page, gets the content, and returns it.
4. We print the first part of the fetched HTML from the `result`.
## Explicitly Choosing the HTTP Strategy
What if you know the page is simple and want the speed of the "delivery drone"? You can explicitly tell `AsyncWebCrawler` to use `AsyncHTTPCrawlerStrategy`.
```python
# http_strategy_example.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
# Import the specific strategies we want to use
from crawl4ai.async_crawler_strategy import AsyncHTTPCrawlerStrategy
async def main():
# 1. Create an instance of the strategy you want
http_strategy = AsyncHTTPCrawlerStrategy()
# 2. Pass the strategy instance when creating the AsyncWebCrawler
async with AsyncWebCrawler(crawler_strategy=http_strategy) as crawler:
print("Crawler is ready using the explicit HTTP strategy.")
# Crawl the same simple page
config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
result = await crawler.arun(
url="https://httpbin.org/html",
config=config
)
if result.success:
print("\nSuccessfully fetched content using HTTP strategy!")
print(f"First 100 chars of fetched HTML: {result.html[:100]}...")
else:
print(f"\nFailed to fetch content: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. We now also import `AsyncHTTPCrawlerStrategy`.
2. We create an instance: `http_strategy = AsyncHTTPCrawlerStrategy()`.
3. We pass this instance to the `AsyncWebCrawler` constructor: `AsyncWebCrawler(crawler_strategy=http_strategy)`.
4. The rest of the code is the same, but now `crawler.arun()` will use the faster, simpler HTTP GET request method defined by `AsyncHTTPCrawlerStrategy`.
For a simple page like `httpbin.org/html`, both strategies will likely return the same HTML content, but the HTTP strategy would generally be faster and use fewer resources. On a complex JavaScript-heavy site, the HTTP strategy might fail to get the full content, while the Playwright strategy would handle it correctly.
## A Glimpse Under the Hood
You don't *need* to know the deep internals to use the strategies, but it helps to understand the structure. Inside the `crawl4ai` library, you'd find a file like `async_crawler_strategy.py`.
It defines the "blueprint" (an Abstract Base Class):
```python
# Simplified from async_crawler_strategy.py
from abc import ABC, abstractmethod
from .models import AsyncCrawlResponse # Defines the structure of the result
class AsyncCrawlerStrategy(ABC):
"""
Abstract base class for crawler strategies.
"""
@abstractmethod
async def crawl(self, url: str, **kwargs) -> AsyncCrawlResponse:
"""Fetch content from the URL."""
pass # Each specific strategy must implement this
```
And then the specific implementations:
```python
# Simplified from async_crawler_strategy.py
from playwright.async_api import Page # Playwright library for browser automation
# ... other imports
class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
# ... (Initialization code to manage browsers)
async def crawl(self, url: str, config: CrawlerRunConfig, **kwargs) -> AsyncCrawlResponse:
# Uses Playwright to:
# 1. Get a browser page
# 2. Navigate to the url (page.goto(url))
# 3. Wait for content, run JS, etc.
# 4. Get the final HTML (page.content())
# 5. Optionally take screenshots, etc.
# 6. Return an AsyncCrawlResponse
# ... implementation details ...
pass
```
```python
# Simplified from async_crawler_strategy.py
import aiohttp # Library for making HTTP requests asynchronously
# ... other imports
class AsyncHTTPCrawlerStrategy(AsyncCrawlerStrategy):
# ... (Initialization code to manage HTTP sessions)
async def crawl(self, url: str, config: CrawlerRunConfig, **kwargs) -> AsyncCrawlResponse:
# Uses aiohttp to:
# 1. Make an HTTP GET (or other method) request to the url
# 2. Read the response body (HTML)
# 3. Get response headers and status code
# 4. Return an AsyncCrawlResponse
# ... implementation details ...
pass
```
The key takeaway is that both strategies implement the same `crawl` method, allowing `AsyncWebCrawler` to use them interchangeably.
## Conclusion
You've learned about `AsyncCrawlerStrategy`, the core concept defining *how* Crawl4AI fetches webpage content.
* It's like choosing a vehicle: a powerful browser (`AsyncPlaywrightCrawlerStrategy`, the default) or a fast, simple HTTP request (`AsyncHTTPCrawlerStrategy`).
* This abstraction gives you flexibility to choose the right fetching method for your task.
* You usually don't need to worry about it, as the default handles most modern websites well.
Now that we understand how the raw content is fetched, the next step is to look at the main class that orchestrates the entire crawling process.
**Next:** Let's dive into the [AsyncWebCrawler](02_asyncwebcrawler.md) itself!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,339 @@
# Chapter 2: Meet the General Manager - AsyncWebCrawler
In [Chapter 1: How We Fetch Webpages - AsyncCrawlerStrategy](01_asynccrawlerstrategy.md), we learned about the different ways Crawl4AI can fetch the raw content of a webpage, like choosing between a fast drone (`AsyncHTTPCrawlerStrategy`) or a versatile delivery truck (`AsyncPlaywrightCrawlerStrategy`).
But who decides *which* delivery vehicle to use? Who tells it *which* address (URL) to go to? And who takes the delivered package (the raw HTML) and turns it into something useful?
That's where the `AsyncWebCrawler` comes in. Think of it as the **General Manager** of the entire crawling operation.
## What Problem Does `AsyncWebCrawler` Solve?
Imagine you want to get information from a website. You need to:
1. Decide *how* to fetch the page (like choosing the drone or truck from Chapter 1).
2. Actually *fetch* the page content.
3. Maybe *clean up* the messy HTML.
4. Perhaps *extract* specific pieces of information (like product prices or article titles).
5. Maybe *save* the results so you don't have to fetch them again immediately (caching).
6. Finally, give you the *final, processed result*.
Doing all these steps manually for every URL would be tedious and complex. `AsyncWebCrawler` acts as the central coordinator, managing all these steps for you. You just tell it what URL to crawl and maybe some preferences, and it handles the rest.
## What is `AsyncWebCrawler`?
`AsyncWebCrawler` is the main class you'll interact with when using Crawl4AI. It's the primary entry point for starting any crawling task.
**Key Responsibilities:**
* **Initialization:** Sets up the necessary components, like the browser (if needed).
* **Coordination:** Takes your request (a URL and configuration) and orchestrates the different parts:
* Delegates fetching to an [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md).
* Manages caching using [CacheContext / CacheMode](09_cachecontext___cachemode.md).
* Uses a [ContentScrapingStrategy](04_contentscrapingstrategy.md) to clean and parse HTML.
* Applies a [RelevantContentFilter](05_relevantcontentfilter.md) if configured.
* Uses an [ExtractionStrategy](06_extractionstrategy.md) to pull out specific data if needed.
* **Result Packaging:** Bundles everything up into a neat [CrawlResult](07_crawlresult.md) object.
* **Resource Management:** Handles starting and stopping resources (like browsers) cleanly.
It's the "conductor" making sure all the different instruments play together harmoniously.
## Your First Crawl: Using `arun`
Let's see the `AsyncWebCrawler` in action. The most common way to use it is with an `async with` block, which automatically handles setup and cleanup. The main method to crawl a single URL is `arun`.
```python
# chapter2_example_1.py
import asyncio
from crawl4ai import AsyncWebCrawler # Import the General Manager
async def main():
# Create the General Manager instance using 'async with'
# This handles setup (like starting a browser if needed)
# and cleanup (closing the browser).
async with AsyncWebCrawler() as crawler:
print("Crawler is ready!")
# Tell the manager to crawl a specific URL
url_to_crawl = "https://httpbin.org/html" # A simple example page
print(f"Asking the crawler to fetch: {url_to_crawl}")
result = await crawler.arun(url=url_to_crawl)
# Check if the crawl was successful
if result.success:
print("\nSuccess! Crawler got the content.")
# The result object contains the processed data
# We'll learn more about CrawlResult in Chapter 7
print(f"Page Title: {result.metadata.get('title', 'N/A')}")
print(f"First 100 chars of Markdown: {result.markdown.raw_markdown[:100]}...")
else:
print(f"\nFailed to crawl: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. **`import AsyncWebCrawler`**: We import the main class.
2. **`async def main():`**: Crawl4AI uses Python's `asyncio` for efficiency, so our code needs to be in an `async` function.
3. **`async with AsyncWebCrawler() as crawler:`**: This is the standard way to create and manage the crawler. The `async with` statement ensures that resources (like the underlying browser used by the default `AsyncPlaywrightCrawlerStrategy`) are properly started and stopped, even if errors occur.
4. **`crawler.arun(url=url_to_crawl)`**: This is the core command. We tell our `crawler` instance (the General Manager) to run (`arun`) the crawling process for the specified `url`. `await` is used because fetching webpages takes time, and `asyncio` allows other tasks to run while waiting.
5. **`result`**: The `arun` method returns a `CrawlResult` object. This object contains all the information gathered during the crawl (HTML, cleaned text, metadata, etc.). We'll explore this object in detail in [Chapter 7: Understanding the Results - CrawlResult](07_crawlresult.md).
6. **`result.success`**: We check this boolean flag to see if the crawl completed without critical errors.
7. **Accessing Data:** If successful, we can access processed information like the page title (`result.metadata['title']`) or the content formatted as Markdown (`result.markdown.raw_markdown`).
## Configuring the Crawl
Sometimes, the default behavior isn't quite what you need. Maybe you want to use the faster "drone" strategy from Chapter 1, or perhaps you want to ensure you *always* fetch a fresh copy of the page, ignoring any saved cache.
You can customize the behavior of a specific `arun` call by passing a `CrawlerRunConfig` object. Think of this as giving specific instructions to the General Manager for *this particular job*.
```python
# chapter2_example_2.py
import asyncio
from crawl4ai import AsyncWebCrawler
from crawl4ai import CrawlerRunConfig # Import configuration class
from crawl4ai import CacheMode # Import cache options
async def main():
async with AsyncWebCrawler() as crawler:
print("Crawler is ready!")
url_to_crawl = "https://httpbin.org/html"
# Create a specific configuration for this run
# Tell the crawler to BYPASS the cache (fetch fresh)
run_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS
)
print("Configuration: Bypass cache for this run.")
# Pass the config object to the arun method
result = await crawler.arun(
url=url_to_crawl,
config=run_config # Pass the specific instructions
)
if result.success:
print("\nSuccess! Crawler got fresh content (cache bypassed).")
print(f"Page Title: {result.metadata.get('title', 'N/A')}")
else:
print(f"\nFailed to crawl: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. **`from crawl4ai import CrawlerRunConfig, CacheMode`**: We import the necessary classes for configuration.
2. **`run_config = CrawlerRunConfig(...)`**: We create an instance of `CrawlerRunConfig`. This object holds various settings for a specific crawl job.
3. **`cache_mode=CacheMode.BYPASS`**: We set the `cache_mode`. `CacheMode.BYPASS` tells the crawler to ignore any previously saved results for this URL and fetch it directly from the web server. We'll learn all about caching options in [Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode](09_cachecontext___cachemode.md).
4. **`crawler.arun(..., config=run_config)`**: We pass our custom `run_config` object to the `arun` method using the `config` parameter.
The `CrawlerRunConfig` is very powerful and lets you control many aspects of the crawl, including which scraping or extraction methods to use. We'll dive deep into it in the next chapter: [Chapter 3: Giving Instructions - CrawlerRunConfig](03_crawlerrunconfig.md).
## What Happens When You Call `arun`? (The Flow)
When you call `crawler.arun(url="...")`, the `AsyncWebCrawler` (our General Manager) springs into action and coordinates several steps behind the scenes:
```mermaid
sequenceDiagram
participant U as User
participant AWC as AsyncWebCrawler (Manager)
participant CC as Cache Check
participant CS as AsyncCrawlerStrategy (Fetcher)
participant SP as Scraping/Processing
participant CR as CrawlResult (Final Report)
U->>AWC: arun("https://example.com", config)
AWC->>CC: Need content for "https://example.com"? (Respect CacheMode in config)
alt Cache Hit & Cache Mode allows reading
CC-->>AWC: Yes, here's the cached result.
AWC-->>CR: Package cached result.
AWC-->>U: Here is the CrawlResult
else Cache Miss or Cache Mode prevents reading
CC-->>AWC: No cached result / Cannot read cache.
AWC->>CS: Please fetch "https://example.com" (using configured strategy)
CS-->>AWC: Here's the raw response (HTML, etc.)
AWC->>SP: Process this raw content (Scrape, Filter, Extract based on config)
SP-->>AWC: Here's the processed data (Markdown, Metadata, etc.)
AWC->>CC: Cache this result? (Respect CacheMode in config)
CC-->>AWC: OK, cached.
AWC-->>CR: Package new result.
AWC-->>U: Here is the CrawlResult
end
```
**Simplified Steps:**
1. **Receive Request:** The `AsyncWebCrawler` gets the URL and configuration from your `arun` call.
2. **Check Cache:** It checks if a valid result for this URL is already saved (cached) and if the `CacheMode` allows using it. (See [Chapter 9](09_cachecontext___cachemode.md)).
3. **Fetch (if needed):** If no valid cached result exists or caching is bypassed, it asks the configured [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md) (e.g., Playwright or HTTP) to fetch the raw page content.
4. **Process Content:** It takes the raw HTML and passes it through various processing steps based on the configuration:
* **Scraping:** Cleaning up HTML, extracting basic structure using a [ContentScrapingStrategy](04_contentscrapingstrategy.md).
* **Filtering:** Optionally filtering content for relevance using a [RelevantContentFilter](05_relevantcontentfilter.md).
* **Extraction:** Optionally extracting specific structured data using an [ExtractionStrategy](06_extractionstrategy.md).
5. **Cache Result (if needed):** If caching is enabled for writing, it saves the final processed result.
6. **Return Result:** It bundles everything into a [CrawlResult](07_crawlresult.md) object and returns it to you.
## Crawling Many Pages: `arun_many`
What if you have a whole list of URLs to crawl? Calling `arun` in a loop works, but it might not be the most efficient way. `AsyncWebCrawler` provides the `arun_many` method designed for this.
```python
# chapter2_example_3.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
async def main():
async with AsyncWebCrawler() as crawler:
urls_to_crawl = [
"https://httpbin.org/html",
"https://httpbin.org/links/10/0",
"https://httpbin.org/robots.txt"
]
print(f"Asking crawler to fetch {len(urls_to_crawl)} URLs.")
# Use arun_many for multiple URLs
# We can still pass a config that applies to all URLs in the batch
config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
results = await crawler.arun_many(urls=urls_to_crawl, config=config)
print(f"\nFinished crawling! Got {len(results)} results.")
for result in results:
status = "Success" if result.success else "Failed"
url_short = result.url.split('/')[-1] # Get last part of URL
print(f"- URL: {url_short:<10} | Status: {status:<7} | Title: {result.metadata.get('title', 'N/A')}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. **`urls_to_crawl = [...]`**: We define a list of URLs.
2. **`await crawler.arun_many(urls=urls_to_crawl, config=config)`**: We call `arun_many`, passing the list of URLs. It handles crawling them concurrently (like dispatching multiple delivery trucks or drones efficiently).
3. **`results`**: `arun_many` returns a list where each item is a `CrawlResult` object corresponding to one of the input URLs.
`arun_many` is much more efficient for batch processing as it leverages `asyncio` to handle multiple fetches and processing tasks concurrently. It uses a [BaseDispatcher](10_basedispatcher.md) internally to manage this concurrency.
## Under the Hood (A Peek at the Code)
You don't need to know the internal details to use `AsyncWebCrawler`, but seeing the structure can help. Inside the `crawl4ai` library, the file `async_webcrawler.py` defines this class.
```python
# Simplified from async_webcrawler.py
# ... imports ...
from .async_crawler_strategy import AsyncCrawlerStrategy, AsyncPlaywrightCrawlerStrategy
from .async_configs import BrowserConfig, CrawlerRunConfig
from .models import CrawlResult
from .cache_context import CacheContext, CacheMode
# ... other strategy imports ...
class AsyncWebCrawler:
def __init__(
self,
crawler_strategy: AsyncCrawlerStrategy = None, # You can provide a strategy...
config: BrowserConfig = None, # Configuration for the browser
# ... other parameters like logger, base_directory ...
):
# If no strategy is given, it defaults to Playwright (the 'truck')
self.crawler_strategy = crawler_strategy or AsyncPlaywrightCrawlerStrategy(...)
self.browser_config = config or BrowserConfig()
# ... setup logger, directories, etc. ...
self.ready = False # Flag to track if setup is complete
async def __aenter__(self):
# This is called when you use 'async with'. It starts the strategy.
await self.crawler_strategy.__aenter__()
await self.awarmup() # Perform internal setup
self.ready = True
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
# This is called when exiting 'async with'. It cleans up.
await self.crawler_strategy.__aexit__(exc_type, exc_val, exc_tb)
self.ready = False
async def arun(self, url: str, config: CrawlerRunConfig = None) -> CrawlResult:
# 1. Ensure config exists, set defaults (like CacheMode.ENABLED)
crawler_config = config or CrawlerRunConfig()
if crawler_config.cache_mode is None:
crawler_config.cache_mode = CacheMode.ENABLED
# 2. Create CacheContext to manage caching logic
cache_context = CacheContext(url, crawler_config.cache_mode)
# 3. Try reading from cache if allowed
cached_result = None
if cache_context.should_read():
cached_result = await async_db_manager.aget_cached_url(url)
# 4. If cache hit and valid, return cached result
if cached_result and self._is_cache_valid(cached_result, crawler_config):
# ... log cache hit ...
return cached_result
# 5. If no cache hit or cache invalid/bypassed: Fetch fresh content
# Delegate to the configured AsyncCrawlerStrategy
async_response = await self.crawler_strategy.crawl(url, config=crawler_config)
# 6. Process the HTML (scrape, filter, extract)
# This involves calling other strategies based on config
crawl_result = await self.aprocess_html(
url=url,
html=async_response.html,
config=crawler_config,
# ... other details from async_response ...
)
# 7. Write to cache if allowed
if cache_context.should_write():
await async_db_manager.acache_url(crawl_result)
# 8. Return the final CrawlResult
return crawl_result
async def aprocess_html(self, url: str, html: str, config: CrawlerRunConfig, ...) -> CrawlResult:
# This internal method handles:
# - Getting the configured ContentScrapingStrategy
# - Calling its 'scrap' method
# - Getting the configured MarkdownGenerationStrategy
# - Calling its 'generate_markdown' method
# - Getting the configured ExtractionStrategy (if any)
# - Calling its 'run' method
# - Packaging everything into a CrawlResult
# ... implementation details ...
pass # Simplified
async def arun_many(self, urls: List[str], config: Optional[CrawlerRunConfig] = None, ...) -> List[CrawlResult]:
# Uses a Dispatcher (like MemoryAdaptiveDispatcher)
# to run self.arun for each URL concurrently.
# ... implementation details using a dispatcher ...
pass # Simplified
# ... other methods like awarmup, close, caching helpers ...
```
The key takeaway is that `AsyncWebCrawler` doesn't do the fetching or detailed processing *itself*. It acts as the central hub, coordinating calls to the various specialized `Strategy` classes based on the provided configuration.
## Conclusion
You've met the General Manager: `AsyncWebCrawler`!
* It's the **main entry point** for using Crawl4AI.
* It **coordinates** all the steps: fetching, caching, scraping, extracting.
* You primarily interact with it using `async with` and the `arun()` (single URL) or `arun_many()` (multiple URLs) methods.
* It takes a URL and an optional `CrawlerRunConfig` object to customize the crawl.
* It returns a comprehensive `CrawlResult` object.
Now that you understand the central role of `AsyncWebCrawler`, let's explore how to give it detailed instructions for each crawling job.
**Next:** Let's dive into the specifics of configuration with [Chapter 3: Giving Instructions - CrawlerRunConfig](03_crawlerrunconfig.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,277 @@
# Chapter 3: Giving Instructions - CrawlerRunConfig
In [Chapter 2: Meet the General Manager - AsyncWebCrawler](02_asyncwebcrawler.md), we met the `AsyncWebCrawler`, the central coordinator for our web crawling tasks. We saw how to tell it *what* URL to crawl using the `arun` method.
But what if we want to tell the crawler *how* to crawl that URL? Maybe we want it to take a picture (screenshot) of the page? Or perhaps we only care about a specific section of the page? Or maybe we want to ignore the cache and get the very latest version?
Passing all these different instructions individually every time we call `arun` could get complicated and messy.
```python
# Imagine doing this every time - it gets long!
# result = await crawler.arun(
# url="https://example.com",
# take_screenshot=True,
# ignore_cache=True,
# only_look_at_this_part="#main-content",
# wait_for_this_element="#data-table",
# # ... maybe many more settings ...
# )
```
That's where `CrawlerRunConfig` comes in!
## What Problem Does `CrawlerRunConfig` Solve?
Think of `CrawlerRunConfig` as the **Instruction Manual** for a *specific* crawl job. Instead of giving the `AsyncWebCrawler` manager lots of separate instructions each time, you bundle them all neatly into a single `CrawlerRunConfig` object.
This object tells the `AsyncWebCrawler` exactly *how* to handle a particular URL or set of URLs for that specific run. It makes your code cleaner and easier to manage.
## What is `CrawlerRunConfig`?
`CrawlerRunConfig` is a configuration class that holds all the settings for a single crawl operation initiated by `AsyncWebCrawler.arun()` or `arun_many()`.
It allows you to customize various aspects of the crawl, such as:
* **Taking Screenshots:** Should the crawler capture an image of the page? (`screenshot`)
* **Waiting:** How long should the crawler wait for the page or specific elements to load? (`page_timeout`, `wait_for`)
* **Focusing Content:** Should the crawler only process a specific part of the page? (`css_selector`)
* **Extracting Data:** Should the crawler use a specific method to pull out structured data? ([ExtractionStrategy](06_extractionstrategy.md))
* **Caching:** How should the crawler interact with previously saved results? ([CacheMode](09_cachecontext___cachemode.md))
* **And much more!** (like handling JavaScript, filtering links, etc.)
## Using `CrawlerRunConfig`
Let's see how to use it. Remember our basic crawl from Chapter 2?
```python
# chapter3_example_1.py
import asyncio
from crawl4ai import AsyncWebCrawler
async def main():
async with AsyncWebCrawler() as crawler:
url_to_crawl = "https://httpbin.org/html"
print(f"Crawling {url_to_crawl} with default settings...")
# This uses the default behavior (no specific config)
result = await crawler.arun(url=url_to_crawl)
if result.success:
print("Success! Got the content.")
print(f"Screenshot taken? {'Yes' if result.screenshot else 'No'}") # Likely No
# We'll learn about CacheMode later, but it defaults to using the cache
else:
print(f"Failed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
Now, let's say for this *specific* crawl, we want to bypass the cache (fetch fresh) and also take a screenshot.
We create a `CrawlerRunConfig` instance and pass it to `arun`:
```python
# chapter3_example_2.py
import asyncio
from crawl4ai import AsyncWebCrawler
from crawl4ai import CrawlerRunConfig # 1. Import the config class
from crawl4ai import CacheMode # Import cache options
async def main():
async with AsyncWebCrawler() as crawler:
url_to_crawl = "https://httpbin.org/html"
print(f"Crawling {url_to_crawl} with custom settings...")
# 2. Create an instance of CrawlerRunConfig with our desired settings
my_instructions = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS, # Don't use the cache, fetch fresh
screenshot=True # Take a screenshot
)
print("Instructions: Bypass cache, take screenshot.")
# 3. Pass the config object to arun()
result = await crawler.arun(
url=url_to_crawl,
config=my_instructions # Pass our instruction manual
)
if result.success:
print("\nSuccess! Got the content with custom config.")
print(f"Screenshot taken? {'Yes' if result.screenshot else 'No'}") # Should be Yes
# Check if the screenshot file path exists in result.screenshot
if result.screenshot:
print(f"Screenshot saved to: {result.screenshot}")
else:
print(f"\nFailed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. **Import:** We import `CrawlerRunConfig` and `CacheMode`.
2. **Create Config:** We create an instance: `my_instructions = CrawlerRunConfig(...)`. We set `cache_mode` to `CacheMode.BYPASS` and `screenshot` to `True`. All other settings remain at their defaults.
3. **Pass Config:** We pass this `my_instructions` object to `crawler.arun` using the `config=` parameter.
Now, when `AsyncWebCrawler` runs this job, it will look inside `my_instructions` and follow those specific settings for *this run only*.
## Some Common `CrawlerRunConfig` Parameters
`CrawlerRunConfig` has many options, but here are a few common ones you might use:
* **`cache_mode`**: Controls caching behavior.
* `CacheMode.ENABLED` (Default): Use the cache if available, otherwise fetch and save.
* `CacheMode.BYPASS`: Always fetch fresh, ignoring any cached version (but still save the new result).
* `CacheMode.DISABLED`: Never read from or write to the cache.
* *(More details in [Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode](09_cachecontext___cachemode.md))*
* **`screenshot` (bool)**: If `True`, takes a screenshot of the fully rendered page. The path to the screenshot file will be in `CrawlResult.screenshot`. Default: `False`.
* **`pdf` (bool)**: If `True`, generates a PDF of the page. The path to the PDF file will be in `CrawlResult.pdf`. Default: `False`.
* **`css_selector` (str)**: If provided (e.g., `"#main-content"` or `.article-body`), the crawler will try to extract *only* the HTML content within the element(s) matching this CSS selector. This is great for focusing on the important part of a page. Default: `None` (process the whole page).
* **`wait_for` (str)**: A CSS selector (e.g., `"#data-loaded-indicator"`). The crawler will wait until an element matching this selector appears on the page before proceeding. Useful for pages that load content dynamically with JavaScript. Default: `None`.
* **`page_timeout` (int)**: Maximum time in milliseconds to wait for page navigation or certain operations. Default: `60000` (60 seconds).
* **`extraction_strategy`**: An object that defines how to extract specific, structured data (like product names and prices) from the page. Default: `None`. *(See [Chapter 6: Getting Specific Data - ExtractionStrategy](06_extractionstrategy.md))*
* **`scraping_strategy`**: An object defining how the raw HTML is cleaned and basic content (like text and links) is extracted. Default: `WebScrapingStrategy()`. *(See [Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy](04_contentscrapingstrategy.md))*
Let's try combining a few: focus on a specific part of the page and wait for something to appear.
```python
# chapter3_example_3.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
async def main():
# This example site has a heading 'H1' inside a 'body' tag.
url_to_crawl = "https://httpbin.org/html"
async with AsyncWebCrawler() as crawler:
print(f"Crawling {url_to_crawl}, focusing on the H1 tag...")
# Instructions: Only get the H1 tag, wait max 10s for it
specific_config = CrawlerRunConfig(
css_selector="h1", # Only grab content inside <h1> tags
page_timeout=10000 # Set page timeout to 10 seconds
# We could also add wait_for="h1" if needed for dynamic loading
)
result = await crawler.arun(url=url_to_crawl, config=specific_config)
if result.success:
print("\nSuccess! Focused crawl completed.")
# The markdown should now ONLY contain the H1 content
print(f"Markdown content:\n---\n{result.markdown.raw_markdown.strip()}\n---")
else:
print(f"\nFailed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
This time, the `result.markdown` should only contain the text from the `<h1>` tag on that page, because we used `css_selector="h1"` in our `CrawlerRunConfig`.
## How `AsyncWebCrawler` Uses the Config (Under the Hood)
You don't need to know the exact internal code, but it helps to understand the flow. When you call `crawler.arun(url, config=my_config)`, the `AsyncWebCrawler` essentially does this:
1. Receives the `url` and the `my_config` object.
2. Before fetching, it checks `my_config.cache_mode` to see if it should look in the cache first.
3. If fetching is needed, it passes `my_config` to the underlying [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md).
4. The strategy uses settings from `my_config` like `page_timeout`, `wait_for`, and whether to take a `screenshot`.
5. After getting the raw HTML, `AsyncWebCrawler` uses the `my_config.scraping_strategy` and `my_config.css_selector` to process the content.
6. If `my_config.extraction_strategy` is set, it uses that to extract structured data.
7. Finally, it bundles everything into a `CrawlResult` and returns it.
Here's a simplified view:
```mermaid
sequenceDiagram
participant User
participant AWC as AsyncWebCrawler
participant Config as CrawlerRunConfig
participant Fetcher as AsyncCrawlerStrategy
participant Processor as Scraping/Extraction
User->>AWC: arun(url, config=my_config)
AWC->>Config: Check my_config.cache_mode
alt Need to Fetch
AWC->>Fetcher: crawl(url, config=my_config)
Note over Fetcher: Uses my_config settings (timeout, wait_for, screenshot...)
Fetcher-->>AWC: Raw Response (HTML, screenshot?)
AWC->>Processor: Process HTML (using my_config.css_selector, my_config.extraction_strategy...)
Processor-->>AWC: Processed Data
else Use Cache
AWC->>AWC: Retrieve from Cache
end
AWC-->>User: Return CrawlResult
```
The `CrawlerRunConfig` acts as a messenger carrying your specific instructions throughout the crawling process.
Inside the `crawl4ai` library, in the file `async_configs.py`, you'll find the definition of the `CrawlerRunConfig` class. It looks something like this (simplified):
```python
# Simplified from crawl4ai/async_configs.py
from .cache_context import CacheMode
from .extraction_strategy import ExtractionStrategy
from .content_scraping_strategy import ContentScrapingStrategy, WebScrapingStrategy
# ... other imports ...
class CrawlerRunConfig():
"""
Configuration class for controlling how the crawler runs each crawl operation.
"""
def __init__(
self,
# Caching
cache_mode: CacheMode = CacheMode.BYPASS, # Default behavior if not specified
# Content Selection / Waiting
css_selector: str = None,
wait_for: str = None,
page_timeout: int = 60000, # 60 seconds
# Media
screenshot: bool = False,
pdf: bool = False,
# Processing Strategies
scraping_strategy: ContentScrapingStrategy = None, # Defaults internally if None
extraction_strategy: ExtractionStrategy = None,
# ... many other parameters omitted for clarity ...
**kwargs # Allows for flexibility
):
self.cache_mode = cache_mode
self.css_selector = css_selector
self.wait_for = wait_for
self.page_timeout = page_timeout
self.screenshot = screenshot
self.pdf = pdf
# Assign scraping strategy, ensuring a default if None is provided
self.scraping_strategy = scraping_strategy or WebScrapingStrategy()
self.extraction_strategy = extraction_strategy
# ... initialize other attributes ...
# Helper methods like 'clone', 'to_dict', 'from_kwargs' might exist too
# ...
```
The key idea is that it's a class designed to hold various settings together. When you create an instance `CrawlerRunConfig(...)`, you're essentially creating an object that stores your choices for these parameters.
## Conclusion
You've learned about `CrawlerRunConfig`, the "Instruction Manual" for individual crawl jobs in Crawl4AI!
* It solves the problem of passing many settings individually to `AsyncWebCrawler`.
* You create an instance of `CrawlerRunConfig` and set the parameters you want to customize (like `cache_mode`, `screenshot`, `css_selector`, `wait_for`).
* You pass this config object to `crawler.arun(url, config=your_config)`.
* This makes your code cleaner and gives you fine-grained control over *how* each crawl is performed.
Now that we know how to fetch content ([AsyncCrawlerStrategy](01_asynccrawlerstrategy.md)), manage the overall process ([AsyncWebCrawler](02_asyncwebcrawler.md)), and give specific instructions ([CrawlerRunConfig](03_crawlerrunconfig.md)), let's look at how the raw, messy HTML fetched from the web is initially cleaned up and processed.
**Next:** Let's explore [Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy](04_contentscrapingstrategy.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,321 @@
# Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy
In [Chapter 3: Giving Instructions - CrawlerRunConfig](03_crawlerrunconfig.md), we learned how to give specific instructions to our `AsyncWebCrawler` using `CrawlerRunConfig`. This included telling it *how* to fetch the page and potentially take screenshots or PDFs.
Now, imagine the crawler has successfully fetched the raw HTML content of a webpage. What's next? Raw HTML is often messy! It contains not just the main article or product description you might care about, but also:
* Navigation menus
* Advertisements
* Headers and footers
* Hidden code like JavaScript (`<script>`) and styling information (`<style>`)
* Comments left by developers
Before we can really understand the *meaning* of the page or extract specific important information, we need to clean up this mess and get a basic understanding of its structure.
## What Problem Does `ContentScrapingStrategy` Solve?
Think of the raw HTML fetched by the crawler as a very rough first draft of a book manuscript. It has the core story, but it's full of editor's notes, coffee stains, layout instructions for the printer, and maybe even doodles in the margins.
Before the *main* editor (who focuses on plot and character) can work on it, someone needs to do an initial cleanup. This "First Pass Editor" would:
1. Remove the coffee stains and doodles (irrelevant stuff like ads, scripts, styles).
2. Identify the basic structure: chapter headings (like the page title), paragraph text, image captions (image alt text), and maybe a list of illustrations (links).
3. Produce a tidier version of the manuscript, ready for more detailed analysis.
In Crawl4AI, the `ContentScrapingStrategy` acts as this **First Pass Editor**. It takes the raw HTML and performs an initial cleanup and structure extraction. Its job is to transform the messy HTML into a more manageable format, identifying key elements like text content, links, images, and basic page metadata (like the title).
## What is `ContentScrapingStrategy`?
`ContentScrapingStrategy` is an abstract concept (like a job description) in Crawl4AI that defines *how* the initial processing of raw HTML should happen. It specifies *that* we need a method to clean HTML and extract basic structure, but the specific tools and techniques used can vary.
This allows Crawl4AI to be flexible. Different strategies might use different underlying libraries or have different performance characteristics.
## The Implementations: Meet the Editors
Crawl4AI provides concrete implementations (the actual editors doing the work) of this strategy:
1. **`WebScrapingStrategy` (The Default Editor):**
* This is the strategy used by default if you don't specify otherwise.
* It uses a popular Python library called `BeautifulSoup` behind the scenes to parse and manipulate the HTML.
* It's generally robust and good at handling imperfect HTML.
* Think of it as a reliable, experienced editor who does a thorough job.
2. **`LXMLWebScrapingStrategy` (The Speedy Editor):**
* This strategy uses another powerful library called `lxml`.
* `lxml` is often faster than `BeautifulSoup`, especially on large or complex pages.
* Think of it as a very fast editor who might be slightly stricter about the manuscript's format but gets the job done quickly.
For most beginners, the default `WebScrapingStrategy` works perfectly fine! You usually don't need to worry about switching unless you encounter performance issues on very large-scale crawls (which is a more advanced topic).
## How It Works Conceptually
Here's the flow:
1. The [AsyncWebCrawler](02_asyncwebcrawler.md) receives the raw HTML from the [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md) (the fetcher).
2. It looks at the [CrawlerRunConfig](03_crawlerrunconfig.md) to see which `ContentScrapingStrategy` to use (defaulting to `WebScrapingStrategy` if none is specified).
3. It hands the raw HTML over to the chosen strategy's `scrap` method.
4. The strategy parses the HTML, removes unwanted tags (like `<script>`, `<style>`, `<nav>`, `<aside>`, etc., based on its internal rules), extracts all links (`<a>` tags), images (`<img>` tags with their `alt` text), and metadata (like the `<title>` tag).
5. It returns the results packaged in a `ScrapingResult` object, containing the cleaned HTML, lists of links and media items, and extracted metadata.
6. The `AsyncWebCrawler` then takes this `ScrapingResult` and uses its contents (along with other info) to build the final [CrawlResult](07_crawlresult.md).
```mermaid
sequenceDiagram
participant AWC as AsyncWebCrawler (Manager)
participant Fetcher as AsyncCrawlerStrategy
participant HTML as Raw HTML
participant CSS as ContentScrapingStrategy (Editor)
participant SR as ScrapingResult (Cleaned Draft)
participant CR as CrawlResult (Final Report)
AWC->>Fetcher: Fetch("https://example.com")
Fetcher-->>AWC: Here's the Raw HTML
AWC->>CSS: Please scrap this Raw HTML (using config)
Note over CSS: Parsing HTML... Removing scripts, styles, ads... Extracting links, images, title...
CSS-->>AWC: Here's the ScrapingResult (Cleaned HTML, Links, Media, Metadata)
AWC->>CR: Combine ScrapingResult with other info
AWC-->>User: Return final CrawlResult
```
## Using the Default Strategy (`WebScrapingStrategy`)
You're likely already using it without realizing it! When you run a basic crawl, `AsyncWebCrawler` automatically employs `WebScrapingStrategy`.
```python
# chapter4_example_1.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
async def main():
# Uses the default AsyncPlaywrightCrawlerStrategy (fetching)
# AND the default WebScrapingStrategy (scraping/cleaning)
async with AsyncWebCrawler() as crawler:
url_to_crawl = "https://httpbin.org/html" # A very simple HTML page
# We don't specify a scraping_strategy in the config, so it uses the default
config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS) # Fetch fresh
print(f"Crawling {url_to_crawl} using default scraping strategy...")
result = await crawler.arun(url=url_to_crawl, config=config)
if result.success:
print("\nSuccess! Content fetched and scraped.")
# The 'result' object now contains info processed by WebScrapingStrategy
# 1. Metadata extracted (e.g., page title)
print(f"Page Title: {result.metadata.get('title', 'N/A')}")
# 2. Links extracted
print(f"Found {len(result.links.internal)} internal links and {len(result.links.external)} external links.")
# Example: print first external link if exists
if result.links.external:
print(f" Example external link: {result.links.external[0].href}")
# 3. Media extracted (images, videos, etc.)
print(f"Found {len(result.media.images)} images.")
# Example: print first image alt text if exists
if result.media.images:
print(f" Example image alt text: '{result.media.images[0].alt}'")
# 4. Cleaned HTML (scripts, styles etc. removed) - might still be complex
# print(f"\nCleaned HTML snippet:\n---\n{result.cleaned_html[:200]}...\n---")
# 5. Markdown representation (generated AFTER scraping)
print(f"\nMarkdown snippet:\n---\n{result.markdown.raw_markdown[:200]}...\n---")
else:
print(f"\nFailed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. We create `AsyncWebCrawler` and `CrawlerRunConfig` as usual.
2. We **don't** set the `scraping_strategy` parameter in `CrawlerRunConfig`. Crawl4AI automatically picks `WebScrapingStrategy`.
3. When `crawler.arun` executes, after fetching the HTML, it internally calls `WebScrapingStrategy.scrap()`.
4. The `result` (a [CrawlResult](07_crawlresult.md) object) contains fields populated by the scraping strategy:
* `result.metadata`: Contains things like the page title found in `<title>` tags.
* `result.links`: Contains lists of internal and external links found (`<a>` tags).
* `result.media`: Contains lists of images (`<img>`), videos (`<video>`), etc.
* `result.cleaned_html`: The HTML after the strategy removed unwanted tags and attributes (this is then used to generate the Markdown).
* `result.markdown`: While not *directly* created by the scraping strategy, the cleaned HTML it produces is the input for generating the Markdown representation.
## Explicitly Choosing a Strategy (e.g., `LXMLWebScrapingStrategy`)
What if you want to try the potentially faster `LXMLWebScrapingStrategy`? You can specify it in the `CrawlerRunConfig`.
```python
# chapter4_example_2.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
# 1. Import the specific strategy you want to use
from crawl4ai import LXMLWebScrapingStrategy
async def main():
# 2. Create an instance of the desired scraping strategy
lxml_editor = LXMLWebScrapingStrategy()
print(f"Using scraper: {lxml_editor.__class__.__name__}")
async with AsyncWebCrawler() as crawler:
url_to_crawl = "https://httpbin.org/html"
# 3. Create a CrawlerRunConfig and pass the strategy instance
config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
scraping_strategy=lxml_editor # Tell the config which strategy to use
)
print(f"Crawling {url_to_crawl} with explicit LXML scraping strategy...")
result = await crawler.arun(url=url_to_crawl, config=config)
if result.success:
print("\nSuccess! Content fetched and scraped using LXML.")
print(f"Page Title: {result.metadata.get('title', 'N/A')}")
print(f"Found {len(result.links.external)} external links.")
# Output should be largely the same as the default strategy for simple pages
else:
print(f"\nFailed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. **Import:** We import `LXMLWebScrapingStrategy` alongside the other classes.
2. **Instantiate:** We create an instance: `lxml_editor = LXMLWebScrapingStrategy()`.
3. **Configure:** We create `CrawlerRunConfig` and pass our instance to the `scraping_strategy` parameter: `CrawlerRunConfig(..., scraping_strategy=lxml_editor)`.
4. **Run:** Now, when `crawler.arun` is called with this config, it will use `LXMLWebScrapingStrategy` instead of the default `WebScrapingStrategy` for the initial HTML processing step.
For simple pages, the results from both strategies will often be very similar. The choice typically comes down to performance considerations in more advanced scenarios.
## A Glimpse Under the Hood
Inside the `crawl4ai` library, the file `content_scraping_strategy.py` defines the blueprint and the implementations.
**The Blueprint (Abstract Base Class):**
```python
# Simplified from crawl4ai/content_scraping_strategy.py
from abc import ABC, abstractmethod
from .models import ScrapingResult # Defines the structure of the result
class ContentScrapingStrategy(ABC):
"""Abstract base class for content scraping strategies."""
@abstractmethod
def scrap(self, url: str, html: str, **kwargs) -> ScrapingResult:
"""
Synchronous method to scrape content.
Takes raw HTML, returns structured ScrapingResult.
"""
pass
@abstractmethod
async def ascrap(self, url: str, html: str, **kwargs) -> ScrapingResult:
"""
Asynchronous method to scrape content.
Takes raw HTML, returns structured ScrapingResult.
"""
pass
```
**The Implementations:**
```python
# Simplified from crawl4ai/content_scraping_strategy.py
from bs4 import BeautifulSoup # Library used by WebScrapingStrategy
# ... other imports like models ...
class WebScrapingStrategy(ContentScrapingStrategy):
def __init__(self, logger=None):
self.logger = logger
# ... potentially other setup ...
def scrap(self, url: str, html: str, **kwargs) -> ScrapingResult:
# 1. Parse HTML using BeautifulSoup
soup = BeautifulSoup(html, 'lxml') # Or another parser
# 2. Find the main content area (maybe using kwargs['css_selector'])
# 3. Remove unwanted tags (scripts, styles, nav, footer, ads...)
# 4. Extract metadata (title, description...)
# 5. Extract all links (<a> tags)
# 6. Extract all images (<img> tags) and other media
# 7. Get the remaining cleaned HTML text content
# ... complex cleaning and extraction logic using BeautifulSoup methods ...
# 8. Package results into a ScrapingResult object
cleaned_html_content = "<html><body>Cleaned content...</body></html>" # Placeholder
links_data = Links(...)
media_data = Media(...)
metadata_dict = {"title": "Page Title"}
return ScrapingResult(
cleaned_html=cleaned_html_content,
links=links_data,
media=media_data,
metadata=metadata_dict,
success=True
)
async def ascrap(self, url: str, html: str, **kwargs) -> ScrapingResult:
# Often delegates to the synchronous version for CPU-bound tasks
return await asyncio.to_thread(self.scrap, url, html, **kwargs)
```
```python
# Simplified from crawl4ai/content_scraping_strategy.py
from lxml import html as lhtml # Library used by LXMLWebScrapingStrategy
# ... other imports like models ...
class LXMLWebScrapingStrategy(WebScrapingStrategy): # Often inherits for shared logic
def __init__(self, logger=None):
super().__init__(logger)
# ... potentially LXML specific setup ...
def scrap(self, url: str, html: str, **kwargs) -> ScrapingResult:
# 1. Parse HTML using lxml
doc = lhtml.document_fromstring(html)
# 2. Find main content, remove unwanted tags, extract info
# ... complex cleaning and extraction logic using lxml's XPath or CSS selectors ...
# 3. Package results into a ScrapingResult object
cleaned_html_content = "<html><body>Cleaned LXML content...</body></html>" # Placeholder
links_data = Links(...)
media_data = Media(...)
metadata_dict = {"title": "Page Title LXML"}
return ScrapingResult(
cleaned_html=cleaned_html_content,
links=links_data,
media=media_data,
metadata=metadata_dict,
success=True
)
# ascrap might also delegate or have specific async optimizations
```
The key takeaway is that both strategies implement the `scrap` (and `ascrap`) method, taking raw HTML and returning a structured `ScrapingResult`. The `AsyncWebCrawler` can use either one thanks to this common interface.
## Conclusion
You've learned about `ContentScrapingStrategy`, Crawl4AI's "First Pass Editor" for raw HTML.
* It tackles the problem of messy HTML by cleaning it and extracting basic structure.
* It acts as a blueprint, with `WebScrapingStrategy` (default, using BeautifulSoup) and `LXMLWebScrapingStrategy` (using lxml) as concrete implementations.
* It's used automatically by `AsyncWebCrawler` after fetching content.
* You can specify which strategy to use via `CrawlerRunConfig`.
* Its output (cleaned HTML, links, media, metadata) is packaged into a `ScrapingResult` and contributes significantly to the final `CrawlResult`.
Now that we have this initially cleaned and structured content, we might want to further filter it. What if we only care about the parts of the page that are *relevant* to a specific topic?
**Next:** Let's explore how to filter content for relevance with [Chapter 5: Focusing on What Matters - RelevantContentFilter](05_relevantcontentfilter.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,425 @@
# Chapter 5: Focusing on What Matters - RelevantContentFilter
In [Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy](04_contentscrapingstrategy.md), we learned how Crawl4AI takes the raw, messy HTML from a webpage and cleans it up using a `ContentScrapingStrategy`. This gives us a tidier version of the HTML (`cleaned_html`) and extracts basic elements like links and images.
But even after this initial cleanup, the page might still contain a lot of "noise" relative to what we *actually* care about. Imagine a news article page: the `ContentScrapingStrategy` might remove scripts and styles, but it could still leave the main article text, plus related article links, user comments, sidebars with ads, and maybe a lengthy footer.
If our goal is just to get the main article content (e.g., to summarize it or feed it to an AI), all that extra stuff is just noise. How can we filter the cleaned content even further to keep only the truly relevant parts?
## What Problem Does `RelevantContentFilter` Solve?
Think of the `cleaned_html` from the previous step like flour that's been roughly sifted the biggest lumps are gone, but there might still be smaller clumps or bran mixed in. If you want super fine flour for a delicate cake, you need a finer sieve.
`RelevantContentFilter` acts as this **finer sieve** or a **Relevance Sieve**. It's a strategy applied *after* the initial cleaning by `ContentScrapingStrategy` but *before* the final processing (like generating the final Markdown output or using an AI for extraction). Its job is to go through the cleaned content and decide which parts are truly relevant to our goal, removing the rest.
This helps us:
1. **Reduce Noise:** Eliminate irrelevant sections like comments, footers, navigation bars, or tangential "related content" blocks.
2. **Focus AI:** If we're sending the content to a Large Language Model (LLM), feeding it only the most relevant parts saves processing time (and potentially money) and can lead to better results.
3. **Improve Accuracy:** By removing distracting noise, subsequent steps like data extraction are less likely to grab the wrong information.
## What is `RelevantContentFilter`?
`RelevantContentFilter` is an abstract concept (a blueprint) in Crawl4AI representing a **method for identifying and retaining only the relevant portions of cleaned HTML content**. It defines *that* we need a way to filter for relevance, but the specific technique used can vary.
This allows us to choose different filtering approaches depending on the task and the type of content.
## The Different Filters: Tools for Sieving
Crawl4AI provides several concrete implementations (the actual sieves) of `RelevantContentFilter`:
1. **`BM25ContentFilter` (The Keyword Sieve):**
* **Analogy:** Like a mini search engine operating *within* the webpage.
* **How it Works:** You give it (or it figures out) some keywords related to what you're looking for (e.g., from a user query like "product specifications" or derived from the page title). It then uses a search algorithm called BM25 to score different chunks of the cleaned HTML based on how relevant they are to those keywords. Only the chunks scoring above a certain threshold are kept.
* **Good For:** Finding specific sections about a known topic within a larger page (e.g., finding only the paragraphs discussing "climate change impact" on a long environmental report page).
2. **`PruningContentFilter` (The Structural Sieve):**
* **Analogy:** Like a gardener pruning a bush, removing weak or unnecessary branches based on their structure.
* **How it Works:** This filter doesn't care about keywords. Instead, it looks at the *structure* and *characteristics* of the HTML elements. It removes elements that often represent noise, such as those with very little text compared to the number of links (low text density), elements with common "noise" words in their CSS classes or IDs (like `sidebar`, `comments`, `footer`), or elements deemed structurally insignificant.
* **Good For:** Removing common boilerplate sections (like headers, footers, simple sidebars, navigation) based purely on layout and density clues, even if you don't have a specific topic query.
3. **`LLMContentFilter` (The AI Sieve):**
* **Analogy:** Asking a smart assistant to read the cleaned content and pick out only the parts relevant to your request.
* **How it Works:** This filter sends the cleaned HTML (often broken into manageable chunks) to a Large Language Model (like GPT). You provide an instruction (e.g., "Extract only the main article content, removing all comments and related links" or "Keep only the sections discussing financial results"). The AI uses its understanding of language and context to identify and return only the relevant parts, often already formatted nicely (like in Markdown).
* **Good For:** Handling complex relevance decisions that require understanding meaning and context, following nuanced natural language instructions. (Note: Requires configuring LLM access, like API keys, and can be slower and potentially costlier than other methods).
## How `RelevantContentFilter` is Used (Via Markdown Generation)
In Crawl4AI, the `RelevantContentFilter` is typically integrated into the **Markdown generation** step. The standard markdown generator (`DefaultMarkdownGenerator`) can accept a `RelevantContentFilter` instance.
When configured this way:
1. The `AsyncWebCrawler` fetches the page and uses the `ContentScrapingStrategy` to get `cleaned_html`.
2. It then calls the `DefaultMarkdownGenerator` to produce the Markdown output.
3. The generator first creates the standard, "raw" Markdown from the *entire* `cleaned_html`.
4. **If** a `RelevantContentFilter` was provided to the generator, it then uses this filter on the `cleaned_html` to select only the relevant HTML fragments.
5. It converts *these filtered fragments* into Markdown. This becomes the `fit_markdown`.
So, the `CrawlResult` will contain *both*:
* `result.markdown.raw_markdown`: Markdown based on the full `cleaned_html`.
* `result.markdown.fit_markdown`: Markdown based *only* on the parts deemed relevant by the filter.
Let's see how to configure this.
### Example 1: Using `BM25ContentFilter` to find specific content
Imagine we crawled a page about renewable energy, but we only want the parts specifically discussing **solar power**.
```python
# chapter5_example_1.py
import asyncio
from crawl4ai import (
AsyncWebCrawler,
CrawlerRunConfig,
DefaultMarkdownGenerator, # The standard markdown generator
BM25ContentFilter # The keyword-based filter
)
async def main():
# 1. Create the BM25 filter with our query
solar_filter = BM25ContentFilter(user_query="solar power technology")
print(f"Filter created for query: '{solar_filter.user_query}'")
# 2. Create a Markdown generator that USES this filter
markdown_generator_with_filter = DefaultMarkdownGenerator(
content_filter=solar_filter
)
print("Markdown generator configured with BM25 filter.")
# 3. Create CrawlerRunConfig using this specific markdown generator
run_config = CrawlerRunConfig(
markdown_generator=markdown_generator_with_filter
)
# 4. Run the crawl
async with AsyncWebCrawler() as crawler:
# Example URL (replace with a real page having relevant content)
url_to_crawl = "https://en.wikipedia.org/wiki/Renewable_energy"
print(f"\nCrawling {url_to_crawl}...")
result = await crawler.arun(url=url_to_crawl, config=run_config)
if result.success:
print("\nCrawl successful!")
print(f"Raw Markdown length: {len(result.markdown.raw_markdown)}")
print(f"Fit Markdown length: {len(result.markdown.fit_markdown)}")
# The fit_markdown should be shorter and focused on solar power
print("\n--- Start of Fit Markdown (Solar Power Focus) ---")
# Print first 500 chars of the filtered markdown
print(result.markdown.fit_markdown[:500] + "...")
print("--- End of Fit Markdown Snippet ---")
else:
print(f"\nCrawl failed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. **Create Filter:** We make an instance of `BM25ContentFilter`, telling it we're interested in "solar power technology".
2. **Create Generator:** We make an instance of `DefaultMarkdownGenerator` and pass our `solar_filter` to its `content_filter` parameter.
3. **Configure Run:** We create `CrawlerRunConfig` and tell it to use our special `markdown_generator_with_filter` for this run.
4. **Crawl & Check:** We run the crawl as usual. In the `result`, `result.markdown.raw_markdown` will have the markdown for the whole page, while `result.markdown.fit_markdown` will *only* contain markdown derived from the HTML parts that the `BM25ContentFilter` scored highly for relevance to "solar power technology". You'll likely see the `fit_markdown` is significantly shorter.
### Example 2: Using `PruningContentFilter` to remove boilerplate
Now, let's try removing common noise like sidebars or footers based on structure, without needing a specific query.
```python
# chapter5_example_2.py
import asyncio
from crawl4ai import (
AsyncWebCrawler,
CrawlerRunConfig,
DefaultMarkdownGenerator,
PruningContentFilter # The structural filter
)
async def main():
# 1. Create the Pruning filter (no query needed)
pruning_filter = PruningContentFilter()
print("Filter created: PruningContentFilter (structural)")
# 2. Create a Markdown generator that uses this filter
markdown_generator_with_filter = DefaultMarkdownGenerator(
content_filter=pruning_filter
)
print("Markdown generator configured with Pruning filter.")
# 3. Create CrawlerRunConfig using this generator
run_config = CrawlerRunConfig(
markdown_generator=markdown_generator_with_filter
)
# 4. Run the crawl
async with AsyncWebCrawler() as crawler:
# Example URL (replace with a real page that has boilerplate)
url_to_crawl = "https://www.python.org/" # Python homepage likely has headers/footers
print(f"\nCrawling {url_to_crawl}...")
result = await crawler.arun(url=url_to_crawl, config=run_config)
if result.success:
print("\nCrawl successful!")
print(f"Raw Markdown length: {len(result.markdown.raw_markdown)}")
print(f"Fit Markdown length: {len(result.markdown.fit_markdown)}")
# fit_markdown should have less header/footer/sidebar content
print("\n--- Start of Fit Markdown (Pruned) ---")
print(result.markdown.fit_markdown[:500] + "...")
print("--- End of Fit Markdown Snippet ---")
else:
print(f"\nCrawl failed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
The structure is the same as the BM25 example, but:
1. We instantiate `PruningContentFilter()`, which doesn't require a `user_query`.
2. We pass this filter to the `DefaultMarkdownGenerator`.
3. The resulting `result.markdown.fit_markdown` should contain Markdown primarily from the main content areas of the page, with structurally identified boilerplate removed.
### Example 3: Using `LLMContentFilter` (Conceptual)
Using `LLMContentFilter` follows the same pattern, but requires setting up LLM provider details.
```python
# chapter5_example_3_conceptual.py
import asyncio
from crawl4ai import (
AsyncWebCrawler,
CrawlerRunConfig,
DefaultMarkdownGenerator,
LLMContentFilter,
# Assume LlmConfig is set up correctly (see LLM-specific docs)
# from crawl4ai.async_configs import LlmConfig
)
# Assume llm_config is properly configured with API keys, provider, etc.
# Example: llm_config = LlmConfig(provider="openai", api_token="env:OPENAI_API_KEY")
# For this example, we'll pretend it's ready.
class MockLlmConfig: # Mock for demonstration
provider = "mock_provider"
api_token = "mock_token"
base_url = None
llm_config = MockLlmConfig()
async def main():
# 1. Create the LLM filter with an instruction
instruction = "Extract only the main news article content. Remove headers, footers, ads, comments, and related links."
llm_filter = LLMContentFilter(
instruction=instruction,
llmConfig=llm_config # Pass the LLM configuration
)
print(f"Filter created: LLMContentFilter")
print(f"Instruction: '{llm_filter.instruction}'")
# 2. Create a Markdown generator using this filter
markdown_generator_with_filter = DefaultMarkdownGenerator(
content_filter=llm_filter
)
print("Markdown generator configured with LLM filter.")
# 3. Create CrawlerRunConfig
run_config = CrawlerRunConfig(
markdown_generator=markdown_generator_with_filter
)
# 4. Run the crawl
async with AsyncWebCrawler() as crawler:
# Example URL (replace with a real news article)
url_to_crawl = "https://httpbin.org/html" # Using simple page for demo
print(f"\nCrawling {url_to_crawl}...")
# In a real scenario, this would call the LLM API
result = await crawler.arun(url=url_to_crawl, config=run_config)
if result.success:
print("\nCrawl successful!")
# The fit_markdown would contain the AI-filtered content
print("\n--- Start of Fit Markdown (AI Filtered - Conceptual) ---")
# Because we used a mock LLM/simple page, fit_markdown might be empty or simple.
# On a real page with a real LLM, it would ideally contain just the main article.
print(result.markdown.fit_markdown[:500] + "...")
print("--- End of Fit Markdown Snippet ---")
else:
print(f"\nCrawl failed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. We create `LLMContentFilter`, providing our natural language `instruction` and the necessary `llmConfig` (which holds provider details and API keys - mocked here for simplicity).
2. We integrate it into `DefaultMarkdownGenerator` and `CrawlerRunConfig` as before.
3. When `arun` is called, the `LLMContentFilter` would (in a real scenario) interact with the configured LLM API, sending chunks of the `cleaned_html` and the instruction, then assembling the AI's response into the `fit_markdown`.
## Under the Hood: How Filtering Fits In
The `RelevantContentFilter` doesn't run on its own; it's invoked by another component, typically the `DefaultMarkdownGenerator`.
Here's the sequence:
```mermaid
sequenceDiagram
participant User
participant AWC as AsyncWebCrawler
participant Config as CrawlerRunConfig
participant Scraper as ContentScrapingStrategy
participant MDGen as DefaultMarkdownGenerator
participant Filter as RelevantContentFilter
participant Result as CrawlResult
User->>AWC: arun(url, config=my_config)
Note over AWC: Config includes Markdown Generator with a Filter
AWC->>Scraper: scrap(raw_html)
Scraper-->>AWC: cleaned_html, links, etc.
AWC->>MDGen: generate_markdown(cleaned_html, config=my_config)
Note over MDGen: Uses html2text for raw markdown
MDGen-->>MDGen: raw_markdown = html2text(cleaned_html)
Note over MDGen: Now, check for content_filter
alt Filter Provided in MDGen
MDGen->>Filter: filter_content(cleaned_html)
Filter-->>MDGen: filtered_html_fragments
Note over MDGen: Uses html2text on filtered fragments
MDGen-->>MDGen: fit_markdown = html2text(filtered_html_fragments)
else No Filter Provided
MDGen-->>MDGen: fit_markdown = "" (or None)
end
Note over MDGen: Generate citations if needed
MDGen-->>AWC: MarkdownGenerationResult (raw, fit, references)
AWC->>Result: Package everything
AWC-->>User: Return CrawlResult
```
**Code Glimpse:**
Inside `crawl4ai/markdown_generation_strategy.py`, the `DefaultMarkdownGenerator`'s `generate_markdown` method has logic like this (simplified):
```python
# Simplified from markdown_generation_strategy.py
from .models import MarkdownGenerationResult
from .html2text import CustomHTML2Text
from .content_filter_strategy import RelevantContentFilter # Import filter base class
class DefaultMarkdownGenerator(MarkdownGenerationStrategy):
# ... __init__ stores self.content_filter ...
def generate_markdown(
self,
cleaned_html: str,
# ... other params like base_url, options ...
content_filter: Optional[RelevantContentFilter] = None,
**kwargs,
) -> MarkdownGenerationResult:
h = CustomHTML2Text(...) # Setup html2text converter
# ... apply options ...
# 1. Generate raw markdown from the full cleaned_html
raw_markdown = h.handle(cleaned_html)
# ... post-process raw_markdown ...
# 2. Convert links to citations (if enabled)
markdown_with_citations, references_markdown = self.convert_links_to_citations(...)
# 3. Generate fit markdown IF a filter is available
fit_markdown = ""
filtered_html = ""
# Use the filter passed directly, or the one stored during initialization
active_filter = content_filter or self.content_filter
if active_filter:
try:
# Call the filter's main method
filtered_html_fragments = active_filter.filter_content(cleaned_html)
# Join fragments (assuming filter returns list of HTML strings)
filtered_html = "\n".join(filtered_html_fragments)
# Convert ONLY the filtered HTML to markdown
fit_markdown = h.handle(filtered_html)
except Exception as e:
fit_markdown = f"Error during filtering: {e}"
# Log error...
return MarkdownGenerationResult(
raw_markdown=raw_markdown,
markdown_with_citations=markdown_with_citations,
references_markdown=references_markdown,
fit_markdown=fit_markdown, # Contains the filtered result
fit_html=filtered_html, # The HTML fragments kept by the filter
)
```
And inside `crawl4ai/content_filter_strategy.py`, you find the blueprint and implementations:
```python
# Simplified from content_filter_strategy.py
from abc import ABC, abstractmethod
from typing import List
# ... other imports like BeautifulSoup, BM25Okapi ...
class RelevantContentFilter(ABC):
"""Abstract base class for content filtering strategies"""
def __init__(self, user_query: str = None, ...):
self.user_query = user_query
# ... common setup ...
@abstractmethod
def filter_content(self, html: str) -> List[str]:
"""
Takes cleaned HTML, returns a list of HTML fragments
deemed relevant by the specific strategy.
"""
pass
# ... common helper methods like extract_page_query, is_excluded ...
class BM25ContentFilter(RelevantContentFilter):
def __init__(self, user_query: str = None, bm25_threshold: float = 1.0, ...):
super().__init__(user_query)
self.bm25_threshold = bm25_threshold
# ... BM25 specific setup ...
def filter_content(self, html: str) -> List[str]:
# 1. Parse HTML (e.g., with BeautifulSoup)
# 2. Extract text chunks (candidates)
# 3. Determine query (user_query or extracted)
# 4. Tokenize query and chunks
# 5. Calculate BM25 scores for chunks vs query
# 6. Filter chunks based on score and threshold
# 7. Return the HTML string of the selected chunks
# ... implementation details ...
relevant_html_fragments = ["<p>Relevant paragraph 1...</p>", "<h2>Relevant Section</h2>..."] # Placeholder
return relevant_html_fragments
# ... Implementations for PruningContentFilter and LLMContentFilter ...
```
The key is that each filter implements the `filter_content` method, returning the list of HTML fragments it considers relevant. The `DefaultMarkdownGenerator` then uses these fragments to create the `fit_markdown`.
## Conclusion
You've learned about `RelevantContentFilter`, Crawl4AI's "Relevance Sieve"!
* It addresses the problem that even cleaned HTML can contain noise relative to a specific goal.
* It acts as a strategy to filter cleaned HTML, keeping only the relevant parts.
* Different filter types exist: `BM25ContentFilter` (keywords), `PruningContentFilter` (structure), and `LLMContentFilter` (AI/semantic).
* It's typically used *within* the `DefaultMarkdownGenerator` to produce a focused `fit_markdown` output in the `CrawlResult`, alongside the standard `raw_markdown`.
* You configure it by passing the chosen filter instance to the `DefaultMarkdownGenerator` and then passing that generator to the `CrawlerRunConfig`.
By using `RelevantContentFilter`, you can significantly improve the signal-to-noise ratio of the content you get from webpages, making downstream tasks like summarization or analysis more effective.
But what if just getting relevant *text* isn't enough? What if you need specific, *structured* data like product names, prices, and ratings from an e-commerce page, or names and affiliations from a list of conference speakers?
**Next:** Let's explore how to extract structured data with [Chapter 6: Getting Specific Data - ExtractionStrategy](06_extractionstrategy.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,442 @@
# Chapter 6: Getting Specific Data - ExtractionStrategy
In the previous chapter, [Chapter 5: Focusing on What Matters - RelevantContentFilter](05_relevantcontentfilter.md), we learned how to sift through the cleaned webpage content to keep only the parts relevant to our query or goal, producing a focused `fit_markdown`. This is great for tasks like summarization or getting the main gist of an article.
But sometimes, we need more than just relevant text. Imagine you're analyzing an e-commerce website listing products. You don't just want the *description*; you need the exact **product name**, the specific **price**, the **customer rating**, and maybe the **SKU number**, all neatly organized. How do we tell Crawl4AI to find these *specific* pieces of information and return them in a structured format, like a JSON object?
## What Problem Does `ExtractionStrategy` Solve?
Think of the content we've processed so far (like the cleaned HTML or the generated Markdown) as a detailed report delivered by a researcher. `RelevantContentFilter` helped trim the report down to the most relevant pages.
Now, we need to give specific instructions to an **Analyst** to go through that focused report and pull out precise data points. We don't just want the report; we want a filled-in spreadsheet with columns for "Product Name," "Price," and "Rating."
`ExtractionStrategy` is the set of instructions we give to this Analyst. It defines *how* to locate and extract specific, structured information (like fields in a database or keys in a JSON object) from the content.
## What is `ExtractionStrategy`?
`ExtractionStrategy` is a core concept (a blueprint) in Crawl4AI that represents the **method used to extract structured data** from the processed content (which could be HTML or Markdown). It specifies *that* we need a way to find specific fields, but the actual *technique* used to find them can vary.
This allows us to choose the best "Analyst" for the job, depending on the complexity of the website and the data we need.
## The Different Analysts: Ways to Extract Data
Crawl4AI offers several concrete implementations (the different Analysts) for extracting structured data:
1. **The Precise Locator (`JsonCssExtractionStrategy` & `JsonXPathExtractionStrategy`)**
* **Analogy:** An analyst who uses very precise map coordinates (CSS Selectors or XPath expressions) to find information on a page. They need to be told exactly where to look. "The price is always in the HTML element with the ID `#product-price`."
* **How it works:** You define a **schema** (a Python dictionary) that maps the names of the fields you want (e.g., "product_name", "price") to the specific CSS selector (`JsonCssExtractionStrategy`) or XPath expression (`JsonXPathExtractionStrategy`) that locates that information within the HTML structure.
* **Pros:** Very fast and reliable if the website structure is consistent and predictable. Doesn't require external AI services.
* **Cons:** Can break easily if the website changes its layout (selectors become invalid). Requires you to inspect the HTML and figure out the correct selectors.
* **Input:** Typically works directly on the raw or cleaned HTML.
2. **The Smart Interpreter (`LLMExtractionStrategy`)**
* **Analogy:** A highly intelligent analyst who can *read and understand* the content. You give them a list of fields you need (a schema) or even just natural language instructions ("Find the product name, its price, and a short description"). They read the content (usually Markdown) and use their understanding of language and context to figure out the values, even if the layout isn't perfectly consistent.
* **How it works:** You provide a desired output schema (e.g., a Pydantic model or a dictionary structure) or a natural language instruction. The strategy sends the content (often the generated Markdown, possibly split into chunks) along with your schema/instruction to a configured Large Language Model (LLM) like GPT or Llama. The LLM reads the text and generates the structured data (usually JSON) according to your request.
* **Pros:** Much more resilient to website layout changes. Can understand context and handle variations. Can extract data based on meaning, not just location.
* **Cons:** Requires setting up access to an LLM (API keys, potentially costs). Can be significantly slower than selector-based methods. The quality of extraction depends on the LLM's capabilities and the clarity of your instructions/schema.
* **Input:** Often works best on the cleaned Markdown representation of the content, but can sometimes use HTML.
## How to Use an `ExtractionStrategy`
You tell the `AsyncWebCrawler` which extraction strategy to use (if any) by setting the `extraction_strategy` parameter within the [CrawlerRunConfig](03_crawlerrunconfig.md) object you pass to `arun` or `arun_many`.
### Example 1: Extracting Data with `JsonCssExtractionStrategy`
Let's imagine we want to extract the title (from the `<h1>` tag) and the main heading (from the `<h1>` tag) of the simple `httpbin.org/html` page.
```python
# chapter6_example_1.py
import asyncio
import json
from crawl4ai import (
AsyncWebCrawler,
CrawlerRunConfig,
JsonCssExtractionStrategy # Import the CSS strategy
)
async def main():
# 1. Define the extraction schema (Field Name -> CSS Selector)
extraction_schema = {
"baseSelector": "body", # Operate within the body tag
"fields": [
{"name": "page_title", "selector": "title", "type": "text"},
{"name": "main_heading", "selector": "h1", "type": "text"}
]
}
print("Extraction Schema defined using CSS selectors.")
# 2. Create an instance of the strategy with the schema
css_extractor = JsonCssExtractionStrategy(schema=extraction_schema)
print(f"Using strategy: {css_extractor.__class__.__name__}")
# 3. Create CrawlerRunConfig and set the extraction_strategy
run_config = CrawlerRunConfig(
extraction_strategy=css_extractor
)
# 4. Run the crawl
async with AsyncWebCrawler() as crawler:
url_to_crawl = "https://httpbin.org/html"
print(f"\nCrawling {url_to_crawl} to extract structured data...")
result = await crawler.arun(url=url_to_crawl, config=run_config)
if result.success and result.extracted_content:
print("\nExtraction successful!")
# The extracted data is stored as a JSON string in result.extracted_content
# Parse the JSON string to work with the data as a Python object
extracted_data = json.loads(result.extracted_content)
print("Extracted Data:")
# Print the extracted data nicely formatted
print(json.dumps(extracted_data, indent=2))
elif result.success:
print("\nCrawl successful, but no structured data extracted.")
else:
print(f"\nCrawl failed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. **Schema Definition:** We create a Python dictionary `extraction_schema`.
* `baseSelector: "body"` tells the strategy to look for items within the `<body>` tag of the HTML.
* `fields` is a list of dictionaries, each defining a field to extract:
* `name`: The key for this field in the output JSON (e.g., "page_title").
* `selector`: The CSS selector to find the element containing the data (e.g., "title" finds the `<title>` tag, "h1" finds the `<h1>` tag).
* `type`: How to get the data from the selected element (`"text"` means get the text content).
2. **Instantiate Strategy:** We create an instance of `JsonCssExtractionStrategy`, passing our `extraction_schema`. This strategy knows its input format should be HTML.
3. **Configure Run:** We create a `CrawlerRunConfig` and assign our `css_extractor` instance to the `extraction_strategy` parameter.
4. **Crawl:** We run `crawler.arun`. After fetching and basic scraping, the `AsyncWebCrawler` will see the `extraction_strategy` in the config and call our `css_extractor`.
5. **Result:** The `CrawlResult` object now contains a field called `extracted_content`. This field holds the structured data found by the strategy, formatted as a **JSON string**. We use `json.loads()` to convert this string back into a Python list/dictionary.
**Expected Output (Conceptual):**
```
Extraction Schema defined using CSS selectors.
Using strategy: JsonCssExtractionStrategy
Crawling https://httpbin.org/html to extract structured data...
Extraction successful!
Extracted Data:
[
{
"page_title": "Herman Melville - Moby-Dick",
"main_heading": "Moby Dick"
}
]
```
*(Note: The actual output is a list containing one dictionary because `baseSelector: "body"` matches one element, and we extract fields relative to that.)*
### Example 2: Extracting Data with `LLMExtractionStrategy` (Conceptual)
Now, let's imagine we want the same information (title, heading) but using an AI. We'll provide a schema describing what we want. (Note: This requires setting up LLM access separately, e.g., API keys).
```python
# chapter6_example_2.py
import asyncio
import json
from crawl4ai import (
AsyncWebCrawler,
CrawlerRunConfig,
LLMExtractionStrategy, # Import the LLM strategy
LlmConfig # Import LLM configuration helper
)
# Assume llm_config is properly configured with provider, API key, etc.
# This is just a placeholder - replace with your actual LLM setup
# E.g., llm_config = LlmConfig(provider="openai", api_token="env:OPENAI_API_KEY")
class MockLlmConfig: provider="mock"; api_token="mock"; base_url=None
llm_config = MockLlmConfig()
async def main():
# 1. Define the desired output schema (what fields we want)
# This helps guide the LLM.
output_schema = {
"page_title": "string",
"main_heading": "string"
}
print("Extraction Schema defined for LLM.")
# 2. Create an instance of the LLM strategy
# We pass the schema and the LLM configuration.
# We also specify input_format='markdown' (common for LLMs).
llm_extractor = LLMExtractionStrategy(
schema=output_schema,
llmConfig=llm_config, # Pass the LLM provider details
input_format="markdown" # Tell it to read the Markdown content
)
print(f"Using strategy: {llm_extractor.__class__.__name__}")
print(f"LLM Provider (mocked): {llm_config.provider}")
# 3. Create CrawlerRunConfig with the strategy
run_config = CrawlerRunConfig(
extraction_strategy=llm_extractor
)
# 4. Run the crawl
async with AsyncWebCrawler() as crawler:
url_to_crawl = "https://httpbin.org/html"
print(f"\nCrawling {url_to_crawl} using LLM to extract...")
# This would make calls to the configured LLM API
result = await crawler.arun(url=url_to_crawl, config=run_config)
if result.success and result.extracted_content:
print("\nExtraction successful (using LLM)!")
# Extracted data is a JSON string
try:
extracted_data = json.loads(result.extracted_content)
print("Extracted Data:")
print(json.dumps(extracted_data, indent=2))
except json.JSONDecodeError:
print("Could not parse LLM output as JSON:")
print(result.extracted_content)
elif result.success:
print("\nCrawl successful, but no structured data extracted by LLM.")
# This might happen if the mock LLM doesn't return valid JSON
# or if the content was too small/irrelevant for extraction.
else:
print(f"\nCrawl failed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. **Schema Definition:** We define a simple dictionary `output_schema` telling the LLM we want fields named "page_title" and "main_heading", both expected to be strings.
2. **Instantiate Strategy:** We create `LLMExtractionStrategy`, passing:
* `schema=output_schema`: Our desired output structure.
* `llmConfig=llm_config`: The configuration telling the strategy *which* LLM to use and how to authenticate (here, it's mocked).
* `input_format="markdown"`: Instructs the strategy to feed the generated Markdown content (from `result.markdown.raw_markdown`) to the LLM, which is often easier for LLMs to parse than raw HTML.
3. **Configure Run & Crawl:** Same as before, we set the `extraction_strategy` in `CrawlerRunConfig` and run the crawl.
4. **Result:** The `AsyncWebCrawler` calls the `llm_extractor`. The strategy sends the Markdown content and the schema instructions to the configured LLM. The LLM analyzes the text and (hopefully) returns a JSON object matching the schema. This JSON is stored as a string in `result.extracted_content`.
**Expected Output (Conceptual, with a real LLM):**
```
Extraction Schema defined for LLM.
Using strategy: LLMExtractionStrategy
LLM Provider (mocked): mock
Crawling https://httpbin.org/html using LLM to extract...
Extraction successful (using LLM)!
Extracted Data:
[
{
"page_title": "Herman Melville - Moby-Dick",
"main_heading": "Moby Dick"
}
]
```
*(Note: LLM output format might vary slightly, but it aims to match the requested schema based on the content it reads.)*
## How It Works Inside (Under the Hood)
When you provide an `extraction_strategy` in the `CrawlerRunConfig`, how does `AsyncWebCrawler` use it?
1. **Fetch & Scrape:** The crawler fetches the raw HTML ([AsyncCrawlerStrategy](01_asynccrawlerstrategy.md)) and performs initial cleaning/scraping ([ContentScrapingStrategy](04_contentscrapingstrategy.md)) to get `cleaned_html`, links, etc.
2. **Markdown Generation:** It usually generates Markdown representation ([DefaultMarkdownGenerator](05_relevantcontentfilter.md#how-relevantcontentfilter-is-used-via-markdown-generation)).
3. **Check for Strategy:** The `AsyncWebCrawler` (specifically in its internal `aprocess_html` method) checks if `config.extraction_strategy` is set.
4. **Execute Strategy:** If a strategy exists:
* It determines the required input format (e.g., "html" for `JsonCssExtractionStrategy`, "markdown" for `LLMExtractionStrategy` based on its `input_format` attribute).
* It retrieves the corresponding content (e.g., `result.cleaned_html` or `result.markdown.raw_markdown`).
* If the content is long and the strategy supports chunking (like `LLMExtractionStrategy`), it might first split the content into smaller chunks.
* It calls the strategy's `run` method, passing the content chunk(s).
* The strategy performs its logic (applying selectors, calling LLM API).
* The strategy returns the extracted data (typically as a list of dictionaries).
5. **Store Result:** The `AsyncWebCrawler` converts the returned structured data into a JSON string and stores it in `CrawlResult.extracted_content`.
Here's a simplified view:
```mermaid
sequenceDiagram
participant User
participant AWC as AsyncWebCrawler
participant Config as CrawlerRunConfig
participant Processor as HTML Processing
participant Extractor as ExtractionStrategy
participant Result as CrawlResult
User->>AWC: arun(url, config=my_config)
Note over AWC: Config includes an Extraction Strategy
AWC->>Processor: Process HTML (scrape, generate markdown)
Processor-->>AWC: Processed Content (HTML, Markdown)
AWC->>Extractor: Run extraction on content (using Strategy's input format)
Note over Extractor: Applying logic (CSS, XPath, LLM...)
Extractor-->>AWC: Structured Data (List[Dict])
AWC->>AWC: Convert data to JSON String
AWC->>Result: Store JSON String in extracted_content
AWC-->>User: Return CrawlResult
```
### Code Glimpse (`extraction_strategy.py`)
Inside the `crawl4ai` library, the file `extraction_strategy.py` defines the blueprint and the implementations.
**The Blueprint (Abstract Base Class):**
```python
# Simplified from crawl4ai/extraction_strategy.py
from abc import ABC, abstractmethod
from typing import List, Dict, Any
class ExtractionStrategy(ABC):
"""Abstract base class for all extraction strategies."""
def __init__(self, input_format: str = "markdown", **kwargs):
self.input_format = input_format # e.g., 'html', 'markdown'
# ... other common init ...
@abstractmethod
def extract(self, url: str, content_chunk: str, *q, **kwargs) -> List[Dict[str, Any]]:
"""Extract structured data from a single chunk of content."""
pass
def run(self, url: str, sections: List[str], *q, **kwargs) -> List[Dict[str, Any]]:
"""Process content sections (potentially chunked) and call extract."""
# Default implementation might process sections in parallel or sequentially
all_extracted_data = []
for section in sections:
all_extracted_data.extend(self.extract(url, section, **kwargs))
return all_extracted_data
```
**Example Implementation (`JsonCssExtractionStrategy`):**
```python
# Simplified from crawl4ai/extraction_strategy.py
from bs4 import BeautifulSoup # Uses BeautifulSoup for CSS selectors
class JsonCssExtractionStrategy(ExtractionStrategy):
def __init__(self, schema: Dict[str, Any], **kwargs):
# Force input format to HTML for CSS selectors
super().__init__(input_format="html", **kwargs)
self.schema = schema # Store the user-defined schema
def extract(self, url: str, html_content: str, *q, **kwargs) -> List[Dict[str, Any]]:
# Parse the HTML content chunk
soup = BeautifulSoup(html_content, "html.parser")
extracted_items = []
# Find base elements defined in the schema
base_elements = soup.select(self.schema.get("baseSelector", "body"))
for element in base_elements:
item = {}
# Extract fields based on schema selectors and types
fields_to_extract = self.schema.get("fields", [])
for field_def in fields_to_extract:
try:
# Find the specific sub-element using CSS selector
target_element = element.select_one(field_def["selector"])
if target_element:
if field_def["type"] == "text":
item[field_def["name"]] = target_element.get_text(strip=True)
elif field_def["type"] == "attribute":
item[field_def["name"]] = target_element.get(field_def["attribute"])
# ... other types like 'html', 'list', 'nested' ...
except Exception as e:
# Handle errors, maybe log them if verbose
pass
if item:
extracted_items.append(item)
return extracted_items
# run() method likely uses the default implementation from base class
```
**Example Implementation (`LLMExtractionStrategy`):**
```python
# Simplified from crawl4ai/extraction_strategy.py
# Needs imports for LLM interaction (e.g., perform_completion_with_backoff)
from .utils import perform_completion_with_backoff, chunk_documents, escape_json_string
from .prompts import PROMPT_EXTRACT_SCHEMA_WITH_INSTRUCTION # Example prompt
class LLMExtractionStrategy(ExtractionStrategy):
def __init__(self, schema: Dict = None, instruction: str = None, llmConfig=None, input_format="markdown", **kwargs):
super().__init__(input_format=input_format, **kwargs)
self.schema = schema
self.instruction = instruction
self.llmConfig = llmConfig # Contains provider, API key, etc.
# ... other LLM specific setup ...
def extract(self, url: str, content_chunk: str, *q, **kwargs) -> List[Dict[str, Any]]:
# Prepare the prompt for the LLM
prompt = self._build_llm_prompt(url, content_chunk)
# Call the LLM API
response = perform_completion_with_backoff(
provider=self.llmConfig.provider,
prompt_with_variables=prompt,
api_token=self.llmConfig.api_token,
base_url=self.llmConfig.base_url,
json_response=True # Often expect JSON from LLM for extraction
# ... pass other necessary args ...
)
# Parse the LLM's response (which should ideally be JSON)
try:
extracted_data = json.loads(response.choices[0].message.content)
# Ensure it's a list
if isinstance(extracted_data, dict):
extracted_data = [extracted_data]
return extracted_data
except Exception as e:
# Handle LLM response parsing errors
print(f"Error parsing LLM response: {e}")
return [{"error": "Failed to parse LLM output", "raw_output": response.choices[0].message.content}]
def _build_llm_prompt(self, url: str, content_chunk: str) -> str:
# Logic to construct the prompt using self.schema or self.instruction
# and the content_chunk. Example:
prompt_template = PROMPT_EXTRACT_SCHEMA_WITH_INSTRUCTION # Choose appropriate prompt
variable_values = {
"URL": url,
"CONTENT": escape_json_string(content_chunk), # Send Markdown or HTML chunk
"SCHEMA": json.dumps(self.schema) if self.schema else "{}",
"REQUEST": self.instruction if self.instruction else "Extract relevant data based on the schema."
}
prompt = prompt_template
for var, val in variable_values.items():
prompt = prompt.replace("{" + var + "}", str(val))
return prompt
# run() method might override the base to handle chunking specifically for LLMs
def run(self, url: str, sections: List[str], *q, **kwargs) -> List[Dict[str, Any]]:
# Potentially chunk sections based on token limits before calling extract
# chunked_content = chunk_documents(sections, ...)
# extracted_data = []
# for chunk in chunked_content:
# extracted_data.extend(self.extract(url, chunk, **kwargs))
# return extracted_data
# Simplified for now:
return super().run(url, sections, *q, **kwargs)
```
## Conclusion
You've learned about `ExtractionStrategy`, Crawl4AI's way of giving instructions to an "Analyst" to pull out specific, structured data from web content.
* It solves the problem of needing precise data points (like product names, prices) in an organized format, not just blocks of text.
* You can choose your "Analyst":
* **Precise Locators (`JsonCssExtractionStrategy`, `JsonXPathExtractionStrategy`):** Use exact CSS/XPath selectors defined in a schema. Fast but brittle.
* **Smart Interpreter (`LLMExtractionStrategy`):** Uses an AI (LLM) guided by a schema or instructions. More flexible but slower and needs setup.
* You configure the desired strategy within the [CrawlerRunConfig](03_crawlerrunconfig.md).
* The extracted structured data is returned as a JSON string in the `CrawlResult.extracted_content` field.
Now that we understand how to fetch, clean, filter, and extract data, let's put it all together and look at the final package that Crawl4AI delivers after a crawl.
**Next:** Let's dive into the details of the output with [Chapter 7: Understanding the Results - CrawlResult](07_crawlresult.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,341 @@
# Chapter 7: Understanding the Results - CrawlResult
In the previous chapter, [Chapter 6: Getting Specific Data - ExtractionStrategy](06_extractionstrategy.md), we learned how to teach Crawl4AI to act like an analyst, extracting specific, structured data points from a webpage using an `ExtractionStrategy`. We've seen how Crawl4AI can fetch pages, clean them, filter them, and even extract precise information.
But after all that work, where does all the gathered information go? When you ask the `AsyncWebCrawler` to crawl a URL using `arun()`, what do you actually get back?
## What Problem Does `CrawlResult` Solve?
Imagine you sent a research assistant to the library (a website) with a set of instructions: "Find this book (URL), make a clean copy of the relevant chapter (clean HTML/Markdown), list all the cited references (links), take photos of the illustrations (media), find the author and publication date (metadata), and maybe extract specific quotes (structured data)."
When the assistant returns, they wouldn't just hand you a single piece of paper. They'd likely give you a folder containing everything you asked for: the clean copy, the list of references, the photos, the metadata notes, and the extracted quotes, all neatly organized. They might also include a note if they encountered any problems (errors).
`CrawlResult` is exactly this **final report folder** or **delivery package**. It's a single object that neatly contains *all* the information Crawl4AI gathered and processed for a specific URL during a crawl operation. Instead of getting lots of separate pieces of data back, you get one convenient container.
## What is `CrawlResult`?
`CrawlResult` is a Python object (specifically, a Pydantic model, which is like a super-powered dictionary) that acts as a data container. It holds the results of a single crawl task performed by `AsyncWebCrawler.arun()` or one of the results from `arun_many()`.
Think of it as a toolbox filled with different tools and information related to the crawled page.
**Key Information Stored in `CrawlResult`:**
* **`url` (string):** The original URL that was requested.
* **`success` (boolean):** Did the crawl complete without critical errors? `True` if successful, `False` otherwise. **Always check this first!**
* **`html` (string):** The raw, original HTML source code fetched from the page.
* **`cleaned_html` (string):** The HTML after initial cleaning by the [ContentScrapingStrategy](04_contentscrapingstrategy.md) (e.g., scripts, styles removed).
* **`markdown` (object):** An object containing different Markdown representations of the content.
* `markdown.raw_markdown`: Basic Markdown generated from `cleaned_html`.
* `markdown.fit_markdown`: Markdown generated *only* from content deemed relevant by a [RelevantContentFilter](05_relevantcontentfilter.md) (if one was used). Might be empty if no filter was applied.
* *(Other fields like `markdown_with_citations` might exist)*
* **`extracted_content` (string):** If you used an [ExtractionStrategy](06_extractionstrategy.md), this holds the extracted structured data, usually formatted as a JSON string. `None` if no extraction was performed or nothing was found.
* **`metadata` (dictionary):** Information extracted from the page's metadata tags, like the page title (`metadata['title']`), description, keywords, etc.
* **`links` (object):** Contains lists of links found on the page.
* `links.internal`: List of links pointing to the same website.
* `links.external`: List of links pointing to other websites.
* **`media` (object):** Contains lists of media items found.
* `media.images`: List of images (`<img>` tags).
* `media.videos`: List of videos (`<video>` tags).
* *(Other media types might be included)*
* **`screenshot` (string):** If you requested a screenshot (`screenshot=True` in `CrawlerRunConfig`), this holds the file path to the saved image. `None` otherwise.
* **`pdf` (bytes):** If you requested a PDF (`pdf=True` in `CrawlerRunConfig`), this holds the PDF data as bytes. `None` otherwise. (Note: Previously might have been a path, now often bytes).
* **`error_message` (string):** If `success` is `False`, this field usually contains details about what went wrong.
* **`status_code` (integer):** The HTTP status code received from the server (e.g., 200 for OK, 404 for Not Found).
* **`response_headers` (dictionary):** The HTTP response headers sent by the server.
* **`redirected_url` (string):** If the original URL redirected, this shows the final URL the crawler landed on.
## Accessing the `CrawlResult`
You get a `CrawlResult` object back every time you `await` a call to `crawler.arun()`:
```python
# chapter7_example_1.py
import asyncio
from crawl4ai import AsyncWebCrawler
async def main():
async with AsyncWebCrawler() as crawler:
url = "https://httpbin.org/html"
print(f"Crawling {url}...")
# The 'arun' method returns a CrawlResult object
result: CrawlResult = await crawler.arun(url=url) # Type hint optional
print("Crawl finished!")
# Now 'result' holds all the information
print(f"Result object type: {type(result)}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. We call `crawler.arun(url=url)`.
2. The `await` keyword pauses execution until the crawl is complete.
3. The value returned by `arun` is assigned to the `result` variable.
4. This `result` variable is our `CrawlResult` object.
If you use `crawler.arun_many()`, it returns a list where each item is a `CrawlResult` object for one of the requested URLs (or an async generator if `stream=True`).
## Exploring the Attributes: Using the Toolbox
Once you have the `result` object, you can access its attributes using dot notation (e.g., `result.success`, `result.markdown`).
**1. Checking for Success (Most Important!)**
Before you try to use any data, always check if the crawl was successful:
```python
# chapter7_example_2.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlResult # Import CrawlResult for type hint
async def main():
async with AsyncWebCrawler() as crawler:
url = "https://httpbin.org/html" # A working URL
# url = "https://httpbin.org/status/404" # Try this URL to see failure
result: CrawlResult = await crawler.arun(url=url)
# --- ALWAYS CHECK 'success' FIRST! ---
if result.success:
print(f"✅ Successfully crawled: {result.url}")
# Now it's safe to access other attributes
print(f" Page Title: {result.metadata.get('title', 'N/A')}")
else:
print(f"❌ Failed to crawl: {result.url}")
print(f" Error: {result.error_message}")
print(f" Status Code: {result.status_code}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
* We use an `if result.success:` block.
* If `True`, we proceed to access other data like `result.metadata`.
* If `False`, we print the `result.error_message` and `result.status_code` to understand why it failed.
**2. Accessing Content (HTML, Markdown)**
```python
# chapter7_example_3.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlResult
async def main():
async with AsyncWebCrawler() as crawler:
url = "https://httpbin.org/html"
result: CrawlResult = await crawler.arun(url=url)
if result.success:
print("--- Content ---")
# Print the first 150 chars of raw HTML
print(f"Raw HTML snippet: {result.html[:150]}...")
# Access the raw markdown
if result.markdown: # Check if markdown object exists
print(f"Markdown snippet: {result.markdown.raw_markdown[:150]}...")
else:
print("Markdown not generated.")
else:
print(f"Crawl failed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
* We access `result.html` for the original HTML.
* We access `result.markdown.raw_markdown` for the main Markdown content. Note the two dots: `result.markdown` gives the `MarkdownGenerationResult` object, and `.raw_markdown` accesses the specific string within it. We also check `if result.markdown:` first, just in case markdown generation failed for some reason.
**3. Getting Metadata, Links, and Media**
```python
# chapter7_example_4.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlResult
async def main():
async with AsyncWebCrawler() as crawler:
url = "https://httpbin.org/links/10/0" # A page with links
result: CrawlResult = await crawler.arun(url=url)
if result.success:
print("--- Metadata & Links ---")
print(f"Title: {result.metadata.get('title', 'N/A')}")
print(f"Found {len(result.links.internal)} internal links.")
print(f"Found {len(result.links.external)} external links.")
if result.links.internal:
print(f" First internal link text: '{result.links.internal[0].text}'")
# Similarly access result.media.images etc.
else:
print(f"Crawl failed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
* `result.metadata` is a dictionary; use `.get()` for safe access.
* `result.links` and `result.media` are objects containing lists (`internal`, `external`, `images`, etc.). We can check their lengths (`len()`) and access individual items by index (e.g., `[0]`).
**4. Checking for Extracted Data, Screenshots, PDFs**
```python
# chapter7_example_5.py
import asyncio
import json
from crawl4ai import (
AsyncWebCrawler, CrawlResult, CrawlerRunConfig,
JsonCssExtractionStrategy # Example extractor
)
async def main():
# Define a simple extraction strategy (from Chapter 6)
schema = {"baseSelector": "body", "fields": [{"name": "heading", "selector": "h1", "type": "text"}]}
extractor = JsonCssExtractionStrategy(schema=schema)
# Configure the run to extract and take a screenshot
config = CrawlerRunConfig(
extraction_strategy=extractor,
screenshot=True
)
async with AsyncWebCrawler() as crawler:
url = "https://httpbin.org/html"
result: CrawlResult = await crawler.arun(url=url, config=config)
if result.success:
print("--- Extracted Data & Media ---")
# Check if structured data was extracted
if result.extracted_content:
print("Extracted Data found:")
data = json.loads(result.extracted_content) # Parse the JSON string
print(json.dumps(data, indent=2))
else:
print("No structured data extracted.")
# Check if a screenshot was taken
if result.screenshot:
print(f"Screenshot saved to: {result.screenshot}")
else:
print("Screenshot not taken.")
# Check for PDF (would be bytes if requested and successful)
if result.pdf:
print(f"PDF data captured ({len(result.pdf)} bytes).")
else:
print("PDF not generated.")
else:
print(f"Crawl failed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
* We check if `result.extracted_content` is not `None` or empty before trying to parse it as JSON.
* We check if `result.screenshot` is not `None` to see if the file path exists.
* We check if `result.pdf` is not `None` to see if the PDF data (bytes) was captured.
## How is `CrawlResult` Created? (Under the Hood)
You don't interact with the `CrawlResult` constructor directly. The `AsyncWebCrawler` creates it for you at the very end of the `arun` process, typically inside its internal `aprocess_html` method (or just before returning if fetching from cache).
Heres a simplified sequence:
1. **Fetch:** `AsyncWebCrawler` calls the [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md) to get the raw `html`, `status_code`, `response_headers`, etc.
2. **Scrape:** It passes the `html` to the [ContentScrapingStrategy](04_contentscrapingstrategy.md) to get `cleaned_html`, `links`, `media`, `metadata`.
3. **Markdown:** It generates Markdown using the configured generator, possibly involving a [RelevantContentFilter](05_relevantcontentfilter.md), resulting in a `MarkdownGenerationResult` object.
4. **Extract (Optional):** If an [ExtractionStrategy](06_extractionstrategy.md) is configured, it runs it on the appropriate content (HTML or Markdown) to get `extracted_content`.
5. **Screenshot/PDF (Optional):** If requested, the fetching strategy captures the `screenshot` path or `pdf` data.
6. **Package:** `AsyncWebCrawler` gathers all these pieces (`url`, `html`, `cleaned_html`, the markdown object, `links`, `media`, `metadata`, `extracted_content`, `screenshot`, `pdf`, `success` status, `error_message`, etc.).
7. **Instantiate:** It creates the `CrawlResult` object, passing all the gathered data into its constructor.
8. **Return:** It returns this fully populated `CrawlResult` object to your code.
## Code Glimpse (`models.py`)
The `CrawlResult` is defined in the `crawl4ai/models.py` file. It uses Pydantic, a library that helps define data structures with type hints and validation. Here's a simplified view:
```python
# Simplified from crawl4ai/models.py
from pydantic import BaseModel, HttpUrl
from typing import List, Dict, Optional, Any
# Other related models (simplified)
class MarkdownGenerationResult(BaseModel):
raw_markdown: str
fit_markdown: Optional[str] = None
# ... other markdown fields ...
class Links(BaseModel):
internal: List[Dict] = []
external: List[Dict] = []
class Media(BaseModel):
images: List[Dict] = []
videos: List[Dict] = []
# The main CrawlResult model
class CrawlResult(BaseModel):
url: str
html: str
success: bool
cleaned_html: Optional[str] = None
media: Media = Media() # Use the Media model
links: Links = Links() # Use the Links model
screenshot: Optional[str] = None
pdf: Optional[bytes] = None
# Uses a private attribute and property for markdown for compatibility
_markdown: Optional[MarkdownGenerationResult] = None # Actual storage
extracted_content: Optional[str] = None # JSON string
metadata: Optional[Dict[str, Any]] = None
error_message: Optional[str] = None
status_code: Optional[int] = None
response_headers: Optional[Dict[str, str]] = None
redirected_url: Optional[str] = None
# ... other fields like session_id, ssl_certificate ...
# Custom property to access markdown data
@property
def markdown(self) -> Optional[MarkdownGenerationResult]:
return self._markdown
# Configuration for Pydantic
class Config:
arbitrary_types_allowed = True
# Custom init and model_dump might exist for backward compatibility handling
# ... (omitted for simplicity) ...
```
**Explanation:**
* It's defined as a `class CrawlResult(BaseModel):`.
* Each attribute (like `url`, `html`, `success`) is defined with a type hint (like `str`, `bool`, `Optional[str]`). `Optional[str]` means the field can be a string or `None`.
* Some attributes are themselves complex objects defined by other Pydantic models (like `media: Media`, `links: Links`).
* The `markdown` field uses a common pattern (property wrapping a private attribute) to provide the `MarkdownGenerationResult` object while maintaining some backward compatibility. You access it simply as `result.markdown`.
## Conclusion
You've now met the `CrawlResult` object the final, comprehensive report delivered by Crawl4AI after processing a URL.
* It acts as a **container** holding all gathered information (HTML, Markdown, metadata, links, media, extracted data, errors, etc.).
* It's the **return value** of `AsyncWebCrawler.arun()` and `arun_many()`.
* The most crucial attribute is **`success` (boolean)**, which you should always check first.
* You can easily **access** all the different pieces of information using dot notation (e.g., `result.metadata['title']`, `result.markdown.raw_markdown`, `result.links.external`).
Understanding the `CrawlResult` is key to effectively using the information Crawl4AI provides.
So far, we've focused on crawling single pages or lists of specific URLs. But what if you want to start at one page and automatically discover and crawl linked pages, exploring a website more deeply?
**Next:** Let's explore how to perform multi-page crawls with [Chapter 8: Exploring Websites - DeepCrawlStrategy](08_deepcrawlstrategy.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,378 @@
# Chapter 8: Exploring Websites - DeepCrawlStrategy
In [Chapter 7: Understanding the Results - CrawlResult](07_crawlresult.md), we saw the final report (`CrawlResult`) that Crawl4AI gives us after processing a single URL. This report contains cleaned content, links, metadata, and maybe even extracted data.
But what if you want to explore a website *beyond* just the first page? Imagine you land on a blog's homepage. You don't just want the homepage content; you want to automatically discover and crawl all the individual blog posts linked from it. How can you tell Crawl4AI to act like an explorer, following links and venturing deeper into the website?
## What Problem Does `DeepCrawlStrategy` Solve?
Think of the `AsyncWebCrawler.arun()` method we've used so far like visiting just the entrance hall of a vast library. You get information about that specific hall, but you don't automatically explore the adjoining rooms or different floors.
What if you want to systematically explore the library? You need a plan:
* Do you explore room by room on the current floor before going upstairs? (Level by level)
* Do you pick one wing and explore all its rooms down to the very end before exploring another wing? (Go deep first)
* Do you have a map highlighting potentially interesting sections and prioritize visiting those first? (Prioritize promising paths)
`DeepCrawlStrategy` provides this **exploration plan**. It defines the logic for how Crawl4AI should discover and crawl new URLs starting from the initial one(s) by following the links it finds on each page. It turns the crawler from a single-page visitor into a website explorer.
## What is `DeepCrawlStrategy`?
`DeepCrawlStrategy` is a concept (a blueprint) in Crawl4AI that represents the **method or logic used to navigate and crawl multiple pages by following links**. It tells the crawler *which links* to follow and in *what order* to visit them.
It essentially takes over the process when you call `arun()` if a deep crawl is requested, managing a queue or list of URLs to visit and coordinating the crawling of those URLs, potentially up to a certain depth or number of pages.
## Different Exploration Plans: The Strategies
Crawl4AI provides several concrete exploration plans (implementations) for `DeepCrawlStrategy`:
1. **`BFSDeepCrawlStrategy` (Level-by-Level Explorer):**
* **Analogy:** Like ripples spreading in a pond.
* **How it works:** It first crawls the starting URL (Level 0). Then, it crawls all the valid links found on that page (Level 1). Then, it crawls all the valid links found on *those* pages (Level 2), and so on. It explores the website layer by layer.
* **Good for:** Finding the shortest path to all reachable pages, getting a broad overview quickly near the start page.
2. **`DFSDeepCrawlStrategy` (Deep Path Explorer):**
* **Analogy:** Like exploring one specific corridor in a maze all the way to the end before backtracking and trying another corridor.
* **How it works:** It starts at the initial URL, follows one link, then follows a link from *that* page, and continues going deeper down one path as far as possible (or until a specified depth limit). Only when it hits a dead end or the limit does it backtrack and try another path.
* **Good for:** Exploring specific branches of a website thoroughly, potentially reaching deeper pages faster than BFS (if the target is down a specific path).
3. **`BestFirstCrawlingStrategy` (Priority Explorer):**
* **Analogy:** Like using a treasure map where some paths are marked as more promising than others.
* **How it works:** This strategy uses a **scoring system**. It looks at all the discovered (but not yet visited) links and assigns a score to each one based on how "promising" it seems (e.g., does the URL contain relevant keywords? Is it from a trusted domain?). It then crawls the link with the *best* score first, regardless of its depth.
* **Good for:** Focusing the crawl on the most relevant or important pages first, especially useful when you can't crawl the entire site and need to prioritize.
**Guiding the Explorer: Filters and Scorers**
Deep crawl strategies often work together with:
* **Filters:** Rules that decide *if* a discovered link should even be considered for crawling. Examples:
* `DomainFilter`: Only follow links within the starting website's domain.
* `URLPatternFilter`: Only follow links matching a specific pattern (e.g., `/blog/posts/...`).
* `ContentTypeFilter`: Avoid following links to non-HTML content like PDFs or images.
* **Scorers:** (Used mainly by `BestFirstCrawlingStrategy`) Rules that assign a score to a potential link to help prioritize it. Examples:
* `KeywordRelevanceScorer`: Scores links higher if the URL contains certain keywords.
* `PathDepthScorer`: Might score links differently based on how deep they are.
These act like instructions for the explorer: "Only explore rooms on this floor (filter)," "Ignore corridors marked 'Staff Only' (filter)," or "Check rooms marked with a star first (scorer)."
## How to Use a `DeepCrawlStrategy`
You enable deep crawling by adding a `DeepCrawlStrategy` instance to your `CrawlerRunConfig`. Let's try exploring a website layer by layer using `BFSDeepCrawlStrategy`, going only one level deep from the start page.
```python
# chapter8_example_1.py
import asyncio
from crawl4ai import (
AsyncWebCrawler,
CrawlerRunConfig,
BFSDeepCrawlStrategy, # 1. Import the desired strategy
DomainFilter # Import a filter to stay on the same site
)
async def main():
# 2. Create an instance of the strategy
# - max_depth=1: Crawl start URL (depth 0) + links found (depth 1)
# - filter_chain: Use DomainFilter to only follow links on the same website
bfs_explorer = BFSDeepCrawlStrategy(
max_depth=1,
filter_chain=[DomainFilter()] # Stay within the initial domain
)
print(f"Strategy: BFS, Max Depth: {bfs_explorer.max_depth}")
# 3. Create CrawlerRunConfig and set the deep_crawl_strategy
# Also set stream=True to get results as they come in.
run_config = CrawlerRunConfig(
deep_crawl_strategy=bfs_explorer,
stream=True # Get results one by one using async for
)
# 4. Run the crawl - arun now handles the deep crawl!
async with AsyncWebCrawler() as crawler:
start_url = "https://httpbin.org/links/10/0" # A page with 10 internal links
print(f"\nStarting deep crawl from: {start_url}...")
crawl_results_generator = await crawler.arun(url=start_url, config=run_config)
crawled_count = 0
# Iterate over the results as they are yielded
async for result in crawl_results_generator:
crawled_count += 1
status = "✅" if result.success else "❌"
depth = result.metadata.get("depth", "N/A")
parent = result.metadata.get("parent_url", "Start")
url_short = result.url.split('/')[-1] # Show last part of URL
print(f" {status} Crawled: {url_short:<6} (Depth: {depth})")
print(f"\nFinished deep crawl. Total pages processed: {crawled_count}")
# Expecting 1 (start URL) + 10 (links) = 11 results
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. **Import:** We import `AsyncWebCrawler`, `CrawlerRunConfig`, `BFSDeepCrawlStrategy`, and `DomainFilter`.
2. **Instantiate Strategy:** We create `BFSDeepCrawlStrategy`.
* `max_depth=1`: We tell it to crawl the starting URL (depth 0) and any valid links it finds on that page (depth 1), but not to go any further.
* `filter_chain=[DomainFilter()]`: We provide a list containing `DomainFilter`. This tells the strategy to only consider following links that point to the same domain as the `start_url`. Links to external sites will be ignored.
3. **Configure Run:** We create a `CrawlerRunConfig` and pass our `bfs_explorer` instance to the `deep_crawl_strategy` parameter. We also set `stream=True` so we can process results as soon as they are ready, rather than waiting for the entire crawl to finish.
4. **Crawl:** We call `await crawler.arun(url=start_url, config=run_config)`. Because the config contains a `deep_crawl_strategy`, `arun` doesn't just crawl the single `start_url`. Instead, it activates the deep crawl logic defined by `BFSDeepCrawlStrategy`.
5. **Process Results:** Since we used `stream=True`, the return value is an asynchronous generator. We use `async for result in crawl_results_generator:` to loop through the `CrawlResult` objects as they are produced by the deep crawl. For each result, we print its status and depth.
You'll see the output showing the crawl starting, then processing the initial page (`links/10/0` at depth 0), followed by the 10 linked pages (e.g., `9`, `8`, ... `0` at depth 1).
## How It Works (Under the Hood)
How does simply putting a strategy in the config change `arun`'s behavior? It involves a bit of Python magic called a **decorator**.
1. **Decorator:** When you create an `AsyncWebCrawler`, its `arun` method is automatically wrapped by a `DeepCrawlDecorator`.
2. **Check Config:** When you call `await crawler.arun(url=..., config=...)`, this decorator checks if `config.deep_crawl_strategy` is set.
3. **Delegate or Run Original:**
* If a strategy **is set**, the decorator *doesn't* run the original single-page crawl logic. Instead, it calls the `arun` method of your chosen `DeepCrawlStrategy` instance (e.g., `bfs_explorer.arun(...)`), passing it the `crawler` itself, the `start_url`, and the `config`.
* If no strategy is set, the decorator simply calls the original `arun` logic to crawl the single page.
4. **Strategy Takes Over:** The `DeepCrawlStrategy`'s `arun` method now manages the crawl.
* It maintains a list or queue of URLs to visit (e.g., `current_level` in BFS, a stack in DFS, a priority queue in BestFirst).
* It repeatedly takes batches of URLs from its list/queue.
* For each batch, it calls `crawler.arun_many(urls=batch_urls, config=batch_config)` (with deep crawling disabled in `batch_config` to avoid infinite loops!).
* As results come back from `arun_many`, the strategy processes them:
* It yields the `CrawlResult` if running in stream mode.
* It extracts links using its `link_discovery` method.
* `link_discovery` uses `can_process_url` (which applies filters) to validate links.
* Valid new links are added to the list/queue for future crawling.
* This continues until the list/queue is empty, the max depth/pages limit is reached, or it's cancelled.
```mermaid
sequenceDiagram
participant User
participant Decorator as DeepCrawlDecorator
participant Strategy as DeepCrawlStrategy (e.g., BFS)
participant AWC as AsyncWebCrawler
User->>Decorator: arun(start_url, config_with_strategy)
Decorator->>Strategy: arun(start_url, crawler=AWC, config)
Note over Strategy: Initialize queue/level with start_url
loop Until Queue Empty or Limits Reached
Strategy->>Strategy: Get next batch of URLs from queue
Note over Strategy: Create batch_config (deep_crawl=None)
Strategy->>AWC: arun_many(batch_urls, config=batch_config)
AWC-->>Strategy: batch_results (List/Stream of CrawlResult)
loop For each result in batch_results
Strategy->>Strategy: Process result (yield if streaming)
Strategy->>Strategy: Discover links (apply filters)
Strategy->>Strategy: Add valid new links to queue
end
end
Strategy-->>Decorator: Final result (List or Generator)
Decorator-->>User: Final result
```
## Code Glimpse
Let's peek at the simplified structure:
**1. The Decorator (`deep_crawling/base_strategy.py`)**
```python
# Simplified from deep_crawling/base_strategy.py
from contextvars import ContextVar
from functools import wraps
# ... other imports
class DeepCrawlDecorator:
deep_crawl_active = ContextVar("deep_crawl_active", default=False)
def __init__(self, crawler: AsyncWebCrawler):
self.crawler = crawler
def __call__(self, original_arun):
@wraps(original_arun)
async def wrapped_arun(url: str, config: CrawlerRunConfig = None, **kwargs):
# Is a strategy present AND not already inside a deep crawl?
if config and config.deep_crawl_strategy and not self.deep_crawl_active.get():
# Mark that we are starting a deep crawl
token = self.deep_crawl_active.set(True)
try:
# Call the STRATEGY's arun method instead of the original
strategy_result = await config.deep_crawl_strategy.arun(
crawler=self.crawler,
start_url=url,
config=config
)
# Handle streaming if needed
if config.stream:
# Return an async generator that resets the context var on exit
async def result_wrapper():
try:
async for result in strategy_result: yield result
finally: self.deep_crawl_active.reset(token)
return result_wrapper()
else:
return strategy_result # Return the list of results directly
finally:
# Reset the context var if not streaming (or handled in wrapper)
if not config.stream: self.deep_crawl_active.reset(token)
else:
# No strategy or already deep crawling, call the original single-page arun
return await original_arun(url, config=config, **kwargs)
return wrapped_arun
```
**2. The Strategy Blueprint (`deep_crawling/base_strategy.py`)**
```python
# Simplified from deep_crawling/base_strategy.py
from abc import ABC, abstractmethod
# ... other imports
class DeepCrawlStrategy(ABC):
@abstractmethod
async def _arun_batch(self, start_url, crawler, config) -> List[CrawlResult]:
# Implementation for non-streaming mode
pass
@abstractmethod
async def _arun_stream(self, start_url, crawler, config) -> AsyncGenerator[CrawlResult, None]:
# Implementation for streaming mode
pass
async def arun(self, start_url, crawler, config) -> RunManyReturn:
# Decides whether to call _arun_batch or _arun_stream
if config.stream:
return self._arun_stream(start_url, crawler, config)
else:
return await self._arun_batch(start_url, crawler, config)
@abstractmethod
async def can_process_url(self, url: str, depth: int) -> bool:
# Applies filters to decide if a URL is valid to crawl
pass
@abstractmethod
async def link_discovery(self, result, source_url, current_depth, visited, next_level, depths):
# Extracts, validates, and prepares links for the next step
pass
@abstractmethod
async def shutdown(self):
# Cleanup logic
pass
```
**3. Example: BFS Implementation (`deep_crawling/bfs_strategy.py`)**
```python
# Simplified from deep_crawling/bfs_strategy.py
# ... imports ...
from .base_strategy import DeepCrawlStrategy # Import the base class
class BFSDeepCrawlStrategy(DeepCrawlStrategy):
def __init__(self, max_depth, filter_chain=None, url_scorer=None, ...):
self.max_depth = max_depth
self.filter_chain = filter_chain or FilterChain() # Use default if none
self.url_scorer = url_scorer
# ... other init ...
self._pages_crawled = 0
async def can_process_url(self, url: str, depth: int) -> bool:
# ... (validation logic using self.filter_chain) ...
is_valid = True # Placeholder
if depth != 0 and not await self.filter_chain.apply(url):
is_valid = False
return is_valid
async def link_discovery(self, result, source_url, current_depth, visited, next_level, depths):
# ... (logic to get links from result.links) ...
links = result.links.get("internal", []) # Example: only internal
for link_data in links:
url = link_data.get("href")
if url and url not in visited:
if await self.can_process_url(url, current_depth + 1):
# Check scoring, max_pages limit etc.
depths[url] = current_depth + 1
next_level.append((url, source_url)) # Add (url, parent) tuple
async def _arun_batch(self, start_url, crawler, config) -> List[CrawlResult]:
visited = set()
current_level = [(start_url, None)] # List of (url, parent_url)
depths = {start_url: 0}
all_results = []
while current_level: # While there are pages in the current level
next_level = []
urls_in_level = [url for url, parent in current_level]
visited.update(urls_in_level)
# Create config for this batch (no deep crawl recursion)
batch_config = config.clone(deep_crawl_strategy=None, stream=False)
# Crawl all URLs in the current level
batch_results = await crawler.arun_many(urls=urls_in_level, config=batch_config)
for result in batch_results:
# Add metadata (depth, parent)
depth = depths.get(result.url, 0)
result.metadata = result.metadata or {}
result.metadata["depth"] = depth
# ... find parent ...
all_results.append(result)
# Discover links for the *next* level
if result.success:
await self.link_discovery(result, result.url, depth, visited, next_level, depths)
current_level = next_level # Move to the next level
return all_results
async def _arun_stream(self, start_url, crawler, config) -> AsyncGenerator[CrawlResult, None]:
# Similar logic to _arun_batch, but uses 'yield result'
# and processes results as they come from arun_many stream
visited = set()
current_level = [(start_url, None)] # List of (url, parent_url)
depths = {start_url: 0}
while current_level:
next_level = []
urls_in_level = [url for url, parent in current_level]
visited.update(urls_in_level)
# Use stream=True for arun_many
batch_config = config.clone(deep_crawl_strategy=None, stream=True)
batch_results_gen = await crawler.arun_many(urls=urls_in_level, config=batch_config)
async for result in batch_results_gen:
# Add metadata
depth = depths.get(result.url, 0)
result.metadata = result.metadata or {}
result.metadata["depth"] = depth
# ... find parent ...
yield result # Yield result immediately
# Discover links for the next level
if result.success:
await self.link_discovery(result, result.url, depth, visited, next_level, depths)
current_level = next_level
# ... shutdown method ...
```
## Conclusion
You've learned about `DeepCrawlStrategy`, the component that turns Crawl4AI into a website explorer!
* It solves the problem of crawling beyond a single starting page by following links.
* It defines the **exploration plan**:
* `BFSDeepCrawlStrategy`: Level by level.
* `DFSDeepCrawlStrategy`: Deep paths first.
* `BestFirstCrawlingStrategy`: Prioritized by score.
* **Filters** and **Scorers** help guide the exploration.
* You enable it by setting `deep_crawl_strategy` in the `CrawlerRunConfig`.
* A decorator mechanism intercepts `arun` calls to activate the strategy.
* The strategy manages the queue of URLs and uses `crawler.arun_many` to crawl them in batches.
Deep crawling allows you to gather information from multiple related pages automatically. But how does Crawl4AI avoid re-fetching the same page over and over again, especially during these deeper crawls? The answer lies in caching.
**Next:** Let's explore how Crawl4AI smartly caches results with [Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode](09_cachecontext___cachemode.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,346 @@
# Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode
In the previous chapter, [Chapter 8: Exploring Websites - DeepCrawlStrategy](08_deepcrawlstrategy.md), we saw how Crawl4AI can explore websites by following links, potentially visiting many pages. During such explorations, or even when you run the same crawl multiple times, the crawler might try to fetch the exact same webpage again and again. This can be slow and might unnecessarily put a load on the website you're crawling. Wouldn't it be smarter to remember the result from the first time and just reuse it?
## What Problem Does Caching Solve?
Imagine you need to download a large instruction manual (a webpage) from the internet.
* **Without Caching:** Every single time you need the manual, you download the entire file again. This takes time and uses bandwidth every time.
* **With Caching:** The first time you download it, you save a copy on your computer (the "cache"). The next time you need it, you first check your local copy. If it's there, you use it instantly! You only download it again if you specifically want the absolute latest version or if your local copy is missing.
Caching in Crawl4AI works the same way. It's a mechanism to **store the results** of crawling a webpage locally (in a database file). When asked to crawl a URL again, Crawl4AI can check its cache first. If a valid result is already stored, it can return that saved result almost instantly, saving time and resources.
## Introducing `CacheMode` and `CacheContext`
Crawl4AI uses two key concepts to manage this caching behavior:
1. **`CacheMode` (The Cache Policy):**
* Think of this like setting the rules for how you interact with your saved instruction manuals.
* It's an **instruction** you give the crawler for a specific run, telling it *how* to use the cache.
* **Analogy:** Should you *always* use your saved copy if you have one? (`ENABLED`) Should you *ignore* your saved copies and always download a fresh one? (`BYPASS`) Should you *never* save any copies? (`DISABLED`) Should you save new copies but never reuse old ones? (`WRITE_ONLY`)
* `CacheMode` lets you choose the caching behavior that best fits your needs for a particular task.
2. **`CacheContext` (The Decision Maker):**
* This is an internal helper that Crawl4AI uses *during* a crawl. You don't usually interact with it directly.
* It looks at the `CacheMode` you provided (the policy) and the type of URL being processed.
* **Analogy:** Imagine a librarian who checks the library's borrowing rules (`CacheMode`) and the type of item you're requesting (e.g., a reference book that can't be checked out, like `raw:` HTML which isn't cached). Based on these, the librarian (`CacheContext`) decides if you can borrow an existing copy (read from cache) or if a new copy should be added to the library (write to cache).
* It helps the main `AsyncWebCrawler` make the right decision about reading from or writing to the cache for each specific URL based on the active policy.
## Setting the Cache Policy: Using `CacheMode`
You control the caching behavior by setting the `cache_mode` parameter within the `CrawlerRunConfig` object that you pass to `crawler.arun()` or `crawler.arun_many()`.
Let's explore the most common `CacheMode` options:
**1. `CacheMode.ENABLED` (The Default Behavior - If not specified)**
* **Policy:** "Use the cache if a valid result exists. If not, fetch the page, save the result to the cache, and then return it."
* This is the standard, balanced approach. It saves time on repeated crawls but ensures you get the content eventually.
* *Note: In recent versions, the default if `cache_mode` is left completely unspecified might be `CacheMode.BYPASS`. Always check the documentation or explicitly set the mode for clarity.* For this tutorial, let's assume we explicitly set it.
```python
# chapter9_example_1.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
async def main():
url = "https://httpbin.org/html"
async with AsyncWebCrawler() as crawler:
# Explicitly set the mode to ENABLED
config_enabled = CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
print(f"Running with CacheMode: {config_enabled.cache_mode.name}")
# First run: Fetches, caches, and returns result
print("First run (ENABLED)...")
result1 = await crawler.arun(url=url, config=config_enabled)
print(f"Got result 1? {'Yes' if result1.success else 'No'}")
# Second run: Finds result in cache and returns it instantly
print("Second run (ENABLED)...")
result2 = await crawler.arun(url=url, config=config_enabled)
print(f"Got result 2? {'Yes' if result2.success else 'No'}")
# This second run should be much faster!
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
* We create a `CrawlerRunConfig` with `cache_mode=CacheMode.ENABLED`.
* The first `arun` call fetches the page from the web and saves the result in the cache.
* The second `arun` call (for the same URL and config affecting cache key) finds the saved result in the cache and returns it immediately, skipping the web fetch.
**2. `CacheMode.BYPASS`**
* **Policy:** "Ignore any existing saved copy. Always fetch a fresh copy from the web. After fetching, save this new result to the cache (overwriting any old one)."
* Useful when you *always* need the absolute latest version of the page, but you still want to update the cache for potential future use with `CacheMode.ENABLED`.
```python
# chapter9_example_2.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
import time
async def main():
url = "https://httpbin.org/html"
async with AsyncWebCrawler() as crawler:
# Set the mode to BYPASS
config_bypass = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
print(f"Running with CacheMode: {config_bypass.cache_mode.name}")
# First run: Fetches, caches, and returns result
print("First run (BYPASS)...")
start_time = time.perf_counter()
result1 = await crawler.arun(url=url, config=config_bypass)
duration1 = time.perf_counter() - start_time
print(f"Got result 1? {'Yes' if result1.success else 'No'} (took {duration1:.2f}s)")
# Second run: Ignores cache, fetches again, updates cache, returns result
print("Second run (BYPASS)...")
start_time = time.perf_counter()
result2 = await crawler.arun(url=url, config=config_bypass)
duration2 = time.perf_counter() - start_time
print(f"Got result 2? {'Yes' if result2.success else 'No'} (took {duration2:.2f}s)")
# Both runs should take a similar amount of time (fetching time)
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
* We set `cache_mode=CacheMode.BYPASS`.
* Both the first and second `arun` calls will fetch the page directly from the web, ignoring any previously cached result. They will still write the newly fetched result to the cache. Notice both runs take roughly the same amount of time (network fetch time).
**3. `CacheMode.DISABLED`**
* **Policy:** "Completely ignore the cache. Never read from it, never write to it."
* Useful when you don't want Crawl4AI to interact with the cache files at all, perhaps for debugging or if you have storage constraints.
```python
# chapter9_example_3.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
import time
async def main():
url = "https://httpbin.org/html"
async with AsyncWebCrawler() as crawler:
# Set the mode to DISABLED
config_disabled = CrawlerRunConfig(cache_mode=CacheMode.DISABLED)
print(f"Running with CacheMode: {config_disabled.cache_mode.name}")
# First run: Fetches, returns result (does NOT cache)
print("First run (DISABLED)...")
start_time = time.perf_counter()
result1 = await crawler.arun(url=url, config=config_disabled)
duration1 = time.perf_counter() - start_time
print(f"Got result 1? {'Yes' if result1.success else 'No'} (took {duration1:.2f}s)")
# Second run: Fetches again, returns result (does NOT cache)
print("Second run (DISABLED)...")
start_time = time.perf_counter()
result2 = await crawler.arun(url=url, config=config_disabled)
duration2 = time.perf_counter() - start_time
print(f"Got result 2? {'Yes' if result2.success else 'No'} (took {duration2:.2f}s)")
# Both runs fetch fresh, and nothing is ever saved to the cache.
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
* We set `cache_mode=CacheMode.DISABLED`.
* Both `arun` calls fetch fresh content from the web. Crucially, neither run reads from nor writes to the cache database.
**Other Modes (`READ_ONLY`, `WRITE_ONLY`):**
* `CacheMode.READ_ONLY`: Only uses existing cached results. If a result isn't in the cache, it will fail or return an empty result rather than fetching it. Never saves anything new.
* `CacheMode.WRITE_ONLY`: Never reads from the cache (always fetches fresh). It *only* writes the newly fetched result to the cache.
## How Caching Works Internally
When you call `crawler.arun(url="...", config=...)`:
1. **Create Context:** The `AsyncWebCrawler` creates a `CacheContext` instance using the `url` and the `config.cache_mode`.
2. **Check Read:** It asks the `CacheContext`, "Should I read from the cache?" (`cache_context.should_read()`).
3. **Try Reading:** If `should_read()` is `True`, it asks the database manager ([`AsyncDatabaseManager`](async_database.py)) to look for a cached result for the `url`.
4. **Cache Hit?**
* If a valid cached result is found: The `AsyncWebCrawler` returns this cached `CrawlResult` immediately. Done!
* If no cached result is found (or if `should_read()` was `False`): Proceed to fetching.
5. **Fetch:** The `AsyncWebCrawler` calls the appropriate [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md) to fetch the content from the web.
6. **Process:** It processes the fetched HTML (scraping, filtering, extracting) to create a new `CrawlResult`.
7. **Check Write:** It asks the `CacheContext`, "Should I write this result to the cache?" (`cache_context.should_write()`).
8. **Write Cache:** If `should_write()` is `True`, it tells the database manager to save the new `CrawlResult` into the cache database.
9. **Return:** The `AsyncWebCrawler` returns the newly created `CrawlResult`.
```mermaid
sequenceDiagram
participant User
participant AWC as AsyncWebCrawler
participant Ctx as CacheContext
participant DB as DatabaseManager
participant Fetcher as AsyncCrawlerStrategy
User->>AWC: arun(url, config)
AWC->>Ctx: Create CacheContext(url, config.cache_mode)
AWC->>Ctx: should_read()?
alt Cache Read Allowed
Ctx-->>AWC: Yes
AWC->>DB: aget_cached_url(url)
DB-->>AWC: Cached Result (or None)
alt Cache Hit & Valid
AWC-->>User: Return Cached CrawlResult
else Cache Miss or Invalid
AWC->>AWC: Proceed to Fetch
end
else Cache Read Not Allowed
Ctx-->>AWC: No
AWC->>AWC: Proceed to Fetch
end
Note over AWC: Fetching Required
AWC->>Fetcher: crawl(url, config)
Fetcher-->>AWC: Raw Response
AWC->>AWC: Process HTML -> New CrawlResult
AWC->>Ctx: should_write()?
alt Cache Write Allowed
Ctx-->>AWC: Yes
AWC->>DB: acache_url(New CrawlResult)
DB-->>AWC: OK
else Cache Write Not Allowed
Ctx-->>AWC: No
end
AWC-->>User: Return New CrawlResult
```
## Code Glimpse
Let's look at simplified code snippets.
**Inside `async_webcrawler.py` (where `arun` uses caching):**
```python
# Simplified from crawl4ai/async_webcrawler.py
from .cache_context import CacheContext, CacheMode
from .async_database import async_db_manager
from .models import CrawlResult
# ... other imports
class AsyncWebCrawler:
# ... (init, other methods) ...
async def arun(self, url: str, config: CrawlerRunConfig = None) -> CrawlResult:
# ... (ensure config exists, set defaults) ...
if config.cache_mode is None:
config.cache_mode = CacheMode.ENABLED # Example default
# 1. Create CacheContext
cache_context = CacheContext(url, config.cache_mode)
cached_result = None
# 2. Check if cache read is allowed
if cache_context.should_read():
# 3. Try reading from database
cached_result = await async_db_manager.aget_cached_url(url)
# 4. If cache hit and valid, return it
if cached_result and self._is_cache_valid(cached_result, config):
self.logger.info("Cache hit for: %s", url) # Example log
return cached_result # Return early
# 5. Fetch fresh content (if no cache hit or read disabled)
async_response = await self.crawler_strategy.crawl(url, config=config)
html = async_response.html # ... and other data ...
# 6. Process the HTML to get a new CrawlResult
crawl_result = await self.aprocess_html(
url=url, html=html, config=config, # ... other params ...
)
# 7. Check if cache write is allowed
if cache_context.should_write():
# 8. Write the new result to the database
await async_db_manager.acache_url(crawl_result)
# 9. Return the new result
return crawl_result
def _is_cache_valid(self, cached_result: CrawlResult, config: CrawlerRunConfig) -> bool:
# Internal logic to check if cached result meets current needs
# (e.g., was screenshot requested now but not cached?)
if config.screenshot and not cached_result.screenshot: return False
if config.pdf and not cached_result.pdf: return False
# ... other checks ...
return True
```
**Inside `cache_context.py` (defining the concepts):**
```python
# Simplified from crawl4ai/cache_context.py
from enum import Enum
class CacheMode(Enum):
"""Defines the caching behavior for web crawling operations."""
ENABLED = "enabled" # Read and Write
DISABLED = "disabled" # No Read, No Write
READ_ONLY = "read_only" # Read Only, No Write
WRITE_ONLY = "write_only" # Write Only, No Read
BYPASS = "bypass" # No Read, Write Only (similar to WRITE_ONLY but explicit intention)
class CacheContext:
"""Encapsulates cache-related decisions and URL handling."""
def __init__(self, url: str, cache_mode: CacheMode, always_bypass: bool = False):
self.url = url
self.cache_mode = cache_mode
self.always_bypass = always_bypass # Usually False
# Determine if URL type is cacheable (e.g., not 'raw:')
self.is_cacheable = url.startswith(("http://", "https://", "file://"))
# ... other URL type checks ...
def should_read(self) -> bool:
"""Determines if cache should be read based on context."""
if self.always_bypass or not self.is_cacheable:
return False
# Allow read if mode is ENABLED or READ_ONLY
return self.cache_mode in [CacheMode.ENABLED, CacheMode.READ_ONLY]
def should_write(self) -> bool:
"""Determines if cache should be written based on context."""
if self.always_bypass or not self.is_cacheable:
return False
# Allow write if mode is ENABLED, WRITE_ONLY, or BYPASS
return self.cache_mode in [CacheMode.ENABLED, CacheMode.WRITE_ONLY, CacheMode.BYPASS]
@property
def display_url(self) -> str:
"""Returns the URL in display format."""
return self.url if not self.url.startswith("raw:") else "Raw HTML"
# Helper for backward compatibility (may be removed later)
def _legacy_to_cache_mode(...) -> CacheMode:
# ... logic to convert old boolean flags ...
pass
```
## Conclusion
You've learned how Crawl4AI uses caching to avoid redundant work and speed up repeated crawls!
* **Caching** stores results locally to reuse them later.
* **`CacheMode`** is the policy you set in `CrawlerRunConfig` to control *how* the cache is used (`ENABLED`, `BYPASS`, `DISABLED`, etc.).
* **`CacheContext`** is an internal helper that makes decisions based on the `CacheMode` and URL type.
* Using the cache effectively (especially `CacheMode.ENABLED`) can significantly speed up your crawling tasks, particularly during development or when dealing with many URLs, including deep crawls.
We've seen how Crawl4AI can crawl single pages, lists of pages (`arun_many`), and even explore websites (`DeepCrawlStrategy`). But how does `arun_many` or a deep crawl manage running potentially hundreds or thousands of individual crawl tasks efficiently without overwhelming your system or the target website?
**Next:** Let's explore the component responsible for managing concurrent tasks: [Chapter 10: Orchestrating the Crawl - BaseDispatcher](10_basedispatcher.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,387 @@
# Chapter 10: Orchestrating the Crawl - BaseDispatcher
In [Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode](09_cachecontext___cachemode.md), we learned how Crawl4AI uses caching to cleverly avoid re-fetching the same webpage multiple times, which is especially helpful when crawling many URLs. We've also seen how methods like `arun_many()` ([Chapter 2: Meet the General Manager - AsyncWebCrawler](02_asyncwebcrawler.md)) or strategies like [DeepCrawlStrategy](08_deepcrawlstrategy.md) can lead to potentially hundreds or thousands of individual URLs needing to be crawled.
This raises a question: if we have 1000 URLs to crawl, does Crawl4AI try to crawl all 1000 simultaneously? That would likely overwhelm your computer's resources (like memory and CPU) and could also flood the target website with too many requests, potentially getting you blocked! How does Crawl4AI manage running many crawls efficiently and responsibly?
## What Problem Does `BaseDispatcher` Solve?
Imagine you're managing a fleet of delivery drones (`AsyncWebCrawler` tasks) that need to pick up packages from many different addresses (URLs). If you launch all 1000 drones at the exact same moment:
* Your control station (your computer) might crash due to the processing load.
* The central warehouse (the target website) might get overwhelmed by simultaneous arrivals.
* Some drones might collide or interfere with each other.
You need a **Traffic Controller** or a **Dispatch Center** to manage the fleet. This controller decides:
1. How many drones can be active in the air at any one time.
2. When to launch the next drone, maybe based on available airspace (system resources) or just a simple count limit.
3. How to handle potential delays or issues (like rate limiting from a specific website).
In Crawl4AI, the `BaseDispatcher` acts as this **Traffic Controller** or **Task Scheduler** for concurrent crawling operations, primarily when using `arun_many()`. It manages *how* multiple crawl tasks are executed concurrently, ensuring the process is efficient without overwhelming your system or the target websites.
## What is `BaseDispatcher`?
`BaseDispatcher` is an abstract concept (a blueprint or job description) in Crawl4AI. It defines *that* we need a system for managing the execution of multiple, concurrent crawling tasks. It specifies the *interface* for how the main `AsyncWebCrawler` interacts with such a system, but the specific *logic* for managing concurrency can vary.
Think of it as the control panel for our drone fleet the panel exists, but the specific rules programmed into it determine how drones are dispatched.
## The Different Controllers: Ways to Dispatch Tasks
Crawl4AI provides concrete implementations (the actual traffic control systems) based on the `BaseDispatcher` blueprint:
1. **`SemaphoreDispatcher` (The Simple Counter):**
* **Analogy:** A parking garage with a fixed number of spots (e.g., 10). A gate (`asyncio.Semaphore`) only lets a new car in if one of the 10 spots is free.
* **How it works:** You tell it the maximum number of crawls that can run *at the same time* (e.g., `semaphore_count=10`). It uses a simple counter (a semaphore) to ensure that no more than this number of crawls are active simultaneously. When one crawl finishes, it allows another one from the queue to start.
* **Good for:** Simple, direct control over concurrency when you know a specific limit works well for your system and the target sites.
2. **`MemoryAdaptiveDispatcher` (The Resource-Aware Controller - Default):**
* **Analogy:** A smart parking garage attendant who checks not just the number of cars, but also the *total space* they occupy (system memory). They might stop letting cars in if the garage is nearing its memory capacity, even if some numbered spots are technically free.
* **How it works:** This dispatcher monitors your system's available memory. It tries to run multiple crawls concurrently (up to a configurable maximum like `max_session_permit`), but it will pause launching new crawls if the system memory usage exceeds a certain threshold (e.g., `memory_threshold_percent=90.0`). It adapts the concurrency level based on available resources.
* **Good for:** Automatically adjusting concurrency to prevent out-of-memory errors, especially when crawl tasks vary significantly in resource usage. **This is the default dispatcher used by `arun_many` if you don't specify one.**
These dispatchers can also optionally work with a `RateLimiter` component, which adds politeness rules for specific websites (e.g., slowing down requests to a domain if it returns "429 Too Many Requests").
## How `arun_many` Uses the Dispatcher
When you call `crawler.arun_many(urls=...)`, here's the basic flow involving the dispatcher:
1. **Get URLs:** `arun_many` receives the list of URLs you want to crawl.
2. **Select Dispatcher:** It checks if you provided a specific `dispatcher` instance. If not, it creates an instance of the default `MemoryAdaptiveDispatcher`.
3. **Delegate Execution:** It hands over the list of URLs and the `CrawlerRunConfig` to the chosen dispatcher's `run_urls` (or `run_urls_stream`) method.
4. **Manage Tasks:** The dispatcher takes charge:
* It iterates through the URLs.
* For each URL, it decides *when* to start the actual crawl based on its rules (semaphore count, memory usage, rate limits).
* When ready, it typically calls the single-page `crawler.arun(url, config)` method internally for that specific URL, wrapped within its concurrency control mechanism.
* It manages the running tasks (e.g., using `asyncio.create_task` and `asyncio.wait`).
5. **Collect Results:** As individual `arun` calls complete, the dispatcher collects their `CrawlResult` objects.
6. **Return:** Once all URLs are processed, the dispatcher returns the list of results (or yields them if streaming).
```mermaid
sequenceDiagram
participant User
participant AWC as AsyncWebCrawler
participant Dispatcher as BaseDispatcher (e.g., MemoryAdaptive)
participant TaskPool as Concurrency Manager
User->>AWC: arun_many(urls, config, dispatcher?)
AWC->>Dispatcher: run_urls(crawler=AWC, urls, config)
Dispatcher->>TaskPool: Initialize (e.g., set max concurrency)
loop For each URL in urls
Dispatcher->>TaskPool: Can I start a new task? (Checks limits)
alt Yes
TaskPool-->>Dispatcher: OK
Note over Dispatcher: Create task: call AWC.arun(url, config) internally
Dispatcher->>TaskPool: Add new task
else No
TaskPool-->>Dispatcher: Wait
Note over Dispatcher: Waits for a running task to finish
end
end
Note over Dispatcher: Manages running tasks, collects results
Dispatcher-->>AWC: List of CrawlResults
AWC-->>User: List of CrawlResults
```
## Using the Dispatcher (Often Implicitly!)
Most of the time, you don't need to think about the dispatcher explicitly. When you use `arun_many`, the default `MemoryAdaptiveDispatcher` handles things automatically.
```python
# chapter10_example_1.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
async def main():
urls_to_crawl = [
"https://httpbin.org/html",
"https://httpbin.org/links/5/0", # Page with 5 links
"https://httpbin.org/robots.txt",
"https://httpbin.org/status/200",
]
# We DON'T specify a dispatcher here.
# arun_many will use the default MemoryAdaptiveDispatcher.
async with AsyncWebCrawler() as crawler:
print(f"Crawling {len(urls_to_crawl)} URLs using the default dispatcher...")
config = CrawlerRunConfig(stream=False) # Get results as a list at the end
# The MemoryAdaptiveDispatcher manages concurrency behind the scenes.
results = await crawler.arun_many(urls=urls_to_crawl, config=config)
print(f"\nFinished! Got {len(results)} results.")
for result in results:
status = "✅" if result.success else "❌"
url_short = result.url.split('/')[-1]
print(f" {status} {url_short:<15} | Title: {result.metadata.get('title', 'N/A')}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
* We call `crawler.arun_many` without passing a `dispatcher` argument.
* Crawl4AI automatically creates and uses a `MemoryAdaptiveDispatcher`.
* This dispatcher runs the crawls concurrently, adapting to your system's memory, and returns all the results once completed (because `stream=False`). You benefit from concurrency without explicit setup.
## Explicitly Choosing a Dispatcher
What if you want simpler, fixed concurrency? You can explicitly create and pass a `SemaphoreDispatcher`.
```python
# chapter10_example_2.py
import asyncio
from crawl4ai import (
AsyncWebCrawler,
CrawlerRunConfig,
SemaphoreDispatcher # 1. Import the specific dispatcher
)
async def main():
urls_to_crawl = [
"https://httpbin.org/delay/1", # Takes 1 second
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
]
# 2. Create an instance of the SemaphoreDispatcher
# Allow only 2 crawls to run at the same time.
semaphore_controller = SemaphoreDispatcher(semaphore_count=2)
print(f"Using SemaphoreDispatcher with limit: {semaphore_controller.semaphore_count}")
async with AsyncWebCrawler() as crawler:
print(f"Crawling {len(urls_to_crawl)} URLs with explicit dispatcher...")
config = CrawlerRunConfig(stream=False)
# 3. Pass the dispatcher instance to arun_many
results = await crawler.arun_many(
urls=urls_to_crawl,
config=config,
dispatcher=semaphore_controller # Pass our controller
)
print(f"\nFinished! Got {len(results)} results.")
# This crawl likely took around 3 seconds (5 tasks, 1s each, 2 concurrent = ceil(5/2)*1s)
for result in results:
status = "✅" if result.success else "❌"
print(f" {status} {result.url}")
if __name__ == "__main__":
asyncio.run(main())
```
**Explanation:**
1. **Import:** We import `SemaphoreDispatcher`.
2. **Instantiate:** We create `SemaphoreDispatcher(semaphore_count=2)`, limiting concurrency to 2 simultaneous crawls.
3. **Pass Dispatcher:** We pass our `semaphore_controller` instance directly to the `dispatcher` parameter of `arun_many`.
4. **Execution:** Now, `arun_many` uses our `SemaphoreDispatcher`. It will start the first two crawls. As one finishes, it will start the next one from the list, always ensuring no more than two are running concurrently.
## A Glimpse Under the Hood
Where are these dispatchers defined? In `crawl4ai/async_dispatcher.py`.
**The Blueprint (`BaseDispatcher`):**
```python
# Simplified from crawl4ai/async_dispatcher.py
from abc import ABC, abstractmethod
from typing import List, Optional
# ... other imports like CrawlerRunConfig, CrawlerTaskResult, AsyncWebCrawler ...
class BaseDispatcher(ABC):
def __init__(
self,
rate_limiter: Optional[RateLimiter] = None,
monitor: Optional[CrawlerMonitor] = None,
):
self.crawler = None # Will be set by arun_many
self.rate_limiter = rate_limiter
self.monitor = monitor
# ... other common state ...
@abstractmethod
async def crawl_url(
self,
url: str,
config: CrawlerRunConfig,
task_id: str,
# ... maybe other internal params ...
) -> CrawlerTaskResult:
"""Crawls a single URL, potentially handling concurrency primitives."""
# This is often the core worker method called by run_urls
pass
@abstractmethod
async def run_urls(
self,
urls: List[str],
crawler: "AsyncWebCrawler",
config: CrawlerRunConfig,
) -> List[CrawlerTaskResult]:
"""Manages the concurrent execution of crawl_url for multiple URLs."""
# This is the main entry point called by arun_many
pass
async def run_urls_stream(
self,
urls: List[str],
crawler: "AsyncWebCrawler",
config: CrawlerRunConfig,
) -> AsyncGenerator[CrawlerTaskResult, None]:
""" Streaming version of run_urls (might be implemented in base or subclasses) """
# Example default implementation (subclasses might override)
results = await self.run_urls(urls, crawler, config)
for res in results: yield res # Naive stream, real one is more complex
# ... other potential helper methods ...
```
**Example Implementation (`SemaphoreDispatcher`):**
```python
# Simplified from crawl4ai/async_dispatcher.py
import asyncio
import uuid
import psutil # For memory tracking in crawl_url
import time # For timing in crawl_url
# ... other imports ...
class SemaphoreDispatcher(BaseDispatcher):
def __init__(
self,
semaphore_count: int = 5,
# ... other params like rate_limiter, monitor ...
):
super().__init__(...) # Pass rate_limiter, monitor to base
self.semaphore_count = semaphore_count
async def crawl_url(
self,
url: str,
config: CrawlerRunConfig,
task_id: str,
semaphore: asyncio.Semaphore = None, # Takes the semaphore
) -> CrawlerTaskResult:
# ... (Code to track start time, memory usage - similar to MemoryAdaptiveDispatcher's version)
start_time = time.time()
error_message = ""
memory_usage = peak_memory = 0.0
result = None
try:
# Update monitor state if used
if self.monitor: self.monitor.update_task(task_id, status=CrawlStatus.IN_PROGRESS)
# Wait for rate limiter if used
if self.rate_limiter: await self.rate_limiter.wait_if_needed(url)
# --- Core Semaphore Logic ---
async with semaphore: # Acquire a spot from the semaphore
# Now that we have a spot, run the actual crawl
process = psutil.Process()
start_memory = process.memory_info().rss / (1024 * 1024)
# Call the single-page crawl method of the main crawler
result = await self.crawler.arun(url, config=config, session_id=task_id)
end_memory = process.memory_info().rss / (1024 * 1024)
memory_usage = peak_memory = end_memory - start_memory
# --- Semaphore spot is released automatically on exiting 'async with' ---
# Update rate limiter based on result status if used
if self.rate_limiter and result.status_code:
if not self.rate_limiter.update_delay(url, result.status_code):
# Handle retry limit exceeded
error_message = "Rate limit retry count exceeded"
# ... update monitor, prepare error result ...
# Update monitor status (success/fail)
if result and not result.success: error_message = result.error_message
if self.monitor: self.monitor.update_task(task_id, status=CrawlStatus.COMPLETED if result.success else CrawlStatus.FAILED)
except Exception as e:
# Handle unexpected errors during the crawl
error_message = str(e)
if self.monitor: self.monitor.update_task(task_id, status=CrawlStatus.FAILED)
# Create a failed CrawlResult if needed
if not result: result = CrawlResult(url=url, html="", success=False, error_message=error_message)
finally:
# Final monitor update with timing, memory etc.
end_time = time.time()
if self.monitor: self.monitor.update_task(...)
# Package everything into CrawlerTaskResult
return CrawlerTaskResult(...)
async def run_urls(
self,
crawler: "AsyncWebCrawler",
urls: List[str],
config: CrawlerRunConfig,
) -> List[CrawlerTaskResult]:
self.crawler = crawler # Store the crawler instance
if self.monitor: self.monitor.start()
try:
# Create the semaphore with the specified count
semaphore = asyncio.Semaphore(self.semaphore_count)
tasks = []
# Create a crawl task for each URL, passing the semaphore
for url in urls:
task_id = str(uuid.uuid4())
if self.monitor: self.monitor.add_task(task_id, url)
# Create an asyncio task to run crawl_url
task = asyncio.create_task(
self.crawl_url(url, config, task_id, semaphore=semaphore)
)
tasks.append(task)
# Wait for all created tasks to complete
# asyncio.gather runs them concurrently, respecting the semaphore limit
results = await asyncio.gather(*tasks, return_exceptions=True)
# Process results (handle potential exceptions returned by gather)
final_results = []
for res in results:
if isinstance(res, Exception):
# Handle case where gather caught an exception from a task
# You might create a failed CrawlerTaskResult here
pass
elif isinstance(res, CrawlerTaskResult):
final_results.append(res)
return final_results
finally:
if self.monitor: self.monitor.stop()
# run_urls_stream would have similar logic but use asyncio.as_completed
# or manage tasks manually to yield results as they finish.
```
The key takeaway is that the `Dispatcher` orchestrates calls to the single-page `crawler.arun` method, wrapping them with concurrency controls (like the `async with semaphore:` block) before running them using `asyncio`'s concurrency tools (`asyncio.create_task`, `asyncio.gather`, etc.).
## Conclusion
You've learned about `BaseDispatcher`, the crucial "Traffic Controller" that manages concurrent crawls in Crawl4AI, especially for `arun_many`.
* It solves the problem of efficiently running many crawls without overloading systems or websites.
* It acts as a **blueprint** for managing concurrency.
* Key implementations:
* **`SemaphoreDispatcher`**: Uses a simple count limit.
* **`MemoryAdaptiveDispatcher`**: Adjusts concurrency based on system memory (the default for `arun_many`).
* The dispatcher is used **automatically** by `arun_many`, but you can provide a specific instance if needed.
* It orchestrates the execution of individual crawl tasks, respecting defined limits.
Understanding the dispatcher helps appreciate how Crawl4AI handles large-scale crawling tasks responsibly and efficiently.
This concludes our tour of the core concepts in Crawl4AI! We've covered how pages are fetched, how the process is managed, how content is cleaned, filtered, and extracted, how deep crawls are performed, how caching optimizes fetches, and finally, how concurrency is managed. You now have a solid foundation to start building powerful web data extraction and processing applications with Crawl4AI. Happy crawling!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

52
output/Crawl4AI/index.md Normal file
View File

@@ -0,0 +1,52 @@
# Tutorial: Crawl4AI
`Crawl4AI` is a flexible Python library for *asynchronously crawling websites* and *extracting structured content*, specifically designed for **AI use cases**.
You primarily interact with the `AsyncWebCrawler`, which acts as the main coordinator. You provide it with URLs and a `CrawlerRunConfig` detailing *how* to crawl (e.g., using specific strategies for fetching, scraping, filtering, and extraction).
It can handle single pages or multiple URLs concurrently using a `BaseDispatcher`, optionally crawl deeper by following links via `DeepCrawlStrategy`, manage `CacheMode`, and apply `RelevantContentFilter` before finally returning a `CrawlResult` containing all the gathered data.
**Source Repository:** [https://github.com/unclecode/crawl4ai/tree/9c58e4ce2ee025debd3f36bf213330bd72b90e46/crawl4ai](https://github.com/unclecode/crawl4ai/tree/9c58e4ce2ee025debd3f36bf213330bd72b90e46/crawl4ai)
```mermaid
flowchart TD
A0["AsyncWebCrawler"]
A1["CrawlerRunConfig"]
A2["AsyncCrawlerStrategy"]
A3["ContentScrapingStrategy"]
A4["ExtractionStrategy"]
A5["CrawlResult"]
A6["BaseDispatcher"]
A7["DeepCrawlStrategy"]
A8["CacheContext / CacheMode"]
A9["RelevantContentFilter"]
A0 -- "Configured by" --> A1
A0 -- "Uses Fetching Strategy" --> A2
A0 -- "Uses Scraping Strategy" --> A3
A0 -- "Uses Extraction Strategy" --> A4
A0 -- "Produces" --> A5
A0 -- "Uses Dispatcher for `arun_m..." --> A6
A0 -- "Uses Caching Logic" --> A8
A6 -- "Calls Crawler's `arun`" --> A0
A1 -- "Specifies Deep Crawl Strategy" --> A7
A7 -- "Processes Links from" --> A5
A3 -- "Provides Cleaned HTML to" --> A9
A1 -- "Specifies Content Filter" --> A9
```
## Chapters
1. [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md)
2. [AsyncWebCrawler](02_asyncwebcrawler.md)
3. [CrawlerRunConfig](03_crawlerrunconfig.md)
4. [ContentScrapingStrategy](04_contentscrapingstrategy.md)
5. [RelevantContentFilter](05_relevantcontentfilter.md)
6. [ExtractionStrategy](06_extractionstrategy.md)
7. [CrawlResult](07_crawlresult.md)
8. [DeepCrawlStrategy](08_deepcrawlstrategy.md)
9. [CacheContext / CacheMode](09_cachecontext___cachemode.md)
10. [BaseDispatcher](10_basedispatcher.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

207
output/CrewAI/01_crew.md Normal file
View File

@@ -0,0 +1,207 @@
# Chapter 1: Crew - Your AI Team Manager
Welcome to the world of CrewAI! We're excited to help you build teams of AI agents that can work together to accomplish complex tasks.
Imagine you have a big project, like planning a surprise birthday trip for a friend. Doing it all yourself researching destinations, checking flight prices, finding hotels, planning activities can be overwhelming. Wouldn't it be great if you had a team to help? Maybe one person researches cool spots, another finds the best travel deals, and you coordinate everything.
That's exactly what a `Crew` does in CrewAI! It acts like the **project manager** or even the **entire team** itself, bringing together specialized AI assistants ([Agents](02_agent.md)) and telling them what [Tasks](03_task.md) to do and in what order.
**What Problem Does `Crew` Solve?**
Single AI models are powerful, but complex goals often require multiple steps and different kinds of expertise. A `Crew` allows you to break down a big goal into smaller, manageable [Tasks](03_task.md) and assign each task to the best AI [Agent](02_agent.md) for the job. It then manages how these agents work together to achieve the overall objective.
## What is a Crew?
Think of a `Crew` as the central coordinator. It holds everything together:
1. **The Team ([Agents](02_agent.md)):** It knows which AI agents are part of the team. Each agent might have a specific role (like 'Travel Researcher' or 'Booking Specialist').
2. **The Plan ([Tasks](03_task.md)):** It holds the list of tasks that need to be completed to achieve the final goal (e.g., 'Research European cities', 'Find affordable flights', 'Book hotel').
3. **The Workflow ([Process](05_process.md)):** It defines *how* the team works. Should they complete tasks one after another (`sequential`)? Or should there be a manager agent delegating work (`hierarchical`)?
4. **Collaboration:** It orchestrates how agents share information and pass results from one task to the next.
## Let's Build a Simple Crew!
Let's try building a very basic `Crew` for our trip planning example. For now, we'll just set up the structure. We'll learn more about creating sophisticated [Agents](02_agent.md) and [Tasks](03_task.md) in the next chapters.
```python
# Import necessary classes (we'll learn about these soon!)
from crewai import Agent, Task, Crew, Process
# Define our agents (don't worry about the details for now)
# Agent 1: The Researcher
researcher = Agent(
role='Travel Researcher',
goal='Find interesting cities in Europe for a birthday trip',
backstory='An expert travel researcher.',
# verbose=True, # Optional: Shows agent's thinking process
allow_delegation=False # This agent doesn't delegate work
# llm=your_llm # We'll cover LLMs later!
)
# Agent 2: The Planner
planner = Agent(
role='Activity Planner',
goal='Create a fun 3-day itinerary for the chosen city',
backstory='An experienced activity planner.',
# verbose=True,
allow_delegation=False
# llm=your_llm
)
```
**Explanation:**
* We import `Agent`, `Task`, `Crew`, and `Process` from the `crewai` library.
* We create two simple [Agents](02_agent.md). We give them a `role` and a `goal`. Think of these as job titles and descriptions for our AI assistants. (We'll dive deep into Agents in [Chapter 2](02_agent.md)).
Now, let's define the [Tasks](03_task.md) for these agents:
```python
# Define the tasks
task1 = Task(
description='Identify the top 3 European cities suitable for a sunny birthday trip in May.',
expected_output='A list of 3 cities with brief reasons.',
agent=researcher # Assign task1 to the researcher agent
)
task2 = Task(
description='Based on the chosen city from task 1, create a 3-day activity plan.',
expected_output='A detailed itinerary for 3 days.',
agent=planner # Assign task2 to the planner agent
)
```
**Explanation:**
* We create two [Tasks](03_task.md). Each task has a `description` (what to do) and an `expected_output` (what the result should look like).
* Crucially, we assign each task to an `agent`. `task1` goes to the `researcher`, and `task2` goes to the `planner`. (More on Tasks in [Chapter 3](03_task.md)).
Finally, let's assemble the `Crew`:
```python
# Create the Crew
trip_crew = Crew(
agents=[researcher, planner],
tasks=[task1, task2],
process=Process.sequential # Tasks will run one after another
# verbose=2 # Optional: Sets verbosity level for the crew execution
)
# Start the Crew's work!
result = trip_crew.kickoff()
print("\n\n########################")
print("## Here is the result")
print("########################\n")
print(result)
```
**Explanation:**
1. We create an instance of the `Crew` class.
2. We pass the list of `agents` we defined earlier.
3. We pass the list of `tasks`. The order in this list matters for the sequential process.
4. We set the `process` to `Process.sequential`. This means `task1` will be completed first by the `researcher`, and its output will *automatically* be available as context for `task2` when the `planner` starts working.
5. We call the `kickoff()` method. This is like saying "Okay team, start working!"
6. The `Crew` manages the execution, ensuring the `researcher` does `task1`, then the `planner` does `task2`.
7. The `result` will contain the final output from the *last* task (`task2` in this case).
**Expected Outcome (Conceptual):**
When you run this (assuming you have underlying AI models configured, which we'll cover in the [LLM chapter](06_llm.md)), the `Crew` will:
1. Ask the `researcher` agent to perform `task1`.
2. The `researcher` will (conceptually) think and produce a list like: "1. Barcelona (Sunny, vibrant) 2. Lisbon (Coastal, historic) 3. Rome (Iconic, warm)".
3. The `Crew` takes this output and gives it to the `planner` agent along with `task2`.
4. The `planner` agent uses the city list (and likely picks one, or you'd refine the task) and creates a 3-day itinerary.
5. The final `result` printed will be the 3-day itinerary generated by the `planner`.
## How Does `Crew.kickoff()` Work Inside?
You don't *need* to know the deep internals to use CrewAI, but understanding the basics helps! When you call `kickoff()`:
1. **Input Check:** It checks if you provided any starting inputs (we didn't in this simple example, but you could provide a starting topic or variable).
2. **Agent & Task Setup:** It makes sure all agents and tasks are ready to go. It ensures agents have the necessary configurations ([LLMs](06_llm.md), [Tools](04_tool.md) - more on these later!).
3. **Process Execution:** It looks at the chosen `process` (e.g., `sequential`).
* **Sequential:** It runs tasks one by one. The output of task `N` is added to the context for task `N+1`.
* **Hierarchical (Advanced):** If you chose this process, the Crew would use a dedicated 'manager' agent to coordinate the other agents and decide who does what next. We'll stick to sequential for now.
4. **Task Execution Loop:**
* It picks the next task based on the process.
* It finds the assigned agent for that task.
* It gives the agent the task description and any relevant context (like outputs from previous tasks).
* The agent performs the task using its underlying AI model ([LLM](06_llm.md)).
* The agent returns the result (output) of the task.
* The Crew stores this output.
* Repeat until all tasks are done.
5. **Final Output:** The `Crew` packages the output from the final task (and potentially outputs from all tasks) and returns it.
Let's visualize the `sequential` process:
```mermaid
sequenceDiagram
participant User
participant MyCrew as Crew
participant ResearcherAgent as Researcher
participant PlannerAgent as Planner
User->>MyCrew: kickoff()
MyCrew->>ResearcherAgent: Execute Task 1 ("Find cities...")
Note right of ResearcherAgent: Researcher thinks... generates city list.
ResearcherAgent-->>MyCrew: Task 1 Output ("Barcelona, Lisbon, Rome...")
MyCrew->>PlannerAgent: Execute Task 2 ("Create itinerary...") \nwith Task 1 Output as context
Note right of PlannerAgent: Planner thinks... uses city list, creates itinerary.
PlannerAgent-->>MyCrew: Task 2 Output ("Day 1: ..., Day 2: ...")
MyCrew-->>User: Final Result (Task 2 Output)
```
**Code Glimpse (`crew.py` simplified):**
The `Crew` class itself is defined in `crewai/crew.py`. It takes parameters like `agents`, `tasks`, and `process` when you create it.
```python
# Simplified view from crewai/crew.py
class Crew(BaseModel):
tasks: List[Task] = Field(default_factory=list)
agents: List[BaseAgent] = Field(default_factory=list)
process: Process = Field(default=Process.sequential)
# ... other configurations like memory, cache, etc.
def kickoff(self, inputs: Optional[Dict[str, Any]] = None) -> CrewOutput:
# ... setup steps ...
# Decides which execution path based on the process
if self.process == Process.sequential:
result = self._run_sequential_process()
elif self.process == Process.hierarchical:
result = self._run_hierarchical_process()
else:
# Handle other processes or errors
raise NotImplementedError(...)
# ... cleanup and formatting steps ...
return result # Returns a CrewOutput object
def _run_sequential_process(self) -> CrewOutput:
# Simplified loop logic
task_outputs = []
for task in self.tasks:
agent = task.agent # Find the agent for this task
context = self._get_context(task, task_outputs) # Get outputs from previous tasks
# Execute the task (sync or async)
output = task.execute_sync(agent=agent, context=context)
task_outputs.append(output)
# ... logging/callbacks ...
return self._create_crew_output(task_outputs) # Package final result
```
This simplified view shows how the `Crew` holds the `agents` and `tasks`, and the `kickoff` method directs traffic based on the chosen `process`, eventually looping through tasks sequentially if `Process.sequential` is selected.
## Conclusion
You've learned about the most fundamental concept in CrewAI: the `Crew`! It's the manager that brings your AI agents together, gives them tasks, and defines how they collaborate to achieve a larger goal. We saw how to define agents and tasks (at a high level) and assemble them into a `Crew` using a `sequential` process.
But a Crew is nothing without its members! In the next chapter, we'll dive deep into the first core component: the [Agent](02_agent.md). What makes an agent tick? How do you define their roles, goals, and capabilities? Let's find out!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

178
output/CrewAI/02_agent.md Normal file
View File

@@ -0,0 +1,178 @@
# Chapter 2: Agent - Your Specialized AI Worker
In [Chapter 1](01_crew.md), we learned about the `Crew` the manager that organizes our AI team. But a manager needs a team to manage! That's where `Agent`s come in.
## Why Do We Need Agents?
Imagine our trip planning `Crew` again. The `Crew` knows the overall goal (plan a surprise trip), but it doesn't *do* the research or the planning itself. It needs specialists.
* One specialist could be excellent at researching travel destinations.
* Another could be fantastic at creating detailed itineraries.
In CrewAI, these specialists are called **`Agent`s**. Instead of having one super-smart AI try to juggle everything, we create multiple `Agent`s, each with its own focus and expertise. This makes complex tasks more manageable and often leads to better results.
**Problem Solved:** `Agent`s allow you to break down a large task into smaller pieces and assign each piece to an AI worker specifically designed for it.
## What is an Agent?
Think of an `Agent` as a **dedicated AI worker** on your `Crew`. Each `Agent` has a unique profile that defines who they are and what they do:
1. **`role`**: This is the Agent's job title. What function do they perform in the team? Examples: 'Travel Researcher', 'Marketing Analyst', 'Code Reviewer', 'Blog Post Writer'.
2. **`goal`**: This is the Agent's primary objective. What specific outcome are they trying to achieve within their role? Examples: 'Find the top 3 family-friendly European destinations', 'Analyze competitor website traffic', 'Identify bugs in Python code', 'Draft an engaging blog post about AI'.
3. **`backstory`**: This is the Agent's personality, skills, and history. It tells the AI *how* to behave and what expertise it possesses. It adds flavour and context. Examples: 'An expert travel agent with 20 years of experience in European travel.', 'A data-driven market analyst known for spotting emerging trends.', 'A meticulous senior software engineer obsessed with code quality.', 'A witty content creator known for simplifying complex topics.'
4. **`llm`** (Optional): This is the Agent's "brain" the specific Large Language Model (like GPT-4, Gemini, etc.) it uses to think, communicate, and execute tasks. We'll cover this more in the [LLM chapter](06_llm.md). If not specified, it usually inherits the `Crew`'s default LLM.
5. **`tools`** (Optional): These are special capabilities the Agent can use, like searching the web, using a calculator, or reading files. Think of them as the Agent's equipment. We'll explore these in the [Tool chapter](04_tool.md).
6. **`allow_delegation`** (Optional, default `False`): Can this Agent ask other Agents in the `Crew` for help with a sub-task? If `True`, it enables collaboration.
7. **`verbose`** (Optional, default `False`): If `True`, the Agent will print out its thought process as it works, which is great for debugging and understanding what's happening.
An Agent takes the [Tasks](03_task.md) assigned to it by the `Crew` and uses its `role`, `goal`, `backstory`, `llm`, and `tools` to complete them.
## Let's Define an Agent!
Let's revisit the `researcher` Agent from Chapter 1 and look closely at how it's defined.
```python
# Make sure you have crewai installed
# pip install crewai
from crewai import Agent
# Define our researcher agent
researcher = Agent(
role='Expert Travel Researcher',
goal='Find the most exciting and sunny European cities for a birthday trip in late May.',
backstory=(
"You are a world-class travel researcher with deep knowledge of "
"European destinations. You excel at finding hidden gems and understanding "
"weather patterns. Your recommendations are always insightful and tailored."
),
verbose=True, # We want to see the agent's thinking process
allow_delegation=False # This agent focuses on its own research
# tools=[...] # We'll add tools later!
# llm=your_llm # We'll cover LLMs later!
)
# (You would typically define other agents, tasks, and a crew here)
# print(researcher) # Just to see the object
```
**Explanation:**
* `from crewai import Agent`: We import the necessary `Agent` class.
* `role='Expert Travel Researcher'`: We clearly define the agent's job title. This tells the LLM its primary function.
* `goal='Find the most exciting...'`: We give it a specific, measurable objective. This guides its actions.
* `backstory='You are a world-class...'`: We provide context and personality. This influences the *style* and *quality* of its output. Notice the detailed description this helps the LLM adopt the persona.
* `verbose=True`: We'll see detailed logs of this agent's thoughts and actions when it runs.
* `allow_delegation=False`: This researcher won't ask other agents for help; it will complete its task independently.
Running this code snippet creates an `Agent` object in Python. This object is now ready to be added to a [Crew](01_crew.md) and assigned [Tasks](03_task.md).
## How Agents Work "Under the Hood"
So, what happens when an `Agent` is given a task by the `Crew`?
1. **Receive Task & Context:** The `Agent` gets the task description (e.g., "Find 3 sunny cities") and potentially some context from previous tasks (e.g., "The user prefers coastal cities").
2. **Consult Profile:** It looks at its own `role`, `goal`, and `backstory`. This helps it frame *how* to tackle the task. Our 'Expert Travel Researcher' will approach this differently than a 'Budget Backpacker Blogger'.
3. **Think & Plan (Using LLM):** The `Agent` uses its assigned `llm` (its brain) to think. It breaks down the task, formulates a plan, and decides what information it needs. This often involves an internal "monologue" (which you can see if `verbose=True`).
4. **Use Tools (If Necessary):** If the plan requires external information or actions (like searching the web for current weather or calculating travel times), and the agent *has* the right [Tools](04_tool.md), it will use them.
5. **Delegate (If Allowed & Necessary):** If `allow_delegation=True` and the `Agent` decides a sub-part of the task is better handled by another specialist `Agent` in the `Crew`, it can ask the `Crew` to delegate that part.
6. **Generate Output (Using LLM):** Based on its thinking, tool results, and potentially delegated results, the `Agent` uses its `llm` again to formulate the final response or output for the task.
7. **Return Result:** The `Agent` passes its completed work back to the `Crew`.
Let's visualize this simplified flow:
```mermaid
sequenceDiagram
participant C as Crew
participant MyAgent as Agent (Researcher)
participant LLM as Agent's Brain
participant SearchTool as Tool
C->>MyAgent: Execute Task ("Find sunny cities in May")
MyAgent->>MyAgent: Consult profile (Role, Goal, Backstory)
MyAgent->>LLM: Formulate plan & Ask: "Best way to find sunny cities?"
LLM-->>MyAgent: Suggestion: "Search web for 'Europe weather May'"
MyAgent->>SearchTool: Use Tool(query="Europe weather May sunny cities")
SearchTool-->>MyAgent: Web search results (e.g., Lisbon, Seville, Malta)
MyAgent->>LLM: Consolidate results & Ask: "Format these 3 cities nicely"
LLM-->>MyAgent: Formatted list: "1. Lisbon..."
MyAgent-->>C: Task Result ("Here are 3 sunny cities: Lisbon...")
```
**Diving into the Code (`agent.py`)**
The core logic for the `Agent` resides in the `crewai/agent.py` file.
The `Agent` class itself inherits from `BaseAgent` (`crewai/agents/agent_builder/base_agent.py`) and primarily stores the configuration you provide:
```python
# Simplified view from crewai/agent.py
from crewai.agents.agent_builder.base_agent import BaseAgent
# ... other imports
class Agent(BaseAgent):
role: str = Field(description="Role of the agent")
goal: str = Field(description="Objective of the agent")
backstory: str = Field(description="Backstory of the agent")
llm: Any = Field(default=None, description="LLM instance")
tools: Optional[List[BaseTool]] = Field(default_factory=list)
allow_delegation: bool = Field(default=False)
verbose: bool = Field(default=False)
# ... other fields like memory, max_iter, etc.
def execute_task(
self,
task: Task,
context: Optional[str] = None,
tools: Optional[List[BaseTool]] = None,
) -> str:
# ... (steps 1 & 2: Prepare task prompt with context, memory, knowledge) ...
task_prompt = task.prompt() # Get base task description
if context:
task_prompt = f"{task_prompt}\nContext:\n{context}"
# Add memory, knowledge, tool descriptions etc. to the prompt...
# ... (Internal setup: Create AgentExecutor if needed) ...
self.create_agent_executor(tools=tools or self.tools)
# ... (Step 3-7: Run the execution loop via AgentExecutor) ...
result = self.agent_executor.invoke({
"input": task_prompt,
"tool_names": self._get_tool_names(self.agent_executor.tools),
"tools": self._get_tool_descriptions(self.agent_executor.tools),
# ... other inputs for the executor ...
})["output"] # Extract the final string output
return result
def create_agent_executor(self, tools: Optional[List[BaseTool]] = None) -> None:
# Sets up the internal CrewAgentExecutor which handles the actual
# interaction loop with the LLM and tools.
# It uses the agent's profile (role, goal, backstory) to build the main prompt.
pass
# ... other helper methods ...
```
Key takeaways from the code:
* The `Agent` class mainly holds the configuration (`role`, `goal`, `backstory`, `llm`, `tools`, etc.).
* The `execute_task` method is called by the `Crew` when it's the agent's turn.
* It prepares a detailed prompt for the underlying LLM, incorporating the task, context, the agent's profile, and available tools.
* It uses an internal object called `agent_executor` (specifically `CrewAgentExecutor` from `crewai/agents/crew_agent_executor.py`) to manage the actual step-by-step thinking, tool use, and response generation loop with the LLM.
You don't need to understand the `agent_executor` in detail right now, just know that it's the engine that drives the agent's execution based on the profile and task you provide.
## Conclusion
You've now met the core members of your AI team: the `Agent`s! You learned that each `Agent` is a specialized worker defined by its `role`, `goal`, and `backstory`. They use an [LLM](06_llm.md) as their brain and can be equipped with [Tools](04_tool.md) to perform specific actions.
We saw how to define an agent in code and got a glimpse into how they process information and execute the work assigned by the [Crew](01_crew.md).
But defining an `Agent` is only half the story. What specific work should they *do*? How do we describe the individual steps needed to achieve the `Crew`'s overall objective? That's where the next concept comes in: the [Task](03_task.md). Let's dive into defining the actual work!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

272
output/CrewAI/03_task.md Normal file
View File

@@ -0,0 +1,272 @@
# Chapter 3: Task - Defining the Work
In [Chapter 1](01_crew.md), we met the `Crew` - our AI team manager. In [Chapter 2](02_agent.md), we met the `Agent`s - our specialized AI workers. Now, we need to tell these agents *exactly* what to do. How do we give them specific assignments?
That's where the `Task` comes in!
## Why Do We Need Tasks?
Imagine our trip planning `Crew` again. We have a 'Travel Researcher' [Agent](02_agent.md) and an 'Activity Planner' [Agent](02_agent.md). Just having them isn't enough. We need to give them clear instructions:
* Researcher: "Find some sunny cities in Europe for May."
* Planner: "Create a 3-day plan for the city the Researcher found."
These specific instructions are **`Task`s** in CrewAI. Instead of one vague goal, we break the project down into smaller, concrete steps.
**Problem Solved:** `Task` allows you to define individual, actionable assignments for your [Agent](02_agent.md)s. It turns a big goal into a manageable checklist.
## What is a Task?
Think of a `Task` as a **work order** or a **specific assignment** given to an [Agent](02_agent.md). It clearly defines what needs to be done and what the expected result should look like.
Here are the key ingredients of a `Task`:
1. **`description`**: This is the most important part! It's a clear and detailed explanation of *what* the [Agent](02_agent.md) needs to accomplish. The more specific, the better.
2. **`expected_output`**: This tells the [Agent](02_agent.md) what a successful result should look like. It sets a clear target. Examples: "A list of 3 cities with pros and cons.", "A bulleted list of activities.", "A paragraph summarizing the key findings."
3. **`agent`**: This specifies *which* [Agent](02_agent.md) in your [Crew](01_crew.md) is responsible for completing this task. Each task is typically assigned to the agent best suited for it.
4. **`context`** (Optional but Important!): Tasks don't usually happen in isolation. A task might need information or results from *previous* tasks. The `context` allows the output of one task to be automatically fed as input/background information to the next task in a sequence.
5. **`tools`** (Optional): You can specify a list of [Tools](04_tool.md) that the [Agent](02_agent.md) is *allowed* to use specifically for *this* task. This can be useful to restrict or grant specific capabilities for certain assignments.
6. **`async_execution`** (Optional, Advanced): You can set this to `True` if you want the task to potentially run at the same time as other asynchronous tasks. We'll stick to synchronous (one after another) for now.
7. **`output_json` / `output_pydantic`** (Optional, Advanced): If you need the task's final output in a structured format like JSON, you can specify a model here.
8. **`output_file`** (Optional, Advanced): You can have the task automatically save its output to a file.
A `Task` bundles the instructions (`description`, `expected_output`) and assigns them to the right worker (`agent`), potentially giving them background info (`context`) and specific equipment (`tools`).
## Let's Define a Task!
Let's look again at the tasks we created for our trip planning [Crew](01_crew.md) in [Chapter 1](01_crew.md).
```python
# Import necessary classes
from crewai import Task, Agent # Assuming Agent class is defined as in Chapter 2
# Assume 'researcher' and 'planner' agents are already defined
# researcher = Agent(role='Travel Researcher', ...)
# planner = Agent(role='Activity Planner', ...)
# Define Task 1 for the Researcher
task1 = Task(
description=(
"Identify the top 3 European cities known for great sunny weather "
"around late May. Focus on cities with vibrant culture and good food."
),
expected_output=(
"A numbered list of 3 cities, each with a brief (1-2 sentence) justification "
"mentioning weather, culture, and food highlights."
),
agent=researcher # Assign this task to our researcher agent
)
# Define Task 2 for the Planner
task2 = Task(
description=(
"Using the list of cities provided by the researcher, select the best city "
"and create a detailed 3-day itinerary. Include morning, afternoon, and "
"evening activities, plus restaurant suggestions."
),
expected_output=(
"A markdown formatted 3-day itinerary for the chosen city. "
"Include timings, activity descriptions, and 2-3 restaurant ideas."
),
agent=planner # Assign this task to our planner agent
# context=[task1] # Optionally explicitly define context (often handled automatically)
)
# (You would then add these tasks to a Crew)
# print(task1)
# print(task2)
```
**Explanation:**
* `from crewai import Task`: We import the `Task` class.
* `description=...`: We write a clear instruction for the agent. Notice how `task1` specifies the criteria (sunny, May, culture, food). `task2` explicitly mentions using the output from the previous task.
* `expected_output=...`: We define what success looks like. `task1` asks for a numbered list with justifications. `task2` asks for a formatted itinerary. This helps the AI agent structure its response.
* `agent=researcher` / `agent=planner`: We link each task directly to the [Agent](02_agent.md) responsible for doing the work.
* `context=[task1]` (Commented Out): We *could* explicitly tell `task2` that it depends on `task1`. However, when using a `sequential` [Process](05_process.md) in the [Crew](01_crew.md), this dependency is usually handled automatically! The output of `task1` will be passed to `task2` as context.
Running this code creates `Task` objects, ready to be managed by a [Crew](01_crew.md).
## Task Workflow and Context: Connecting the Dots
Tasks are rarely standalone. They often form a sequence, where the result of one task is needed for the next. This is where `context` comes in.
Imagine our `Crew` is set up with a `sequential` [Process](05_process.md) (like in Chapter 1):
1. The `Crew` runs `task1` using the `researcher` agent.
2. The `researcher` completes `task1` and produces an output (e.g., "1. Lisbon...", "2. Seville...", "3. Malta..."). This output is stored.
3. The `Crew` moves to `task2`. Because it's sequential, it automatically takes the output from `task1` and provides it as *context* to `task2`.
4. The `planner` agent receives `task2`'s description *and* the list of cities from `task1` as context.
5. The `planner` uses this context to complete `task2` (e.g., creates an itinerary for Lisbon).
This automatic passing of information makes building workflows much easier!
```mermaid
graph LR
A["Task 1: Find Cities (Agent: Researcher)"] -->|Output: Lisbon, Seville, Malta| B[Context for Task 2]
B --> C["Task 2: Create Itinerary (Agent: Planner)"]
C -->|Output: Lisbon Itinerary...| D[Final Result]
style A fill:#f9f,stroke:#333,stroke-width:2px
style C fill:#f9f,stroke:#333,stroke-width:2px
style B fill:#ccf,stroke:#333,stroke-width:1px,stroke-dasharray: 5 5
style D fill:#cfc,stroke:#333,stroke-width:2px
```
While the `sequential` process often handles context automatically, you *can* explicitly define dependencies using the `context` parameter in the `Task` definition if you need more control, especially with more complex workflows.
## How Does a Task Execute "Under the Hood"?
When the [Crew](01_crew.md)'s `kickoff()` method runs a task, here's a simplified view of what happens:
1. **Selection:** The [Crew](01_crew.md) (based on its [Process](05_process.md)) picks the next `Task` to execute.
2. **Agent Assignment:** It identifies the `agent` assigned to this `Task`.
3. **Context Gathering:** It collects the output from any prerequisite tasks (like the previous task in a sequential process) to form the `context`.
4. **Execution Call:** The [Crew](01_crew.md) tells the assigned `Agent` to execute the `Task`, passing the `description`, `expected_output`, available `tools` (if any specified for the task), and the gathered `context`.
5. **Agent Work:** The [Agent](02_agent.md) uses its configuration ([LLM](06_llm.md), backstory, etc.) and the provided information (task details, context, tools) to perform the work.
6. **Result Return:** The [Agent](02_agent.md) generates the result and returns it as a `TaskOutput` object.
7. **Output Storage:** The [Crew](01_crew.md) receives this `TaskOutput` and stores it, making it available as potential context for future tasks.
Let's visualize the interaction:
```mermaid
sequenceDiagram
participant C as Crew
participant T1 as Task 1
participant R_Agent as Researcher Agent
participant T2 as Task 2
participant P_Agent as Planner Agent
C->>T1: Prepare to Execute
Note right of T1: Task 1 selected
C->>R_Agent: Execute Task(T1.description, T1.expected_output)
R_Agent->>R_Agent: Use LLM, Profile, Tools...
R_Agent-->>C: Return TaskOutput (Cities List)
C->>C: Store TaskOutput from T1
C->>T2: Prepare to Execute
Note right of T2: Task 2 selected
Note right of C: Get Context (Output from T1)
C->>P_Agent: Execute Task(T2.description, T2.expected_output, context=T1_Output)
P_Agent->>P_Agent: Use LLM, Profile, Tools, Context...
P_Agent-->>C: Return TaskOutput (Itinerary)
C->>C: Store TaskOutput from T2
```
**Diving into the Code (`task.py`)**
The `Task` class itself is defined in `crewai/task.py`. It's primarily a container for the information you provide:
```python
# Simplified view from crewai/task.py
from pydantic import BaseModel, Field
from typing import List, Optional, Type, Any
# Import Agent and Tool placeholders for the example
from crewai import BaseAgent, BaseTool
class TaskOutput(BaseModel): # Simplified representation of the result
description: str
raw: str
agent: str
# ... other fields like pydantic, json_dict
class Task(BaseModel):
# Core attributes
description: str = Field(description="Description of the actual task.")
expected_output: str = Field(description="Clear definition of expected output.")
agent: Optional[BaseAgent] = Field(default=None, description="Agent responsible.")
# Optional attributes
context: Optional[List["Task"]] = Field(default=None, description="Context from other tasks.")
tools: Optional[List[BaseTool]] = Field(default_factory=list, description="Task-specific tools.")
async_execution: Optional[bool] = Field(default=False)
output_json: Optional[Type[BaseModel]] = Field(default=None)
output_pydantic: Optional[Type[BaseModel]] = Field(default=None)
output_file: Optional[str] = Field(default=None)
callback: Optional[Any] = Field(default=None) # Function to call after execution
# Internal state
output: Optional[TaskOutput] = Field(default=None, description="Task output after execution")
def execute_sync(
self,
agent: Optional[BaseAgent] = None,
context: Optional[str] = None,
tools: Optional[List[BaseTool]] = None,
) -> TaskOutput:
# 1. Identify the agent to use (passed or self.agent)
agent_to_execute = agent or self.agent
if not agent_to_execute:
raise Exception("No agent assigned to task.")
# 2. Prepare tools (task tools override agent tools if provided)
execution_tools = tools or self.tools or agent_to_execute.tools
# 3. Call the agent's execute_task method
# (The agent handles LLM calls, tool use, etc.)
raw_result = agent_to_execute.execute_task(
task=self, # Pass self (the task object)
context=context,
tools=execution_tools,
)
# 4. Format the output
# (Handles JSON/Pydantic conversion if requested)
pydantic_output, json_output = self._export_output(raw_result)
# 5. Create and return TaskOutput object
task_output = TaskOutput(
description=self.description,
raw=raw_result,
pydantic=pydantic_output,
json_dict=json_output,
agent=agent_to_execute.role,
# ... other fields
)
self.output = task_output # Store the output within the task object
# 6. Execute callback if defined
if self.callback:
self.callback(task_output)
# 7. Save to file if output_file is set
if self.output_file:
# ... logic to save file ...
pass
return task_output
def prompt(self) -> str:
# Combines description and expected output for the agent
return f"{self.description}\n\nExpected Output:\n{self.expected_output}"
# ... other methods like execute_async, _export_output, _save_file ...
```
Key takeaways from the code:
* The `Task` class holds the configuration (`description`, `expected_output`, `agent`, etc.).
* The `execute_sync` (and `execute_async`) method orchestrates the execution *by calling the assigned agent's `execute_task` method*. The task itself doesn't contain the AI logic; it delegates that to the agent.
* It takes the raw result from the agent and wraps it in a `TaskOutput` object, handling formatting (like JSON) and optional actions (callbacks, file saving).
* The `prompt()` method shows how the core instructions are formatted before being potentially combined with context and tool descriptions by the agent.
## Advanced Task Features (A Quick Peek)
While we focused on the basics, `Task` has more capabilities:
* **Asynchronous Execution (`async_execution=True`):** Allows multiple tasks to run concurrently, potentially speeding up your Crew if tasks don't strictly depend on each other's immediate output.
* **Structured Outputs (`output_json`, `output_pydantic`):** Force the agent to return data in a specific Pydantic model or JSON structure, making it easier to use the output programmatically.
* **File Output (`output_file='path/to/output.txt'`):** Automatically save the task's result to a specified file.
* **Conditional Tasks (`ConditionalTask`):** A special type of task (defined in `crewai.tasks.conditional_task`) that only runs if a specific condition (based on the previous task's output) is met. This allows for branching logic in your workflows.
## Conclusion
You've now learned about the `Task` the fundamental unit of work in CrewAI. A `Task` defines *what* needs to be done (`description`), what the result should look like (`expected_output`), and *who* should do it (`agent`). Tasks are the building blocks of your Crew's plan, and their outputs often flow as `context` to subsequent tasks, creating powerful workflows.
We've seen how to define Agents and give them Tasks. But what if an agent needs a specific ability, like searching the internet, calculating something, or reading a specific document? How do we give our agents superpowers? That's where [Tools](04_tool.md) come in! Let's explore them in the next chapter.
**Next:** [Chapter 4: Tool - Equipping Your Agents](04_tool.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

273
output/CrewAI/04_tool.md Normal file
View File

@@ -0,0 +1,273 @@
# Chapter 4: Tool - Equipping Your Agents
In [Chapter 3: Task](03_task.md), we learned how to define specific assignments (`Task`s) for our AI `Agent`s. We told the 'Travel Researcher' agent to find sunny cities and the 'Activity Planner' agent to create an itinerary.
But wait... how does the 'Travel Researcher' actually *find* those cities? Can it browse the web? Can it look at weather data? By default, an [Agent](02_agent.md)'s "brain" ([LLM](06_llm.md)) is great at reasoning and generating text based on the information it already has, but it can't interact with the outside world on its own.
This is where `Tool`s come in! They are the **special equipment and abilities** we give our agents to make them more capable.
## Why Do We Need Tools?
Imagine you hire a brilliant researcher. They can think, analyze, and write reports. But if their task is "Find the best coffee shop near me right now," they need specific tools: maybe a map application, a business directory, or a review website. Without these tools, they can only guess or rely on outdated knowledge.
Similarly, our AI [Agent](02_agent.md)s need `Tool`s to perform actions beyond simple text generation.
* Want your agent to find current information? Give it a **web search tool**.
* Need it to perform calculations? Give it a **calculator tool**.
* Want it to read a specific document? Give it a **file reading tool**.
* Need it to ask another agent for help? Use the built-in **delegation tool** ([AgentTools](tools/agent_tools/agent_tools.py)).
**Problem Solved:** `Tool`s extend an [Agent](02_agent.md)'s capabilities beyond its built-in knowledge, allowing it to interact with external systems, perform specific computations, or access real-time information.
## What is a Tool?
Think of a `Tool` as a **function or capability** that an [Agent](02_agent.md) can choose to use while working on a [Task](03_task.md). Each `Tool` has a few key parts:
1. **`name`**: A short, unique name for the tool (e.g., `web_search`, `calculator`).
2. **`description`**: This is **very important**! It tells the [Agent](02_agent.md) *what the tool does* and *when it should be used*. The agent's [LLM](06_llm.md) reads this description to decide if the tool is appropriate for the current step of its task. A good description is crucial for the agent to use the tool correctly. Example: "Useful for searching the internet for current events or information."
3. **`args_schema`** (Optional): Defines the inputs the tool needs to work. For example, a `web_search` tool would likely need a `query` argument (the search term). This is often defined using Pydantic models.
4. **`_run` method**: This is the actual code that gets executed when the agent uses the tool. It takes the arguments defined in `args_schema` and performs the action (like calling a search API or performing a calculation).
Agents are given a list of `Tool`s they are allowed to use. When an agent is working on a task, its internal thought process might lead it to conclude that it needs a specific capability. It will then look through its available tools, read their descriptions, and if it finds a match, it will figure out the necessary arguments and execute the tool's `_run` method.
## Equipping an Agent with a Tool
CrewAI integrates with many existing toolkits, like `crewai_tools` (install separately: `pip install 'crewai[tools]'`). Let's give our 'Travel Researcher' agent a web search tool. We'll use `SerperDevTool` as an example, which uses the Serper.dev API for Google Search results.
*(Note: Using tools like this often requires API keys. You'll need to sign up for Serper.dev and set the `SERPER_API_KEY` environment variable for this specific example to run.)*
```python
# Make sure you have crewai and crewai_tools installed
# pip install crewai crewai_tools
import os
from crewai import Agent
from crewai_tools import SerperDevTool
# Set up your API key (replace with your actual key or environment variable setup)
# IMPORTANT: Do NOT hardcode keys in production code! Use environment variables.
# os.environ["SERPER_API_KEY"] = "YOUR_SERPER_API_KEY"
# 1. Instantiate the tool
# (It automatically gets a name and description)
search_tool = SerperDevTool()
# 2. Define the agent and provide the tool in the 'tools' list
researcher = Agent(
role='Expert Travel Researcher',
goal='Find the three most exciting and sunny European cities for a birthday trip in late May.',
backstory=(
"You are a world-class travel researcher with deep knowledge of "
"European destinations. You excel at finding hidden gems and understanding "
"weather patterns. Your recommendations are always insightful and tailored. "
"You MUST use the web search tool to find the most up-to-date information." # Added instruction
),
verbose=True,
allow_delegation=False,
tools=[search_tool] # <-- Give the agent access to the tool!
# llm=your_llm # Assumes an LLM is configured (see Chapter 6)
)
# (You would then create a Task and a Crew as shown in previous chapters)
# task1 = Task(description="Find 3 sunny European cities for May...", agent=researcher, ...)
# trip_crew = Crew(agents=[researcher], tasks=[task1], ...)
# result = trip_crew.kickoff()
# print(result)
```
**Explanation:**
1. We import `SerperDevTool` from `crewai_tools`.
2. We create an instance of the tool: `search_tool = SerperDevTool()`. This object now holds the `name` ("Search"), `description` ("A tool that can be used to search the internet..."), and the `_run` logic to call the Serper API.
3. When defining our `researcher` [Agent](02_agent.md), we pass a list containing `search_tool` to the `tools` parameter: `tools=[search_tool]`.
4. We also updated the agent's `backstory` to explicitly encourage using the tool, which can sometimes help guide the agent.
**Expected Outcome (Conceptual):**
When this `researcher` agent runs a [Task](03_task.md) like "Find 3 sunny European cities for May...", because `verbose=True`, you would see its thought process. It might look something like this (simplified):
```
Thought: I need to find sunny European cities for May. The best way to get current information is to search the web. I have a 'Search' tool available. I should use it.
Action: Search
Action Input: {"query": "best sunny European cities May weather culture food"}
[... Agent waits for the tool to run ...]
Observation: [Search results mentioning Lisbon, Seville, Malta, Athens, etc. with details]
Thought: Okay, the search results suggest Lisbon, Seville, and Malta are good options based on sun, culture, and food. I will summarize these findings as requested.
Final Answer: Here are the top 3 sunny European cities for May... 1. Lisbon... 2. Seville... 3. Malta...
```
The agent used the tool's `description` to know when to use it, formulated the necessary input (`query`), executed the tool, received the `Observation` (the tool's output), and then used that information to generate its `Final Answer`.
## How Tools Work "Under the Hood"
When an [Agent](02_agent.md) equipped with tools runs a [Task](03_task.md), a fascinating interaction happens between the Agent, its [LLM](06_llm.md) brain, and the Tools.
1. **Task Received:** The Agent gets the task description and any context.
2. **Initial Thought:** The Agent's [LLM](06_llm.md) thinks about the task and its profile (`role`, `goal`, `backstory`). It formulates an initial plan.
3. **Need for Capability:** The LLM might realize it needs information it doesn't have (e.g., "What's the weather like *right now*?") or needs to perform an action (e.g., "Calculate 5 factorial").
4. **Tool Selection:** The Agent provides its [LLM](06_llm.md) with the list of available `Tool`s, including their `name`s and crucially, their `description`s. The LLM checks if any tool description matches the capability it needs.
5. **Tool Invocation Decision:** If the LLM finds a suitable tool (e.g., it needs to search, and finds the `Search` tool whose description says "Useful for searching the internet"), it decides to use it. It outputs a special message indicating the tool name and the arguments (based on the tool's `args_schema`).
6. **Tool Execution:** The CrewAI framework intercepts this special message. It finds the corresponding `Tool` object and calls its `run()` method, passing the arguments the LLM provided.
7. **Action Performed:** The tool's `_run()` method executes its code (e.g., calls an external API, runs a calculation).
8. **Result Returned:** The tool's `_run()` method returns its result (e.g., the text of the search results, the calculated number).
9. **Observation Provided:** The CrewAI framework takes the tool's result and feeds it back to the Agent's [LLM](06_llm.md) as an "Observation".
10. **Continued Thought:** The LLM now has new information from the tool. It incorporates this observation into its thinking and continues working on the task, potentially deciding to use another tool or generate the final answer.
Let's visualize this flow for our researcher using the search tool:
```mermaid
sequenceDiagram
participant A as Agent
participant LLM as Agent's Brain
participant ST as Search Tool
A->>LLM: Task: "Find sunny cities..." Plan?
LLM-->>A: Plan: Need current info. Search web for "sunny European cities May".
A->>A: Check tools: Found 'Search' tool (description matches).
A->>LLM: Format request for 'Search' tool. Query?
LLM-->>A: Output: Use Tool 'Search' with args {"query": "sunny European cities May"}
A->>ST: run(query="sunny European cities May")
Note right of ST: ST._run() calls Serper API...
ST-->>A: Return results: "Lisbon (Sunny...), Seville (Hot...), Malta (Warm...)"
A->>LLM: Observation: Got results "Lisbon...", "Seville...", "Malta..."
LLM-->>A: Thought: Use these results to formulate the final list.
LLM-->>A: Final Answer: "Based on recent web search, the top cities are..."
```
**Diving into the Code (`tools/base_tool.py`)**
The foundation for all tools is the `BaseTool` class (found in `crewai/tools/base_tool.py`). When you use a pre-built tool or create your own, it typically inherits from this class.
```python
# Simplified view from crewai/tools/base_tool.py
from abc import ABC, abstractmethod
from typing import Type, Optional, Any
from pydantic import BaseModel, Field
class BaseTool(BaseModel, ABC):
# Configuration for the tool
name: str = Field(description="The unique name of the tool.")
description: str = Field(description="What the tool does, how/when to use it.")
args_schema: Optional[Type[BaseModel]] = Field(
default=None, description="Pydantic schema for the tool's arguments."
)
# ... other options like caching ...
# This method contains the actual logic
@abstractmethod
def _run(self, *args: Any, **kwargs: Any) -> Any:
"""The core implementation of the tool's action."""
pass
# This method is called by the agent execution framework
def run(self, *args: Any, **kwargs: Any) -> Any:
"""Executes the tool's core logic."""
# Could add logging, error handling, caching calls here
print(f"----- Executing Tool: {self.name} -----") # Example logging
result = self._run(*args, **kwargs)
print(f"----- Tool {self.name} Finished -----")
return result
# Helper method to generate a structured description for the LLM
def _generate_description(self):
# Creates a detailed description including name, args, and description
# This is what the LLM sees to decide if it should use the tool
pass
# ... other helper methods ...
# You can create a simple tool using the 'Tool' class directly
# or inherit from BaseTool for more complex logic.
from typing import Type
class SimpleTool(BaseTool):
name: str = "MySimpleTool"
description: str = "A very simple example tool."
# No args_schema needed if it takes no arguments
def _run(self) -> str:
return "This simple tool was executed successfully!"
```
Key takeaways:
* `BaseTool` requires `name` and `description`.
* `args_schema` defines the expected input structure (using Pydantic).
* The actual logic lives inside the `_run` method.
* The `run` method is the entry point called by the framework.
* The framework (`crewai/tools/tool_usage.py` and `crewai/agents/executor.py`) handles the complex part: presenting tools to the LLM, parsing the LLM's decision to use a tool, calling `tool.run()`, and feeding the result back.
A special mention goes to `AgentTools` (`crewai/tools/agent_tools/agent_tools.py`), which provides tools like `Delegate work to coworker` and `Ask question to coworker`, enabling agents within a [Crew](01_crew.md) to collaborate.
## Creating Your Own Simple Tool (Optional)
While CrewAI offers many pre-built tools, sometimes you need a custom one. Let's create a *very* basic calculator.
```python
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type
import math # Using math module for safety
# 1. Define the input schema using Pydantic
class CalculatorInput(BaseModel):
expression: str = Field(description="The mathematical expression to evaluate (e.g., '2 + 2 * 4').")
# 2. Create the Tool class, inheriting from BaseTool
class CalculatorTool(BaseTool):
name: str = "Calculator"
description: str = "Useful for evaluating simple mathematical expressions involving numbers, +, -, *, /, and parentheses."
args_schema: Type[BaseModel] = CalculatorInput # Link the input schema
def _run(self, expression: str) -> str:
"""Evaluates the mathematical expression."""
allowed_chars = "0123456789+-*/(). "
if not all(c in allowed_chars for c in expression):
return "Error: Expression contains invalid characters."
try:
# VERY IMPORTANT: eval() is dangerous with arbitrary user input.
# In a real application, use a safer parsing library like 'numexpr' or build your own parser.
# This is a simplified example ONLY.
result = eval(expression, {"__builtins__": None}, {"math": math}) # Safer eval
return f"The result of '{expression}' is {result}"
except Exception as e:
return f"Error evaluating expression '{expression}': {e}"
# 3. Instantiate and use it in an agent
calculator = CalculatorTool()
math_agent = Agent(
role='Math Whiz',
goal='Calculate the results of mathematical expressions accurately.',
backstory='You are an expert mathematician agent.',
tools=[calculator], # Give the agent the calculator
verbose=True
)
# Example Task for this agent:
# math_task = Task(description="What is the result of (5 + 3) * 6 / 2?", agent=math_agent)
```
**Explanation:**
1. We define `CalculatorInput` using Pydantic to specify that the tool needs an `expression` string. The `description` here helps the LLM understand what kind of string to provide.
2. We create `CalculatorTool` inheriting from `BaseTool`. We set `name`, `description`, and link `args_schema` to our `CalculatorInput`.
3. The `_run` method takes the `expression` string. We added a basic safety check and used a slightly safer version of `eval`. **Again, `eval` is generally unsafe; prefer dedicated math parsing libraries in production.** It returns the result as a string.
4. We can now instantiate `CalculatorTool()` and add it to an agent's `tools` list.
## Conclusion
You've learned about `Tool`s the essential equipment that gives your AI [Agent](02_agent.md)s superpowers! Tools allow agents to perform actions like searching the web, doing calculations, or interacting with other systems, making them vastly more useful than agents that can only generate text. We saw how to equip an agent with pre-built tools and even how to create a simple custom tool by defining its `name`, `description`, `args_schema`, and `_run` method. The `description` is key for the agent to know when and how to use its tools effectively.
Now that we have Agents equipped with Tools and assigned Tasks, how does the whole [Crew](01_crew.md) actually coordinate the work? Do agents work one after another? Is there a manager? That's determined by the `Process`. Let's explore that next!
**Next:** [Chapter 5: Process - Orchestrating the Workflow](05_process.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

297
output/CrewAI/05_process.md Normal file
View File

@@ -0,0 +1,297 @@
# Chapter 5: Process - Orchestrating the Workflow
In [Chapter 4: Tool](04_tool.md), we learned how to give our [Agent](02_agent.md)s special abilities using `Tool`s, like searching the web. Now we have specialized agents, defined tasks, and equipped agents. But how do they actually *work together*? Does Agent 1 finish its work before Agent 2 starts? Or is there a manager overseeing everything?
This coordination is handled by the **`Process`**.
## Why Do We Need a Process?
Think back to our trip planning [Crew](01_crew.md). We have a 'Travel Researcher' agent and an 'Activity Planner' agent.
* **Scenario 1:** Maybe the Researcher needs to find the city *first*, and *then* the Planner creates the itinerary for that specific city. The work happens in a specific order.
* **Scenario 2:** Maybe we have a more complex project with many agents (Researcher, Planner, Booker, Budgeter). Perhaps we want a 'Project Manager' agent to receive the main goal, decide which agent needs to do what first, review their work, and then assign the next step.
The way the agents collaborate and the order in which [Task](03_task.md)s are executed is crucial for success. A well-defined `Process` ensures work flows smoothly and efficiently.
**Problem Solved:** `Process` defines the strategy or workflow the [Crew](01_crew.md) uses to execute its [Task](03_task.md)s. It dictates how [Agent](02_agent.md)s collaborate and how information moves between them.
## What is a Process?
Think of the `Process` as the **project management style** for your [Crew](01_crew.md). It determines the overall flow of work. CrewAI primarily supports two types of processes:
1. **`Process.sequential`**:
* **Analogy:** Like following a recipe or a checklist.
* **How it works:** Tasks are executed one after another, in the exact order you list them in the `Crew` definition. The output of the first task automatically becomes available as context for the second task, the output of the second for the third, and so on.
* **Best for:** Simple, linear workflows where each step clearly follows the previous one.
2. **`Process.hierarchical`**:
* **Analogy:** Like a traditional company structure with a manager.
* **How it works:** You designate a "manager" [Agent](02_agent.md) (usually by providing a specific `manager_llm` or a custom `manager_agent` to the `Crew`). This manager receives the overall goal and the list of tasks. It then analyzes the tasks and decides which *worker* agent should perform which task, potentially breaking them down or reordering them. The manager delegates work, reviews results, and coordinates the team until the goal is achieved.
* **Best for:** More complex projects where task order might change, delegation is needed, or a central coordinator can optimize the workflow.
Choosing the right `Process` is key to structuring how your agents interact.
## How to Use Process
You define the process when you create your `Crew`, using the `process` parameter.
### Sequential Process
This is the default and simplest process. We already used it in [Chapter 1](01_crew.md)!
```python
# Assuming 'researcher' and 'planner' agents are defined (from Chapter 2)
# Assuming 'task1' (find cities) and 'task2' (create itinerary) are defined (from Chapter 3)
# task1 assigned to researcher, task2 assigned to planner
from crewai import Crew, Process
# Define the crew with a sequential process
trip_crew = Crew(
agents=[researcher, planner],
tasks=[task1, task2],
process=Process.sequential # Explicitly setting the sequential process
# verbose=2 # Optional verbosity
)
# Start the work
# result = trip_crew.kickoff()
# print(result)
```
**Explanation:**
* We import `Crew` and `Process`.
* When creating the `trip_crew`, we pass our list of `agents` and `tasks`.
* We set `process=Process.sequential`.
* When `kickoff()` is called:
1. `task1` (Find Cities) is executed by the `researcher`.
2. The output of `task1` (the list of cities) is automatically passed as context.
3. `task2` (Create Itinerary) is executed by the `planner`, using the cities list from `task1`.
4. The final output of `task2` is returned.
It's simple and predictable: Task 1 -> Task 2 -> Done.
### Hierarchical Process
For this process, the `Crew` needs a manager. You usually specify the language model the manager should use (`manager_llm`). The manager agent is created internally by CrewAI using this LLM.
```python
# Assuming 'researcher' and 'planner' agents are defined
# Assuming 'task1' and 'task2' are defined (WITHOUT necessarily assigning agents initially)
# You need an LLM configured (e.g., from OpenAI, Ollama - see Chapter 6)
# from langchain_openai import ChatOpenAI # Example LLM
from crewai import Crew, Process, Task
# Example tasks (agent assignment might be handled by the manager)
task1 = Task(description='Find top 3 European cities for a sunny May birthday trip.', expected_output='List of 3 cities with justifications.')
task2 = Task(description='Create a 3-day itinerary for the best city found.', expected_output='Detailed 3-day plan.')
# Define the crew with a hierarchical process and a manager LLM
hierarchical_crew = Crew(
agents=[researcher, planner], # The worker agents
tasks=[task1, task2], # The tasks to be managed
process=Process.hierarchical, # Set the process to hierarchical
manager_llm=ChatOpenAI(model="gpt-4") # Specify the LLM for the manager agent
# You could also provide a pre-configured manager_agent instance instead of manager_llm
)
# Start the work
# result = hierarchical_crew.kickoff()
# print(result)
```
**Explanation:**
* We set `process=Process.hierarchical`.
* We provide a list of worker `agents` (`researcher`, `planner`).
* We provide the `tasks` that need to be accomplished. Note that for the hierarchical process, you *might* not need to assign agents directly to tasks, as the manager can decide who is best suited. However, assigning them can still provide hints to the manager.
* Crucially, we provide `manager_llm`. CrewAI will use this LLM to create an internal 'Manager Agent'. This agent's implicit goal is to orchestrate the `agents` to complete the `tasks`.
* When `kickoff()` is called:
1. The internal Manager Agent analyzes `task1` and `task2` and the available agents (`researcher`, `planner`).
2. It decides which agent should do `task1` (likely the `researcher`). It delegates the task using internal tools (like `AgentTools`).
3. It receives the result from the `researcher`.
4. It analyzes the result and decides the next step likely delegating `task2` to the `planner`, providing the context from `task1`.
5. It receives the result from the `planner`.
6. Once all tasks are deemed complete by the manager, it compiles and returns the final result.
This process is more dynamic, allowing the manager to adapt the workflow.
## How Process Works "Under the Hood"
When you call `crew.kickoff()`, the first thing the `Crew` does is check its `process` attribute to determine the execution strategy.
1. **Input & Setup:** `kickoff()` prepares the agents and tasks, interpolating any initial inputs.
2. **Process Check:** It looks at `crew.process`.
3. **Execution Path:**
* If `Process.sequential`, it calls an internal method like `_run_sequential_process()`.
* If `Process.hierarchical`, it first ensures a manager agent exists (creating one if `manager_llm` was provided) and then calls a method like `_run_hierarchical_process()`.
4. **Task Loop (Sequential):** `_run_sequential_process()` iterates through the `tasks` list in order. For each task, it finds the assigned agent, gathers context from the *previous* task's output, and asks the agent to execute the task.
5. **Managed Execution (Hierarchical):** `_run_hierarchical_process()` delegates control to the manager agent. The manager agent, using its LLM and specialized delegation tools (like `AgentTools`), decides which task to tackle next and which worker agent to assign it to. It manages the flow until all tasks are completed.
6. **Output:** The final result (usually the output of the last task) is packaged and returned.
### Visualization
Let's visualize the difference:
**Sequential Process:**
```mermaid
sequenceDiagram
participant User
participant MyCrew as Crew (Sequential)
participant ResearcherAgent as Researcher
participant PlannerAgent as Planner
User->>MyCrew: kickoff()
MyCrew->>ResearcherAgent: Execute Task 1 ("Find cities")
ResearcherAgent-->>MyCrew: Task 1 Output (Cities List)
MyCrew->>PlannerAgent: Execute Task 2 ("Create itinerary")\nwith Task 1 Output context
PlannerAgent-->>MyCrew: Task 2 Output (Itinerary)
MyCrew-->>User: Final Result (Task 2 Output)
```
**Hierarchical Process:**
```mermaid
sequenceDiagram
participant User
participant MyCrew as Crew (Hierarchical)
participant ManagerAgent as Manager
participant ResearcherAgent as Researcher
participant PlannerAgent as Planner
User->>MyCrew: kickoff()
MyCrew->>ManagerAgent: Goal: Plan Trip (Tasks: Find Cities, Create Itinerary)
ManagerAgent->>ManagerAgent: Decide: Researcher should do Task 1
ManagerAgent->>ResearcherAgent: Delegate: Execute Task 1 ("Find cities")
ResearcherAgent-->>ManagerAgent: Task 1 Output (Cities List)
ManagerAgent->>ManagerAgent: Decide: Planner should do Task 2 with context
ManagerAgent->>PlannerAgent: Delegate: Execute Task 2 ("Create itinerary", Cities List)
PlannerAgent-->>ManagerAgent: Task 2 Output (Itinerary)
ManagerAgent->>MyCrew: Report Final Result (Itinerary)
MyCrew-->>User: Final Result (Itinerary)
```
### Diving into the Code (`crew.py`)
The `Crew` class in `crewai/crew.py` holds the logic.
```python
# Simplified view from crewai/crew.py
from crewai.process import Process
from crewai.task import Task
from crewai.agents.agent_builder.base_agent import BaseAgent
# ... other imports
class Crew(BaseModel):
# ... other fields like agents, tasks ...
process: Process = Field(default=Process.sequential)
manager_llm: Optional[Any] = Field(default=None)
manager_agent: Optional[BaseAgent] = Field(default=None)
# ... other fields ...
@model_validator(mode="after")
def check_manager_llm(self):
# Ensures manager_llm or manager_agent is set for hierarchical process
if self.process == Process.hierarchical:
if not self.manager_llm and not self.manager_agent:
raise PydanticCustomError(
"missing_manager_llm_or_manager_agent",
"Attribute `manager_llm` or `manager_agent` is required when using hierarchical process.",
{},
)
return self
def kickoff(self, inputs: Optional[Dict[str, Any]] = None) -> CrewOutput:
# ... setup, input interpolation, callback setup ...
# THE CORE DECISION BASED ON PROCESS:
if self.process == Process.sequential:
result = self._run_sequential_process()
elif self.process == Process.hierarchical:
# Ensure manager is ready before running
self._create_manager_agent() # Creates manager if needed
result = self._run_hierarchical_process()
else:
raise NotImplementedError(f"Process '{self.process}' not implemented.")
# ... calculate usage metrics, final formatting ...
return result
def _run_sequential_process(self) -> CrewOutput:
task_outputs = []
for task_index, task in enumerate(self.tasks):
agent = task.agent # Get assigned agent
# ... handle conditional tasks, async tasks ...
context = self._get_context(task, task_outputs) # Get previous output
output = task.execute_sync(agent=agent, context=context) # Run task
task_outputs.append(output)
# ... logging/callbacks ...
return self._create_crew_output(task_outputs)
def _run_hierarchical_process(self) -> CrewOutput:
# This actually delegates the orchestration to the manager agent.
# The manager agent uses its LLM and tools (AgentTools)
# to call the worker agents sequentially or in parallel as it sees fit.
manager = self.manager_agent
# Simplified concept: Manager executes a "meta-task"
# whose goal is to complete the crew's tasks using available agents.
# The actual implementation involves the manager agent's execution loop.
return self._execute_tasks(self.tasks) # The manager guides this execution internally
def _create_manager_agent(self):
# Logic to setup the self.manager_agent instance, either using
# the provided self.manager_agent or creating a default one
# using self.manager_llm and AgentTools(agents=self.agents).
if self.manager_agent is None and self.manager_llm:
# Simplified: Create a default manager agent here
# It gets tools to delegate work to self.agents
self.manager_agent = Agent(
role="Crew Manager",
goal="Coordinate the crew to achieve their goals.",
backstory="An expert project manager.",
llm=self.manager_llm,
tools=AgentTools(agents=self.agents).tools(), # Gives it delegation capability
allow_delegation=True, # Must be true for manager
verbose=self.verbose
)
self.manager_agent.crew = self # Link back to crew
# Ensure manager has necessary setup...
pass
def _execute_tasks(self, tasks: List[Task], ...) -> CrewOutput:
"""Internal method used by both sequential and hierarchical processes
to iterate through tasks. In hierarchical, the manager agent influences
which agent runs which task via delegation tools."""
# ... loops through tasks, gets agent (directly for seq, via manager for hier), executes ...
pass
# ... other helper methods like _get_context, _create_crew_output ...
```
Key takeaways from the code:
* The `Crew` stores the `process` type (`sequential` or `hierarchical`).
* A validation (`check_manager_llm`) ensures a manager (`manager_llm` or `manager_agent`) is provided if `process` is `hierarchical`.
* The `kickoff` method explicitly checks `self.process` to decide which internal execution method (`_run_sequential_process` or `_run_hierarchical_process`) to call.
* `_run_sequential_process` iterates through tasks in order.
* `_run_hierarchical_process` relies on the `manager_agent` (created by `_create_manager_agent` if needed) to manage the task execution flow, often using delegation tools.
## Conclusion
You've now learned about the `Process` - the crucial setting that defines *how* your [Crew](01_crew.md) collaborates.
* **`Sequential`** is like a checklist: tasks run one by one, in order, with outputs flowing directly to the next task. Simple and predictable.
* **`Hierarchical`** is like having a manager: a dedicated manager [Agent](02_agent.md) coordinates the worker agents, deciding who does what and when. More flexible for complex workflows.
Choosing the right process helps structure your agent interactions effectively.
So far, we've built the team ([Agent](02_agent.md)), defined the work ([Task](03_task.md)), given them abilities ([Tool](04_tool.md)), and decided on the workflow ([Process](05_process.md)). But what powers the "thinking" part of each agent? What is the "brain" that understands roles, goals, backstories, and uses tools? That's the Large Language Model, or [LLM](06_llm.md). Let's dive into that next!
**Next:** [Chapter 6: LLM - The Agent's Brain](06_llm.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

330
output/CrewAI/06_llm.md Normal file
View File

@@ -0,0 +1,330 @@
# Chapter 6: LLM - The Agent's Brain
In the [previous chapter](05_process.md), we explored the `Process` - how the `Crew` organizes the workflow for its `Agent`s, deciding whether they work sequentially or are managed hierarchically. We now have specialized agents ([Agent](02_agent.md)), defined work ([Task](03_task.md)), useful abilities ([Tool](04_tool.md)), and a workflow strategy ([Process](05_process.md)).
But what actually does the *thinking* inside an agent? When we give the 'Travel Researcher' agent the task "Find sunny European cities," what part of the agent understands this request, decides to use the search tool, interprets the results, and writes the final list?
This core thinking component is the **Large Language Model**, or **LLM**.
## Why Do Agents Need an LLM?
Imagine our 'Travel Researcher' agent again. It has a `role`, `goal`, and `backstory`. It has a `Task` to complete and maybe a `Tool` to search the web. But it needs something to:
1. **Understand:** Read the task description, its own role/goal, and any context from previous tasks.
2. **Reason:** Figure out a plan. "Okay, I need sunny cities. My description says I'm an expert. The task asks for 3. I should use the search tool to get current info."
3. **Act:** Decide *when* to use a tool and *what* input to give it (e.g., formulate the search query).
4. **Generate:** Take the information (search results, its own knowledge) and write the final output in the expected format.
The LLM is the engine that performs all these cognitive actions. It's the "brain" that drives the agent's behavior based on the instructions and tools provided.
**Problem Solved:** The LLM provides the core intelligence for each `Agent`. It processes language, makes decisions (like which tool to use or what text to generate), and ultimately enables the agent to perform its assigned `Task` based on its defined profile.
## What is an LLM in CrewAI?
Think of an LLM as a highly advanced, versatile AI assistant you can interact with using text. Models like OpenAI's GPT-4, Google's Gemini, Anthropic's Claude, or open-source models run locally via tools like Ollama are all examples of LLMs. They are trained on vast amounts of text data and can understand instructions, answer questions, write text, summarize information, and even make logical deductions.
In CrewAI, the `LLM` concept is an **abstraction**. CrewAI itself doesn't *include* these massive language models. Instead, it provides a standardized way to **connect to and interact with** various LLMs, whether they are hosted by companies like OpenAI or run on your own computer.
**How CrewAI Handles LLMs:**
* **`litellm` Integration:** CrewAI uses a fantastic library called `litellm` under the hood. `litellm` acts like a universal translator, allowing CrewAI to talk to over 100 different LLM providers (OpenAI, Azure OpenAI, Gemini, Anthropic, Ollama, Hugging Face, etc.) using a consistent interface. This means you can easily switch the "brain" of your agents without rewriting large parts of your code.
* **Standard Interface:** The CrewAI `LLM` abstraction (often represented by helper classes or configuration settings) simplifies how you specify which model to use and how it should behave. It handles common parameters like:
* `model`: The specific name of the LLM you want to use (e.g., `"gpt-4o"`, `"ollama/llama3"`, `"gemini-pro"`).
* `temperature`: Controls the randomness (creativity) of the output. Lower values (e.g., 0.1) make the output more deterministic and focused, while higher values (e.g., 0.8) make it more creative but potentially less factual.
* `max_tokens`: The maximum number of words (tokens) the LLM should generate in its response.
* **API Management:** It manages the technical details of sending requests to the chosen LLM provider and receiving the responses.
Essentially, CrewAI lets you plug in the LLM brain of your choice for your agents.
## Configuring an LLM for Your Crew
You need to tell CrewAI which LLM(s) your agents should use. There are several ways to do this, ranging from letting CrewAI detect settings automatically to explicitly configuring specific models.
**1. Automatic Detection (Environment Variables)**
Often the easiest way for common models like OpenAI's is to set environment variables. CrewAI (via `litellm`) can pick these up automatically.
If you set these in your system or a `.env` file:
```bash
# Example .env file
OPENAI_API_KEY="sk-your_openai_api_key_here"
# Optional: Specify the model, otherwise it uses a default like gpt-4o
OPENAI_MODEL_NAME="gpt-4o"
```
Then, often you don't need to specify the LLM explicitly in your code:
```python
# agent.py (simplified)
from crewai import Agent
# If OPENAI_API_KEY and OPENAI_MODEL_NAME are set in the environment,
# CrewAI might automatically configure an OpenAI LLM for this agent.
researcher = Agent(
role='Travel Researcher',
goal='Find interesting cities in Europe',
backstory='Expert researcher.',
# No 'llm=' parameter needed here if env vars are set
)
```
**2. Explicit Configuration (Recommended for Clarity)**
It's usually better to be explicit about which LLM you want to use. CrewAI integrates well with LangChain's LLM wrappers, which are commonly used.
**Example: Using OpenAI (GPT-4o)**
```python
# Make sure you have langchain_openai installed: pip install langchain-openai
import os
from langchain_openai import ChatOpenAI
from crewai import Agent
# Set the API key (best practice: use environment variables)
# os.environ["OPENAI_API_KEY"] = "sk-your_key_here"
# Instantiate the OpenAI LLM wrapper
openai_llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
# Pass the configured LLM to the Agent
researcher = Agent(
role='Travel Researcher',
goal='Find interesting cities in Europe',
backstory='Expert researcher.',
llm=openai_llm # Explicitly assign the LLM
)
# You can also assign a default LLM to the Crew
# from crewai import Crew
# trip_crew = Crew(
# agents=[researcher],
# tasks=[...],
# # Manager LLM for hierarchical process
# manager_llm=openai_llm
# # A function_calling_llm can also be set for tool use reasoning
# # function_calling_llm=openai_llm
# )
```
**Explanation:**
* We import `ChatOpenAI` from `langchain_openai`.
* We create an instance, specifying the `model` name and optionally other parameters like `temperature`.
* We pass this `openai_llm` object to the `llm` parameter when creating the `Agent`. This agent will now use GPT-4o for its thinking.
* You can also assign LLMs at the `Crew` level, especially the `manager_llm` for hierarchical processes or a default `function_calling_llm` which helps agents decide *which* tool to use.
**Example: Using a Local Model via Ollama (Llama 3)**
If you have Ollama running locally with a model like Llama 3 pulled (`ollama pull llama3`):
```python
# Make sure you have langchain_community installed: pip install langchain-community
from langchain_community.llms import Ollama
from crewai import Agent
# Instantiate the Ollama LLM wrapper
# Make sure Ollama server is running!
ollama_llm = Ollama(model="llama3", base_url="http://localhost:11434")
# temperature, etc. can also be set if supported by the model/wrapper
# Pass the configured LLM to the Agent
local_researcher = Agent(
role='Travel Researcher',
goal='Find interesting cities in Europe',
backstory='Expert researcher.',
llm=ollama_llm # Use the local Llama 3 model
)
```
**Explanation:**
* We import `Ollama` from `langchain_community.llms`.
* We create an instance, specifying the `model` name ("llama3" in this case, assuming it's available in your Ollama setup) and the `base_url` where your Ollama server is running.
* We pass `ollama_llm` to the `Agent`. Now, this agent's "brain" runs entirely on your local machine!
**CrewAI's `LLM` Class (Advanced/Direct `litellm` Usage)**
CrewAI also provides its own `LLM` class (`from crewai import LLM`) which allows more direct configuration using `litellm` parameters. This is less common for beginners than using the LangChain wrappers shown above, but offers fine-grained control.
**Passing LLMs to the Crew**
Besides assigning an LLM to each agent individually, you can set defaults or specific roles at the `Crew` level:
```python
from crewai import Crew, Process
from langchain_openai import ChatOpenAI
# Assume agents 'researcher', 'planner' and tasks 'task1', 'task2' are defined
openai_llm = ChatOpenAI(model="gpt-4o")
fast_llm = ChatOpenAI(model="gpt-3.5-turbo") # Maybe a faster/cheaper model
trip_crew = Crew(
agents=[researcher, planner], # Agents might have their own LLMs assigned too
tasks=[task1, task2],
process=Process.hierarchical,
# The Manager agent will use gpt-4o
manager_llm=openai_llm,
# Use gpt-3.5-turbo specifically for deciding which tool to use (can save costs)
function_calling_llm=fast_llm
)
```
* `manager_llm`: Specifies the brain for the manager agent in a hierarchical process.
* `function_calling_llm`: Specifies the LLM used by agents primarily to decide *which tool to call* and *with what arguments*. This can sometimes be a faster/cheaper model than the one used for generating the final detailed response. If not set, agents typically use their main `llm`.
If an agent doesn't have an `llm` explicitly assigned, it might inherit the `function_calling_llm` or default to environment settings. It's usually clearest to assign LLMs explicitly where needed.
## How LLM Interaction Works Internally
When an [Agent](02_agent.md) needs to think (e.g., execute a [Task](03_task.md)), the process looks like this:
1. **Prompt Assembly:** The `Agent` gathers all relevant information: its `role`, `goal`, `backstory`, the `Task` description, `expected_output`, any `context` from previous tasks, and the descriptions of its available `Tool`s. It assembles this into a detailed prompt.
2. **LLM Object Call:** The `Agent` passes this prompt to its configured `LLM` object (e.g., the `ChatOpenAI` instance or the `Ollama` instance we created).
3. **`litellm` Invocation:** The CrewAI/LangChain `LLM` object uses `litellm`'s `completion` function, passing the assembled prompt (formatted as messages), the target `model` name, and other parameters (`temperature`, `max_tokens`, `tools`, etc.).
4. **API Request:** `litellm` handles the specifics of communicating with the target LLM's API (e.g., sending a request to OpenAI's API endpoint or the local Ollama server).
5. **LLM Processing:** The actual LLM (GPT-4, Llama 3, etc.) processes the request.
6. **API Response:** The LLM provider sends back the response (which could be generated text or a decision to use a specific tool with certain arguments).
7. **`litellm` Response Handling:** `litellm` receives the API response and standardizes it.
8. **LLM Object Response:** The `LLM` object receives the standardized response from `litellm`.
9. **Result to Agent:** The `LLM` object returns the result (text or tool call information) back to the `Agent`.
10. **Agent Action:** The `Agent` then either uses the generated text as its output or, if the LLM decided to use a tool, it executes the specified tool.
Let's visualize this:
```mermaid
sequenceDiagram
participant Agent
participant LLM_Object as LLM Object (e.g., ChatOpenAI)
participant LiteLLM
participant ProviderAPI as Actual LLM API (e.g., OpenAI)
Agent->>Agent: Assemble Prompt (Role, Goal, Task, Tools...)
Agent->>LLM_Object: call(prompt, tools_schema)
LLM_Object->>LiteLLM: litellm.completion(model, messages, ...)
LiteLLM->>ProviderAPI: Send API Request
ProviderAPI-->>LiteLLM: Receive API Response (text or tool_call)
LiteLLM-->>LLM_Object: Standardized Response
LLM_Object-->>Agent: Result (text or tool_call)
Agent->>Agent: Process Result (Output text or Execute tool)
```
**Diving into the Code (`llm.py`, `utilities/llm_utils.py`)**
The primary logic resides in `crewai/llm.py` and the helper `crewai/utilities/llm_utils.py`.
* **`crewai/utilities/llm_utils.py`:** The `create_llm` function is key. It handles the logic of figuring out which LLM to instantiate based on environment variables, direct `LLM` object input, or string names. It tries to create an `LLM` instance.
* **`crewai/llm.py`:**
* The `LLM` class itself holds the configuration (`model`, `temperature`, etc.).
* The `call` method is the main entry point. It takes the `messages` (the prompt) and optional `tools`.
* It calls `_prepare_completion_params` to format the request parameters based on the LLM's requirements and the provided configuration.
* Crucially, it then calls `litellm.completion(**params)`. This is where the magic happens `litellm` takes over communication with the actual LLM API.
* It handles the response from `litellm`, checking for text content or tool calls (`_handle_non_streaming_response` or `_handle_streaming_response`).
* It uses helper methods like `_format_messages_for_provider` to deal with quirks of different LLMs (like Anthropic needing a 'user' message first).
```python
# Simplified view from crewai/llm.py
# Import litellm and other necessary modules
import litellm
from typing import List, Dict, Optional, Union, Any
class LLM:
def __init__(self, model: str, temperature: Optional[float] = 0.7, **kwargs):
self.model = model
self.temperature = temperature
# ... store other parameters like max_tokens, api_key, base_url ...
self.additional_params = kwargs
self.stream = False # Default to non-streaming
def _prepare_completion_params(self, messages, tools=None) -> Dict[str, Any]:
# Formats messages based on provider (e.g., Anthropic)
formatted_messages = self._format_messages_for_provider(messages)
params = {
"model": self.model,
"messages": formatted_messages,
"temperature": self.temperature,
"tools": tools,
"stream": self.stream,
# ... add other stored parameters (max_tokens, api_key etc.) ...
**self.additional_params,
}
# Remove None values
return {k: v for k, v in params.items() if v is not None}
def call(self, messages, tools=None, callbacks=None, available_functions=None) -> Union[str, Any]:
# ... (emit start event, validate params) ...
try:
# Prepare the parameters for litellm
params = self._prepare_completion_params(messages, tools)
# Decide whether to stream or not (simplified here)
if self.stream:
# Handles chunk processing, tool calls from stream end
return self._handle_streaming_response(params, callbacks, available_functions)
else:
# Makes single call, handles tool calls from response
return self._handle_non_streaming_response(params, callbacks, available_functions)
except Exception as e:
# ... (emit failure event, handle exceptions like context window exceeded) ...
raise e
def _handle_non_streaming_response(self, params, callbacks, available_functions):
# THE CORE CALL TO LITELLM
response = litellm.completion(**params)
# Extract text content
text_response = response.choices[0].message.content or ""
# Check for tool calls in the response
tool_calls = getattr(response.choices[0].message, "tool_calls", [])
if not tool_calls or not available_functions:
# ... (emit success event) ...
return text_response # Return plain text
else:
# Handle the tool call (runs the actual function)
tool_result = self._handle_tool_call(tool_calls, available_functions)
if tool_result is not None:
return tool_result # Return tool output
else:
# ... (emit success event for text if tool failed?) ...
return text_response # Fallback to text if tool fails
def _handle_tool_call(self, tool_calls, available_functions):
# Extracts function name and args from tool_calls[0]
# Looks up function in available_functions
# Executes the function with args
# Returns the result
# ... (error handling) ...
pass
def _format_messages_for_provider(self, messages):
# Handles provider-specific message formatting rules
# (e.g., ensuring Anthropic starts with 'user' role)
pass
# ... other methods like _handle_streaming_response ...
```
This simplified view shows how the `LLM` class acts as a wrapper around `litellm`, preparing requests and processing responses, shielding the rest of CrewAI from the complexities of different LLM APIs.
## Conclusion
You've learned about the **LLM**, the essential "brain" powering your CrewAI [Agent](02_agent.md)s. It's the component that understands language, reasons about tasks, decides on actions (like using [Tool](04_tool.md)s), and generates text.
We saw that CrewAI uses the `litellm` library to provide a flexible way to connect to a wide variety of LLM providers (like OpenAI, Google Gemini, Anthropic Claude, or local models via Ollama). You can configure which LLM your agents or crew use, either implicitly through environment variables or explicitly by passing configured LLM objects (often using LangChain wrappers) during `Agent` or `Crew` creation.
This abstraction makes CrewAI powerful, allowing you to experiment with different models to find the best fit for your specific needs and budget.
But sometimes, agents need to remember things from past interactions or previous tasks within the same run. How does CrewAI handle short-term and potentially long-term memory? Let's explore that in the next chapter!
**Next:** [Chapter 7: Memory - Giving Agents Recall](07_memory.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

216
output/CrewAI/07_memory.md Normal file
View File

@@ -0,0 +1,216 @@
# Chapter 7: Memory - Giving Your Crew Recall
In the [previous chapter](06_llm.md), we looked at the Large Language Model ([LLM](06_llm.md)) the "brain" that allows each [Agent](02_agent.md) to understand, reason, and generate text. Now we have agents that can think, perform [Task](03_task.md)s using [Tool](04_tool.md)s, and follow a [Process](05_process.md).
But imagine a team working on a complex project over several days. What if every morning, they completely forgot everything they discussed and learned the previous day? They'd waste a lot of time repeating work and asking the same questions. By default, AI agents often behave like this they only remember the immediate conversation.
How can we give our CrewAI team the ability to remember past information? That's where **Memory** comes in!
## Why Do We Need Memory?
AI Agents, especially when working together in a [Crew](01_crew.md), often need to build upon previous interactions or knowledge gained during their work. Without memory:
* An agent might ask for the same information multiple times.
* Context from an earlier task might be lost by the time a later task runs.
* The crew can't easily learn from past experiences across different projects or runs.
* Tracking specific details about key people, places, or concepts mentioned during the process becomes difficult.
**Problem Solved:** Memory provides [Agent](02_agent.md)s and the [Crew](01_crew.md) with the ability to store and recall past interactions, information, and insights. It's like giving your AI team shared notes, a collective memory, or institutional knowledge.
## What is Memory in CrewAI?
Think of Memory as the **storage system** for your Crew's experiences and knowledge. It allows the Crew to persist information beyond a single interaction or task execution. CrewAI implements different kinds of memory to handle different needs:
1. **`ShortTermMemory`**:
* **Analogy:** Like your computer's RAM or a person's short-term working memory.
* **Purpose:** Holds immediate context and information relevant *within the current run* of the Crew. What happened in the previous task? What was just discussed?
* **How it helps:** Ensures that the output of one task is available and easily accessible as context for the next task within the same `kickoff()` execution. It helps maintain the flow of conversation and information *during* a single job.
2. **`LongTermMemory`**:
* **Analogy:** Like a team's documented "lessons learned" database or a long-term knowledge base.
* **Purpose:** Stores insights, evaluations, and key takeaways *across multiple runs* of the Crew. Did a similar task succeed or fail in the past? What strategies worked well?
* **How it helps:** Allows the Crew to improve over time by recalling past performance on similar tasks. (Note: Effective use often involves evaluating task outcomes, which can be an advanced topic).
3. **`EntityMemory`**:
* **Analogy:** Like a CRM (Customer Relationship Management) system, a character sheet in a game, or index cards about important topics.
* **Purpose:** Tracks specific entities (like people, companies, projects, concepts) mentioned during the Crew's execution and stores details and relationships about them. Who is "Dr. Evans"? What is "Project Phoenix"?
* **How it helps:** Maintains consistency and detailed knowledge about key subjects, preventing the Crew from forgetting important details about who or what it's dealing with.
## How Does Memory Help?
Using memory makes your Crew more effective:
* **Better Context:** Agents have access to relevant past information, leading to more informed decisions and responses.
* **Efficiency:** Avoids redundant questions and re-work by recalling previously established facts or results.
* **Learning (LTM):** Enables the Crew to get better over time based on past performance.
* **Consistency (Entity):** Keeps track of important details about recurring topics or entities.
* **Shared Understanding:** Helps create a common ground of knowledge for all agents in the Crew.
## Using Memory in Your Crew
The simplest way to start using memory is by enabling it when you define your `Crew`. Setting `memory=True` activates the core memory components (ShortTerm and Entity Memory) for context building within a run.
Let's add memory to our trip planning `Crew`:
```python
# Assuming 'researcher' and 'planner' agents are defined (Chapter 2)
# Assuming 'task1' and 'task2' are defined (Chapter 3)
# Assuming an LLM is configured (Chapter 6)
from crewai import Crew, Process
# researcher = Agent(...)
# planner = Agent(...)
# task1 = Task(...)
# task2 = Task(...)
# Define the crew WITH memory enabled
trip_crew_with_memory = Crew(
agents=[researcher, planner],
tasks=[task1, task2],
process=Process.sequential,
memory=True # <-- Enable memory features!
# verbose=2
)
# Start the work. Agents will now leverage memory.
# result = trip_crew_with_memory.kickoff()
# print(result)
```
**Explanation:**
* We simply add the `memory=True` parameter when creating the `Crew`.
* **What does this do?** Behind the scenes, CrewAI initializes `ShortTermMemory` and `EntityMemory` for this crew.
* **How is it used?**
* **ShortTermMemory:** As tasks complete within this `kickoff()` run, their outputs and key interactions can be stored. When the next task starts, CrewAI automatically queries this memory for relevant recent context to add to the prompt for the next agent. This makes the context flow smoother than just passing the raw output of the previous task.
* **EntityMemory:** As agents discuss entities (e.g., "Lisbon," "May birthday trip"), the memory tries to capture details about them. If "Lisbon" is mentioned again later, the memory can provide the stored details ("Coastal city, known for trams and Fado music...") as context.
* **LongTermMemory:** While `memory=True` sets up the *potential* for LTM, actively using it to learn across multiple runs often requires additional steps like task evaluation or explicit saving mechanisms, which are more advanced topics beyond this basic introduction. For now, focus on the benefits of STM and Entity Memory for within-run context.
By just adding `memory=True`, your agents automatically get better at remembering what's going on *within the current job*.
## How Memory Works Internally (Simplified)
So, what happens "under the hood" when `memory=True` and an agent starts a task?
1. **Task Execution Start:** The [Crew](01_crew.md) assigns a [Task](03_task.md) to an [Agent](02_agent.md).
2. **Context Gathering:** Before calling the [LLM](06_llm.md), the Crew interacts with its **Memory Module** (specifically, the `ContextualMemory` orchestrator). It asks, "What relevant memories do we have for this task, considering the description and any immediate context?"
3. **Memory Module Queries:** The `ContextualMemory` then queries the different active memory types:
* It asks `ShortTermMemory`: "Show me recent interactions or results related to this query." (Uses RAG/vector search on recent data).
* It asks `EntityMemory`: "Tell me about entities mentioned in this query." (Uses RAG/vector search on stored entity data).
* *If LTM were being actively queried (less common automatically):* "Any long-term insights related to this type of task?" (Usually queries a database like SQLite).
4. **Context Consolidation:** The Memory Module gathers the relevant snippets from each memory type.
5. **Prompt Augmentation:** This retrieved memory context is combined with the original task description, expected output, and any direct context (like the previous task's raw output).
6. **LLM Call:** This augmented, richer prompt is sent to the agent's [LLM](06_llm.md).
7. **Agent Response:** The agent generates its response, now informed by the retrieved memories.
8. **Memory Update:** As the task completes, its key interactions and outputs are processed and potentially saved back into ShortTermMemory and EntityMemory for future use within this run.
Let's visualize this context-building flow:
```mermaid
sequenceDiagram
participant C as Crew
participant A as Agent
participant CtxMem as ContextualMemory
participant STM as ShortTermMemory
participant EM as EntityMemory
participant LLM as Agent's LLM
C->>A: Execute Task(description, current_context)
Note over A: Need to build full prompt context.
A->>CtxMem: Get memory context for task query
CtxMem->>STM: Search(task_query)
STM-->>CtxMem: Recent memories (e.g., "Found Lisbon earlier")
CtxMem->>EM: Search(task_query)
EM-->>CtxMem: Entity details (e.g., "Lisbon: Capital of Portugal")
CtxMem-->>A: Combined Memory Snippets
A->>A: Assemble Final Prompt (Task Desc + Current Context + Memory Snippets)
A->>LLM: Process Augmented Prompt
LLM-->>A: Generate Response
A-->>C: Task Result
Note over C: Crew updates memories (STM, EM) with task results.
```
**Diving into the Code (High Level)**
* **`crewai/crew.py`:** When you set `memory=True` in the `Crew` constructor, the `create_crew_memory` validator method (triggered by Pydantic) initializes instances of `ShortTermMemory`, `LongTermMemory`, and `EntityMemory` and stores them in private attributes like `_short_term_memory`.
```python
# Simplified from crewai/crew.py
class Crew(BaseModel):
memory: bool = Field(default=False, ...)
_short_term_memory: Optional[InstanceOf[ShortTermMemory]] = PrivateAttr()
_long_term_memory: Optional[InstanceOf[LongTermMemory]] = PrivateAttr()
_entity_memory: Optional[InstanceOf[EntityMemory]] = PrivateAttr()
# ... other fields ...
@model_validator(mode="after")
def create_crew_memory(self) -> "Crew":
if self.memory:
# Simplified: Initializes memory objects if memory=True
self._long_term_memory = LongTermMemory(...)
self._short_term_memory = ShortTermMemory(crew=self, ...)
self._entity_memory = EntityMemory(crew=self, ...)
return self
```
* **`crewai/memory/contextual/contextual_memory.py`:** This class is responsible for orchestrating the retrieval from different memory types. Its `build_context_for_task` method takes the task information and queries the relevant memories.
```python
# Simplified from crewai/memory/contextual/contextual_memory.py
class ContextualMemory:
def __init__(self, stm: ShortTermMemory, ltm: LongTermMemory, em: EntityMemory, ...):
self.stm = stm
self.ltm = ltm
self.em = em
# ...
def build_context_for_task(self, task, context) -> str:
query = f"{task.description} {context}".strip()
if not query: return ""
memory_context = []
# Fetch relevant info from Short Term Memory
memory_context.append(self._fetch_stm_context(query))
# Fetch relevant info from Entity Memory
memory_context.append(self._fetch_entity_context(query))
# Fetch relevant info from Long Term Memory (if applicable)
# memory_context.append(self._fetch_ltm_context(task.description))
return "\n".join(filter(None, memory_context))
def _fetch_stm_context(self, query) -> str:
stm_results = self.stm.search(query)
# ... format results ...
return formatted_results if stm_results else ""
def _fetch_entity_context(self, query) -> str:
em_results = self.em.search(query)
# ... format results ...
return formatted_results if em_results else ""
```
* **Memory Types (`short_term_memory.py`, `entity_memory.py`, `long_term_memory.py`):**
* `ShortTermMemory` and `EntityMemory` typically use `RAGStorage` (`crewai/memory/storage/rag_storage.py`), which often relies on a vector database like ChromaDB to store embeddings of text snippets and find similar ones based on a query.
* `LongTermMemory` typically uses `LTMSQLiteStorage` (`crewai/memory/storage/ltm_sqlite_storage.py`) to save structured data about task evaluations (like descriptions, scores, suggestions) into an SQLite database file.
The key idea is that `memory=True` sets up these storage systems and the `ContextualMemory` orchestrator, which automatically enriches agent prompts with relevant remembered information.
## Conclusion
You've learned about the crucial concept of **Memory** in CrewAI! Memory gives your agents the ability to recall past information, preventing them from being purely stateless. We explored the three main types:
* **`ShortTermMemory`**: For context within the current run.
* **`LongTermMemory`**: For insights across multiple runs (more advanced).
* **`EntityMemory`**: For tracking specific people, places, or concepts.
Enabling memory with `memory=True` in your `Crew` is the first step to making your agents more context-aware and efficient, primarily leveraging Short Term and Entity memory automatically.
But what if your agents need access to a large body of pre-existing information, like company documentation, technical manuals, or a specific set of research papers? That's static information, not necessarily memories of *interactions*. How do we provide that? That's where the concept of **Knowledge** comes in. Let's explore that next!
**Next:** [Chapter 8: Knowledge - Providing External Information](08_knowledge.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,266 @@
# Chapter 8: Knowledge - Providing External Information
In [Chapter 7: Memory](07_memory.md), we learned how to give our [Crew](01_crew.md) the ability to remember past interactions and details using `Memory`. This helps them maintain context within a single run and potentially across runs.
But what if your [Agent](02_agent.md) needs access to a large body of *existing* information that isn't derived from its own conversations? Think about company documents, technical manuals, specific research papers, or a product catalog. This information exists *before* the Crew starts working. How do we give our agents access to this specific library of information?
That's where **`Knowledge`** comes in!
## Why Do We Need Knowledge?
Imagine you have an [Agent](02_agent.md) whose job is to answer customer questions about a specific product, "Widget Pro". You want this agent to *only* use the official "Widget Pro User Manual" to answer questions, not its general knowledge from the internet (which might be outdated or wrong).
Without a way to provide the manual, the agent might hallucinate answers or use incorrect information. `Knowledge` allows us to load specific documents (like the user manual), process them, and make them searchable for our agents.
**Problem Solved:** `Knowledge` provides your [Agent](02_agent.md)s with access to specific, pre-defined external information sources (like documents or databases), allowing them to retrieve relevant context to enhance their understanding and task execution based on that specific information.
## What is Knowledge?
Think of `Knowledge` as giving your [Crew](01_crew.md) access to a **specialized, private library** full of specific documents or information. It consists of a few key parts:
1. **`KnowledgeSource`**: This represents the actual *source* of the information. It could be:
* A local file (PDF, DOCX, TXT, etc.)
* A website URL
* A database connection (more advanced)
CrewAI uses helpful classes like `CrewDoclingSource` to easily handle various file types and web content. You tell the `KnowledgeSource` *where* the information is (e.g., the file path to your user manual).
2. **Processing & Embedding**: When you create a `Knowledge` object with sources, the information is automatically:
* **Loaded**: The content is read from the source (e.g., text extracted from the PDF).
* **Chunked**: The long text is broken down into smaller, manageable pieces (chunks).
* **Embedded**: Each chunk is converted into a numerical representation (an embedding vector) that captures its meaning. This is done using an embedding model (often specified via the `embedder` configuration).
3. **`KnowledgeStorage` (Vector Database)**: These embedded chunks are then stored in a special kind of database called a vector database. CrewAI typically uses **ChromaDB** by default for this.
* **Why?** Vector databases are optimized for finding information based on *semantic similarity*. When an agent asks a question related to a topic, the database can quickly find the text chunks whose meanings (embeddings) are closest to the meaning of the question.
4. **Retrieval**: When an [Agent](02_agent.md) needs information for its [Task](03_task.md), it queries the `Knowledge` object. This query is also embedded, and the `KnowledgeStorage` efficiently retrieves the most relevant text chunks from the original documents. These chunks are then provided to the agent as context.
In short: `Knowledge` = Specific Info Sources + Processing/Embedding + Vector Storage + Retrieval.
## Using Knowledge in Your Crew
Let's give our 'Product Support Agent' access to a hypothetical "widget_pro_manual.txt" file.
**1. Prepare Your Knowledge Source File:**
Make sure you have a directory named `knowledge` in your project's root folder. Place your file (e.g., `widget_pro_manual.txt`) inside this directory.
```
your_project_root/
├── knowledge/
│ └── widget_pro_manual.txt
└── your_crewai_script.py
```
*(Make sure `widget_pro_manual.txt` contains some text about Widget Pro.)*
**2. Define the Knowledge Source and Knowledge Object:**
```python
# Make sure you have docling installed for file handling: pip install docling
from crewai import Agent, Task, Crew, Process, Knowledge
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
# Assume an LLM is configured (e.g., via environment variables or passed to Agent/Crew)
# from langchain_openai import ChatOpenAI
# Define the knowledge source - point to the file inside the 'knowledge' directory
# Use the relative path from within the 'knowledge' directory
manual_source = CrewDoclingSource(file_paths=["widget_pro_manual.txt"])
# Create the Knowledge object, give it a name and pass the sources
# This will load, chunk, embed, and store the manual's content
product_knowledge = Knowledge(
collection_name="widget_pro_manual", # Name for the storage collection
sources=[manual_source],
# embedder=... # Optional: specify embedding config, otherwise uses default
# storage=... # Optional: specify storage config, otherwise uses default ChromaDB
)
```
**Explanation:**
* We import `Knowledge` and `CrewDoclingSource`.
* `CrewDoclingSource(file_paths=["widget_pro_manual.txt"])`: We create a source pointing to our file. Note: The path is relative *within* the `knowledge` directory. `CrewDoclingSource` handles loading various file types.
* `Knowledge(collection_name="widget_pro_manual", sources=[manual_source])`: We create the main `Knowledge` object.
* `collection_name`: A unique name for this set of knowledge in the vector database.
* `sources`: A list containing the `manual_source` we defined.
* When this line runs, CrewAI automatically processes `widget_pro_manual.txt` and stores it in the vector database under the collection "widget\_pro\_manual".
**3. Equip an Agent with Knowledge:**
You can add the `Knowledge` object directly to an agent.
```python
# Define the agent and give it the knowledge base
support_agent = Agent(
role='Product Support Specialist',
goal='Answer customer questions accurately based ONLY on the Widget Pro manual.',
backstory='You are an expert support agent with deep knowledge of the Widget Pro, derived exclusively from its official manual.',
knowledge=product_knowledge, # <-- Assign the knowledge here!
verbose=True,
allow_delegation=False,
# llm=ChatOpenAI(model="gpt-4") # Example LLM
)
# Define a task for the agent
support_task = Task(
description="The customer asks: 'How do I reset my Widget Pro?' Use the manual to find the answer.",
expected_output="A clear, step-by-step answer based solely on the provided manual content.",
agent=support_agent
)
# Create and run the crew
support_crew = Crew(
agents=[support_agent],
tasks=[support_task],
process=Process.sequential
)
# result = support_crew.kickoff()
# print(result)
```
**Explanation:**
* When defining `support_agent`, we pass our `product_knowledge` object to the `knowledge` parameter: `knowledge=product_knowledge`.
* Now, whenever `support_agent` works on a `Task`, it will automatically query the `product_knowledge` base for relevant information *before* calling its [LLM](06_llm.md).
* The retrieved text chunks from `widget_pro_manual.txt` will be added to the context given to the [LLM](06_llm.md), strongly guiding it to answer based on the manual.
**Expected Outcome (Conceptual):**
When `support_crew.kickoff()` runs:
1. `support_agent` receives `support_task`.
2. The agent (internally) queries `product_knowledge` with something like "How do I reset my Widget Pro?".
3. The vector database finds chunks from `widget_pro_manual.txt` that are semantically similar (e.g., sections describing the reset procedure).
4. These relevant text chunks are retrieved.
5. The agent's [LLM](06_llm.md) receives the task description *plus* the retrieved manual excerpts as context.
6. The [LLM](06_llm.md) generates the answer based heavily on the provided manual text.
7. The final `result` will be the step-by-step reset instructions derived from the manual.
*(Alternatively, you can assign `Knowledge` at the `Crew` level using the `knowledge` parameter, making it available to all agents in the crew.)*
## How Knowledge Retrieval Works Internally
When an [Agent](02_agent.md) with assigned `Knowledge` executes a [Task](03_task.md):
1. **Task Start:** The agent begins processing the task.
2. **Context Building:** The agent prepares the information needed for its [LLM](06_llm.md). This includes the task description, its role/goal/backstory, and any context from `Memory` (if enabled).
3. **Knowledge Query:** The agent identifies the need for information related to the task. It formulates a query (often based on the task description or key terms) and sends it to its assigned `Knowledge` object.
4. **Storage Search:** The `Knowledge` object passes the query to its underlying `KnowledgeStorage` (the vector database, e.g., ChromaDB).
5. **Vector Similarity Search:** The vector database converts the query into an embedding and searches for stored text chunks whose embeddings are closest (most similar) to the query embedding.
6. **Retrieve Chunks:** The database returns the top N most relevant text chunks (along with metadata and scores).
7. **Augment Prompt:** The agent takes these retrieved text chunks and adds them as specific context to the prompt it's preparing for the [LLM](06_llm.md). The prompt might now look something like: "Your task is: [...task description...]. Here is relevant information from the knowledge base: [...retrieved chunk 1...] [...retrieved chunk 2...] Now, provide the final answer."
8. **LLM Call:** The agent sends this augmented prompt to its [LLM](06_llm.md).
9. **Generate Response:** The [LLM](06_llm.md), now equipped with highly relevant context directly from the specified knowledge source, generates a more accurate and grounded response.
Let's visualize this retrieval process:
```mermaid
sequenceDiagram
participant A as Agent
participant K as Knowledge Object
participant KS as KnowledgeStorage (Vector DB)
participant LLM as Agent's LLM
A->>A: Start Task ('How to reset Widget Pro?')
A->>A: Prepare base prompt (Task, Role, Goal...)
A->>K: Query('How to reset Widget Pro?')
K->>KS: Search(query='How to reset Widget Pro?')
Note right of KS: Finds similar chunks via embeddings
KS-->>K: Return relevant chunks from manual
K-->>A: Provide relevant chunks
A->>A: Augment prompt with retrieved chunks
A->>LLM: Send augmented prompt
LLM-->>A: Generate answer based on task + manual excerpts
A->>A: Final Answer (Steps from manual)
```
## Diving into the Code (High Level)
* **`crewai/knowledge/knowledge.py`**:
* The `Knowledge` class holds the list of `sources` and the `storage` object.
* Its `__init__` method initializes the `KnowledgeStorage` (creating a default ChromaDB instance if none is provided) and then iterates through the `sources`, telling each one to `add()` its content to the storage.
* The `query()` method simply delegates the search request to the `self.storage.search()` method.
```python
# Simplified view from crewai/knowledge/knowledge.py
class Knowledge(BaseModel):
sources: List[BaseKnowledgeSource] = Field(default_factory=list)
storage: Optional[KnowledgeStorage] = Field(default=None)
embedder: Optional[Dict[str, Any]] = None
collection_name: Optional[str] = None
def __init__(self, collection_name: str, sources: List[BaseKnowledgeSource], ...):
# ... setup storage (e.g., KnowledgeStorage(...)) ...
self.sources = sources
self.storage.initialize_knowledge_storage()
self._add_sources() # Tell sources to load/chunk/embed/save
def query(self, query: List[str], limit: int = 3) -> List[Dict[str, Any]]:
if self.storage is None: raise ValueError("Storage not initialized.")
# Delegate search to the storage object
return self.storage.search(query, limit)
def _add_sources(self):
for source in self.sources:
source.storage = self.storage # Give source access to storage
source.add() # Source loads, chunks, embeds, and saves
```
* **`crewai/knowledge/source/`**: Contains different `KnowledgeSource` implementations.
* `base_knowledge_source.py`: Defines the `BaseKnowledgeSource` abstract class, including the `add()` method placeholder and helper methods like `_chunk_text()`.
* `crew_docling_source.py`: Implements loading from files and URLs using the `docling` library. Its `add()` method loads content, chunks it, and calls `self._save_documents()`.
* `_save_documents()` (in `base_knowledge_source.py` or subclasses) typically calls `self.storage.save(self.chunks)`.
* **`crewai/knowledge/storage/knowledge_storage.py`**:
* The `KnowledgeStorage` class acts as a wrapper around the actual vector database (ChromaDB by default).
* `initialize_knowledge_storage()`: Sets up the connection to ChromaDB and gets/creates the specified collection.
* `save()`: Takes the text chunks, gets their embeddings using the configured `embedder`, and `upsert`s them into the ChromaDB collection.
* `search()`: Takes a query, gets its embedding, and uses the ChromaDB collection's `query()` method to find and return similar documents.
* **`crewai/agent.py`**:
* The `Agent` class has an optional `knowledge: Knowledge` attribute.
* In the `execute_task` method, before calling the LLM, if `self.knowledge` exists, it calls `self.knowledge.query()` using the task prompt (or parts of it) as the query.
* The results from `knowledge.query()` are formatted and added to the task prompt as additional context.
```python
# Simplified view from crewai/agent.py
class Agent(BaseAgent):
knowledge: Optional[Knowledge] = Field(default=None, ...)
# ... other fields ...
def execute_task(self, task: Task, context: Optional[str] = None, ...) -> str:
task_prompt = task.prompt()
# ... add memory context if applicable ...
# === KNOWLEDGE RETRIEVAL ===
if self.knowledge:
# Query the knowledge base using the task prompt
agent_knowledge_snippets = self.knowledge.query([task_prompt]) # Or task.description
if agent_knowledge_snippets:
# Format the snippets into context string
agent_knowledge_context = extract_knowledge_context(agent_knowledge_snippets)
if agent_knowledge_context:
# Add knowledge context to the prompt
task_prompt += agent_knowledge_context
# ===========================
# ... add crew knowledge context if applicable ...
# ... prepare tools, create agent_executor ...
# Call the LLM via agent_executor with the augmented task_prompt
result = self.agent_executor.invoke({"input": task_prompt, ...})["output"]
return result
```
## Conclusion
You've now learned about **`Knowledge`** in CrewAI! It's the mechanism for providing your agents with access to specific, pre-existing external information sources like documents or websites. By defining `KnowledgeSource`s, creating a `Knowledge` object, and assigning it to an [Agent](02_agent.md) or [Crew](01_crew.md), you enable your agents to retrieve relevant context from these sources using vector search. This makes their responses more accurate, grounded, and aligned with the specific information you provide, distinct from the general interaction history managed by [Memory](07_memory.md).
This concludes our introductory tour of the core concepts in CrewAI! You've learned about managing the team ([Crew](01_crew.md)), defining specialized workers ([Agent](02_agent.md)), assigning work ([Task](03_task.md)), equipping agents with abilities ([Tool](04_tool.md)), setting the workflow ([Process](05_process.md)), powering the agent's thinking ([LLM](06_llm.md)), giving them recall ([Memory](07_memory.md)), and providing external information ([Knowledge](08_knowledge.md)).
With these building blocks, you're ready to start creating sophisticated AI crews to tackle complex challenges! Happy building!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

46
output/CrewAI/index.md Normal file
View File

@@ -0,0 +1,46 @@
# Tutorial: CrewAI
**CrewAI** is a framework for orchestrating *autonomous AI agents*.
Think of it like building a specialized team (a **Crew**) where each member (**Agent**) has a role, goal, and tools.
You assign **Tasks** to Agents, defining what needs to be done. The **Crew** manages how these Agents collaborate, following a specific **Process** (like sequential steps).
Agents use their "brain" (an **LLM**) and can utilize **Tools** (like web search) and access shared **Memory** or external **Knowledge** bases to complete their tasks effectively.
**Source Repository:** [https://github.com/crewAIInc/crewAI/tree/e723e5ca3fb7e4cb890c4befda47746aedbd7408/src/crewai](https://github.com/crewAIInc/crewAI/tree/e723e5ca3fb7e4cb890c4befda47746aedbd7408/src/crewai)
```mermaid
flowchart TD
A0["Agent"]
A1["Task"]
A2["Crew"]
A3["Tool"]
A4["Process"]
A5["LLM"]
A6["Memory"]
A7["Knowledge"]
A2 -- "Manages" --> A0
A2 -- "Orchestrates" --> A1
A2 -- "Defines workflow" --> A4
A2 -- "Manages shared" --> A6
A0 -- "Executes" --> A1
A0 -- "Uses" --> A3
A0 -- "Uses as brain" --> A5
A0 -- "Queries" --> A7
A1 -- "Assigned to" --> A0
```
## Chapters
1. [Crew](01_crew.md)
2. [Agent](02_agent.md)
3. [Task](03_task.md)
4. [Tool](04_tool.md)
5. [Process](05_process.md)
6. [LLM](06_llm.md)
7. [Memory](07_memory.md)
8. [Knowledge](08_knowledge.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,263 @@
# Chapter 1: Modules and Programs: Building Blocks of DSPy
Welcome to the first chapter of our journey into DSPy! We're excited to have you here.
Imagine you want to build something cool with AI, like a smart assistant that can answer questions based on your documents. This involves several steps: understanding the question, finding the right information in the documents, and then crafting a clear answer. How do you organize all these steps in your code?
That's where **Modules** and **Programs** come in! They are the fundamental building blocks in DSPy, helping you structure your AI applications cleanly and effectively.
Think of it like building with **Lego bricks**:
* A **`Module`** is like a single Lego brick. It's a basic unit that performs a specific, small task.
* A **`Program`** is like your final Lego creation (a car, a house). It's built by combining several Lego bricks (`Module`s) together in a specific way to achieve a bigger goal.
In this chapter, we'll learn:
* What a `Module` is and what it does.
* How `Program`s use `Module`s to solve complex tasks.
* How they create structure and manage the flow of information.
Let's start building!
## What is a `Module`?
A `dspy.Module` is the most basic building block in DSPy. Think of it as:
* **A Function:** Like a function in Python, it takes some input, does something, and produces an output.
* **A Lego Brick:** It performs one specific job.
* **A Specialist:** It often specializes in one task, frequently involving interaction with a powerful AI model like a Language Model ([LM](05_lm__language_model_client_.md)) or a Retrieval Model ([RM](06_rm__retrieval_model_client_.md)). We'll learn more about LMs and RMs later!
The key idea is **encapsulation**. A `Module` bundles a piece of logic together, hiding the internal complexity. You just need to know what it does, not necessarily *every single detail* of how it does it.
Every `Module` has two main parts:
1. `__init__`: This is where you set up the module, like defining any internal components or settings it needs.
2. `forward`: This is where the main logic happens. It defines *what the module does* when you call it with some input.
Let's look at a conceptual example. DSPy provides pre-built modules. One common one is `dspy.Predict`, which is designed to call a Language Model to generate an output based on some input, following specific instructions.
```python
import dspy
# Conceptual structure of a simple Module like dspy.Predict
class BasicPredict(dspy.Module): # Inherits from dspy.Module
def __init__(self, instructions):
super().__init__() # Important initialization
self.instructions = instructions
# In a real DSPy module, we'd set up LM connection here
# self.lm = ... (connect to language model)
def forward(self, input_data):
# 1. Combine instructions and input_data
prompt = self.instructions + "\nInput: " + input_data + "\nOutput:"
# 2. Call the Language Model (LM) with the prompt
# lm_output = self.lm(prompt) # Simplified call
lm_output = f"Generated answer for '{input_data}' based on instructions." # Dummy output
# 3. Return the result
return lm_output
# How you might use it (conceptual)
# predictor = BasicPredict(instructions="Translate the input to French.")
# french_text = predictor(input_data="Hello")
# print(french_text) # Might output: "Generated answer for 'Hello' based on instructions."
```
In this simplified view:
* `BasicPredict` inherits from `dspy.Module`. All your custom modules will do this.
* `__init__` stores the `instructions`. Real DSPy modules might initialize connections to LMs or load settings here.
* `forward` defines the core task: combining instructions and input, (conceptually) calling an LM, and returning the result.
Don't worry about the LM details yet! The key takeaway is that a `Module` wraps a specific piece of work, defined in its `forward` method. DSPy provides useful pre-built modules like `dspy.Predict` and `dspy.ChainOfThought` (which encourages step-by-step reasoning), and you can also build your own.
## What is a `Program`?
Now, what if your task is more complex than a single LM call? For instance, answering a question based on documents might involve:
1. Understanding the `question`.
2. Generating search queries based on the `question`.
3. Using a Retrieval Model ([RM](06_rm__retrieval_model_client_.md)) to find relevant `context` documents using the queries.
4. Using a Language Model ([LM](05_lm__language_model_client_.md)) to generate the final `answer` based on the `question` and `context`.
This is too much for a single simple `Module`. We need to combine multiple modules!
This is where a `Program` comes in. **Technically, a `Program` in DSPy is also just a `dspy.Module`!** The difference is in how we use it: a `Program` is typically a `Module` that *contains and coordinates other `Module`s*.
Think back to the Lego analogy:
* Small `Module`s are like bricks for the engine, wheels, and chassis.
* The `Program` is the main `Module` representing the whole car, defining how the engine, wheels, and chassis bricks connect and work together in its `forward` method.
A `Program` defines the **data flow** between its sub-modules. It orchestrates the sequence of operations.
Let's sketch out a simple `Program` for our question-answering example:
```python
import dspy
# Assume we have these pre-built or custom Modules (simplified)
class GenerateSearchQuery(dspy.Module):
def forward(self, question):
# Logic to create search queries from the question
print(f"Generating query for: {question}")
return f"search query for '{question}'"
class RetrieveContext(dspy.Module):
def forward(self, query):
# Logic to find documents using the query
print(f"Retrieving context for: {query}")
return f"Relevant context document about '{query}'"
class GenerateAnswer(dspy.Module):
def forward(self, question, context):
# Logic to generate answer using question and context
print(f"Generating answer for: {question} using context: {context}")
return f"Final answer about '{question}' based on context."
# Now, let's build the Program (which is also a Module!)
class RAG(dspy.Module): # RAG = Retrieval-Augmented Generation
def __init__(self):
super().__init__()
# Initialize the sub-modules it will use
self.generate_query = GenerateSearchQuery()
self.retrieve = RetrieveContext()
self.generate_answer = GenerateAnswer()
def forward(self, question):
# Define the flow of data through the sub-modules
print("\n--- RAG Program Start ---")
search_query = self.generate_query(question=question)
context = self.retrieve(query=search_query)
answer = self.generate_answer(question=question, context=context)
print("--- RAG Program End ---")
return answer
# How to use the Program
rag_program = RAG()
final_answer = rag_program(question="What is DSPy?")
print(f"\nFinal Output: {final_answer}")
```
If you run this conceptual code, you'd see output like:
```
--- RAG Program Start ---
Generating query for: What is DSPy?
Retrieving context for: search query for 'What is DSPy?'
Generating answer for: What is DSPy? using context: Relevant context document about 'search query for 'What is DSPy?''
--- RAG Program End ---
Final Output: Final answer about 'What is DSPy?' based on context.
```
See how the `RAG` program works?
1. In `__init__`, it creates instances of the smaller modules it needs (`GenerateSearchQuery`, `RetrieveContext`, `GenerateAnswer`).
2. In `forward`, it calls these modules *in order*, passing the output of one as the input to the next. It defines the workflow!
## Hierarchical Structure
Modules can contain other modules, which can contain *even more* modules! This allows you to build complex systems by breaking them down into manageable, hierarchical parts.
Imagine our `GenerateAnswer` module was actually quite complex. Maybe it first summarizes the context, then drafts an answer, then refines it. We could implement `GenerateAnswer` as *another* program containing these sub-modules!
```mermaid
graph TD
A[RAG Program] --> B(GenerateSearchQuery Module);
A --> C(RetrieveContext Module);
A --> D(GenerateAnswer Module / Program);
D --> D1(SummarizeContext Module);
D --> D2(DraftAnswer Module);
D --> D3(RefineAnswer Module);
```
This diagram shows how the `RAG` program uses `GenerateAnswer`, which itself could be composed of smaller modules like `SummarizeContext`, `DraftAnswer`, and `RefineAnswer`. This nesting makes complex systems easier to design, understand, and debug.
## How It Works Under the Hood (A Tiny Peek)
You don't need to know the deep internals right now, but it's helpful to have a basic mental model.
1. **Foundation:** All DSPy modules, whether simple bricks or complex programs, inherit from a base class (`dspy.primitives.module.BaseModule`). This provides common functionality like saving, loading, and finding internal parameters (we'll touch on saving/loading later).
2. **Execution:** When you call a module (e.g., `rag_program(question="...")`), Python executes its `__call__` method. In DSPy, this typically just calls the `forward` method you defined.
3. **Orchestration:** If a module's `forward` method calls other modules (like in our `RAG` example), it simply executes their `forward` methods in turn, passing the data as defined in the code.
Here's a simplified sequence of what happens when we call `rag_program("What is DSPy?")`:
```mermaid
sequenceDiagram
participant User
participant RAGProgram as RAG Program (forward)
participant GenQuery as GenerateQuery (forward)
participant Retrieve as RetrieveContext (forward)
participant GenAnswer as GenerateAnswer (forward)
User->>RAGProgram: Call with "What is DSPy?"
RAGProgram->>GenQuery: Call with question="What is DSPy?"
GenQuery-->>RAGProgram: Return "search query..."
RAGProgram->>Retrieve: Call with query="search query..."
Retrieve-->>RAGProgram: Return "Relevant context..."
RAGProgram->>GenAnswer: Call with question, context
GenAnswer-->>RAGProgram: Return "Final answer..."
RAGProgram-->>User: Return "Final answer..."
```
The core files involved are:
* `primitives/module.py`: Defines the `BaseModule` class, the ancestor of all modules.
* `primitives/program.py`: Defines the `Module` class (which you inherit from) itself, adding core methods like `__call__` that invokes `forward`.
You can see from the code snippets provided earlier (like `ChainOfThought` or `Predict`) that they inherit from `dspy.Module` and define `__init__` and `forward`, just like our examples.
```python
# Snippet from dspy/primitives/program.py (Simplified)
from dspy.primitives.module import BaseModule
class Module(BaseModule): # Inherits from BaseModule
def __init__(self):
super()._base_init()
# ... initialization ...
def forward(self, *args, **kwargs):
# This is where the main logic of the module goes.
# Users override this method in their own modules.
raise NotImplementedError # Needs to be implemented by subclasses
def __call__(self, *args, **kwargs):
# When you call module_instance(), this runs...
# ...and typically calls self.forward()
return self.forward(*args, **kwargs)
# You write classes like this:
class MyModule(dspy.Module):
def __init__(self):
super().__init__()
# Your setup
def forward(self, input_data):
# Your logic
result = ...
return result
```
The important part is the pattern: inherit from `dspy.Module`, set things up in `__init__`, and define the core logic in `forward`.
## Conclusion
Congratulations! You've learned about the fundamental organizing principle in DSPy: **Modules** and **Programs**.
* **Modules** are the basic building blocks, like Lego bricks, often handling a specific task (maybe calling an [LM](05_lm__language_model_client_.md) or [RM](06_rm__retrieval_model_client_.md)).
* **Programs** are also Modules, but they typically combine *other* modules to orchestrate a more complex workflow, defining how data flows between them.
* The `forward` method is key it contains the logic of what a module *does*.
* This structure allows you to build complex AI systems in a clear, manageable, and hierarchical way.
Now that we understand how modules provide structure, how do they know what kind of input data they expect and what kind of output data they should produce? That's where **Signatures** come in!
Let's dive into that next!
**Next:** [Chapter 2: Signature](02_signature.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

201
output/DSPy/02_signature.md Normal file
View File

@@ -0,0 +1,201 @@
# Chapter 2: Signatures - Defining the Task
In [Chapter 1: Modules and Programs](01_module___program.md), we learned that `Module`s are like Lego bricks that perform specific tasks, often using Language Models ([LM](05_lm__language_model_client_.md)). We saw how `Program`s combine these modules.
But how does a `Module`, especially one using an LM like `dspy.Predict`, know *exactly* what job to do?
Imagine you ask a chef (our LM) to cook something. Just saying "cook" isn't enough! You need to tell them:
1. **What ingredients to use** (the inputs).
2. **What dish to make** (the outputs).
3. **The recipe or instructions** (how to make it).
This is precisely what a **`Signature`** does in DSPy!
A `Signature` acts like a clear recipe or contract for a DSPy `Module`. It defines:
* **Input Fields:** What information the module needs to start its work.
* **Output Fields:** What information the module is expected to produce.
* **Instructions:** Natural language guidance (like a recipe!) telling the underlying LM *how* to transform the inputs into the outputs.
Think of it as specifying the 'shape' and 'purpose' of a module, making sure everyone (you, DSPy, and the LM) understands the task.
## Why Do We Need Signatures?
Without a clear definition, how would a module like `dspy.Predict` know what to ask the LM?
Let's say we want a module to translate English text to French. We need to tell it:
* It needs an `english_sentence` as input.
* It should produce a `french_sentence` as output.
* The *task* is to translate the input sentence into French.
A `Signature` bundles all this information together neatly.
## Defining a Signature: The Recipe Card
The most common way to define a Signature is by creating a Python class that inherits from `dspy.Signature`.
Let's create our English-to-French translation signature:
```python
import dspy
from dspy.signatures.field import InputField, OutputField
class TranslateToFrench(dspy.Signature):
"""Translates English text to French.""" # <-- These are the Instructions!
# Define the Input Field the module expects
english_sentence = dspy.InputField(desc="The original sentence in English")
# Define the Output Field the module should produce
french_sentence = dspy.OutputField(desc="The translated sentence in French")
```
Let's break this down:
1. **`class TranslateToFrench(dspy.Signature):`**: We declare a new class named `TranslateToFrench` that inherits from `dspy.Signature`. This tells DSPy it's a signature definition.
2. **`"""Translates English text to French."""`**: This is the **docstring**. It's crucial! DSPy uses this docstring as the natural language **Instructions** for the LM. It tells the LM the *goal* of the task.
3. **`english_sentence = dspy.InputField(...)`**: We define an input field named `english_sentence`. `dspy.InputField` marks this as required input. The `desc` provides a helpful description (good for documentation and potentially useful for the LM later).
4. **`french_sentence = dspy.OutputField(...)`**: We define an output field named `french_sentence`. `dspy.OutputField` marks this as the expected output. The `desc` describes what this field should contain.
That's it! We've created a reusable "recipe card" that clearly defines our translation task.
## How Modules Use Signatures
Now, how does a `Module` like `dspy.Predict` use this `TranslateToFrench` signature?
`dspy.Predict` is a pre-built module designed to take a signature and use an LM to generate the output fields based on the input fields and instructions.
Here's how you might use our signature with `dspy.Predict` (we'll cover `dspy.Predict` in detail in [Chapter 4](04_predict.md)):
```python
# Assume 'lm' is a configured Language Model client (more in Chapter 5)
# lm = dspy.OpenAI(model='gpt-3.5-turbo')
# dspy.settings.configure(lm=lm)
# Create an instance of dspy.Predict, giving it our Signature
translator = dspy.Predict(TranslateToFrench)
# Call the predictor with the required input field
english = "Hello, how are you?"
result = translator(english_sentence=english)
# The result object will contain the output field defined in the signature
print(f"English: {english}")
# Assuming the LM works correctly, it might print:
# print(f"French: {result.french_sentence}") # => French: Bonjour, comment ça va?
```
In this (slightly simplified) example:
1. `translator = dspy.Predict(TranslateToFrench)`: We create a `Predict` module. Crucially, we pass our `TranslateToFrench` **class** itself to it. `dspy.Predict` now knows the input/output fields and the instructions from the signature.
2. `result = translator(english_sentence=english)`: When we call the `translator`, we provide the input data using the exact name defined in our signature (`english_sentence`).
3. `result.french_sentence`: `dspy.Predict` uses the LM, guided by the signature's instructions and fields, to generate the output. It then returns an object where you can access the generated French text using the output field name (`french_sentence`).
The `Signature` acts as the bridge, ensuring the `Predict` module knows its job specification.
## How It Works Under the Hood (A Peek)
You don't need to memorize this, but understanding the flow helps! When a module like `dspy.Predict` uses a `Signature`:
1. **Inspection:** The module looks at the `Signature` class (`TranslateToFrench` in our case).
2. **Extract Info:** It identifies the `InputField`s (`english_sentence`), `OutputField`s (`french_sentence`), and the `Instructions` (the docstring: `"Translates English text to French."`).
3. **Prompt Formatting:** When you call the module (e.g., `translator(english_sentence="Hello")`), it uses this information to build a prompt for the [LM](05_lm__language_model_client_.md). This prompt typically includes:
* The **Instructions**.
* Clearly labeled **Input Fields** and their values.
* Clearly labeled **Output Fields** (often just the names, indicating what the LM should generate).
4. **LM Call:** The formatted prompt is sent to the configured LM.
5. **Parsing Output:** The LM's response is received. DSPy tries to parse this response to extract the values for the defined `OutputField`s (like `french_sentence`).
6. **Return Result:** A structured result object containing the parsed outputs is returned.
Let's visualize this flow:
```mermaid
sequenceDiagram
participant User
participant PredictModule as dspy.Predict(TranslateToFrench)
participant Signature as TranslateToFrench
participant LM as Language Model
User->>PredictModule: Call with english_sentence="Hello"
PredictModule->>Signature: Get Instructions, Input/Output Fields
Signature-->>PredictModule: Return structure ("Translates...", "english_sentence", "french_sentence")
PredictModule->>LM: Send formatted prompt (e.g., "Translate...\nEnglish: Hello\nFrench:")
LM-->>PredictModule: Return generated text (e.g., "Bonjour")
PredictModule->>Signature: Parse LM output into 'french_sentence' field
Signature-->>PredictModule: Return structured output {french_sentence: "Bonjour"}
PredictModule-->>User: Return structured output (Prediction object)
```
The core logic for defining signatures resides in:
* `dspy/signatures/signature.py`: Defines the base `Signature` class and the logic for handling instructions and fields.
* `dspy/signatures/field.py`: Defines `InputField` and `OutputField`.
Modules like `dspy.Predict` (in `dspy/predict/predict.py`) contain the code to *read* these Signatures and interact with LMs accordingly.
```python
# Simplified view inside dspy/signatures/signature.py
from pydantic import BaseModel
from pydantic.fields import FieldInfo
# ... other imports ...
class SignatureMeta(type(BaseModel)):
# Metaclass magic to handle fields and docstring
def __new__(mcs, name, bases, namespace, **kwargs):
# ... logic to find fields, handle docstring ...
cls = super().__new__(mcs, name, bases, namespace, **kwargs)
cls.__doc__ = cls.__doc__ or _default_instructions(cls) # Default instructions if none provided
# ... logic to validate fields ...
return cls
@property
def instructions(cls) -> str:
# Retrieves the docstring as instructions
return inspect.cleandoc(getattr(cls, "__doc__", ""))
@property
def input_fields(cls) -> dict[str, FieldInfo]:
# Finds fields marked as input
return cls._get_fields_with_type("input")
@property
def output_fields(cls) -> dict[str, FieldInfo]:
# Finds fields marked as output
return cls._get_fields_with_type("output")
class Signature(BaseModel, metaclass=SignatureMeta):
# The base class you inherit from
pass
# Simplified view inside dspy/signatures/field.py
import pydantic
def InputField(**kwargs):
# Creates a Pydantic field marked as input for DSPy
return pydantic.Field(**move_kwargs(**kwargs, __dspy_field_type="input"))
def OutputField(**kwargs):
# Creates a Pydantic field marked as output for DSPy
return pydantic.Field(**move_kwargs(**kwargs, __dspy_field_type="output"))
```
The key takeaway is that the `Signature` class structure (using `InputField`, `OutputField`, and the docstring) provides a standardized way for modules to understand the task specification.
## Conclusion
You've now learned about `Signatures`, the essential component for defining *what* a DSPy module should do!
* A `Signature` specifies the **Inputs**, **Outputs**, and **Instructions** for a task.
* It acts like a contract or recipe card for modules, especially those using LMs.
* You typically define them by subclassing `dspy.Signature`, using `InputField`, `OutputField`, and a descriptive **docstring** for instructions.
* Modules like `dspy.Predict` use Signatures to understand the task and generate appropriate prompts for the LM.
Signatures bring clarity and structure to LM interactions. But how do we provide concrete examples to help the LM learn or perform better? That's where `Examples` come in!
**Next:** [Chapter 3: Example](03_example.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

229
output/DSPy/03_example.md Normal file
View File

@@ -0,0 +1,229 @@
# Chapter 3: Example - Your Data Points
In [Chapter 2: Signature](02_signature.md), we learned how to define the *task* for a DSPy module using `Signatures` specifying the inputs, outputs, and instructions. It's like writing a recipe card.
But sometimes, just giving instructions isn't enough. Imagine teaching someone to translate by just giving the rule "Translate English to French". They might struggle! It often helps to show them a few *examples* of correct translations.
This is where **`dspy.Example`** comes in! It's how you represent individual data points or examples within DSPy.
Think of a `dspy.Example` as:
* **A Single Row:** Like one row in a spreadsheet or database table.
* **A Flashcard:** Holding a specific question and its answer, or an input and its desired output.
* **A Test Case:** A concrete instance of the task defined by your `Signature`.
In this chapter, we'll learn:
* What a `dspy.Example` is and how it stores data.
* How to create `Example` objects.
* Why `Example`s are essential for few-shot learning, training, and evaluation.
* How to mark specific fields as inputs using `.with_inputs()`.
Let's dive into representing our data!
## What is a `dspy.Example`?
A `dspy.Example` is a fundamental data structure in DSPy designed to hold the information for a single instance of your task. It essentially acts like a flexible container (similar to a Python dictionary) where you store key-value pairs.
Crucially, the **keys** in your `Example` should generally match the **field names** you defined in your [Signature](02_signature.md).
Let's revisit our `TranslateToFrench` signature from Chapter 2:
```python
# From Chapter 2
import dspy
from dspy.signatures.field import InputField, OutputField
class TranslateToFrench(dspy.Signature):
"""Translates English text to French."""
english_sentence = dspy.InputField(desc="The original sentence in English")
french_sentence = dspy.OutputField(desc="The translated sentence in French")
```
This signature has two fields: `english_sentence` (input) and `french_sentence` (output).
An `Example` representing one instance of this task would need to contain values for these keys.
## Creating an Example
Creating a `dspy.Example` is straightforward. You can initialize it with keyword arguments, where the argument names match the fields you care about (usually your Signature fields).
```python
import dspy
# Create an example for our translation task
example1 = dspy.Example(
english_sentence="Hello, world!",
french_sentence="Bonjour le monde!"
)
# You can access the values like attributes
print(f"English: {example1.english_sentence}")
print(f"French: {example1.french_sentence}")
```
**Output:**
```
English: Hello, world!
French: Bonjour le monde!
```
See? `example1` now holds one complete data point for our translation task. It bundles the input (`english_sentence`) and the corresponding desired output (`french_sentence`) together.
You can also create examples from dictionaries:
```python
data_dict = {
"english_sentence": "How are you?",
"french_sentence": "Comment ça va?"
}
example2 = dspy.Example(data_dict)
print(f"Example 2 English: {example2.english_sentence}")
```
**Output:**
```
Example 2 English: How are you?
```
## Why Use Examples? The Three Main Roles
`Example` objects are the standard way DSPy handles data, and they are used in three critical ways:
1. **Few-Shot Demonstrations:** When using modules like `dspy.Predict` (which we'll see in [Chapter 4: Predict](04_predict.md)), you can provide a few `Example` objects directly in the prompt sent to the Language Model (LM). This shows the LM *exactly* how to perform the task, often leading to much better results than instructions alone. It's like showing the chef pictures of the final dish alongside the recipe.
2. **Training Data:** When you want to *optimize* your DSPy program (e.g., automatically find the best prompts or few-shot examples), you use **Teleprompters** ([Chapter 8: Teleprompter / Optimizer](08_teleprompter___optimizer.md)). Teleprompters require a training set, which is simply a list of `dspy.Example` objects representing the tasks you want your program to learn to do well.
3. **Evaluation Data:** How do you know if your DSPy program is working correctly? You test it on a dataset! The `dspy.evaluate` module ([Chapter 7: Evaluate](07_evaluate.md)) takes a list of `dspy.Example` objects (your test set or development set) and measures your program's performance against the expected outputs (labels) in those examples.
In all these cases, `dspy.Example` provides a consistent way to package and manage your data points.
## Marking Inputs: `.with_inputs()`
Often, especially during training and evaluation, DSPy needs to know which fields in your `Example` represent the *inputs* to your program and which represent the *outputs* or *labels* (the ground truth answers).
The `.with_inputs()` method allows you to explicitly mark certain keys as input fields. This method returns a *new* `Example` object with this input information attached, leaving the original unchanged.
Let's mark `english_sentence` as the input for our `example1`:
```python
# Our original example
example1 = dspy.Example(
english_sentence="Hello, world!",
french_sentence="Bonjour le monde!"
)
# Mark 'english_sentence' as the input field
input_marked_example = example1.with_inputs("english_sentence")
# Let's check the inputs and labels (non-inputs)
print(f"Inputs: {input_marked_example.inputs()}")
print(f"Labels: {input_marked_example.labels()}")
```
**Output:**
```
Inputs: Example({'english_sentence': 'Hello, world!'}) (input_keys={'english_sentence'})
Labels: Example({'french_sentence': 'Bonjour le monde!'}) (input_keys=set())
```
Notice:
* `.with_inputs("english_sentence")` didn't change `example1`. It created `input_marked_example`.
* `input_marked_example.inputs()` returns a new `Example` containing only the fields marked as inputs.
* `input_marked_example.labels()` returns a new `Example` containing the remaining fields (the outputs/labels).
This distinction is vital for evaluation (comparing predictions against labels) and optimization (knowing what the program receives vs. what it should produce). Datasets loaded within DSPy often automatically handle marking inputs for you based on common conventions.
## How It Works Under the Hood (A Peek)
The `dspy.Example` object is fundamentally quite simple. It's designed to behave much like a Python dictionary but with some added conveniences like attribute-style access (`example.field`) and the special `.with_inputs()` method.
1. **Storage:** Internally, an `Example` uses a dictionary (often named `_store`) to hold all the key-value pairs you provide.
```python
# Conceptual internal structure
example = dspy.Example(question="What is DSPy?", answer="A framework...")
# example._store == {'question': 'What is DSPy?', 'answer': 'A framework...'}
```
2. **Attribute Access:** When you access `example.question`, Python's magic methods (`__getattr__`) look up `'question'` in the internal `_store`. Similarly, setting `example.new_field = value` uses `__setattr__` to update the `_store`.
3. **`.with_inputs()`:** This method creates a *copy* of the current `Example`'s `_store`. It then stores the provided input keys (like `{'english_sentence'}`) in a separate internal attribute (like `_input_keys`) on the *new* copied object. It doesn't modify the original `Example`.
4. **`.inputs()` and `.labels()`:** These methods check the `_input_keys` attribute. `.inputs()` creates a new `Example` containing only the key-value pairs whose keys are *in* `_input_keys`. `.labels()` creates a new `Example` containing the key-value pairs whose keys are *not* in `_input_keys`.
Let's look at a simplified view of the code from `dspy/primitives/example.py`:
```python
# Simplified view from dspy/primitives/example.py
class Example:
def __init__(self, base=None, **kwargs):
self._store = {} # The internal dictionary
self._input_keys = None # Stores the input keys after with_inputs()
# Simplified: Copy from base or dictionary if provided
if base and isinstance(base, dict): self._store = base.copy()
# Simplified: Update with keyword arguments
self._store.update(kwargs)
# Allows accessing self.key like dictionary lookup self._store[key]
def __getattr__(self, key):
if key in self._store: return self._store[key]
raise AttributeError(f"No attribute '{key}'")
# Allows setting self.key like dictionary assignment self._store[key] = value
def __setattr__(self, key, value):
if key.startswith("_"): super().__setattr__(key, value) # Handle internal attributes
else: self._store[key] = value
# Allows dictionary-style access example[key]
def __getitem__(self, key): return self._store[key]
# Creates a *copy* and marks input keys on the copy.
def with_inputs(self, *keys):
copied = self.copy() # Make a shallow copy
copied._input_keys = set(keys) # Store the input keys on the copy
return copied
# Returns a new Example containing only input fields.
def inputs(self):
if self._input_keys is None: raise ValueError("Inputs not set.")
# Create a dict with only input keys
input_dict = {k: v for k, v in self._store.items() if k in self._input_keys}
# Return a new Example wrapping this dict
return type(self)(base=input_dict).with_inputs(*self._input_keys)
# Returns a new Example containing only non-input fields (labels).
def labels(self):
input_keys = self.inputs().keys() if self._input_keys else set()
# Create a dict with only non-input keys
label_dict = {k: v for k, v in self._store.items() if k not in input_keys}
# Return a new Example wrapping this dict
return type(self)(base=label_dict)
# Helper to create a copy
def copy(self, **kwargs):
return type(self)(base=self, **kwargs)
# ... other helpful methods like keys(), values(), items(), etc. ...
```
The key idea is that `dspy.Example` provides a convenient and standardized wrapper around your data points, making it easy to use them for few-shot examples, training, and evaluation, while also allowing you to specify which parts are inputs versus labels.
## Conclusion
You've now mastered `dspy.Example`, the way DSPy represents individual data points!
* An `Example` holds key-value pairs, like a **row in a spreadsheet** or a **flashcard**.
* Its keys typically correspond to the fields defined in a [Signature](02_signature.md).
* `Example`s are essential for providing **few-shot demonstrations**, **training data** for optimizers ([Teleprompter / Optimizer](08_teleprompter___optimizer.md)), and **evaluation data** for testing ([Evaluate](07_evaluate.md)).
* The `.with_inputs()` method lets you mark which fields are inputs, crucial for distinguishing inputs from labels.
Now that we have `Signatures` to define *what* task to do, and `Examples` to hold the *data* for that task, how do we actually get a Language Model to *do* the task based on the signature? That's the job of the `dspy.Predict` module!
**Next:** [Chapter 4: Predict](04_predict.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

249
output/DSPy/04_predict.md Normal file
View File

@@ -0,0 +1,249 @@
# Chapter 4: Predict - The Basic LM Caller
In [Chapter 3: Example](03_example.md), we learned how to create `dspy.Example` objects to represent our data points like flashcards holding an input and its corresponding desired output. We also saw in [Chapter 2: Signature](02_signature.md) how to define the *task* itself using `dspy.Signature`.
Now, we have the recipe (`Signature`) and some sample dishes (`Example`s). How do we actually get the chef (our Language Model or LM) to cook? How do we combine the instructions from the `Signature` and maybe some `Example`s to prompt the LM and get a result back?
This is where **`dspy.Predict`** comes in! It's the most fundamental way in DSPy to make a single call to a Language Model.
Think of `dspy.Predict` as:
* **A Basic Request:** Like asking the LM to do *one specific thing* based on instructions.
* **The Workhorse:** It handles formatting the input, calling the LM, and extracting the answer.
* **A Single Lego Brick:** It's the simplest "thinking" block in DSPy, directly using the LM's power.
In this chapter, we'll learn:
* What `dspy.Predict` does.
* How to use it with a `Signature`.
* How it turns your instructions and data into an LM call.
* How to get the generated output.
Let's make our first LM call!
## What is `dspy.Predict`?
`dspy.Predict` is a DSPy [Module](01_module___program.md). Its job is simple but essential:
1. **Takes a `Signature`:** When you create a `dspy.Predict` module, you tell it which `Signature` to use. This tells `Predict` what inputs to expect, what outputs to produce, and the instructions for the LM.
2. **Receives Inputs:** When you call the `Predict` module, you provide the input data (matching the `Signature`'s input fields).
3. **Formats a Prompt:** It combines the `Signature`'s instructions, the input data you provided, and potentially some `Example`s (called demonstrations or "demos") into a text prompt suitable for an LM.
4. **Calls the LM:** It sends this carefully crafted prompt to the configured Language Model ([Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md)).
5. **Parses the Output:** It takes the LM's generated text response and tries to extract the specific pieces of information defined by the `Signature`'s output fields.
6. **Returns a `Prediction`:** It gives you back a structured object (a `dspy.Prediction`) containing the extracted output fields.
It's the core mechanism for executing a single, defined prediction task using an LM.
## Using `dspy.Predict`
Let's use our `TranslateToFrench` signature from Chapter 2 to see `dspy.Predict` in action.
**1. Define the Signature (Recap):**
```python
import dspy
from dspy.signatures.field import InputField, OutputField
class TranslateToFrench(dspy.Signature):
"""Translates English text to French."""
english_sentence = dspy.InputField(desc="The original sentence in English")
french_sentence = dspy.OutputField(desc="The translated sentence in French")
```
This signature tells our module it needs `english_sentence` and should produce `french_sentence`, following the instruction "Translates English text to French."
**2. Configure the Language Model (A Sneak Peek):**
Before using `Predict`, DSPy needs to know *which* LM to talk to (like OpenAI's GPT-3.5, a local model, etc.). We'll cover this fully in [Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md), but here's a quick example:
```python
# Assume you have an OpenAI API key configured
# We'll explain this properly in the next chapter!
gpt3_turbo = dspy.OpenAI(model='gpt-3.5-turbo')
dspy.settings.configure(lm=gpt3_turbo)
```
This tells DSPy to use the `gpt-3.5-turbo` model for any LM calls.
**3. Create and Use `dspy.Predict`:**
Now we can create our translator module using `dspy.Predict` and our signature.
```python
# Create a Predict module using our signature
translator = dspy.Predict(TranslateToFrench)
# Prepare the input data
english_input = "Hello, how are you?"
# Call the predictor with the input field name from the signature
result = translator(english_sentence=english_input)
# Access the output field name from the signature
print(f"English: {english_input}")
print(f"French: {result.french_sentence}")
```
**What happens here?**
1. `translator = dspy.Predict(TranslateToFrench)`: We create an instance of `Predict`, telling it to use the `TranslateToFrench` signature.
2. `result = translator(english_sentence=english_input)`: We *call* the `translator` module like a function. We pass the input using the keyword argument `english_sentence`, which matches the `InputField` name in our signature.
3. `result.french_sentence`: `Predict` works its magic! It builds a prompt (using the signature's instructions and the input), sends it to GPT-3.5 Turbo, gets the French translation back, parses it, and stores it in the `result` object. We access the translation using the `OutputField` name, `french_sentence`.
**Expected Output (might vary slightly based on the LM):**
```
English: Hello, how are you?
French: Bonjour, comment ça va?
```
It worked! `dspy.Predict` successfully used the LM to perform the translation task defined by our signature.
## Giving Examples (Few-Shot Learning)
Sometimes, just instructions aren't enough for the LM to understand the *exact format* or style you want. You can provide a few examples (`dspy.Example` objects from [Chapter 3: Example](03_example.md)) to guide it better. This is called "few-shot learning".
You pass these examples using the `demos` argument when calling the `Predict` module.
```python
# Create some example translations (from Chapter 3)
demo1 = dspy.Example(english_sentence="Good morning!", french_sentence="Bonjour!")
demo2 = dspy.Example(english_sentence="Thank you.", french_sentence="Merci.")
# Our translator module (same as before)
translator = dspy.Predict(TranslateToFrench)
# Input we want to translate
english_input = "See you later."
# Call the predictor, this time providing demos
result_with_demos = translator(
english_sentence=english_input,
demos=[demo1, demo2] # Pass our examples here!
)
print(f"English: {english_input}")
print(f"French (with demos): {result_with_demos.french_sentence}")
```
**What's different?**
* We created `demo1` and `demo2`, which are `dspy.Example` objects containing both the English and French sentences.
* We passed `demos=[demo1, demo2]` when calling `translator`.
Now, `dspy.Predict` will format the prompt to include these examples *before* asking the LM to translate the new input. This often leads to more accurate or better-formatted results, especially for complex tasks.
**Expected Output (likely similar, but potentially more consistent):**
```
English: See you later.
French (with demos): À plus tard.
```
## How It Works Under the Hood
What actually happens when you call `translator(english_sentence=...)`?
1. **Gather Information:** The `Predict` module (`translator`) gets the input value (`"Hello, how are you?"`) and any `demos` provided. It already knows its `Signature` (`TranslateToFrench`).
2. **Format Prompt:** It constructs a text prompt for the LM. This prompt usually includes:
* The `Signature`'s instructions (`"Translates English text to French."`).
* The `demos` (if provided), formatted clearly (e.g., "English: Good morning!\nFrench: Bonjour!\n---\nEnglish: Thank you.\nFrench: Merci.\n---").
* The current input, labeled according to the `Signature` (`"English: Hello, how are you?"`).
* A label indicating where the LM should put its answer (`"French:"`).
3. **LM Call:** The `Predict` module sends this complete prompt string to the configured [LM](05_lm__language_model_client_.md) (e.g., GPT-3.5 Turbo).
4. **Receive Completion:** The LM generates text based on the prompt (e.g., it might return `"Bonjour, comment ça va?"`).
5. **Parse Output:** `Predict` looks at the `Signature`'s `OutputField`s (`french_sentence`). It parses the LM's completion to extract the value corresponding to `french_sentence`.
6. **Return Prediction:** It bundles the extracted output(s) into a `dspy.Prediction` object and returns it. You can then access the results like `result.french_sentence`.
Let's visualize this flow:
```mermaid
sequenceDiagram
participant User
participant PredictModule as translator (Predict)
participant Signature as TranslateToFrench
participant LM as Language Model Client
User->>PredictModule: Call with english_sentence="Hello", demos=[...]
PredictModule->>Signature: Get Instructions, Input/Output Fields
Signature-->>PredictModule: Return structure ("Translate...", "english_sentence", "french_sentence")
PredictModule->>PredictModule: Format prompt (Instructions + Demos + Input + Output Label)
PredictModule->>LM: Send formatted prompt ("Translate...\nEnglish: ...\nFrench: ...\n---\nEnglish: Hello\nFrench:")
LM-->>PredictModule: Return completion text ("Bonjour, comment ça va?")
PredictModule->>Signature: Parse completion for 'french_sentence'
Signature-->>PredictModule: Return parsed value {"french_sentence": "Bonjour, comment ça va?"}
PredictModule-->>User: Return Prediction object (result)
```
The core logic resides in `dspy/predict/predict.py`.
```python
# Simplified view from dspy/predict/predict.py
from dspy.primitives.program import Module
from dspy.primitives.prediction import Prediction
from dspy.signatures.signature import ensure_signature
from dspy.dsp.utils import settings # To get the configured LM
class Predict(Module):
def __init__(self, signature, **config):
super().__init__()
# Store the signature and any extra configuration
self.signature = ensure_signature(signature)
self.config = config
# Other initializations (demos, etc.)
self.demos = []
self.lm = None # LM will be set later or taken from settings
def forward(self, **kwargs):
# Get signature, demos, and LM (either passed in or from settings)
signature = self.signature # Use the stored signature
demos = kwargs.pop("demos", self.demos) # Get demos if provided
lm = kwargs.pop("lm", self.lm) or settings.lm # Find the LM to use
# Prepare inputs for the LM call
inputs = kwargs # Remaining kwargs are the inputs
# --- This is where the magic happens ---
# 1. Format the prompt using signature, demos, inputs
# (Simplified - actual formatting is more complex)
prompt = format_prompt(signature, demos, inputs)
# 2. Call the Language Model
# (Simplified - handles retries, multiple generations etc.)
lm_output_text = lm(prompt, **self.config)
# 3. Parse the LM's output text based on the signature's output fields
# (Simplified - extracts fields like 'french_sentence')
parsed_output = parse_output(signature, lm_output_text)
# --- End Magic ---
# 4. Create and return a Prediction object
prediction = Prediction(signature=signature, **parsed_output)
# (Optionally trace the call)
# settings.trace.append(...)
return prediction
# (Helper functions format_prompt and parse_output would exist elsewhere)
```
This simplified code shows the key steps: initialize with a signature, and in the `forward` method, use the signature, demos, and inputs to format a prompt, call the LM, parse the output, and return a `Prediction`. The `dspy.Prediction` object itself (defined in `dspy/primitives/prediction.py`) is essentially a specialized container holding the results corresponding to the signature's output fields.
## Conclusion
You've now learned about `dspy.Predict`, the fundamental building block in DSPy for making a single call to a Language Model!
* `dspy.Predict` takes a `Signature` to understand the task (inputs, outputs, instructions).
* It formats a prompt, calls the LM, and parses the response.
* You call it like a function, passing inputs that match the `Signature`'s `InputField`s.
* It returns a `dspy.Prediction` object containing the results, accessible via the `Signature`'s `OutputField` names.
* You can provide few-shot `Example`s via the `demos` argument to guide the LM.
`Predict` is the simplest way to leverage an LM in DSPy. But how do we actually connect DSPy to different LMs like those from OpenAI, Anthropic, Cohere, or even models running on your own machine? That's what we'll explore next!
**Next:** [Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,304 @@
# Chapter 5: LM (Language Model Client) - The Engine Room
In [Chapter 4: Predict](04_predict.md), we saw how `dspy.Predict` takes a [Signature](02_signature.md) and input data to magically generate an output. We used our `translator` example:
```python
# translator = dspy.Predict(TranslateToFrench)
# result = translator(english_sentence="Hello, how are you?")
# print(result.french_sentence) # --> Bonjour, comment ça va?
```
But wait... how did `dspy.Predict` *actually* produce that French sentence? It didn't just invent it! It needed to talk to a powerful Language Model (LM) like GPT-3.5, GPT-4, Claude, Llama, or some other AI brain.
How does DSPy connect your program (`dspy.Predict` in this case) to these external AI brains? That's the job of the **LM (Language Model Client)** abstraction!
Think of the LM Client as:
* **The Engine:** It's the core component that provides the "thinking" power to your DSPy modules.
* **The Translator:** It speaks the specific language (API calls, parameters) required by different LM providers (like OpenAI, Anthropic, Cohere, Hugging Face, or models running locally).
* **The Connection:** It bridges the gap between your abstract DSPy code and the concrete LM service.
In this chapter, you'll learn:
* What the LM Client does and why it's crucial.
* How to tell DSPy which Language Model to use.
* How this setup lets you easily switch between different LMs.
* A peek under the hood at how the connection works.
Let's connect our program to an AI brain!
## What Does the LM Client Do?
When a module like `dspy.Predict` needs an LM to generate text, it doesn't make the raw API call itself. Instead, it relies on the configured **LM Client**. The LM Client handles several important tasks:
1. **API Interaction:** It knows how to format the request (the prompt, parameters like `temperature`, `max_tokens`) in the exact way the target LM provider expects. It then makes the actual network call to the provider's API (or interacts with a local model).
2. **Parameter Management:** You can set standard parameters like `temperature` (controlling randomness) or `max_tokens` (limiting output length) when you configure the LM Client. It ensures these are sent correctly with each request.
3. **Authentication:** It usually handles sending your API keys securely (often by reading them from environment variables).
4. **Retries:** If an API call fails due to a temporary issue (like a network glitch or the LM service being busy), the LM Client often automatically retries the request a few times.
5. **Standard Interface:** It provides a consistent way for DSPy modules (`Predict`, `ChainOfThought`, etc.) to interact with *any* supported LM. This means you can swap the underlying LM without changing your module code.
6. **Caching:** To save time and money, the LM Client usually caches responses. If you make the exact same request again, it can return the saved result instantly instead of calling the LM API again.
Essentially, the LM Client abstracts away all the messy details of talking to different AI models, giving your DSPy program a clean and consistent engine to rely on.
## Configuring Which LM to Use
So, how do you tell DSPy *which* LM engine to use? You do this using `dspy.settings.configure`.
First, you need to import and create an instance of the specific client for your desired LM provider. DSPy integrates with many models primarily through the `litellm` library, but also provides direct wrappers for common ones like OpenAI.
**Example: Configuring OpenAI's GPT-3.5 Turbo**
Let's say you want to use OpenAI's `gpt-3.5-turbo` model.
1. **Import the client:**
```python
import dspy
```
*(Note: For many common providers like OpenAI, Anthropic, Cohere, etc., you can use the general `dspy.LM` client which leverages `litellm`)*
2. **Create an instance:** You specify the model name. API keys are typically picked up automatically from environment variables (e.g., `OPENAI_API_KEY`). You can also set default parameters here.
```python
# Use the generic dspy.LM for LiteLLM integration
# Model name follows 'provider/model_name' format for many models
turbo = dspy.LM(model='openai/gpt-3.5-turbo', max_tokens=100)
# Or, if you prefer the dedicated OpenAI client wrapper (functionally similar for basic use)
# from dspy.models.openai import OpenAI
# turbo = OpenAI(model='gpt-3.5-turbo', max_tokens=100)
```
This creates an object `turbo` that knows how to talk to the `gpt-3.5-turbo` model via OpenAI's API (using `litellm`'s connection logic) and will limit responses to 100 tokens by default.
3. **Configure DSPy settings:** You tell DSPy globally that this is the LM engine to use for subsequent calls.
```python
dspy.settings.configure(lm=turbo)
```
That's it! Now, any DSPy module (like `dspy.Predict`) that needs to call an LM will automatically use the `turbo` instance we just configured.
**Using Other Models (via `dspy.LM` and LiteLLM)**
The `dspy.LM` client is very powerful because it uses `litellm` under the hood, which supports a vast numberk of models from providers like Anthropic, Cohere, Google, Hugging Face, Ollama (for local models), and more. You generally just need to change the `model` string.
```python
# Example: Configure Anthropic's Claude 3 Haiku
# (Assumes ANTHROPIC_API_KEY environment variable is set)
# Note: Provider prefix 'anthropic/' is often optional if model name is unique
claude_haiku = dspy.LM(model='anthropic/claude-3-haiku-20240307', max_tokens=200)
dspy.settings.configure(lm=claude_haiku)
# Now DSPy modules will use Claude 3 Haiku
# Example: Configure a local model served via Ollama
# (Assumes Ollama server is running and has the 'llama3' model)
local_llama = dspy.LM(model='ollama/llama3', max_tokens=500, temperature=0.7)
dspy.settings.configure(lm=local_llama)
# Now DSPy modules will use the local Llama 3 model via Ollama
```
You only need to configure the LM **once** (usually at the start of your script).
## How Modules Use the Configured LM
Remember our `translator` module from [Chapter 4: Predict](04_predict.md)?
```python
# Define signature (same as before)
class TranslateToFrench(dspy.Signature):
"""Translates English text to French."""
english_sentence = dspy.InputField()
french_sentence = dspy.OutputField()
# Configure the LM (e.g., using OpenAI)
# turbo = dspy.LM(model='openai/gpt-3.5-turbo', max_tokens=100)
# dspy.settings.configure(lm=turbo)
# Create the Predict module
translator = dspy.Predict(TranslateToFrench)
# Use the module - NO need to pass the LM here!
result = translator(english_sentence="Hello, how are you?")
print(result.french_sentence)
```
Notice that we didn't pass `turbo` or `claude_haiku` or `local_llama` directly to `dspy.Predict`. When `translator(...)` is called, `dspy.Predict` internally asks `dspy.settings` for the currently configured `lm`. It then uses that client object to handle the actual LM interaction.
## The Power of Swapping LMs
This setup makes it incredibly easy to experiment with different language models. Want to see if Claude does a better job at translation than GPT-3.5? Just change the configuration!
```python
# --- Experiment 1: Using GPT-3.5 Turbo ---
print("Testing with GPT-3.5 Turbo...")
turbo = dspy.LM(model='openai/gpt-3.5-turbo', max_tokens=100)
dspy.settings.configure(lm=turbo)
translator = dspy.Predict(TranslateToFrench)
result_turbo = translator(english_sentence="Where is the library?")
print(f"GPT-3.5: {result_turbo.french_sentence}")
# --- Experiment 2: Using Claude 3 Haiku ---
print("\nTesting with Claude 3 Haiku...")
claude_haiku = dspy.LM(model='anthropic/claude-3-haiku-20240307', max_tokens=100)
dspy.settings.configure(lm=claude_haiku)
# We can reuse the SAME translator object, or create a new one
# It will pick up the NEWLY configured LM from settings
result_claude = translator(english_sentence="Where is the library?")
print(f"Claude 3 Haiku: {result_claude.french_sentence}")
```
**Expected Output:**
```
Testing with GPT-3.5 Turbo...
GPT-3.5: Où est la bibliothèque?
Testing with Claude 3 Haiku...
Claude 3 Haiku: Où se trouve la bibliothèque ?
```
Look at that! We changed the underlying AI brain just by modifying the `dspy.settings.configure` call. The core logic of our `translator` module remained untouched. This flexibility is a key advantage of DSPy.
## How It Works Under the Hood (A Peek)
Let's trace what happens when `translator(english_sentence=...)` runs:
1. **Module Execution:** The `forward` method of the `dspy.Predict` module (`translator`) starts executing.
2. **Get LM Client:** Inside its logic, `Predict` needs to call an LM. It accesses `dspy.settings.lm`. This returns the currently configured LM client object (e.g., the `claude_haiku` instance we set).
3. **Format Prompt:** `Predict` uses the [Signature](02_signature.md) and the input (`english_sentence`) to prepare the text prompt.
4. **LM Client Call:** `Predict` calls the LM client object, passing the formatted prompt and any necessary parameters (like `max_tokens` which might come from the client's defaults or be overridden). Let's say it calls `claude_haiku(prompt, max_tokens=100, ...)`.
5. **API Interaction (Inside LM Client):**
* The `claude_haiku` object (an instance of `dspy.LM`) checks its cache first. If the same request was made recently, it might return the cached response directly.
* If not cached, it constructs the specific API request for Anthropic's Claude 3 Haiku model (using `litellm`). This includes setting headers, API keys, and formatting the prompt/parameters correctly for Anthropic.
* It makes the HTTPS request to the Anthropic API endpoint.
* It handles potential retries if the API returns specific errors.
* It receives the raw response from the API.
6. **Parse Response (Inside LM Client):** The client extracts the generated text content from the API response structure.
7. **Return to Module:** The LM client returns the generated text (e.g., `"Où se trouve la bibliothèque ?"`) back to the `dspy.Predict` module.
8. **Module Finishes:** `Predict` takes this text, parses it according to the `OutputField` (`french_sentence`) in the signature, and returns the final `Prediction` object.
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant User
participant PredictModule as translator (Predict)
participant Settings as dspy.settings
participant LMClient as LM Client (e.g., dspy.LM instance)
participant ActualAPI as Actual LM API (e.g., Anthropic)
User->>PredictModule: Call translator(english_sentence="...")
PredictModule->>Settings: Get configured lm
Settings-->>PredictModule: Return LMClient instance
PredictModule->>PredictModule: Format prompt for LM
PredictModule->>LMClient: __call__(prompt, **params)
LMClient->>LMClient: Check Cache (Cache Miss)
LMClient->>ActualAPI: Send formatted API request (prompt, key, params)
ActualAPI-->>LMClient: Return API response
LMClient->>LMClient: Parse response, extract text
LMClient-->>PredictModule: Return generated text
PredictModule->>PredictModule: Parse text into output fields
PredictModule-->>User: Return Prediction object
```
**Relevant Code Files:**
* `dspy/clients/lm.py`: Defines the main `dspy.LM` class which uses `litellm` for broad compatibility. It handles caching (in-memory and disk via `litellm`), retries, parameter mapping, and calling the appropriate `litellm` functions.
* `dspy/clients/base_lm.py`: Defines the `BaseLM` abstract base class that all LM clients inherit from. It includes the basic `__call__` structure, history tracking, and requires subclasses to implement the core `forward` method for making the actual API call. It also defines `inspect_history`.
* `dspy/models/openai.py` (and others like `anthropic.py`, `cohere.py` - though `dspy.LM` is often preferred now): Specific client implementations (often inheriting from `BaseLM` or using `dspy.LM` internally).
* `dspy/dsp/utils/settings.py`: Defines the `Settings` singleton object where the configured `lm` (and other components like `rm`) are stored and accessed globally or via thread-local context.
```python
# Simplified structure from dspy/clients/base_lm.py
class BaseLM:
def __init__(self, model, **kwargs):
self.model = model
self.kwargs = kwargs # Default params like temp, max_tokens
self.history = [] # Stores records of calls
@with_callbacks # Handles logging, potential custom hooks
def __call__(self, prompt=None, messages=None, **kwargs):
# 1. Call the actual request logic (implemented by subclasses)
response = self.forward(prompt=prompt, messages=messages, **kwargs)
# 2. Extract the output text(s)
outputs = [choice.message.content for choice in response.choices] # Simplified
# 3. Log the interaction (prompt, response, cost, etc.)
# (self.history.append(...))
# 4. Return the list of generated texts
return outputs
def forward(self, prompt=None, messages=None, **kwargs):
# Subclasses MUST implement this method to make the actual API call
# It should return an object similar to OpenAI's API response structure
raise NotImplementedError
# Simplified structure from dspy/clients/lm.py
import litellm
class LM(BaseLM): # Inherits from BaseLM
def __init__(self, model, model_type="chat", ..., num_retries=8, **kwargs):
super().__init__(model=model, **kwargs)
self.model_type = model_type
self.num_retries = num_retries
# ... other setup ...
def forward(self, prompt=None, messages=None, **kwargs):
# Combine default and call-specific kwargs
request_kwargs = {**self.kwargs, **kwargs}
messages = messages or [{"role": "user", "content": prompt}]
# Use litellm to make the call, handles different providers
# Simplified - handles caching, retries, model types under the hood
if self.model_type == "chat":
response = litellm.completion(
model=self.model,
messages=messages,
# Pass combined parameters
**request_kwargs,
# Configure retries and caching via litellm
num_retries=self.num_retries,
# cache=...
)
else: # Text completion model type
response = litellm.text_completion(...) # Simplified
# LiteLLM returns an object compatible with BaseLM's expectations
return response
# Simplified Usage in a Module (like Predict)
# from dspy.dsp.utils import settings
# Inside Predict's forward method:
# lm_client = settings.lm # Get the globally configured client
# prompt_text = self._generate_prompt(...) # Format the prompt
# parameters = self.config # Get parameters specific to this Predict instance
# generated_texts = lm_client(prompt_text, **parameters) # Call the LM Client!
# output_text = generated_texts[0]
# parsed_result = self._parse_output(output_text) # Parse based on signature
# return Prediction(**parsed_result)
```
The key is that modules interact with the standard `BaseLM` interface (primarily its `__call__` method), and the specific LM client implementation handles the rest.
## Conclusion
You've now demystified the **LM (Language Model Client)**! It's the essential engine connecting your DSPy programs to the power of large language models.
* The LM Client acts as a **translator** and **engine**, handling API calls, parameters, retries, and caching.
* You configure which LM to use **globally** via `dspy.settings.configure(lm=...)`, usually using `dspy.LM` for broad compatibility via `litellm`.
* DSPy modules like `dspy.Predict` automatically **use the configured LM** without needing it passed explicitly.
* This makes it easy to **swap out different LMs** (like GPT-4, Claude, Llama) with minimal code changes, facilitating experimentation.
Now that we know how to connect to the "brain" (LM), what about connecting to external knowledge sources like databases or document collections? That's where the **RM (Retrieval Model Client)** comes in.
**Next:** [Chapter 6: RM (Retrieval Model Client)](06_rm__retrieval_model_client_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,345 @@
# Chapter 6: RM (Retrieval Model Client) - Your Program's Librarian
In [Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md), we learned how to connect our DSPy programs to the powerful "brain" of a Language Model (LM) using the LM Client. The LM is great at generating creative text, answering questions based on its vast training data, and reasoning.
But what if your program needs information that the LM wasn't trained on?
* Maybe it's very recent news (LMs often have knowledge cut-offs).
* Maybe it's private information from your company's documents.
* Maybe it's specific details from a large technical manual.
LMs can't know *everything*. Sometimes, your program needs to **look things up** in an external knowledge source before it can generate an answer.
Imagine you're building a chatbot that answers questions about your company's latest product manuals. The LM itself probably hasn't read them. Your program needs a way to:
1. Receive the user's question (e.g., "How do I reset the Frobozz device?").
2. **Search** through the product manuals for relevant sections about resetting the Frobozz.
3. Give those relevant sections to the LM as **context**.
4. Ask the LM to generate a final answer based on the user's question *and* the context it just found.
This "search" step is where the **RM (Retrieval Model Client)** comes in!
Think of the RM as:
* **A Specialized Librarian:** Your program asks it to find relevant information on a topic (the query).
* **A Search Engine Interface:** It connects your DSPy program to an external search system or database.
* **The Knowledge Fetcher:** It retrieves relevant text snippets (passages) to help the LM.
In this chapter, you'll learn:
* What an RM Client does and why it's essential for knowledge-intensive tasks.
* How to configure DSPy to use a specific Retrieval Model.
* How DSPy modules can use the configured RM to find information.
* A glimpse into how the RM fetches data behind the scenes.
Let's give our program access to external knowledge!
## What Does the RM Client Do?
The RM Client acts as a bridge between your DSPy program and an external knowledge source. Its main job is to:
1. **Receive a Search Query:** Your program gives it a text query (e.g., "reset Frobozz device").
2. **Interface with a Retrieval System:** It talks to the actual search engine or database. This could be:
* A **Vector Database:** Like Pinecone, Weaviate, Chroma, Milvus (great for searching based on meaning).
* A **Specialized Retrieval API:** Like ColBERTv2 (a powerful neural search model), You.com Search API, or a custom company search API.
* A **Local Index:** A search index built over your own files (e.g., using ColBERT locally).
3. **Fetch Relevant Passages:** It asks the retrieval system to find the top `k` most relevant text documents or passages based on the query.
4. **Return the Passages:** It gives these retrieved passages back to your DSPy program, usually as a list of text strings or structured objects.
The key goal is to provide **relevant context** that the [LM (Language Model Client)](05_lm__language_model_client_.md) can then use to perform its task more accurately, often within a structure called Retrieval-Augmented Generation (RAG).
## Configuring Which RM to Use
Just like we configured the LM in the previous chapter, we need to tell DSPy which RM to use. This is done using `dspy.settings.configure`.
First, you import and create an instance of the specific RM client you want to use. DSPy has built-in clients for several common retrieval systems.
**Example: Configuring ColBERTv2 (a hosted endpoint)**
ColBERTv2 is a powerful retrieval model. Let's imagine there's a public server running ColBERTv2 that has indexed Wikipedia.
1. **Import the client:**
```python
import dspy
```
*(For many RMs like ColBERTv2, Pinecone, Weaviate, the client is directly available under `dspy` or `dspy.retrieve`)*
2. **Create an instance:** You need to provide the URL and port (if applicable) of the ColBERTv2 server.
```python
# Assume a ColBERTv2 server is running at this URL indexing Wikipedia
colbertv2_wiki = dspy.ColBERTv2(url='http://your-colbertv2-endpoint.com:8893', port=None)
```
This creates an object `colbertv2_wiki` that knows how to talk to that specific ColBERTv2 server.
3. **Configure DSPy settings:** Tell DSPy globally that this is the RM to use.
```python
dspy.settings.configure(rm=colbertv2_wiki)
```
Now, any DSPy module that needs to retrieve information will automatically use the `colbertv2_wiki` instance.
**Using Other RMs (e.g., Pinecone, Weaviate)**
Configuring other RMs follows a similar pattern. You'll typically need to provide details like index names, API keys (often via environment variables), and the client object for that specific service.
```python
# Example: Configuring Pinecone (Conceptual - requires setup)
# from dspy.retrieve.pinecone_rm import PineconeRM
# Assumes PINECONE_API_KEY and PINECONE_ENVIRONMENT are set in environment
# pinecone_retriever = PineconeRM(
# pinecone_index_name='my-company-docs-index',
# # Assuming embeddings are done via OpenAI's model
# openai_embed_model='text-embedding-ada-002'
# )
# dspy.settings.configure(rm=pinecone_retriever)
# Example: Configuring Weaviate (Conceptual - requires setup)
# import weaviate
# from dspy.retrieve.weaviate_rm import WeaviateRM
# weaviate_client = weaviate.connect_to_local() # Or connect_to_wcs, etc.
# weaviate_retriever = WeaviateRM(
# weaviate_collection_name='my_manuals',
# weaviate_client=weaviate_client
# )
# dspy.settings.configure(rm=weaviate_retriever)
```
*(Don't worry about the specifics of connecting to Pinecone or Weaviate here; the key takeaway is the `dspy.settings.configure(rm=...)` pattern.)*
## How Modules Use the Configured RM: `dspy.Retrieve`
Usually, you don't call `dspy.settings.rm(...)` directly in your main program logic. Instead, you use a DSPy module designed for retrieval. The most basic one is `dspy.Retrieve`.
The `dspy.Retrieve` module is a simple [Module](01_module___program.md) whose job is to:
1. Take a query as input.
2. Call the currently configured RM (`dspy.settings.rm`).
3. Return the retrieved passages.
Here's how you typically use it within a DSPy `Program`:
```python
import dspy
# Assume RM is already configured (e.g., colbertv2_wiki from before)
# dspy.settings.configure(rm=colbertv2_wiki)
class SimpleRAG(dspy.Module):
def __init__(self, num_passages=3):
super().__init__()
# Initialize the Retrieve module, asking for top 3 passages
self.retrieve = dspy.Retrieve(k=num_passages)
# Initialize a Predict module to generate the answer
self.generate_answer = dspy.Predict('context, question -> answer')
def forward(self, question):
# 1. Retrieve relevant context using the configured RM
context = self.retrieve(query=question).passages # Note: Pass query=...
# 2. Generate the answer using the LM, providing context
prediction = self.generate_answer(context=context, question=question)
return prediction
# --- Let's try it ---
# Assume LM is also configured (e.g., gpt3_turbo from Chapter 5)
# dspy.settings.configure(lm=gpt3_turbo)
rag_program = SimpleRAG()
question = "What is the largest rodent?"
result = rag_program(question=question)
print(f"Question: {question}")
# The retrieve module would fetch passages about rodents...
# print(f"Context: {context}") # (Would show passages about capybaras, etc.)
print(f"Answer: {result.answer}")
```
**What's happening?**
1. `self.retrieve = dspy.Retrieve(k=3)`: Inside our `SimpleRAG` program, we create an instance of `dspy.Retrieve`. We tell it we want the top `k=3` passages.
2. `context = self.retrieve(query=question).passages`: In the `forward` method, we call the `retrieve` module with the input `question` as the `query`.
* **Crucially:** The `dspy.Retrieve` module automatically looks up `dspy.settings.rm` (our configured `colbertv2_wiki`).
* It calls `colbertv2_wiki(question, k=3)`.
* The RM client fetches the passages.
* `dspy.Retrieve` returns a `dspy.Prediction` object, and we access the list of passage texts using `.passages`.
3. `self.generate_answer(context=context, question=question)`: We then pass the fetched `context` (along with the original `question`) to our `generate_answer` module (a `dspy.Predict` instance), which uses the configured [LM](05_lm__language_model_client_.md) to produce the final answer.
**Expected Output (using a Wikipedia RM and a capable LM):**
```
Question: What is the largest rodent?
Answer: The largest rodent is the capybara.
```
The `dspy.Retrieve` module handles the interaction with the configured RM seamlessly.
## Calling the RM Directly (for Testing)
While `dspy.Retrieve` is the standard way, you *can* call the configured RM directly if you want to quickly test it or see what it returns.
```python
import dspy
# Assume colbertv2_wiki is configured as the RM
# dspy.settings.configure(rm=colbertv2_wiki)
query = "Stanford University mascot"
k = 2 # Ask for top 2 passages
# Call the configured RM directly
retrieved_passages = dspy.settings.rm(query, k=k)
# Print the results
print(f"Query: {query}")
print(f"Retrieved Passages (Top {k}):")
for i, passage in enumerate(retrieved_passages):
# RM clients often return dotdict objects with 'long_text'
print(f"--- Passage {i+1} ---")
print(passage.long_text) # Access the text content
```
**Expected Output (might vary depending on the RM and its index):**
```
Query: Stanford University mascot
Retrieved Passages (Top 2):
--- Passage 1 ---
Stanford Tree | Stanford University Athletics The Stanford Tree is the Stanford Band's mascot and the unofficial mascot of Stanford University. Stanford's team name is "Cardinal", referring to the vivid red color (not the bird as at several other schools). The Tree, in various versions, has been called one of America's most bizarre and controversial college mascots. The tree costume is created anew by the Band member selected to be the Tree each year. The Tree appears at football games, basketball games, and other Stanford Athletic events. Any current student may petition to become the Tree for the following year....
--- Passage 2 ---
Stanford Cardinal | The Official Site of Stanford Athletics Stanford University is home to 36 varsity sports programs, 20 for women and 16 for men. Stanford participates in the NCAA's Division I (Football Bowl Subdivision subdivision for football). Stanford is a member of the Pac-12 Conference in most sports; the men's and women's water polo teams are members of the Mountain Pacific Sports Federation, the men's volleyball team is a member of the Mountain Pacific Sports Federation, the field hockey team is a member of the America East Conference, and the sailing team competes in the Pacific Coast Collegiate Sailing Conference....
```
This shows how you can directly interact with the RM client configured in `dspy.settings`. Notice the output is often a list of `dspy.dsp.utils.dotdict` objects, where the actual text is usually in the `long_text` attribute. `dspy.Retrieve` conveniently extracts just the text into its `.passages` list.
## How It Works Under the Hood
Let's trace the journey of a query when using `dspy.Retrieve` within our `SimpleRAG` program:
1. **Module Call:** The `SimpleRAG` program's `forward` method calls `self.retrieve(query="What is the largest rodent?")`.
2. **Get RM Client:** The `dspy.Retrieve` module (`self.retrieve`) needs an RM. It looks up `dspy.settings.rm`. This returns the configured RM client object (e.g., our `colbertv2_wiki` instance).
3. **RM Client Call:** The `Retrieve` module calls the RM client object's `forward` (or `__call__`) method, passing the query and `k` (e.g., `colbertv2_wiki("What is the largest rodent?", k=3)`).
4. **External Interaction (Inside RM Client):**
* The `colbertv2_wiki` object (an instance of `dspy.ColBERTv2`) constructs an HTTP request to the ColBERTv2 server URL (`http://your-colbertv2-endpoint.com:8893`). The request includes the query and `k`.
* It sends the request over the network.
* The external ColBERTv2 server receives the request, searches its index (e.g., Wikipedia), and finds the top 3 relevant passages.
* The server sends the passages back in the HTTP response (often as JSON).
5. **Parse Response (Inside RM Client):** The `colbertv2_wiki` client receives the response, parses the JSON, and converts the passages into a list of `dspy.dsp.utils.dotdict` objects (each containing `long_text`, potentially `pid`, `score`, etc.).
6. **Return to Module:** The RM client returns this list of `dotdict` passages back to the `dspy.Retrieve` module.
7. **Extract Text:** The `Retrieve` module takes the list of `dotdict` objects and extracts the `long_text` from each, creating a simple list of strings.
8. **Return Prediction:** It packages this list of strings into a `dspy.Prediction` object under the `passages` key and returns it to the `SimpleRAG` program.
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant User
participant RAGProgram as SimpleRAG (forward)
participant RetrieveMod as dspy.Retrieve
participant Settings as dspy.settings
participant RMClient as RM Client (e.g., ColBERTv2)
participant ExtSearch as External Search (e.g., ColBERT Server)
User->>RAGProgram: Call with question="..."
RAGProgram->>RetrieveMod: Call retrieve(query=question)
RetrieveMod->>Settings: Get configured rm
Settings-->>RetrieveMod: Return RMClient instance
RetrieveMod->>RMClient: __call__(query, k=3)
RMClient->>ExtSearch: Send Search Request (query, k)
ExtSearch-->>RMClient: Return Found Passages
RMClient->>RMClient: Parse Response into dotdicts
RMClient-->>RetrieveMod: Return list[dotdict]
RetrieveMod->>RetrieveMod: Extract 'long_text' into list[str]
RetrieveMod-->>RAGProgram: Return Prediction(passages=list[str])
RAGProgram->>RAGProgram: Use context for LM call...
RAGProgram-->>User: Return final answer
```
**Relevant Code Files:**
* `dspy/retrieve/retrieve.py`: Defines the `dspy.Retrieve` module. Its `forward` method gets the query, retrieves the RM from `dspy.settings`, calls the RM, and processes the results into a `Prediction`.
* `dspy/dsp/colbertv2.py`: Defines the `dspy.ColBERTv2` client. Its `__call__` method makes HTTP requests (`requests.get` or `requests.post`) to a ColBERTv2 endpoint and parses the JSON response. (Other clients like `dspy/retrieve/pinecone_rm.py` or `dspy/retrieve/weaviate_rm.py` contain logic specific to those services).
* `dspy/dsp/utils/settings.py`: Where the configured `rm` instance is stored and accessed globally (as seen in [Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md)).
```python
# Simplified view from dspy/retrieve/retrieve.py
import dspy
from dspy.primitives.prediction import Prediction
class Retrieve(dspy.Module):
def __init__(self, k=3):
super().__init__()
self.k = k
def forward(self, query: str, k: Optional[int] = None) -> Prediction:
# Determine how many passages to retrieve
k = k if k is not None else self.k
# Get the configured RM client from global settings
rm_client = dspy.settings.rm
if not rm_client:
raise AssertionError("No RM is loaded. Configure with dspy.settings.configure(rm=...).")
# Call the RM client instance
# The RM client handles communication with the actual search system
passages_or_dotdicts = rm_client(query, k=k) # e.g., calls colbertv2_wiki(query, k=k)
# Ensure output is iterable and extract text
# (Simplified - handles different return types from RMs)
if isinstance(passages_or_dotdicts, list) and hasattr(passages_or_dotdicts[0], 'long_text'):
passages = [psg.long_text for psg in passages_or_dotdicts]
else:
# Assume it's already a list of strings or handle other cases
passages = list(passages_or_dotdicts)
# Return passages wrapped in a Prediction object
return Prediction(passages=passages)
# Simplified view from dspy/dsp/colbertv2.py
import requests
from dspy.dsp.utils import dotdict
class ColBERTv2:
def __init__(self, url: str, port: Optional[int] = None, **kwargs):
self.url = f"{url}:{port}" if port else url
# ... other init ...
def __call__(self, query: str, k: int = 10, **kwargs) -> list[dotdict]:
# Construct the payload for the API request
payload = {"query": query, "k": k}
try:
# Make the HTTP GET request to the ColBERTv2 server
res = requests.get(self.url, params=payload, timeout=10)
res.raise_for_status() # Raise an exception for bad status codes
# Parse the JSON response
json_response = res.json()
topk = json_response.get("topk", [])[:k]
# Convert results into dotdict objects for consistency
passages = [dotdict({**d, "long_text": d.get("text", "")}) for d in topk]
return passages
except requests.exceptions.RequestException as e:
print(f"Error calling ColBERTv2 server: {e}")
return [] # Return empty list on error
```
The key idea is abstraction: `dspy.Retrieve` uses whatever RM is configured in `dspy.settings`, and the specific RM client hides the details of talking to its particular backend search system.
## Conclusion
You've now met the **RM (Retrieval Model Client)**, your DSPy program's connection to external knowledge sources!
* An RM acts like a **librarian** or **search engine interface**.
* It takes a **query** and fetches **relevant text passages** from systems like vector databases (Pinecone, Weaviate) or APIs (ColBERTv2).
* It provides crucial **context** for LMs, enabling tasks like answering questions about recent events or private documents (Retrieval-Augmented Generation - RAG).
* You configure it globally using `dspy.settings.configure(rm=...)`.
* The `dspy.Retrieve` module is the standard way to use the configured RM within your programs.
With LMs providing reasoning and RMs providing knowledge, we can build powerful DSPy programs. But how do we know if our program is actually working well? How do we measure its performance? That's where evaluation comes in!
**Next:** [Chapter 7: Evaluate](07_evaluate.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

315
output/DSPy/07_evaluate.md Normal file
View File

@@ -0,0 +1,315 @@
# Chapter 7: Evaluate - Grading Your Program
In the previous chapter, [Chapter 6: RM (Retrieval Model Client)](06_rm__retrieval_model_client_.md), we learned how to connect our DSPy program to external knowledge sources using Retrieval Models (RMs). We saw how combining RMs with Language Models (LMs) allows us to build sophisticated programs like Retrieval-Augmented Generation (RAG) systems.
Now that we can build these powerful programs, a crucial question arises: **How good are they?** If we build a RAG system to answer questions, how often does it get the answer right? How do we measure its performance objectively?
This is where **`dspy.Evaluate`** comes in! It's DSPy's built-in tool for testing and grading your programs.
Think of `dspy.Evaluate` as:
* **An Automated Grader:** Like a teacher grading a batch of homework assignments based on an answer key.
* **A Test Suite Runner:** Similar to how software developers use test suites to check if their code works correctly.
* **Your Program's Report Card:** It gives you a score that tells you how well your DSPy program is performing on a specific set of tasks.
In this chapter, you'll learn:
* What you need to evaluate a DSPy program.
* How to define a metric (a grading rule).
* How to use `dspy.Evaluate` to run the evaluation and get a score.
* How it works behind the scenes.
Let's learn how to grade our DSPy creations!
## The Ingredients for Evaluation
To grade your program using `dspy.Evaluate`, you need three main ingredients:
1. **Your DSPy `Program`:** The program you want to test. This could be a simple `dspy.Predict` module or a complex multi-step program like the `SimpleRAG` we sketched out in the last chapter.
2. **A Dataset (`devset`):** A list of `dspy.Example` objects ([Chapter 3: Example](03_example.md)). Crucially, these examples must contain not only the **inputs** your program expects but also the **gold standard outputs** (the correct answers or desired results) that you want to compare against. This dataset is often called a "development set" or "dev set".
3. **A Metric Function (`metric`):** A Python function you define. This function takes one gold standard `Example` and the `Prediction` generated by your program for that example's inputs. It then compares them and returns a score indicating how well the prediction matched the gold standard. The score is often `1.0` for a perfect match and `0.0` for a mismatch, but it can also be a fractional score (e.g., for F1 score).
`dspy.Evaluate` takes these three ingredients, runs your program on all examples in the dataset, uses your metric function to score each prediction against the gold standard, and finally reports the average score across the entire dataset.
## Evaluating a Simple Question Answering Program
Let's illustrate this with a simple example. Suppose we have a basic DSPy program that's supposed to answer simple questions.
```python
import dspy
# Assume we have configured an LM client (Chapter 5)
# gpt3_turbo = dspy.LM(model='openai/gpt-3.5-turbo')
# dspy.settings.configure(lm=gpt3_turbo)
# A simple program using dspy.Predict (Chapter 4)
class BasicQA(dspy.Module):
def __init__(self):
super().__init__()
# Use a simple signature: question -> answer
self.predictor = dspy.Predict('question -> answer')
def forward(self, question):
return self.predictor(question=question)
# Create an instance of our program
qa_program = BasicQA()
```
Now, let's prepare the other ingredients for evaluation.
**1. Prepare the Dataset (`devset`)**
We need a list of `dspy.Example` objects, each containing a `question` (input) and the correct `answer` (gold standard output).
```python
# Create example data points with questions and gold answers
dev_example1 = dspy.Example(question="What color is the sky?", answer="blue")
dev_example2 = dspy.Example(question="What is 2 + 2?", answer="4")
dev_example3 = dspy.Example(question="What is the capital of France?", answer="Paris")
dev_example_wrong = dspy.Example(question="Who wrote Hamlet?", answer="Shakespeare") # Let's assume our QA program might get this wrong
# Create the development set (list of examples)
devset = [dev_example1, dev_example2, dev_example3, dev_example_wrong]
# We need to tell DSPy which fields are inputs vs outputs for evaluation
# The .with_inputs() method marks the input keys.
# The remaining keys ('answer' in this case) are treated as labels.
devset = [d.with_inputs('question') for d in devset]
```
Here, we've created a small dataset `devset` with four question-answer pairs. We used `.with_inputs('question')` to mark the `question` field as the input; `dspy.Evaluate` will automatically treat the remaining field (`answer`) as the gold label to compare against.
**2. Define a Metric Function (`metric`)**
We need a function that compares the program's predicted answer to the gold answer in an example. Let's create a simple "exact match" metric.
```python
def simple_exact_match_metric(gold_example, prediction, trace=None):
# Does the predicted 'answer' EXACTLY match the gold 'answer'?
# '.answer' field comes from our Predict signature 'question -> answer'
# 'gold_example.answer' is the gold label from the devset example
return prediction.answer == gold_example.answer
# Note: DSPy often provides common metrics too, like dspy.evaluate.answer_exact_match
# import dspy.evaluate
# metric = dspy.evaluate.answer_exact_match
```
Our `simple_exact_match_metric` function takes the gold `dspy.Example` (`gold_example`) and the program's output `dspy.Prediction` (`prediction`). It returns `True` (which Python treats as `1.0`) if the predicted `answer` matches the gold `answer`, and `False` (`0.0`) otherwise. The `trace` argument is optional and can be ignored for basic metrics; it sometimes contains information about the program's execution steps.
**3. Create and Run `dspy.Evaluate`**
Now we have all the ingredients: `qa_program`, `devset`, and `simple_exact_match_metric`. Let's use `dspy.Evaluate`.
```python
from dspy.evaluate import Evaluate
# 1. Create the Evaluator instance
evaluator = Evaluate(
devset=devset, # The dataset to evaluate on
metric=simple_exact_match_metric, # The function to score predictions
num_threads=4, # Run 4 evaluations in parallel (optional)
display_progress=True, # Show a progress bar (optional)
display_table=True # Display results in a table (optional)
)
# 2. Run the evaluation by calling the evaluator with the program
# This will run qa_program on each example in devset,
# score it using simple_exact_match_metric, and return the average score.
average_score = evaluator(qa_program)
print(f"Average Score: {average_score}%")
```
**What happens here?**
1. We create an `Evaluate` object, providing our dataset and metric. We also request parallel execution (`num_threads=4`) for speed and ask for progress/table display.
2. We call the `evaluator` instance with our `qa_program`.
3. `Evaluate` iterates through `devset`:
* For `dev_example1`, it calls `qa_program(question="What color is the sky?")`. Let's assume the program predicts `answer="blue"`.
* It calls `simple_exact_match_metric(dev_example1, predicted_output)`. Since `"blue" == "blue"`, the score is `1.0`.
* It does the same for `dev_example2` (input: "What is 2 + 2?"). Assume prediction is `answer="4"`. Score: `1.0`.
* It does the same for `dev_example3` (input: "What is the capital of France?"). Assume prediction is `answer="Paris"`. Score: `1.0`.
* It does the same for `dev_example_wrong` (input: "Who wrote Hamlet?"). Maybe the simple LM messes up and predicts `answer="William Shakespeare"`. Since `"William Shakespeare" != "Shakespeare"`, the score is `0.0`.
4. `Evaluate` calculates the average score: `(1.0 + 1.0 + 1.0 + 0.0) / 4 = 0.75`.
5. It prints the average score as a percentage.
**Expected Output:**
A progress bar will be shown (if `tqdm` is installed), followed by a table like this (requires `pandas`):
```text
Average Metric: 3 / 4 (75.0%)
question answer simple_exact_match_metric
0 What color is the sky? blue ✔️ [True]
1 What is 2 + 2? 4 ✔️ [True]
2 What is the capital of France? Paris ✔️ [True]
3 Who wrote Hamlet? Shakespeare
```
*(Note: The table shows the predicted answer if different, otherwise just the metric outcome. The exact table format might vary slightly).*
And finally:
```text
Average Score: 75.0%
```
This tells us our simple QA program achieved 75% accuracy on our small development set using the exact match criterion.
## Getting More Details (Optional Flags)
Sometimes, just the average score isn't enough. You might want to see the score for each individual example or the actual predictions made by the program. `Evaluate` provides flags for this:
* `return_all_scores=True`: Returns the average score *and* a list containing the individual score for each example.
* `return_outputs=True`: Returns the average score *and* a list of tuples, where each tuple contains `(example, prediction, score)`.
```python
# Re-run evaluation asking for more details
evaluator_detailed = Evaluate(devset=devset, metric=simple_exact_match_metric)
# Get individual scores
avg_score, individual_scores = evaluator_detailed(qa_program, return_all_scores=True)
print(f"Individual Scores: {individual_scores}") # Output: [True, True, True, False]
# Get full outputs
avg_score, outputs_list = evaluator_detailed(qa_program, return_outputs=True)
# outputs_list[0] would be roughly: (dev_example1, Prediction(answer='blue'), True)
# outputs_list[3] would be roughly: (dev_example_wrong, Prediction(answer='William Shakespeare'), False)
print(f"Number of outputs returned: {len(outputs_list)}") # Output: 4
```
These flags are useful for more detailed error analysis to understand *where* your program is failing.
## How It Works Under the Hood
What happens internally when you call `evaluator(program)`?
1. **Initialization:** The `Evaluate` instance stores the `devset`, `metric`, `num_threads`, and other settings.
2. **Parallel Executor:** It creates a `ParallelExecutor` (if `num_threads > 1`) to manage running the evaluations concurrently.
3. **Iteration:** It iterates through each `example` in the `devset`.
4. **Program Execution:** For each `example`, it calls `program(**example.inputs())` (e.g., `qa_program(question=example.question)`). This runs your DSPy program's `forward` method to get a `prediction`.
5. **Metric Calculation:** It calls the provided `metric` function, passing it the original `example` (which contains the gold labels) and the `prediction` object returned by the program (e.g., `metric(example, prediction)`). This yields a `score`.
6. **Error Handling:** If running the program or the metric causes an error for a specific example, `Evaluate` catches it (up to `max_errors`), records a default `failure_score` (usually 0.0), and continues with the rest of the dataset.
7. **Aggregation:** It collects all the individual scores (including failure scores).
8. **Calculate Average:** It computes the average score by summing all scores and dividing by the total number of examples in the `devset`.
9. **Return Results:** It returns the average score (and optionally the individual scores or full output tuples based on the flags).
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant User
participant Evaluator as dspy.Evaluate
participant Executor as ParallelExecutor
participant Program as Your DSPy Program
participant Metric as Your Metric Function
User->>Evaluator: __call__(program)
Evaluator->>Executor: Create (manages threads)
loop For each example in devset
Executor->>Executor: Assign task to a thread
Note over Executor, Program: In parallel thread:
Executor->>Program: Call program(**example.inputs())
Program-->>Executor: Return prediction
Executor->>Metric: Call metric(example, prediction)
Metric-->>Executor: Return score
end
Executor->>Evaluator: Collect all results (predictions, scores)
Evaluator->>Evaluator: Calculate average score
Evaluator-->>User: Return average score (and other requested data)
```
**Relevant Code Files:**
* `dspy/evaluate/evaluate.py`: Defines the `Evaluate` class.
* The `__init__` method stores the configuration.
* The `__call__` method orchestrates the evaluation: sets up the `ParallelExecutor`, defines the `process_item` function (which runs the program and metric for one example), executes it over the `devset`, aggregates results, and handles display/return logic.
* `dspy/utils/parallelizer.py`: Contains the `ParallelExecutor` class used for running tasks concurrently across multiple threads or processes.
* `dspy/evaluate/metrics.py`: Contains implementations of common metrics like `answer_exact_match`.
```python
# Simplified view from dspy/evaluate/evaluate.py
# ... imports ...
from dspy.utils.parallelizer import ParallelExecutor
class Evaluate:
def __init__(self, devset, metric, num_threads=1, ..., failure_score=0.0):
self.devset = devset
self.metric = metric
self.num_threads = num_threads
self.display_progress = ...
self.display_table = ...
# ... store other flags ...
self.failure_score = failure_score
# @with_callbacks # Decorator handles optional logging/callbacks
def __call__(self, program, metric=None, devset=None, ...):
# Use provided args or fall back to instance attributes
metric = metric if metric is not None else self.metric
devset = devset if devset is not None else self.devset
num_threads = ... # Similar logic for other args
# Create executor for parallelism
executor = ParallelExecutor(num_threads=num_threads, ...)
# Define the work to be done for each example
def process_item(example):
try:
# Run the program with the example's inputs
prediction = program(**example.inputs())
# Call the metric function with the gold example and prediction
score = metric(example, prediction)
return prediction, score
except Exception as e:
# Handle errors during program/metric execution
# Log error, return None or failure score
print(f"Error processing example: {e}")
return None # Executor will handle None later
# Execute process_item for all examples in devset using the executor
raw_results = executor.execute(process_item, devset)
# Process results, handle failures (replace None with failure score)
results = []
for i, r in enumerate(raw_results):
example = devset[i]
if r is None: # Execution failed for this example
prediction, score = dspy.Prediction(), self.failure_score
else:
prediction, score = r
results.append((example, prediction, score))
# Calculate the average score
total_score = sum(score for *_, score in results)
num_examples = len(devset)
average_score = round(100 * total_score / num_examples, 2) if num_examples > 0 else 0
# Display table if requested
if self.display_table:
self._display_result_table(...) # Internal helper function
# Return results based on flags (return_all_scores, return_outputs)
# ... logic to construct return tuple ...
return average_score # Base return value
```
The core logic involves running the program and the metric function for each data point, handling potential errors, and averaging the results, with parallel processing to speed things up.
## Conclusion
You've now learned about `dspy.Evaluate`, the standard way to measure the performance of your DSPy programs!
* `Evaluate` acts as an **automated grader** for your DSPy modules.
* It requires three ingredients: your **program**, a **dataset (`devset`)** with gold labels, and a **metric function** to compare predictions against labels.
* It runs the program on the dataset, applies the metric, and reports the **average score**.
* It supports **parallel execution** for speed and offers options to display progress, show results tables, and return detailed outputs.
Knowing how well your program performs is essential. But what if the score isn't good enough? How can we *improve* the program, perhaps by automatically finding better prompts or few-shot examples?
That's precisely what **Teleprompters** (Optimizers) are designed for! Let's dive into how DSPy can help automatically optimize your programs next.
**Next:** [Chapter 8: Teleprompter / Optimizer](08_teleprompter___optimizer.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,314 @@
# Chapter 8: Teleprompter / Optimizer - Your Program's Coach
Welcome to Chapter 8! In [Chapter 7: Evaluate](07_evaluate.md), we learned how to grade our DSPy programs using metrics and datasets to see how well they perform. That's great for knowing our score, but what if the score isn't high enough?
Think about building our `BasicQA` program from the last chapter. Maybe we tried running it and found it only got 75% accuracy. How do we improve it?
Traditionally, we might start **manually tweaking prompts**:
* "Maybe I should rephrase the instructions?"
* "Should I add some examples (few-shot demonstrations)?"
* "Which examples work best?"
This manual process, often called "prompt engineering," can be slow, tedious, and requires a lot of guesswork. Wouldn't it be amazing if DSPy could **automatically figure out the best prompts and examples** for us?
That's exactly what **Teleprompters** (also called Optimizers) do! They are DSPy's built-in automated prompt engineers and program tuners.
Think of a Teleprompter as a **coach** for your DSPy program (the 'student'):
* The coach observes how the student performs on practice drills (a dataset).
* It uses feedback (a metric) to figure out weaknesses.
* It suggests new strategies (better instructions, better examples) to improve performance.
* It repeats this until the student performs much better!
In this chapter, we'll learn:
* What a Teleprompter is and the problem it solves.
* The key ingredients needed to use a Teleprompter.
* How to use a simple Teleprompter (`BootstrapFewShot`) to automatically find good few-shot examples.
* The basic idea behind how Teleprompters optimize programs.
Let's automate the improvement process!
## What is a Teleprompter / Optimizer?
A `Teleprompter` in DSPy is an algorithm that takes your DSPy [Program](01_module___program.md) (the 'student') and automatically tunes its internal parameters to maximize performance on a given task. These parameters are most often:
1. **Instructions:** The natural language guidance given to the Language Models ([LM](05_lm__language_model_client_.md)) within your program's modules (like `dspy.Predict`).
2. **Few-Shot Examples (Demos):** The `dspy.Example` objects provided in prompts to show the LM how to perform the task.
Some advanced Teleprompters can even fine-tune the weights of the LM itself!
To work its magic, a Teleprompter needs three things (sound familiar? They're similar to evaluation!):
1. **The Student Program:** The DSPy program you want to improve.
2. **A Training Dataset (`trainset`):** A list of `dspy.Example` objects ([Chapter 3: Example](03_example.md)) representing the task. The Teleprompter will use this data to practice and learn.
3. **A Metric Function (`metric`):** The same kind of function we used in [Chapter 7: Evaluate](07_evaluate.md). It tells the Teleprompter how well the student program is doing on each example in the `trainset`.
The Teleprompter uses the `metric` to guide its search for better instructions or demos, trying different combinations and keeping the ones that yield the highest score on the `trainset`. The output is an **optimized version of your student program**.
## Use Case: Automatically Finding Good Few-Shot Examples with `BootstrapFewShot`
Let's revisit our `BasicQA` program and the evaluation setup from Chapter 7.
```python
import dspy
from dspy.evaluate import Evaluate
# Assume LM is configured (e.g., dspy.settings.configure(lm=...))
# Our simple program
class BasicQA(dspy.Module):
def __init__(self):
super().__init__()
self.predictor = dspy.Predict('question -> answer')
def forward(self, question):
return self.predictor(question=question)
# Our metric from Chapter 7
def simple_exact_match_metric(gold, prediction, trace=None):
return prediction.answer.lower() == gold.answer.lower()
# Our dataset from Chapter 7 (let's use it as a trainset now)
dev_example1 = dspy.Example(question="What color is the sky?", answer="blue")
dev_example2 = dspy.Example(question="What is 2 + 2?", answer="4")
dev_example3 = dspy.Example(question="What is the capital of France?", answer="Paris")
# Example our program might struggle with initially
dev_example_hard = dspy.Example(question="Who painted the Mona Lisa?", answer="Leonardo da Vinci")
trainset = [dev_example1, dev_example2, dev_example3, dev_example_hard]
trainset = [d.with_inputs('question') for d in trainset]
# Let's evaluate the initial program (likely imperfect)
initial_program = BasicQA()
evaluator = Evaluate(devset=trainset, metric=simple_exact_match_metric, display_progress=False)
initial_score = evaluator(initial_program)
print(f"Initial Score (on trainset): {initial_score}%")
# Might output: Initial Score (on trainset): 75.0% (assuming it fails the last one)
```
Our initial program gets 75%. We could try adding few-shot examples manually, but which ones? And how many?
Let's use `dspy.teleprompt.BootstrapFewShot`. This Teleprompter automatically creates and selects few-shot demonstrations for the predictors in your program.
**1. Import the Teleprompter:**
```python
from dspy.teleprompt import BootstrapFewShot
```
**2. Instantiate the Teleprompter:**
We need to give it the `metric` function it should use to judge success. We can also specify how many candidate demos (`max_bootstrapped_demos`) it should try to find for each predictor.
```python
# Configure the BootstrapFewShot optimizer
# It will use the metric to find successful demonstrations
# max_bootstrapped_demos=4 means it will try to find up to 4 good examples for EACH predictor
config = dict(max_bootstrapped_demos=4, metric=simple_exact_match_metric)
teleprompter = BootstrapFewShot(**config)
```
**3. Compile the Program:**
This is the main step. We call the Teleprompter's `compile` method, giving it our initial `student` program and the `trainset`. It returns a *new*, optimized program.
```python
# Compile the program!
# This runs the optimization process using the trainset.
# It uses a 'teacher' model (often the student itself or a copy)
# to generate traces, finds successful ones via the metric,
# and adds them as demos to the student's predictors.
compiled_program = teleprompter.compile(student=initial_program, trainset=trainset)
# The 'compiled_program' is a new instance of BasicQA,
# but its internal predictor now has few-shot examples added!
```
**What just happened?**
Behind the scenes, `BootstrapFewShot` (conceptually):
* Used a "teacher" program (often a copy of the student or another specified LM configuration) to run each example in the `trainset`.
* For each example, it checked if the teacher's output was correct using our `simple_exact_match_metric`.
* If an example was processed correctly, the Teleprompter saved the input/output pair as a potential "demonstration" (a good example).
* It collected these successful demonstrations.
* It assigned a selection of these good demonstrations (`max_bootstrapped_demos`) to the `demos` attribute of the corresponding predictor inside our `compiled_program`.
**4. Evaluate the Compiled Program:**
Now, let's see if the optimized program performs better on the same `trainset`.
```python
# Evaluate the compiled program
compiled_score = evaluator(compiled_program)
print(f"Compiled Score (on trainset): {compiled_score}%")
# If the optimization worked, the score should be higher!
# Might output: Compiled Score (on trainset): 100.0%
```
If `BootstrapFewShot` found good examples (like the "Mona Lisa" one after the teacher model successfully answered it), the `compiled_program` now has these examples embedded in its prompts, helping the LM perform better on similar questions. We automated the process of finding effective few-shot examples!
## How Optimization Works (Conceptual)
Different Teleprompters use different strategies, but the core idea is usually:
1. **Goal:** Find program parameters (instructions, demos) that maximize the `metric` score on the `trainset`.
2. **Search Space:** The "space" of all possible instructions or combinations of demos.
3. **Search Strategy:** How the Teleprompter explores this space.
* `BootstrapFewShot`: Generates candidate demos based on successful teacher executions.
* Other optimizers (like `COPRO` or `MIPROv2` mentioned in the code snippets) might use an LM to *propose* new instructions, evaluate them, and iterate. Some use sophisticated search algorithms like Bayesian Optimization or random search.
4. **Evaluation:** Use the `metric` and `trainset` to score each candidate configuration (e.g., a program with specific demos or instructions).
5. **Selection:** Keep the configuration that resulted in the best score.
**Analogy Revisited:**
* **Coach:** The Teleprompter algorithm (`BootstrapFewShot`).
* **Student:** Your DSPy `Program` (`initial_program`).
* **Practice Drills:** The `trainset`.
* **Scoring:** The `metric` function (`simple_exact_match_metric`).
* **Trying Techniques:** Generating/selecting different demos or instructions.
* **Adopting Best Techniques:** Creating the `compiled_program` with the highest-scoring demos/instructions found.
## How It Works Under the Hood (`BootstrapFewShot` Peek)
Let's briefly look at the internal flow for `BootstrapFewShot.compile()`:
1. **Prepare Teacher:** It sets up a 'teacher' program. This is often a copy of the student program, sometimes configured with specific settings (like a higher temperature for more exploration) or potentially using labeled examples if provided (`LabeledFewShot` within `BootstrapFewShot`).
2. **Iterate Trainset:** It goes through each `example` in the `trainset`.
3. **Teacher Execution:** For each `example`, it runs the `teacher` program (`teacher(**example.inputs())`). This happens within a `dspy.settings.context` block to capture the execution `trace`.
4. **Metric Check:** It uses the provided `metric` to compare the `teacher`'s prediction against the `example`'s gold label (`metric(example, prediction, trace)`).
5. **Collect Demos:** If the `metric` returns success (e.g., `True` or a score above a threshold), the Teleprompter extracts the input/output steps from the execution `trace`. Each successful trace step can become a candidate `dspy.Example` demonstration.
6. **Assign Demos:** After iterating through the `trainset`, it takes the collected successful demonstrations (up to `max_bootstrapped_demos`) and assigns them to the `demos` attribute of the corresponding predictors in the `student` program instance.
7. **Return Compiled Student:** It returns the modified `student` program, which now contains the bootstrapped few-shot examples.
```mermaid
sequenceDiagram
participant User
participant Teleprompter as BootstrapFewShot
participant StudentProgram as Student Program
participant TeacherProgram as Teacher Program
participant LM as Language Model
participant Metric as Metric Function
participant CompiledProgram as Compiled Program (Student with Demos)
User->>Teleprompter: compile(student=StudentProgram, trainset=...)
Teleprompter->>TeacherProgram: Set up (copy of student, potentially modified)
loop For each example in trainset
Teleprompter->>TeacherProgram: Run example.inputs()
TeacherProgram->>LM: Make calls (via Predictors)
LM-->>TeacherProgram: Return predictions
TeacherProgram-->>Teleprompter: Return final prediction & trace
Teleprompter->>Metric: Evaluate(example, prediction, trace)
Metric-->>Teleprompter: Return score (success/fail)
alt Metric returns success
Teleprompter->>Teleprompter: Extract demo from trace
end
end
Teleprompter->>StudentProgram: Assign selected demos to predictors
StudentProgram-->>CompiledProgram: Create compiled version
Teleprompter-->>User: Return CompiledProgram
```
**Relevant Code Files:**
* `dspy/teleprompt/teleprompt.py`: Defines the base `Teleprompter` class.
* `dspy/teleprompt/bootstrap.py`: Contains the implementation for `BootstrapFewShot`. Key methods include `compile` (orchestrates the process) and `_bootstrap_one_example` (handles running the teacher and checking the metric for a single training example).
```python
# Simplified view from dspy/teleprompt/bootstrap.py
# ... imports ...
from .teleprompt import Teleprompter
from .vanilla import LabeledFewShot # Used for teacher setup if labeled demos are needed
import dspy
class BootstrapFewShot(Teleprompter):
def __init__(self, metric=None, max_bootstrapped_demos=4, ...):
self.metric = metric
self.max_bootstrapped_demos = max_bootstrapped_demos
# ... other initializations ...
def compile(self, student, *, teacher=None, trainset):
self.trainset = trainset
self._prepare_student_and_teacher(student, teacher) # Sets up self.student and self.teacher
self._prepare_predictor_mappings() # Links student predictors to teacher predictors
self._bootstrap() # Runs the core bootstrapping logic
self.student = self._train() # Assigns collected demos to the student
self.student._compiled = True
return self.student
def _bootstrap(self):
# ... setup ...
self.name2traces = {name: [] for name in self.name2predictor} # Store successful traces per predictor
for example_idx, example in enumerate(tqdm.tqdm(self.trainset)):
# ... logic to stop early if enough demos found ...
success = self._bootstrap_one_example(example, round_idx=0) # Try to get a demo from this example
# ... potentially multiple rounds ...
# ... logging ...
def _bootstrap_one_example(self, example, round_idx=0):
# ... setup teacher context (e.g., temperature) ...
try:
with dspy.settings.context(trace=[], **self.teacher_settings):
# Optionally modify teacher LM settings for exploration
# ...
# Run the teacher program
prediction = self.teacher(**example.inputs())
trace = dspy.settings.trace # Get the execution trace
# Evaluate the prediction using the metric
if self.metric:
metric_val = self.metric(example, prediction, trace)
# Determine success based on metric value/threshold
success = bool(metric_val) # Simplified
else:
success = True # Assume success if no metric provided
except Exception:
success = False
# ... error handling ...
if success:
# If successful, extract demos from the trace
for step in trace:
predictor, inputs, outputs = step
demo = dspy.Example(augmented=True, **inputs, **outputs)
try:
predictor_name = self.predictor2name[id(predictor)]
# Store the successful demo example
self.name2traces[predictor_name].append(demo)
except KeyError:
continue # Handle potential issues finding the predictor
return success
def _train(self):
# Assign the collected demos to the student's predictors
for name, predictor in self.student.named_predictors():
demos_for_predictor = self.name2traces[name][:self.max_bootstrapped_demos]
# Potentially mix with labeled demos if configured
# ...
predictor.demos = demos_for_predictor # Assign the demos!
return self.student
```
This simplified view shows the core loop: run the teacher, check the metric, collect successful traces as demos, and finally assign those demos to the student program.
## Conclusion
You've now learned about DSPy's **Teleprompters / Optimizers**, the powerful tools for automating prompt engineering!
* Teleprompters act like **coaches**, automatically tuning your DSPy programs (students).
* They optimize parameters like **instructions** and **few-shot examples (demos)**.
* They require a **student program**, a **training dataset**, and a **metric** function.
* We saw how `BootstrapFewShot` automatically finds effective few-shot examples by running a teacher model and collecting successful execution traces.
* The result of `teleprompter.compile()` is an **optimized program** instance, ready to be used or evaluated further.
Teleprompters save you from the tedious process of manual tuning, allowing you to build high-performing LM-based programs more efficiently.
Now that we understand how to build, evaluate, and automatically optimize DSPy programs, how can we make them interact smoothly with different data formats or models, especially when integrating with other systems? That's where **Adapters** come in.
**Next:** [Chapter 9: Adapter](09_adapter.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

362
output/DSPy/09_adapter.md Normal file
View File

@@ -0,0 +1,362 @@
# Chapter 9: Adapter - The Universal Translator
Welcome to Chapter 9! In [Chapter 8: Teleprompter / Optimizer](08_teleprompter___optimizer.md), we saw how DSPy can automatically optimize our programs by finding better prompts or few-shot examples. We ended up with a `compiled_program` that should perform better.
Now, this optimized program needs to communicate with a Language Model ([LM](05_lm__language_model_client_.md)) to actually do its work. But here's a potential challenge: different types of LMs expect different kinds of input!
* Older **Completion Models** (like GPT-3 `davinci`) expect a single, long text prompt.
* Newer **Chat Models** (like GPT-4, Claude 3, Llama 3 Chat) expect a structured list of messages, each with a role (like "system", "user", or "assistant").
Our DSPy program, using its [Signature](02_signature.md), defines the task in an abstract way (inputs, outputs, instructions). How does this abstract definition get translated into the specific format required by the LM we're using, especially these modern chat models?
That's where the **`Adapter`** comes in! It acts like a universal translator.
Think of it like this:
* Your DSPy program (using a `Signature`) has a message it wants to send to the LM.
* The LM speaks a specific language (e.g., "chat message list" language).
* The `Adapter` translates your program's message into the LM's language, handles the conversation, and translates the LM's reply back into a format your DSPy program understands.
In this chapter, you'll learn:
* What problem Adapters solve.
* What an `Adapter` does (formatting and parsing).
* How they allow your DSPy code to work with different LMs seamlessly.
* How they work behind the scenes (mostly automatically!).
Let's meet the translator!
## The Problem: Different LMs, Different Languages
Imagine you have a DSPy Signature for summarizing text:
```python
import dspy
class Summarize(dspy.Signature):
"""Summarize the given text."""
text = dspy.InputField(desc="The text to summarize.")
summary = dspy.OutputField(desc="A concise summary.")
```
And you use it in a `dspy.Predict` module:
```python
# Assume LM is configured (Chapter 5)
summarizer = dspy.Predict(Summarize)
long_text = "DSPy is a framework for programming foundation models..." # (imagine longer text)
result = summarizer(text=long_text)
# We expect result.summary to contain the summary
```
Now, if the configured LM is a **completion model**, the `summarizer` needs to create a single prompt like:
```text
Summarize the given text.
---
Follow the following format.
Text: ${text}
Summary: ${summary}
---
Text: DSPy is a framework for programming foundation models...
Summary:
```
But if the configured LM is a **chat model**, it needs a structured list of messages, perhaps like this:
```python
[
{"role": "system", "content": "Summarize the given text.\n\nFollow the following format.\n\nText: ${text}\nSummary: ${summary}"},
{"role": "user", "content": "Text: DSPy is a framework for programming foundation models...\nSummary:"}
]
```
*(Simplified - actual chat formatting can be more complex)*
How does `dspy.Predict` know which format to use? And how does it extract the `summary` from the potentially differently formatted responses? It doesn't! That's the job of the **Adapter**.
## What Does an Adapter Do?
An `Adapter` is a component that sits between your DSPy module (like `dspy.Predict`) and the [LM Client](05_lm__language_model_client_.md). Its main tasks are:
1. **Formatting:** It takes the abstract information from DSPy the [Signature](02_signature.md) (instructions, input/output fields), any few-shot `demos` ([Example](03_example.md)), and the current `inputs` and **formats** it into the specific structure the target LM expects (either a single string or a list of chat messages).
2. **Parsing:** After the LM generates its response (which is usually just raw text), the `Adapter` **parses** this text to extract the values for the output fields defined in the `Signature` (like extracting the generated `summary` text).
The most common adapter is the `dspy.adapters.ChatAdapter`, which is specifically designed to translate between the DSPy format and the message list format expected by chat models.
## Why Use Adapters? Flexibility!
The main benefit of using Adapters is **flexibility**.
* **Write Once, Run Anywhere:** Your core DSPy program logic (your `Module`s, `Program`s, and `Signature`s) remains the same regardless of whether you're using a completion LM or a chat LM.
* **Easy Switching:** You can switch the underlying [LM Client](05_lm__language_model_client_.md) (e.g., from OpenAI GPT-3 to Anthropic Claude 3) in `dspy.settings`, and the appropriate Adapter (usually the default `ChatAdapter`) handles the communication differences automatically.
* **Standard Interface:** Adapters ensure that modules like `dspy.Predict` have a consistent way to interact with LMs, hiding the complexities of different API formats.
## How Adapters Work: Format and Parse
Let's look conceptually at what the `ChatAdapter` does:
**1. Formatting (`format` method):**
Imagine calling our `summarizer` with one demo example:
```python
# Demo example
demo = dspy.Example(
text="Long article about cats.",
summary="Cats are popular pets."
).with_inputs("text")
# Call the summarizer with the demo
result = summarizer(text=long_text, demos=[demo])
```
The `ChatAdapter`'s `format` method might take the `Summarize` signature, the `demo`, and the `long_text` input and produce a list of messages like this:
```python
# Conceptual Output of ChatAdapter.format()
[
# 1. System message from Signature instructions
{"role": "system", "content": "Summarize the given text.\n\n---\n\nFollow the following format.\n\nText: ${text}\nSummary: ${summary}\n\n---\n\n"},
# 2. User turn for the demo input
{"role": "user", "content": "Text: Long article about cats.\nSummary:"},
# 3. Assistant turn for the demo output
{"role": "assistant", "content": "Summary: Cats are popular pets."}, # (Might use special markers like [[ ## Summary ## ]])
# 4. User turn for the actual input
{"role": "user", "content": "Text: DSPy is a framework for programming foundation models...\nSummary:"}
]
```
*(Note: `ChatAdapter` uses specific markers like `[[ ## field_name ## ]]` to clearly separate fields in the content, making parsing easier)*
This message list is then passed to the chat-based LM Client.
**2. Parsing (`parse` method):**
The chat LM responds, likely mimicking the format. Its response might be a string like:
```text
[[ ## summary ## ]]
DSPy helps build and optimize language model pipelines.
```
The `ChatAdapter`'s `parse` method takes this string. It looks for the markers (`[[ ## summary ## ]]`) defined by the `Summarize` signature's output fields. It extracts the content associated with each marker and returns a dictionary:
```python
# Conceptual Output of ChatAdapter.parse()
{
"summary": "DSPy helps build and optimize language model pipelines."
}
```
This dictionary is then packaged into the `dspy.Prediction` object (as `result.summary`) that your `summarizer` module returns.
## Using Adapters (It's Often Automatic!)
The good news is that you usually don't interact with Adapters directly. Modules like `dspy.Predict` are designed to use the currently configured adapter automatically.
DSPy sets a default adapter (usually `ChatAdapter`) in its global `dspy.settings`. When you configure your [LM Client](05_lm__language_model_client_.md) like this:
```python
import dspy
# Configure LM (Chapter 5)
# turbo = dspy.LM(model='openai/gpt-3.5-turbo')
# dspy.settings.configure(lm=turbo)
# Default Adapter (ChatAdapter) is usually active automatically!
# You typically DON'T need to configure it unless you want a different one.
# dspy.settings.configure(adapter=dspy.adapters.ChatAdapter())
```
Now, when you use `dspy.Predict` or other modules that call LMs, they will internally use `dspy.settings.adapter` (the `ChatAdapter` in this case) to handle the formatting and parsing needed to talk to the configured `dspy.settings.lm` (`turbo`).
```python
# The summarizer automatically uses the configured LM and Adapter
summarizer = dspy.Predict(Summarize)
result = summarizer(text=long_text) # Adapter works its magic here!
print(result.summary)
```
You write your DSPy code at a higher level of abstraction, and the Adapter handles the translation details for you.
## How It Works Under the Hood
Let's trace the flow when `summarizer(text=long_text)` is called, assuming a chat LM and the `ChatAdapter` are configured:
1. **`Predict.__call__`:** The `summarizer` (`dspy.Predict`) instance is called.
2. **Get Components:** It retrieves the `Signature` (`Summarize`), `demos`, `inputs` (`text`), the configured `LM` client, and the configured `Adapter` (e.g., `ChatAdapter`) from `dspy.settings`.
3. **`Adapter.__call__`:** `Predict` calls the `Adapter` instance, passing it the LM, signature, demos, and inputs.
4. **`Adapter.format`:** The `Adapter`'s `__call__` method first calls its own `format` method. `ChatAdapter.format` generates the list of chat messages (system prompt, demo turns, final user turn).
5. **`LM.__call__`:** The `Adapter`'s `__call__` method then passes the formatted messages to the `LM` client instance (e.g., `turbo(messages=...)`).
6. **API Call:** The `LM` client sends the messages to the actual LM API (e.g., OpenAI API).
7. **API Response:** The LM API returns the generated completion text (e.g., `[[ ## summary ## ]]\nDSPy helps...`).
8. **`LM.__call__` Returns:** The `LM` client returns the raw completion string(s) back to the `Adapter`.
9. **`Adapter.parse`:** The `Adapter`'s `__call__` method calls its own `parse` method with the completion string. `ChatAdapter.parse` extracts the content based on the `[[ ## ... ## ]]` markers and the `Signature`'s output fields.
10. **`Adapter.__call__` Returns:** The `Adapter` returns a list of dictionaries, each representing a parsed completion (e.g., `[{'summary': 'DSPy helps...'}]`).
11. **`Predict.__call__` Returns:** `Predict` packages these parsed dictionaries into `dspy.Prediction` objects and returns the result.
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant User
participant PredictMod as dspy.Predict (summarizer)
participant Adapter as Adapter (e.g., ChatAdapter)
participant LMClient as LM Client (e.g., turbo)
participant LMApi as Actual LM API
User->>PredictMod: Call summarizer(text=...)
PredictMod->>Adapter: __call__(lm=LMClient, signature, demos, inputs)
Adapter->>Adapter: format(signature, demos, inputs)
Adapter-->>Adapter: Return formatted_messages (list)
Adapter->>LMClient: __call__(messages=formatted_messages)
LMClient->>LMApi: Send API Request
LMApi-->>LMClient: Return raw_completion_text
LMClient-->>Adapter: Return raw_completion_text
Adapter->>Adapter: parse(signature, raw_completion_text)
Adapter-->>Adapter: Return parsed_output (dict)
Adapter-->>PredictMod: Return list[parsed_output]
PredictMod->>PredictMod: Create Prediction object(s)
PredictMod-->>User: Return Prediction object(s)
```
**Relevant Code Files:**
* `dspy/adapters/base.py`: Defines the abstract `Adapter` class.
* Requires subclasses to implement `format` and `parse`.
* The `__call__` method orchestrates the format -> LM call -> parse sequence.
* `dspy/adapters/chat_adapter.py`: Defines `ChatAdapter`, the default implementation.
* `format`: Implements logic to create the system/user/assistant message list, using `[[ ## ... ## ]]` markers. Includes helper functions like `format_turn` and `prepare_instructions`.
* `parse`: Implements logic to find the `[[ ## ... ## ]]` markers in the LM's output string and extract the corresponding values.
* `dspy/predict/predict.py`: The `Predict` module's `forward` method retrieves the adapter from `dspy.settings` and calls it.
```python
# Simplified view from dspy/adapters/base.py
from abc import ABC, abstractmethod
# ... other imports ...
class Adapter(ABC):
# ... init ...
# The main orchestration method
def __call__(
self,
lm: "LM",
lm_kwargs: dict[str, Any],
signature: Type[Signature],
demos: list[dict[str, Any]],
inputs: dict[str, Any],
) -> list[dict[str, Any]]:
# 1. Format the inputs for the LM
# Returns either a string or list[dict] (for chat)
formatted_input = self.format(signature, demos, inputs)
# Prepare arguments for the LM call
lm_call_args = dict(prompt=formatted_input) if isinstance(formatted_input, str) else dict(messages=formatted_input)
# 2. Call the Language Model Client
outputs = lm(**lm_call_args, **lm_kwargs) # Returns list of strings or dicts
# 3. Parse the LM outputs
parsed_values = []
for output in outputs:
# Extract raw text (simplified)
raw_text = output if isinstance(output, str) else output["text"]
# Parse the raw text based on the signature
value = self.parse(signature, raw_text)
# Validate fields (simplified)
# ...
parsed_values.append(value)
return parsed_values
@abstractmethod
def format(self, signature, demos, inputs) -> list[dict[str, Any]] | str:
# Subclasses must implement this to format input for the LM
raise NotImplementedError
@abstractmethod
def parse(self, signature: Type[Signature], completion: str) -> dict[str, Any]:
# Subclasses must implement this to parse the LM's output string
raise NotImplementedError
# ... other helper methods (format_fields, format_turn, etc.) ...
# Simplified view from dspy/adapters/chat_adapter.py
# ... imports ...
import re
field_header_pattern = re.compile(r"\[\[ ## (\w+) ## \]\]") # Matches [[ ## field_name ## ]]
class ChatAdapter(Adapter):
# ... init ...
def format(self, signature, demos, inputs) -> list[dict[str, Any]]:
messages = []
# 1. Create system message from signature instructions
# (Uses helper `prepare_instructions`)
prepared_instructions = prepare_instructions(signature)
messages.append({"role": "system", "content": prepared_instructions})
# 2. Format demos into user/assistant turns
# (Uses helper `format_turn`)
for demo in demos:
messages.append(self.format_turn(signature, demo, role="user"))
messages.append(self.format_turn(signature, demo, role="assistant"))
# 3. Format final input into a user turn
# (Handles chat history if present, uses `format_turn`)
# ... logic for chat history or simple input ...
messages.append(self.format_turn(signature, inputs, role="user"))
# Expand image tags if needed
messages = try_expand_image_tags(messages)
return messages
def parse(self, signature: Type[Signature], completion: str) -> dict[str, Any]:
# Logic to split completion string by [[ ## field_name ## ]] markers
# Finds matches using `field_header_pattern`
sections = self._split_completion_by_markers(completion)
fields = {}
for field_name, field_content in sections:
if field_name in signature.output_fields:
try:
# Use helper `parse_value` to cast string to correct type
fields[field_name] = parse_value(field_content, signature.output_fields[field_name].annotation)
except Exception as e:
# Handle parsing errors
# ...
pass
# Check if all expected output fields were found
# ...
return fields
# ... helper methods: format_turn, format_fields, _split_completion_by_markers ...
```
The key takeaway is that `Adapter` subclasses provide concrete implementations for `format` (DSPy -> LM format) and `parse` (LM output -> DSPy format), enabling smooth communication.
## Conclusion
You've now met the **`Adapter`**, DSPy's universal translator!
* Adapters solve the problem of **different LMs expecting different input formats** (e.g., completion prompts vs. chat messages).
* They act as a bridge, **formatting** DSPy's abstract [Signature](02_signature.md), demos, and inputs into the LM-specific format, and **parsing** the LM's raw output back into structured DSPy data.
* The primary benefit is **flexibility**, allowing you to use the same DSPy program with various LM types without changing your core logic.
* Adapters like `ChatAdapter` usually work **automatically** behind the scenes, configured via `dspy.settings`.
With Adapters handling the translation, LM Clients providing the connection, and RMs fetching knowledge, we have a powerful toolkit. But how do we manage all these configurations globally? That's the role of `dspy.settings`.
**Next:** [Chapter 10: Settings](10_settings.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

367
output/DSPy/10_settings.md Normal file
View File

@@ -0,0 +1,367 @@
# Chapter 10: Settings - Your Program's Control Panel
Welcome to the final chapter of our introductory DSPy tutorial! In [Chapter 9: Adapter](09_adapter.md), we saw how Adapters act as translators, allowing our DSPy programs to communicate seamlessly with different types of Language Models (LMs).
Throughout the previous chapters, we've seen snippets like `dspy.settings.configure(lm=...)` and `dspy.settings.configure(rm=...)`. We mentioned that modules like `dspy.Predict` or `dspy.Retrieve` automatically find and use these configured components. But how does this central configuration work? How do we manage these important defaults for our entire project?
That's where **`dspy.settings`** comes in! It's the central control panel for your DSPy project.
Think of `dspy.settings` like the **Defaults menu** in a software application:
* You set your preferred font, theme, or language once in the settings.
* The entire application then uses these defaults unless you specifically choose something different for a particular document or window.
`dspy.settings` does the same for your DSPy programs. It holds the default [LM (Language Model Client)](05_lm__language_model_client_.md), [RM (Retrieval Model Client)](06_rm__retrieval_model_client_.md), and [Adapter](09_adapter.md) that your modules will use.
In this chapter, you'll learn:
* Why a central settings object is useful.
* How to configure global defaults using `dspy.settings.configure`.
* How modules automatically use these settings.
* How to temporarily override settings for specific parts of your code using `dspy.context`.
Let's learn how to manage our program's defaults!
## Why Use `dspy.settings`?
Imagine building a complex DSPy [Program](01_module___program.md) with many sub-modules that need to call an LM or an RM. Without a central settings object, you might have to pass the LM and RM instances explicitly to every single module during initialization or when calling them. This would be tedious and make your code harder to manage.
```python
# --- WITHOUT dspy.settings (Conceptual - DON'T DO THIS) ---
import dspy
# Assume lm_instance and rm_instance are created somewhere
class GenerateSearchQuery(dspy.Module):
def __init__(self, lm): # Needs LM passed in
self.predictor = dspy.Predict('question -> query', lm=lm) # Pass LM to Predict
# ... forward ...
class RetrieveContext(dspy.Module):
def __init__(self, rm): # Needs RM passed in
self.retriever = dspy.Retrieve(rm=rm, k=3) # Pass RM to Retrieve
# ... forward ...
# ... other modules needing lm or rm ...
class ComplexRAG(dspy.Module):
def __init__(self, lm, rm): # Needs LM and RM passed in
self.gen_query = GenerateSearchQuery(lm=lm) # Pass LM down
self.retrieve = RetrieveContext(rm=rm) # Pass RM down
# ... other sub-modules needing lm or rm ...
def forward(self, question, lm=None, rm=None): # Maybe pass them here too? Messy!
# ... use sub-modules ...
```
This gets complicated quickly!
`dspy.settings` solves this by providing a single, global place to store these configurations. You configure it once, and all DSPy modules can access the defaults they need automatically.
## Configuring Global Defaults
The primary way to set defaults is using the `dspy.settings.configure` method. You typically do this once near the beginning of your script or application.
Let's set up a default LM and RM:
```python
import dspy
# 1. Create your LM and RM instances (as seen in Chapters 5 & 6)
# Example using OpenAI and a dummy RM
try:
# Assumes OPENAI_API_KEY is set
turbo = dspy.LM(model='openai/gpt-3.5-turbo-instruct', max_tokens=100)
except ImportError:
print("Note: dspy[openai] not installed. Using dummy LM.")
# Define a dummy LM if OpenAI isn't available
class DummyLM(dspy.LM):
def __init__(self): super().__init__(model="dummy")
def basic_request(self, prompt, **kwargs): return {"choices": [{"text": "Dummy LM Response"}]}
def __call__(self, prompt, **kwargs): return ["Dummy LM Response"]
turbo = DummyLM()
# Dummy RM for demonstration
class DummyRM(dspy.Retrieve):
def __init__(self, k=3): super().__init__(k=k)
def forward(self, query, k=None):
k = k if k is not None else self.k
return dspy.Prediction(passages=[f"Dummy passage {i+1} for '{query}'" for i in range(k)])
my_rm = DummyRM(k=3)
# 2. Configure dspy.settings with these instances
dspy.settings.configure(lm=turbo, rm=my_rm)
# That's it! Defaults are now set globally.
print(f"Default LM: {dspy.settings.lm}")
print(f"Default RM: {dspy.settings.rm}")
```
**Output (example):**
```text
Default LM: LM(model='openai/gpt-3.5-turbo-instruct', temperature=0.0, max_tokens=100, ...) # Or DummyLM
Default RM: Retrieve(k=3) # Or DummyRM
```
Now, any `dspy.Predict`, `dspy.ChainOfThought`, or `dspy.Retrieve` module created *after* this configuration will automatically use `turbo` as the LM and `my_rm` as the RM, unless told otherwise explicitly.
## How Modules Use the Settings
Modules like `dspy.Predict` and `dspy.Retrieve` are designed to look for their required components (LM or RM) in `dspy.settings` if they aren't provided directly.
Consider `dspy.Predict`:
```python
import dspy
# Assume settings were configured as above
# Create a Predict module WITHOUT passing 'lm' explicitly
simple_predictor = dspy.Predict('input -> output')
# When we call it, it will automatically use dspy.settings.lm
result = simple_predictor(input="Tell me a fact.")
print(result.output)
```
**Output (using DummyLM):**
```text
Dummy LM Response
```
Inside its `forward` method, `dspy.Predict` essentially does this (simplified):
```python
# Simplified internal logic of dspy.Predict.forward()
def forward(self, **kwargs):
# ... get signature, demos, config ...
# Get the LM: Use 'lm' passed in kwargs, OR self.lm (if set), OR dspy.settings.lm
lm_to_use = kwargs.pop("lm", self.lm) or dspy.settings.lm
assert lm_to_use is not None, "No LM configured!"
# ... format prompt using signature/demos/inputs ...
# ... call lm_to_use(prompt, ...) ...
# ... parse output ...
# ... return Prediction ...
```
Similarly, `dspy.Retrieve` looks for `dspy.settings.rm`:
```python
import dspy
# Assume settings were configured as above
# Create a Retrieve module WITHOUT passing 'rm' explicitly
retriever = dspy.Retrieve() # Uses default k=3 from DummyRM initialization
# When called, it uses dspy.settings.rm
results = retriever(query="DSPy benefits")
print(results.passages)
```
**Output (using DummyRM):**
```text
["Dummy passage 1 for 'DSPy benefits'", "Dummy passage 2 for 'DSPy benefits'", "Dummy passage 3 for 'DSPy benefits'"]
```
This automatic lookup makes your program code much cleaner, as you don't need to thread the `lm` and `rm` objects through every part of your application.
## Temporary Overrides with `dspy.context`
Sometimes, you might want to use a *different* LM or RM for just a specific part of your code, without changing the global default. For example, maybe you want to use a more powerful (and expensive) LM like GPT-4 for a critical reasoning step, while using a cheaper LM like GPT-3.5 for the rest of the program.
You can achieve this using the `dspy.settings.context` context manager. Changes made inside a `with dspy.settings.context(...)` block are **thread-local** and only last until the block exits.
```python
import dspy
# Assume global settings have 'turbo' (GPT-3.5 or Dummy) as the LM
# dspy.settings.configure(lm=turbo, rm=my_rm)
print(f"Outside context: {dspy.settings.lm}")
# Let's create a more powerful (dummy) LM for demonstration
class DummyGPT4(dspy.LM):
def __init__(self): super().__init__(model="dummy-gpt4")
def basic_request(self, prompt, **kwargs): return {"choices": [{"text": "GPT-4 Dummy Response"}]}
def __call__(self, prompt, **kwargs): return ["GPT-4 Dummy Response"]
gpt4_dummy = DummyGPT4()
# Use dspy.context to temporarily switch the LM
with dspy.settings.context(lm=gpt4_dummy, rm=None): # Temporarily set lm, unset rm
print(f"Inside context: {dspy.settings.lm}")
print(f"Inside context (RM): {dspy.settings.rm}")
# Modules used inside this block will use the temporary settings
predictor_in_context = dspy.Predict('input -> output')
result_in_context = predictor_in_context(input="Complex reasoning task")
print(f"Prediction in context: {result_in_context.output}")
# Trying to use RM here would fail as it's None in this context
# retriever_in_context = dspy.Retrieve()
# retriever_in_context(query="something") # This would raise an error
# Settings revert back automatically outside the block
print(f"Outside context again: {dspy.settings.lm}")
print(f"Outside context again (RM): {dspy.settings.rm}")
```
**Output (example):**
```text
Outside context: LM(model='openai/gpt-3.5-turbo-instruct', ...) # Or DummyLM
Inside context: LM(model='dummy-gpt4', ...)
Inside context (RM): None
Prediction in context: GPT-4 Dummy Response
Outside context again: LM(model='openai/gpt-3.5-turbo-instruct', ...) # Or DummyLM
Outside context again (RM): Retrieve(k=3) # Or DummyRM
```
Inside the `with` block, `dspy.settings.lm` temporarily pointed to `gpt4_dummy`, and `dspy.settings.rm` was temporarily `None`. The `predictor_in_context` used the temporary LM. Once the block ended, the settings automatically reverted to the global defaults.
This is crucial for writing clean code where different parts might need different configurations, and also essential for how DSPy's optimizers ([Chapter 8: Teleprompter / Optimizer](08_teleprompter___optimizer.md)) work internally to manage different model configurations during optimization.
## How It Works Under the Hood
`dspy.settings` uses a combination of global variables and thread-local storage to manage configurations.
1. **Global Defaults:** There's a primary configuration dictionary (`main_thread_config`) that holds the settings configured by `dspy.settings.configure()`.
2. **Ownership:** To prevent race conditions in multi-threaded applications, only the *first* thread that calls `configure` becomes the "owner" and is allowed to make further global changes using `configure`.
3. **Thread-Local Overrides:** `dspy.settings.context()` uses Python's `threading.local` storage. When you enter a `with dspy.settings.context(...)` block, it stores the specified overrides (`lm=gpt4_dummy`, etc.) in a place specific to the current thread.
4. **Attribute Access:** When code accesses `dspy.settings.lm`, the `Settings` object first checks if there's an override for `lm` in the current thread's local storage.
* If yes, it returns the thread-local override.
* If no, it returns the value from the global `main_thread_config`.
5. **Context Exit:** When the `with` block finishes, the `context` manager restores the thread-local storage to its state *before* the block was entered, effectively removing the temporary overrides for that thread.
**Sequence Diagram: Module Accessing Settings**
```mermaid
sequenceDiagram
participant User
participant Module as Your Module (e.g., Predict)
participant Settings as dspy.settings
participant ThreadLocalStorage as Thread-Local Storage
participant GlobalConfig as Global Defaults
User->>Module: Call module(input=...)
Module->>Settings: Get configured lm (`settings.lm`)
Settings->>ThreadLocalStorage: Check for 'lm' override?
alt Override Exists
ThreadLocalStorage-->>Settings: Return thread-local lm
Settings-->>Module: Return thread-local lm
else No Override
ThreadLocalStorage-->>Settings: No override found
Settings->>GlobalConfig: Get global 'lm'
GlobalConfig-->>Settings: Return global lm
Settings-->>Module: Return global lm
end
Module->>Module: Use the returned lm for processing...
Module-->>User: Return result
```
This mechanism ensures that global settings are the default, but thread-specific overrides via `dspy.context` take precedence when active, providing both convenience and flexibility.
**Relevant Code Files:**
* `dspy/dsp/utils/settings.py`: Defines the `Settings` class, the `DEFAULT_CONFIG`, manages global state (`main_thread_config`, `config_owner_thread_id`), uses `threading.local` for overrides, and implements the `configure` method and the `context` context manager.
```python
# Simplified view from dspy/dsp/utils/settings.py
import copy
import threading
from contextlib import contextmanager
# from dspy.dsp.utils.utils import dotdict # Simplified as dict
DEFAULT_CONFIG = dict(lm=None, rm=None, adapter=None, ...) # Default values
# Global state
main_thread_config = copy.deepcopy(DEFAULT_CONFIG)
config_owner_thread_id = None
global_lock = threading.Lock()
# Thread-local storage for overrides
class ThreadLocalOverrides(threading.local):
def __init__(self):
self.overrides = {}
thread_local_overrides = ThreadLocalOverrides()
class Settings:
_instance = None
def __new__(cls): # Singleton pattern
if cls._instance is None: cls._instance = super().__new__(cls)
return cls._instance
# When you access settings.lm or settings['lm']
def __getattr__(self, name):
# Check thread-local overrides first
overrides = getattr(thread_local_overrides, "overrides", {})
if name in overrides: return overrides[name]
# Fall back to global config
elif name in main_thread_config: return main_thread_config[name]
else: raise AttributeError(f"'Settings' object has no attribute '{name}'")
def __getitem__(self, key): return self.__getattr__(key)
# dspy.settings.configure(...)
def configure(self, **kwargs):
global main_thread_config, config_owner_thread_id
current_thread_id = threading.get_ident()
with global_lock: # Ensure thread safety for configuration
if config_owner_thread_id is None: config_owner_thread_id = current_thread_id
elif config_owner_thread_id != current_thread_id:
raise RuntimeError("dspy.settings can only be changed by the thread that initially configured it.")
# Update global config
for k, v in kwargs.items(): main_thread_config[k] = v
# with dspy.settings.context(...)
@contextmanager
def context(self, **kwargs):
# Save current overrides
original_overrides = getattr(thread_local_overrides, "overrides", {}).copy()
# Create new overrides for this context (combining global + old local + new)
new_overrides = {**main_thread_config, **original_overrides, **kwargs}
# Apply new overrides to thread-local storage
thread_local_overrides.overrides = new_overrides
try:
yield # Code inside the 'with' block runs here
finally:
# Restore original overrides when exiting the block
thread_local_overrides.overrides = original_overrides
# The global instance you use
settings = Settings()
```
This structure elegantly handles both global defaults and safe, temporary, thread-specific overrides.
## Conclusion
Congratulations! You've reached the end of this introductory DSPy tutorial and learned about `dspy.settings`, the central control panel.
* `dspy.settings` holds **global default configurations** like the [LM](05_lm__language_model_client_.md), [RM](06_rm__retrieval_model_client_.md), and [Adapter](09_adapter.md).
* You configure it **once** using `dspy.settings.configure(lm=..., rm=...)`.
* DSPy modules like `dspy.Predict` and `dspy.Retrieve` automatically **use these defaults**, simplifying your code.
* `dspy.context` allows for **temporary, thread-local overrides**, providing flexibility without affecting the global state.
By mastering these 10 chapters, you've gained a solid foundation in the core concepts of DSPy:
1. Structuring programs with [Modules and Programs](01_module___program.md).
2. Defining tasks with [Signatures](02_signature.md).
3. Representing data with [Examples](03_example.md).
4. Making basic LM calls with [Predict](04_predict.md).
5. Connecting to AI brains with [LM Clients](05_lm__language_model_client_.md).
6. Accessing external knowledge with [RM Clients](06_rm__retrieval_model_client_.md).
7. Measuring performance with [Evaluate](07_evaluate.md).
8. Automating optimization with [Teleprompters](08_teleprompter___optimizer.md).
9. Ensuring compatibility with [Adapters](09_adapter.md).
10. Managing configuration with [Settings](10_settings.md).
You're now equipped to start building, evaluating, and optimizing your own sophisticated language model pipelines with DSPy. Happy programming!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

58
output/DSPy/index.md Normal file
View File

@@ -0,0 +1,58 @@
# Tutorial: DSPy
DSPy helps you build and optimize *programs* that use **Language Models (LMs)** and **Retrieval Models (RMs)**.
Think of it like composing Lego bricks (**Modules**) where each brick performs a specific task (like generating text or retrieving information).
**Signatures** define what each Module does (its inputs and outputs), and **Teleprompters** automatically tune these modules (like optimizing prompts or examples) to get the best performance on your data.
**Source Repository:** [https://github.com/stanfordnlp/dspy/tree/7cdfe988e6404289b896d946d957f17bb4d9129b/dspy](https://github.com/stanfordnlp/dspy/tree/7cdfe988e6404289b896d946d957f17bb4d9129b/dspy)
```mermaid
flowchart TD
A0["Module / Program"]
A1["Signature"]
A2["Predict"]
A3["LM (Language Model Client)"]
A4["RM (Retrieval Model Client)"]
A5["Teleprompter / Optimizer"]
A6["Example"]
A7["Evaluate"]
A8["Adapter"]
A9["Settings"]
A0 -- "Contains / Composes" --> A0
A0 -- "Uses (via Retrieve)" --> A4
A1 -- "Defines structure for" --> A6
A2 -- "Implements" --> A1
A2 -- "Calls" --> A3
A2 -- "Uses demos from" --> A6
A2 -- "Formats prompts using" --> A8
A5 -- "Optimizes" --> A0
A5 -- "Fine-tunes" --> A3
A5 -- "Uses training data from" --> A6
A5 -- "Uses metric from" --> A7
A7 -- "Tests" --> A0
A7 -- "Evaluates on dataset of" --> A6
A8 -- "Translates" --> A1
A8 -- "Formats demos from" --> A6
A9 -- "Configures default" --> A3
A9 -- "Configures default" --> A4
A9 -- "Configures default" --> A8
```
## Chapters
1. [Module / Program](01_module___program.md)
2. [Signature](02_signature.md)
3. [Example](03_example.md)
4. [Predict](04_predict.md)
5. [LM (Language Model Client)](05_lm__language_model_client_.md)
6. [RM (Retrieval Model Client)](06_rm__retrieval_model_client_.md)
7. [Evaluate](07_evaluate.md)
8. [Teleprompter / Optimizer](08_teleprompter___optimizer.md)
9. [Adapter](09_adapter.md)
10. [Settings](10_settings.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,226 @@
# Chapter 1: FastAPI Application & Routing
Welcome to your first adventure with FastAPI! 👋
Imagine you want to build a small website or an API (Application Programming Interface) - a way for computers to talk to each other. How do you tell your program, "When someone visits this specific web address, run this specific piece of Python code"? That's where FastAPI comes in!
**Our Goal Today:** We'll build the simplest possible web application. When you visit the main page in your web browser, it will just say "Hello, World!". This tiny example will teach us the absolute basics of FastAPI.
## What Problem Does This Solve?
Think about a big airport. There's a central control tower that manages all the planes landing and taking off. It knows which runway corresponds to which flight number.
In the world of web applications, the `FastAPI` application object is like that **control tower**. It's the central piece of your project. You need a way to tell this control tower: "Hey, if a request comes in for the main web address (`/`) using the `GET` method (which browsers use when you just visit a page), please run *this* specific Python function."
This process of connecting URLs (web addresses) and HTTP methods (like `GET`, `POST`) to your Python functions is called **Routing**. FastAPI makes this super easy and efficient.
## Your First FastAPI Application
Let's start with the absolute minimum code needed.
1. **Create a file:** Make a file named `main.py`.
2. **Write the code:**
```python
# main.py
from fastapi import FastAPI
# Create the main FastAPI application object
# Think of this as initializing the 'control tower'
app = FastAPI()
# Define a 'route'
# This tells FastAPI: If someone sends a GET request to '/', run the function below
@app.get("/")
async def read_root():
# This function will be executed for requests to '/'
# It returns a simple Python dictionary
return {"message": "Hello World"}
```
**Explanation:**
* `from fastapi import FastAPI`: We import the main `FastAPI` class. This class provides all the core functionality.
* `app = FastAPI()`: We create an *instance* of the `FastAPI` class. By convention, we call this instance `app`. This `app` variable is our central control tower.
* `@app.get("/")`: This is a Python **decorator**. It modifies the function defined right below it. Specifically, `@app.get(...)` tells FastAPI that the function `read_root` should handle incoming web requests that:
* Use the `GET` HTTP method. This is the most common method, used by your browser when you type a URL.
* Are for the path `/`. This is the "root" path, the main address of your site (like `http://www.example.com/`).
* `async def read_root(): ...`: This is the Python function that will actually run when someone accesses `/`.
* `async def`: This declares an "asynchronous" function. FastAPI is built for high performance using `asyncio`. Don't worry too much about `async` right now; just know that you'll often use `async def` for your route functions.
* `return {"message": "Hello World"}`: The function returns a standard Python dictionary. FastAPI is smart enough to automatically convert this dictionary into JSON format, which is the standard way APIs send data over the web.
## Running Your Application
Okay, we have the code, but how do we actually *run* it so we can see "Hello, World!" in our browser? We need a web server. FastAPI applications are served by ASGI servers like **Uvicorn**.
1. **Install necessary libraries:**
Open your terminal or command prompt and run:
```bash
pip install fastapi uvicorn[standard]
```
This installs FastAPI itself and Uvicorn with helpful extras.
2. **Run the server:**
In the same directory where you saved `main.py`, run this command in your terminal:
```bash
uvicorn main:app --reload
```
**Explanation of the command:**
* `uvicorn`: This calls the Uvicorn server program.
* `main:app`: This tells Uvicorn where to find your FastAPI application.
* `main`: Refers to the Python file `main.py`.
* `app`: Refers to the object named `app` you created inside `main.py` (`app = FastAPI()`).
* `--reload`: This is super helpful during development! It tells Uvicorn to automatically restart your server whenever you save changes to your `main.py` file.
You should see output similar to this in your terminal:
```bash
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [xxxxx] using StatReload
INFO: Started server process [xxxxx]
INFO: Waiting for application startup.
INFO: Application startup complete.
```
Now, open your web browser and go to `http://127.0.0.1:8000`.
**Result:** You should see this JSON response in your browser:
```json
{"message":"Hello World"}
```
Congratulations! You've just created and run your first FastAPI application! 🎉
## Organizing Your Routes with `APIRouter`
Our "Hello World" example is tiny. Real applications have many different routes (like `/users/`, `/items/`, `/orders/`, etc.). Putting *all* of them in the single `main.py` file using `@app.get(...)`, `@app.post(...)` would quickly become messy and hard to manage.
Imagine our airport analogy again. Instead of one giant control tower managing *everything*, large airports have different terminals (Terminal A for domestic flights, Terminal B for international, etc.) to organize things.
FastAPI provides `APIRouter` for this exact purpose. Think of `APIRouter` as creating a **mini-application** or a **chapter** for your routes. You can group related routes together in separate files using `APIRouter`, and then "include" these routers into your main `app`.
**Let's organize!**
1. **Create a new file:** Let's say we want to manage routes related to "items". Create a file named `routers/items.py`. (You might need to create the `routers` directory first).
2. **Write the router code:**
```python
# routers/items.py
from fastapi import APIRouter
# Create an APIRouter instance
# This is like a mini-FastAPI app for item-related routes
router = APIRouter()
# Define a route on the router, not the main app
@router.get("/items/")
async def read_items():
# A simple example returning a list of items
return [{"name": "Item Foo"}, {"name": "Item Bar"}]
@router.get("/items/{item_id}")
async def read_item(item_id: str):
# We'll learn about path parameters like {item_id} later!
# See [Path Operations & Parameter Declaration](02_path_operations___parameter_declaration.md)
return {"item_id": item_id, "name": f"Item {item_id}"}
```
**Explanation:**
* `from fastapi import APIRouter`: We import `APIRouter`.
* `router = APIRouter()`: We create an instance of `APIRouter`.
* `@router.get("/items/")`: Notice we use `@router.get` instead of `@app.get`. We are defining this route *on the router*.
3. **Modify `main.py` to include the router:**
```python
# main.py
from fastapi import FastAPI
from routers import items # Import the items router
# Create the main FastAPI application
app = FastAPI()
# Include the router from the items module
# All routes defined in items.router will now be part of the main app
app.include_router(items.router)
# You can still define routes directly on the app if needed
@app.get("/")
async def read_root():
return {"message": "Hello Main App!"}
```
**Explanation:**
* `from routers import items`: We import the `items` module (which contains our `items.py` file).
* `app.include_router(items.router)`: This is the crucial line! It tells the main `app` to incorporate all the routes defined in `items.router`. Now, requests to `/items/` and `/items/{item_id}` will be handled correctly.
Now, if you run `uvicorn main:app --reload` again:
* Visiting `http://127.0.0.1:8000/` still shows `{"message":"Hello Main App!"}`.
* Visiting `http://127.0.0.1:8000/items/` will show `[{"name":"Item Foo"},{"name":"Item Bar"}]`.
* Visiting `http://127.0.0.1:8000/items/abc` will show `{"item_id":"abc","name":"Item abc"}`. (We'll cover `{item_id}` properly in the [next chapter](02_path_operations___parameter_declaration.md)).
Using `APIRouter` helps keep your project organized as it grows!
## How it Works Under the Hood (Simplified)
What actually happens when you visit `http://127.0.0.1:8000/`?
1. **Browser Request:** Your browser sends an HTTP `GET` request to the address `127.0.0.1` on port `8000`, asking for the path `/`.
2. **Uvicorn Receives:** The Uvicorn server is listening on that address and port. It receives the raw request.
3. **Uvicorn to FastAPI:** Uvicorn understands the ASGI standard, which is how it communicates with FastAPI. It passes the request details (method=`GET`, path=`/`, headers, etc.) to your `FastAPI` `app` instance.
4. **FastAPI Routing:** Your `FastAPI` application (`app`) looks at its internal list of routes. This list was built when you used decorators like `@app.get("/")` or included routers like `app.include_router(items.router)`.
5. **Match Found:** FastAPI finds a route that matches:
* HTTP Method: `GET`
* Path: `/`
It sees that this route is connected to your `read_root` function.
6. **Function Execution:** FastAPI calls your `async def read_root()` function.
7. **Function Returns:** Your function runs and returns the Python dictionary `{"message": "Hello World"}`.
8. **Response Processing:** FastAPI takes the returned dictionary. Because the route didn't specify a different response type, FastAPI automatically converts the dictionary into a JSON string. It also creates the necessary HTTP headers (like `Content-Type: application/json`).
9. **FastAPI to Uvicorn:** FastAPI sends the complete HTTP response (status code 200 OK, headers, JSON body) back to Uvicorn.
10. **Uvicorn to Browser:** Uvicorn sends the response over the network back to your browser.
11. **Browser Displays:** Your browser receives the response, sees it's JSON, and displays it.
Here's a diagram showing the flow:
```mermaid
sequenceDiagram
participant User Browser
participant ASGI Server (Uvicorn)
participant FastAPI App
participant Route Handler (read_root)
User Browser->>+ASGI Server (Uvicorn): GET / HTTP/1.1
ASGI Server (Uvicorn)->>+FastAPI App: Pass Request (method='GET', path='/')
FastAPI App->>FastAPI App: Lookup route for GET /
FastAPI App->>+Route Handler (read_root): Call async def read_root()
Route Handler (read_root)-->>-FastAPI App: Return {"message": "Hello World"}
FastAPI App->>FastAPI App: Convert dict to JSON Response (status 200)
FastAPI App-->>-ASGI Server (Uvicorn): Send HTTP Response
ASGI Server (Uvicorn)-->>-User Browser: HTTP/1.1 200 OK\nContent-Type: application/json\n\n{"message":"Hello World"}
```
Internally, FastAPI uses (and builds upon) the routing capabilities of the Starlette framework. When you use `@app.get()` or `@router.get()`, these functions register the path, method, and your handler function into a list of `Route` objects (defined conceptually in `fastapi/routing.py` and `starlette/routing.py`). When `app.include_router()` is called, the routes from the router are added to the main app's list, often with a path prefix if specified. When a request arrives, FastAPI iterates through this list, performs pattern matching on the path, checks the method, and calls the first matching handler.
## Conclusion
You've taken your first steps into the world of FastAPI!
* You learned that the `FastAPI` class is the core of your application, like a central control tower.
* You saw how to define **routes** using decorators like `@app.get("/")` to connect URL paths and HTTP methods to your Python functions.
* You wrote and ran your first simple "Hello World" API using `uvicorn`.
* You discovered `APIRouter` as a way to organize your routes into logical groups (like chapters or terminals), making your code cleaner as your project grows.
You now have the fundamental building blocks to create web APIs. In the next chapter, we'll dive deeper into defining routes, specifically how to handle data that comes *in* the URL path itself.
Ready to learn more? Let's move on to [Chapter 2: Path Operations & Parameter Declaration](02_path_operations___parameter_declaration.md)!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,393 @@
# Chapter 2: Path Operations & Parameter Declaration
Welcome back! In [Chapter 1: FastAPI Application & Routing](01_fastapi_application___routing.md), we learned how to set up a basic FastAPI application and organize our code using `APIRouter`. We saw how to connect a URL like `/` to a Python function using `@app.get("/")`.
But what if we need more information from the user? Imagine you're building an API for an online store. You don't just want a single "hello" page; you want users to be able to:
1. Get information about a *specific* item, like `/items/5` (where 5 is the item ID).
2. Search or filter items, like `/items/?query=socks` (search for "socks").
3. Add a *new* item by sending its details (name, price, etc.).
How do we tell FastAPI to expect this extra information (like the item ID `5`, the search query `"socks"`, or the new item's details) and make it available inside our Python function?
That's exactly what **Path Operations** and **Parameter Declaration** are for!
**Our Goal Today:** Learn how FastAPI uses function parameters and type hints to automatically handle data coming from different parts of the web request (URL path, query string, request body) and even validate it!
## What Problem Does This Solve?
Think of your API endpoint (like `/items/`) as a specific room in a building. To get into the room or ask for something specific within it, you often need to provide information:
* Maybe the room number is part of the address (`/items/10` - room number 10). This is like a **Path Parameter**.
* Maybe you need to fill out a small form asking optional questions ("Any specific colour?", "Sort by price?"). This is like **Query Parameters**.
* Maybe you need to hand over a detailed document with instructions or data (like the specs for a new item). This is like the **Request Body**.
FastAPI needs a way to understand these different types of information, extract them from the incoming request, check if they are the correct type (e.g., is the item ID *really* a number?), and give them to your Python function in a clean, easy-to-use way. It does this magic using standard Python type hints and special functions we'll learn about.
## Path Operations: More Than Just GET
In Chapter 1, we used `@app.get("/")`. The `get` part refers to the HTTP **method**. Browsers use `GET` when you simply visit a URL. But there are other common methods for different actions:
* `GET`: Retrieve data.
* `POST`: Create new data.
* `PUT`: Update existing data completely.
* `PATCH`: Partially update existing data.
* `DELETE`: Remove data.
FastAPI provides decorators for all these: `@app.post()`, `@app.put()`, `@app.patch()`, `@app.delete()`. You use them just like `@app.get()` to link a path and an HTTP method to your function.
```python
# main.py (continuing from Chapter 1, maybe add this to routers/items.py)
from fastapi import FastAPI
app = FastAPI()
# A GET operation (read)
@app.get("/items/")
async def read_items():
return [{"item_id": 1, "name": "Thingamajig"}]
# A POST operation (create)
@app.post("/items/")
async def create_item():
# We'll see how to get data *into* here later
return {"message": "Item received!"} # Placeholder
# We'll focus on GET for now, but others work similarly!
```
**Explanation:**
* We define different functions for different *actions* on the same path (`/items/`).
* `@app.get("/items/")` handles requests to *get* the list of items.
* `@app.post("/items/")` handles requests to *create* a new item. FastAPI knows which function to call based on the HTTP method used in the request.
## Path Parameters: Getting Data from the URL Path
Let's say you want an endpoint to get a *single* item by its ID. The URL might look like `http://127.0.0.1:8000/items/5`. Here, `5` is the ID we want to capture.
You define this in FastAPI by putting the variable name in curly braces `{}` within the path string:
```python
# main.py or routers/items.py
from fastapi import FastAPI
app = FastAPI() # Or use your APIRouter
@app.get("/items/{item_id}") # Path parameter defined here
async def read_item(item_id: int): # Parameter name MUST match! Type hint is key!
# FastAPI automatically converts the 'item_id' from the path (which is a string)
# into an integer because of the 'int' type hint.
# It also validates if it *can* be converted to an int.
return {"item_id": item_id, "name": f"Item {item_id} Name"}
```
**Explanation:**
* `@app.get("/items/{item_id}")`: The `{item_id}` part tells FastAPI: "Expect some value here in the URL path, and call it `item_id`."
* `async def read_item(item_id: int)`:
* We declare a function parameter named **exactly** `item_id`. FastAPI connects the path variable to this function argument.
* We use the Python type hint `: int`. This is crucial! FastAPI uses this to:
1. **Convert:** The value from the URL (`"5"`) is automatically converted to an integer (`5`).
2. **Validate:** If you visit `/items/foo`, FastAPI knows `"foo"` cannot be converted to an `int`, and it automatically returns a helpful error response *before* your function even runs!
**Try it:**
1. Run `uvicorn main:app --reload`.
2. Visit `http://127.0.0.1:8000/items/5`. You should see:
```json
{"item_id":5,"name":"Item 5 Name"}
```
3. Visit `http://127.0.0.1:8000/items/abc`. You should see an error like:
```json
{
"detail": [
{
"type": "int_parsing",
"loc": [
"path",
"item_id"
],
"msg": "Input should be a valid integer, unable to parse string as an integer",
"input": "abc",
"url": "..."
}
]
}
```
See? Automatic validation!
Path parameters are *required* parts of the path. The URL simply won't match the route if that part is missing.
## Query Parameters: Optional Info After "?"
What if you want to provide optional filtering or configuration in the URL? Like getting items, but maybe skipping the first 10 and limiting the results to 5: `http://127.0.0.1:8000/items/?skip=10&limit=5`.
These `key=value` pairs after the `?` are called **Query Parameters**.
In FastAPI, you declare them as function parameters that are *not* part of the path string. You can provide default values to make them optional.
```python
# main.py or routers/items.py
from fastapi import FastAPI
app = FastAPI() # Or use your APIRouter
# A simple fake database of items
fake_items_db = [{"item_name": "Foo"}, {"item_name": "Bar"}, {"item_name": "Baz"}]
@app.get("/items/")
# 'skip' and 'limit' are NOT in the path "/items/"
# They have default values, making them optional query parameters
async def read_items(skip: int = 0, limit: int = 10):
# FastAPI automatically gets 'skip' and 'limit' from the query string.
# If they are not provided in the URL, it uses the defaults (0 and 10).
# It also converts them to integers and validates them!
return fake_items_db[skip : skip + limit]
```
**Explanation:**
* `async def read_items(skip: int = 0, limit: int = 10)`:
* `skip` and `limit` are *not* mentioned in `@app.get("/items/")`. FastAPI knows they must be query parameters.
* They have default values (`= 0`, `= 10`). This makes them optional. If the user doesn't provide them in the URL, these defaults are used.
* The type hints `: int` ensure automatic conversion and validation, just like with path parameters.
**Try it:**
1. Make sure `uvicorn` is running.
2. Visit `http://127.0.0.1:8000/items/`. Result (uses defaults `skip=0`, `limit=10`):
```json
[{"item_name":"Foo"},{"item_name":"Bar"},{"item_name":"Baz"}]
```
3. Visit `http://127.0.0.1:8000/items/?skip=1&limit=1`. Result:
```json
[{"item_name":"Bar"}]
```
4. Visit `http://127.0.0.1:8000/items/?limit=abc`. Result: Automatic validation error because `abc` is not an integer.
You can also declare query parameters without default values. In that case, they become *required* query parameters.
```python
# Example: Required query parameter 'query_str'
@app.get("/search/")
async def search_items(query_str: str): # No default value means it's required
return {"search_query": query_str}
# Visiting /search/ will cause an error
# Visiting /search/?query_str=hello will work
```
You can also use other types like `bool` or `float`, and even optional types like `str | None = None` (or `Optional[str] = None` in older Python).
```python
@app.get("/users/{user_id}/items")
async def read_user_items(
user_id: int, # Path parameter
show_details: bool = False, # Optional query parameter (e.g., ?show_details=true)
category: str | None = None # Optional query parameter (e.g., ?category=books)
):
# ... function logic ...
return {"user_id": user_id, "show_details": show_details, "category": category}
```
## Request Body: Sending Complex Data
Sometimes, the data you need to send is too complex for the URL path or query string (like the name, description, price, tax, and tags for a new item). For `POST`, `PUT`, and `PATCH` requests, data is usually sent in the **Request Body**, often as JSON.
FastAPI uses **Pydantic models** to define the structure of the data you expect in the request body. We'll dive deep into Pydantic in [Chapter 3: Data Validation & Serialization (Pydantic)](03_data_validation___serialization__pydantic_.md), but here's a sneak peek:
```python
# main.py or a new models.py file
from pydantic import BaseModel
# Define the structure of an Item using Pydantic
class Item(BaseModel):
name: str
description: str | None = None # Optional field
price: float
tax: float | None = None # Optional field
# Now use it in a path operation
# main.py or routers/items.py
from fastapi import FastAPI
# Assume Item is defined as above (maybe import it)
app = FastAPI() # Or use your APIRouter
@app.post("/items/")
async def create_item(item: Item): # Declare the body parameter using the Pydantic model
# FastAPI automatically:
# 1. Reads the request body.
# 2. Parses the JSON data.
# 3. Validates the data against the 'Item' model (Are 'name' and 'price' present? Are types correct?).
# 4. If valid, provides the data as the 'item' argument (an instance of the Item class).
# 5. If invalid, returns an automatic validation error.
print(f"Received item: {item.name}, Price: {item.price}")
item_dict = item.model_dump() # Convert Pydantic model back to dict if needed
if item.tax:
price_with_tax = item.price + item.tax
item_dict["price_with_tax"] = price_with_tax
return item_dict
```
**Explanation:**
* `class Item(BaseModel): ...`: We define a class `Item` that inherits from Pydantic's `BaseModel`. We declare the expected fields (`name`, `description`, `price`, `tax`) and their types.
* `async def create_item(item: Item)`: We declare a *single* parameter `item` with the type hint `Item`. Because `Item` is a Pydantic model, FastAPI knows it should expect this data in the **request body** as JSON.
* FastAPI handles all the parsing and validation. If the incoming JSON doesn't match the `Item` structure, the client gets an error. If it matches, your function receives a ready-to-use `item` object.
You typically use request bodies for `POST`, `PUT`, and `PATCH` requests. You can only declare *one* body parameter per function (though that body can contain nested structures, as defined by your Pydantic model).
## Fine-tuning Parameters with `Path`, `Query`, `Body`, etc.
Type hints are great for basic validation (like `int`, `str`, `bool`). But what if you need more specific rules?
* The `item_id` must be greater than 0.
* A query parameter `q` should have a maximum length of 50 characters.
* A `description` in the request body should have a minimum length.
FastAPI provides functions like `Path`, `Query`, `Body`, `Header`, `Cookie`, and `File` (imported directly from `fastapi`) that you can use alongside type hints (using `typing.Annotated`) to add these extra validation rules and metadata.
Let's enhance our previous examples:
```python
# main.py or routers/items.py
from typing import Annotated # Use Annotated for extra metadata
from fastapi import FastAPI, Path, Query
# Assume Item Pydantic model is defined/imported
app = FastAPI() # Or use your APIRouter
# Fake DB
fake_items_db = [{"item_name": "Foo"}, {"item_name": "Bar"}, {"item_name": "Baz"}]
@app.get("/items/{item_id}")
async def read_item(
# Use Annotated[type, Path(...)] for path parameters
item_id: Annotated[int, Path(
title="The ID of the item to get",
description="The item ID must be a positive integer.",
gt=0, # gt = Greater Than 0
le=1000 # le = Less Than or Equal to 1000
)]
):
return {"item_id": item_id, "name": f"Item {item_id} Name"}
@app.get("/items/")
async def read_items(
# Use Annotated[type | None, Query(...)] for optional query parameters
q: Annotated[str | None, Query(
title="Query string",
description="Optional query string to search items.",
min_length=3,
max_length=50
)] = None, # Default value still makes it optional
skip: Annotated[int, Query(ge=0)] = 0, # ge = Greater Than or Equal to 0
limit: Annotated[int, Query(gt=0, le=100)] = 10
):
results = fake_items_db[skip : skip + limit]
if q:
results = [item for item in results if q.lower() in item["item_name"].lower()]
return results
# Using Body works similarly, often used inside Pydantic models (Chapter 3)
# or if you need to embed a single body parameter
@app.post("/items/")
async def create_item(item: Item): # Pydantic model handles body structure
# Validation for item fields is defined within the Item model itself (See Chapter 3)
# For simple body params without Pydantic, you might use:
# importance: Annotated[int, Body(gt=0)]
return item
```
**Explanation:**
* **`Annotated`**: This is the standard Python way (Python 3.9+) to add extra context to type hints. FastAPI uses this to associate `Path`, `Query`, etc., with your parameters.
* **`Path(...)`**: Used for path parameters.
* `title`, `description`: Add metadata that will appear in the automatic documentation (see [Chapter 4](04_openapi___automatic_docs.md)).
* `gt`, `ge`, `lt`, `le`: Numeric validation (greater than, greater than or equal, less than, less than or equal).
* **`Query(...)`**: Used for query parameters.
* Takes similar arguments to `Path` for metadata and numeric validation.
* `min_length`, `max_length`: String length validation.
* The default value (`= None`, `= 0`, `= 10`) still determines if the parameter is optional or required.
* **`Body(...)`**: Used for request body parameters (often implicitly handled by Pydantic models). Can add metadata or validation similar to `Query`.
* **Others**: `Header()`, `Cookie()`, `File()` work similarly for data from request headers, cookies, or uploaded files.
Using `Path`, `Query`, etc., gives you fine-grained control over data validation and adds useful information to your API documentation automatically.
## How it Works Under the Hood (Simplified)
How does FastAPI magically connect URL parts and request data to your function arguments and validate them?
1. **App Startup:** When you run your app, FastAPI (using Starlette's routing) inspects all the functions decorated with `@app.get`, `@app.post`, etc.
2. **Function Signature Inspection:** For each function, FastAPI looks at its parameters (`item_id`, `skip`, `limit`, `item`, `q`).
3. **Parameter Type Analysis:** It checks the type hints (`int`, `str`, `bool`, `Item`, `Annotated[...]`).
4. **Location Determination:**
* If a parameter name matches a variable in the path string (`{item_id}`), it's a **Path Parameter**.
* If a parameter has a type hint that's a Pydantic model (`item: Item`), it's a **Body Parameter**.
* Otherwise, it's a **Query Parameter** (`skip`, `limit`, `q`).
* If `Annotated` is used with `Path`, `Query`, `Body`, `Header`, `Cookie`, or `File`, that explicitly defines the location and adds extra validation rules.
5. **Request Arrives:** A request comes in (e.g., `GET /items/5?q=search`).
6. **Routing:** Uvicorn passes the request to FastAPI. FastAPI/Starlette matches the path (`/items/5`) and method (`GET`) to the `read_item` function (or `read_items` if the path was `/items/`). Let's assume it matches `read_item` for `/items/{item_id}`.
7. **Data Extraction:** FastAPI extracts data from the request based on the parameter definitions found in step 4:
* Path: Extracts `"5"` for `item_id`.
* Query: Extracts `"search"` for `q` (if the route was `/items/` and the function `read_items`).
* Body: Reads and parses JSON (if it was a POST/PUT request with a body parameter).
8. **Validation & Conversion:** FastAPI uses the type hints and any extra rules from `Path`, `Query`, `Body` (often leveraging Pydantic internally):
* Converts `"5"` to the integer `5` for `item_id`. Checks `gt=0`.
* Converts `"search"` to a string for `q`. Checks `max_length`.
* Validates the JSON body against the `Item` model.
9. **Error Handling:** If any validation or conversion fails, FastAPI *immediately* stops and sends back a 422 "Unprocessable Entity" error response with details about what went wrong. Your function is *not* called.
10. **Function Call:** If everything is valid, FastAPI calls your function (`read_item` or `read_items`) with the extracted, converted, and validated data as arguments (`read_item(item_id=5)` or `read_items(q="search", skip=0, limit=10)`).
11. **Response:** Your function runs and returns a result. FastAPI processes the result into an HTTP response.
Here's a simplified diagram for a `GET /items/5?limit=10` request:
```mermaid
sequenceDiagram
participant Client
participant ASGI Server (Uvicorn)
participant FastAPI App
participant Param Processor
participant Route Handler (read_item)
Client->>+ASGI Server (Uvicorn): GET /items/5?limit=10
ASGI Server (Uvicorn)->>+FastAPI App: Pass Request (method='GET', path='/items/5', query='limit=10')
FastAPI App->>FastAPI App: Match route for GET /items/{item_id}
FastAPI App->>+Param Processor: Process params for read_item(item_id: Annotated[int, Path(gt=0)], limit: Annotated[int, Query(gt=0)]=10)
Param Processor->>Param Processor: Extract '5' from path for item_id
Param Processor->>Param Processor: Extract '10' from query for limit
Param Processor->>Param Processor: Validate/Convert: item_id = 5 (int, >0) -> OK
Param Processor->>Param Processor: Validate/Convert: limit = 10 (int, >0) -> OK
Param Processor-->>-FastAPI App: Validated Params: {item_id: 5, limit: 10}
FastAPI App->>+Route Handler (read_item): Call read_item(item_id=5, limit=10)
Route Handler (read_item)-->>-FastAPI App: Return {"item_id": 5, ...}
FastAPI App->>FastAPI App: Convert result to JSON Response
FastAPI App-->>-ASGI Server (Uvicorn): Send HTTP Response
ASGI Server (Uvicorn)-->>-Client: HTTP 200 OK Response
```
FastAPI cleverly uses Python's type hinting system, Pydantic, and Starlette's request handling to automate the tedious tasks of parsing, validation, and documentation.
## Conclusion
You've now learned the core mechanics of defining API endpoints (Path Operations) and extracting data from requests in FastAPI!
* You know how to use decorators like `@app.get`, `@app.post` for different HTTP methods.
* You can define **Path Parameters** using `{}` in the path string and matching function arguments with type hints (`item_id: int`).
* You can define **Query Parameters** using function arguments *not* in the path, making them optional with default values (`skip: int = 0`).
* You understand the basics of receiving JSON **Request Bodies** using Pydantic models (`item: Item`).
* You saw how to add extra validation and metadata using `Annotated` with `Path()`, `Query()`, and `Body()`.
* You got a glimpse of how FastAPI uses type hints and these tools to automatically parse, validate, and document your API parameters.
This powerful parameter declaration system is a cornerstone of FastAPI's ease of use and robustness. In the next chapter, we'll explore Pydantic models in much more detail, unlocking even more powerful data validation and serialization capabilities for your request bodies and responses.
Ready to master data shapes? Let's move on to [Chapter 3: Data Validation & Serialization (Pydantic)](03_data_validation___serialization__pydantic_.md)!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,344 @@
# Chapter 3: Data Validation & Serialization (Pydantic)
Welcome back! In [Chapter 2: Path Operations & Parameter Declaration](02_path_operations___parameter_declaration.md), we learned how FastAPI uses type hints to understand path parameters (like `/items/{item_id}`) and query parameters (like `/?skip=0&limit=10`). We even saw a sneak peek of how Pydantic models can define the structure of a JSON request body.
Now, let's dive deep into that magic! How does FastAPI *really* handle complex data coming into your API and the data you send back?
**Our Goal Today:** Understand how FastAPI uses the powerful **Pydantic** library to automatically validate incoming data (making sure it's correct) and serialize outgoing data (converting it to JSON).
## What Problem Does This Solve?
Imagine you're building the API for an online store, specifically the part where a user can add a new product. They need to send you information like the product's name, price, and maybe an optional description. This information usually comes as JSON in the request body.
You need to make sure:
1. **The data arrived:** Did the user actually send the product details?
2. **It has the right shape:** Does the JSON contain a `name` and a `price`? Is the `description` there, or is it okay if it's missing?
3. **It has the right types:** Is the `name` a string? Is the `price` a number (like a float or decimal)?
4. **It meets certain rules (optional):** Maybe the price must be positive? Maybe the name can't be empty?
Doing these checks manually for every API endpoint would be tedious and error-prone.
Similarly, when you send data *back* (like the details of the newly created product), you need to convert your internal Python objects (like dictionaries or custom class instances) into standard JSON that the user's browser or application can understand. You might also want to control *which* information gets sent back (e.g., maybe hide internal cost fields).
**FastAPI solves both problems using Pydantic:**
* **Validation (Gatekeeper):** Pydantic models act like strict blueprints or forms. You define the expected structure and types of incoming data using a Pydantic model. FastAPI uses this model to automatically parse the incoming JSON, check if it matches the blueprint (validate it), and provide you with a clean Python object. If the data doesn't match, FastAPI automatically sends back a clear error message saying exactly what's wrong. Think of it as a meticulous gatekeeper checking IDs and forms at the entrance.
* **Serialization (Translator):** When you return data from your API function, FastAPI can use a Pydantic model (specified as a `response_model`) or its built-in `jsonable_encoder` to convert your Python objects (Pydantic models, database objects, dictionaries, etc.) into JSON format. Think of it as a helpful translator converting your application's internal language into the common language of JSON for the outside world.
## Your First Pydantic Model
Pydantic models are simply Python classes that inherit from `pydantic.BaseModel`. You define the "fields" of your data as class attributes with type hints.
Let's define a model for our product item:
1. **Create a file (optional but good practice):** You could put this in a file like `models.py`.
2. **Write the model:**
```python
# models.py (or within your main.py/routers/items.py)
from pydantic import BaseModel
class Item(BaseModel):
name: str
description: str | None = None # Optional field with a default of None
price: float
tax: float | None = None # Optional field with a default of None
```
**Explanation:**
* `from pydantic import BaseModel`: We import the necessary `BaseModel` from Pydantic.
* `class Item(BaseModel):`: We define our model class `Item`, inheriting from `BaseModel`.
* `name: str`: We declare a field named `name`. The type hint `: str` tells Pydantic that this field is **required** and must be a string.
* `description: str | None = None`:
* `str | None`: This type hint (using the pipe `|` operator for Union) means `description` can be either a string OR `None`.
* `= None`: This sets the **default value** to `None`. Because it has a default value, this field is **optional**. If the incoming data doesn't include `description`, Pydantic will automatically set it to `None`.
* `price: float`: A required field that must be a floating-point number.
* `tax: float | None = None`: An optional field that can be a float or `None`, defaulting to `None`.
This simple class definition now acts as our data blueprint!
## Using Pydantic for Request Body Validation
Now, let's use this `Item` model in a `POST` request to create a new item. We saw this briefly in Chapter 2.
```python
# main.py (or routers/items.py)
from fastapi import FastAPI
# Assume 'Item' model is defined above or imported: from models import Item
app = FastAPI() # Or use your APIRouter
@app.post("/items/")
# Declare 'item' parameter with type hint 'Item'
async def create_item(item: Item):
# If the code reaches here, FastAPI + Pydantic already did:
# 1. Read the request body (as JSON bytes).
# 2. Parsed the JSON into a Python dict.
# 3. Validated the dict against the 'Item' model.
# - Checked required fields ('name', 'price').
# - Checked types (name is str, price is float, etc.).
# - Assigned default values for optional fields if missing.
# 4. Created an 'Item' instance from the valid data.
# 'item' is now a Pydantic 'Item' object with validated data!
print(f"Received item name: {item.name}")
print(f"Received item price: {item.price}")
if item.description:
print(f"Received item description: {item.description}")
if item.tax:
print(f"Received item tax: {item.tax}")
# You can easily convert the Pydantic model back to a dict if needed
item_dict = item.model_dump() # Pydantic v2 method
# ... here you would typically save the item to a database ...
# Return the created item's data
return item_dict
```
**Explanation:**
* `async def create_item(item: Item)`: By declaring the function parameter `item` with the type hint `Item` (our Pydantic model), FastAPI automatically knows it should:
* Expect JSON in the request body.
* Validate that JSON against the `Item` model.
* **Automatic Validation:** If the client sends JSON like `{"name": "Thingamajig", "price": 49.99}`, FastAPI/Pydantic validates it, creates an `Item` object (`item`), and passes it to your function. Inside your function, `item.name` will be `"Thingamajig"`, `item.price` will be `49.99`, and `item.description` and `item.tax` will be `None` (their defaults).
* **Automatic Errors:** If the client sends invalid JSON, like `{"name": "Gadget"}` (missing `price`) or `{"name": "Gizmo", "price": "expensive"}` (`price` is not a float), FastAPI will **not** call your `create_item` function. Instead, it will automatically send back a `422 Unprocessable Entity` HTTP error response with a detailed JSON body explaining the validation errors.
**Example 422 Error Response (if `price` was missing):**
```json
{
"detail": [
{
"type": "missing",
"loc": [
"body",
"price"
],
"msg": "Field required",
"input": { // The invalid data received
"name": "Gadget"
},
"url": "..." // Pydantic v2 URL to error details
}
]
}
```
This automatic validation saves you a *ton* of boilerplate code and provides clear feedback to API consumers.
## Using Pydantic for Response Serialization (`response_model`)
We just saw how Pydantic validates *incoming* data. It's also incredibly useful for shaping *outgoing* data.
Let's say when we create an item, we want to return the item's data, but maybe we have some internal fields in our Pydantic model that we *don't* want to expose in the API response. Or, we just want to be absolutely sure the response *always* conforms to the `Item` structure.
We can use the `response_model` parameter in the path operation decorator:
```python
# main.py (or routers/items.py, modified version)
from fastapi import FastAPI
from pydantic import BaseModel # Assuming Item is defined here or imported
# Let's add an internal field to our model for demonstration
class Item(BaseModel):
name: str
description: str | None = None
price: float
tax: float | None = None
internal_cost: float = 0.0 # Field we DON'T want in the response
app = FastAPI() # Or use your APIRouter
# Add response_model=Item to the decorator
@app.post("/items/", response_model=Item)
async def create_item(item: Item):
# item is the validated input Item object
print(f"Processing item: {item.name} with internal cost {item.internal_cost}")
# ... save item to database ...
# Let's imagine we return the same item object we received
# (in reality, you might return an object fetched from the DB)
return item # FastAPI will handle serialization based on response_model
```
**Explanation:**
* `@app.post("/items/", response_model=Item)`: By adding `response_model=Item`, we tell FastAPI:
1. **Filter:** Whatever data is returned by the `create_item` function, filter it so that only the fields defined in the `Item` model (`name`, `description`, `price`, `tax`, `internal_cost`) are included in the final JSON response. **Wait!** Actually, Pydantic V2 by default includes all fields from the returned object *that are also in the response model*. In this case, since we return `item` which *is* an `Item` instance, all fields (`name`, `description`, `price`, `tax`, `internal_cost`) would be included *if* the returned object *was* an `Item` instance. *Correction:* Let's refine the example to show filtering. Let's define a *different* response model.
```python
# models.py
from pydantic import BaseModel
# Input model (can include internal fields)
class ItemCreate(BaseModel):
name: str
description: str | None = None
price: float
tax: float | None = None
internal_cost: float # Required input, but we won't return it
# Output model (defines what the client sees)
class ItemPublic(BaseModel):
name: str
description: str | None = None
price: float
tax: float | None = None
# Note: internal_cost is NOT defined here
# ---- In main.py or routers/items.py ----
from fastapi import FastAPI
from models import ItemCreate, ItemPublic # Import both models
app = FastAPI()
items_db = [] # Simple in-memory "database"
@app.post("/items/", response_model=ItemPublic) # Use ItemPublic for response
async def create_item(item_input: ItemCreate): # Use ItemCreate for input
print(f"Received internal cost: {item_input.internal_cost}")
# Convert input model to a dict (or create DB model instance)
item_data = item_input.model_dump()
# Simulate saving to DB and getting back the saved data
# In a real app, the DB might assign an ID, etc.
saved_item_data = item_data.copy()
saved_item_data["id"] = len(items_db) + 1 # Add a simulated ID
items_db.append(saved_item_data)
# Return the *dictionary* of saved data. FastAPI will use response_model
# ItemPublic to filter and serialize this dictionary.
return saved_item_data
```
**Explanation (Revised):**
* `ItemCreate`: Defines the structure we expect for *creating* an item, including `internal_cost`.
* `ItemPublic`: Defines the structure we want to *return* to the client, notably *excluding* `internal_cost`.
* `create_item(item_input: ItemCreate)`: We accept the full `ItemCreate` model as input.
* `@app.post("/items/", response_model=ItemPublic)`: We declare that the response should conform to the `ItemPublic` model.
* `return saved_item_data`: We return a Python dictionary containing all fields (including `internal_cost` and the simulated `id`).
* **Automatic Filtering & Serialization:** FastAPI takes the returned dictionary (`saved_item_data`). Because `response_model=ItemPublic` is set, it does the following *before* sending the response:
1. It looks at the fields defined in `ItemPublic` (`name`, `description`, `price`, `tax`).
2. It takes only those fields from the `saved_item_data` dictionary. The `internal_cost` and `id` fields are automatically dropped because they are not in `ItemPublic`.
3. It ensures the values for the included fields match the types expected by `ItemPublic` (this also provides some output validation).
4. It converts the resulting filtered data into a JSON string using `jsonable_encoder` internally.
**Example Interaction:**
1. **Client sends `POST /items/` with body:**
```json
{
"name": "Super Gadget",
"price": 120.50,
"internal_cost": 55.25,
"description": "The best gadget ever!"
}
```
2. **FastAPI:** Validates this against `ItemCreate` (Success).
3. **`create_item` function:** Runs, prints `internal_cost`, prepares `saved_item_data` dictionary.
4. **FastAPI (Response processing):** Takes the returned dictionary, filters it using `ItemPublic`.
5. **Client receives `200 OK` with body:**
```json
{
"name": "Super Gadget",
"description": "The best gadget ever!",
"price": 120.50,
"tax": null
}
```
Notice `internal_cost` and `id` are gone!
The `response_model` gives you precise control over your API's output contract, enhancing security and clarity.
## How it Works Under the Hood (Simplified)
Let's trace the journey of a `POST /items/` request using our `ItemCreate` input model and `ItemPublic` response model.
1. **Request In:** Client sends `POST /items/` with JSON body to the Uvicorn server.
2. **FastAPI Routing:** Uvicorn passes the request to FastAPI. FastAPI matches the path and method to our `create_item` function.
3. **Parameter Analysis:** FastAPI inspects `create_item(item_input: ItemCreate)`. It sees `item_input` is type-hinted with a Pydantic model (`ItemCreate`), so it knows to look for the data in the request body.
4. **Body Reading & Parsing:** FastAPI reads the raw bytes from the request body and attempts to parse them as JSON into a Python dictionary. If JSON parsing fails, an error is returned.
5. **Pydantic Validation:** FastAPI passes the parsed dictionary to Pydantic, essentially calling `ItemCreate.model_validate(parsed_dict)`.
* **Success:** Pydantic checks types, required fields, etc. If valid, it returns a populated `ItemCreate` instance.
* **Failure:** Pydantic raises a `ValidationError`. FastAPI catches this.
6. **Error Handling (if validation failed):** FastAPI converts the Pydantic `ValidationError` into a user-friendly JSON response (status code 422) and sends it back immediately. The `create_item` function is *never called*.
7. **Function Execution (if validation succeeded):** FastAPI calls `create_item(item_input=<ItemCreate instance>)`. Your function logic runs.
8. **Return Value:** Your function returns a value (e.g., the `saved_item_data` dictionary).
9. **Response Model Processing:** FastAPI sees `response_model=ItemPublic` in the decorator.
10. **Filtering/Validation:** FastAPI uses the `ItemPublic` model to filter the returned dictionary (`saved_item_data`), keeping only fields defined in `ItemPublic`. It may also perform type coercion/validation based on `ItemPublic`.
11. **Serialization (`jsonable_encoder`):** FastAPI passes the filtered data to `jsonable_encoder`. This function recursively walks through the data, converting Pydantic models, `datetime` objects, `UUID`s, Decimals, etc., into basic JSON-compatible types (strings, numbers, booleans, lists, dicts, null).
12. **Response Out:** FastAPI creates the final HTTP response with the correct status code, headers (`Content-Type: application/json`), and the JSON string body. Uvicorn sends this back to the client.
Here's a diagram summarizing the flow:
```mermaid
sequenceDiagram
participant Client
participant ASGI Server (Uvicorn)
participant FastAPI App
participant Pydantic Validator
participant Route Handler (create_item)
participant Pydantic Serializer (via response_model)
participant JsonableEncoder
Client->>ASGI Server (Uvicorn): POST /items/ (with JSON body)
ASGI Server (Uvicorn)->>FastAPI App: Pass Request
FastAPI App->>FastAPI App: Find route, see param `item_input: ItemCreate`
FastAPI App->>FastAPI App: Read & Parse JSON body
FastAPI App->>Pydantic Validator: Validate data with ItemCreate model
alt Validation Fails
Pydantic Validator-->>FastAPI App: Raise ValidationError
FastAPI App->>FastAPI App: Format 422 Error Response
FastAPI App-->>ASGI Server (Uvicorn): Send 422 Response
ASGI Server (Uvicorn)-->>Client: HTTP 422 Response
else Validation Succeeds
Pydantic Validator-->>FastAPI App: Return ItemCreate instance
FastAPI App->>Route Handler (create_item): Call create_item(item_input=...)
Route Handler (create_item)-->>FastAPI App: Return result (e.g., dict)
FastAPI App->>FastAPI App: Check response_model=ItemPublic
FastAPI App->>Pydantic Serializer (via response_model): Filter/Validate result using ItemPublic
Pydantic Serializer (via response_model)-->>FastAPI App: Return filtered data
FastAPI App->>JsonableEncoder: Convert filtered data to JSON types
JsonableEncoder-->>FastAPI App: Return JSON-compatible data
FastAPI App->>FastAPI App: Create 200 OK JSON Response
FastAPI App-->>ASGI Server (Uvicorn): Send 200 Response
ASGI Server (Uvicorn)-->>Client: HTTP 200 OK Response
end
```
## Internal Code Connections
While FastAPI hides the complexity, here's roughly where things happen:
* **Model Definition:** You use `pydantic.BaseModel`.
* **Parameter Analysis:** FastAPI's `fastapi.dependencies.utils.analyze_param` identifies parameters type-hinted with Pydantic models as potential body parameters.
* **Request Body Handling:** `fastapi.dependencies.utils.request_body_to_args` coordinates reading, parsing, and validation (using Pydantic's validation methods internally, like `model_validate` in v2).
* **Validation Errors:** Pydantic raises `pydantic.ValidationError`, which FastAPI catches and handles using default exception handlers (see `fastapi.exception_handlers`) to create the 422 response.
* **Response Serialization:** The `fastapi.routing.APIRoute` class handles the `response_model`. If present, it uses it to process the return value before passing it to `fastapi.encoders.jsonable_encoder`.
* **JSON Conversion:** `fastapi.encoders.jsonable_encoder` is the workhorse that converts various Python types into JSON-compatible formats. It knows how to handle Pydantic models (calling their `.model_dump(mode='json')` method in v2), datetimes, UUIDs, etc.
## Conclusion
You've unlocked one of FastAPI's superpowers: seamless data validation and serialization powered by Pydantic!
* You learned to define data shapes using **Pydantic models** (`BaseModel`).
* You saw how FastAPI **automatically validates** incoming request bodies against these models using simple type hints in your function parameters (`item: Item`).
* You learned how to use the `response_model` parameter in path operation decorators to **filter and serialize** outgoing data, ensuring your API responses have a consistent and predictable structure.
* You understood the basic flow: FastAPI acts as the orchestrator, using Pydantic as the expert validator and `jsonable_encoder` as the expert translator.
This automatic handling drastically reduces boilerplate code, prevents common errors, and makes your API development faster and more robust.
But there's another huge benefit to defining your data with Pydantic models: FastAPI uses them to generate interactive API documentation automatically! Let's see how that works in the next chapter.
Ready to see your API document itself? Let's move on to [Chapter 4: OpenAPI & Automatic Docs](04_openapi___automatic_docs.md)!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,278 @@
# Chapter 4: OpenAPI & Automatic Docs
Welcome back! In [Chapter 3: Data Validation & Serialization (Pydantic)](03_data_validation___serialization__pydantic_.md), we saw how FastAPI uses Pydantic models to automatically validate incoming data and serialize outgoing data, making our API robust and predictable. But how do we tell others (or remind ourselves later) how to actually *use* our API? What endpoints exist? What data should they send? What will they get back?
**Our Goal Today:** Discover how FastAPI automatically generates API documentation that is interactive and always stays synchronized with your code, using the OpenAPI standard.
## What Problem Does This Solve?
Imagine you've built an amazing, complex machine maybe a fantastic coffee maker. You know exactly how it works, which buttons to press, and where to put the beans and water. But if someone else wants to use it, or even if you forget some details after a few months, you need a **user manual**.
An API is similar. It's a way for different software components (like a web frontend and a backend server, or two different backend services) to communicate. Without a clear "manual", it's hard for developers to know:
* What specific URLs (paths) are available? (`/items/`, `/users/{user_id}`)
* What HTTP methods can be used for each path? (`GET`, `POST`, `DELETE`)
* What data needs to be sent in the URL path or query string? (`item_id`, `?q=search`)
* What data needs to be sent in the request body (often as JSON)? (`{"name": "...", "price": ...}`)
* What does the data returned by the API look like?
* How does security work?
Manually writing and updating this documentation is a chore. It's easy to make mistakes, and even easier for the documentation to become outdated as the code changes. This leads to confusion, errors, and wasted time.
FastAPI solves this beautifully by automatically generating this "manual" based directly on your Python code. It uses an industry standard called **OpenAPI**.
## Key Concepts
### 1. OpenAPI Specification
* **What it is:** OpenAPI (formerly known as Swagger Specification) is a standard, language-agnostic way to describe RESTful APIs. Think of it as a universal blueprint for APIs.
* **Format:** It's usually written in JSON or YAML format. This format is machine-readable, meaning tools can automatically process it.
* **Content:** An OpenAPI document details everything about your API: available paths, allowed operations (GET, POST, etc.) on those paths, expected parameters (path, query, header, cookie, body), data formats (using JSON Schema, which Pydantic models map to), security requirements, and more.
FastAPI automatically generates this OpenAPI schema for your entire application.
### 2. Automatic Generation: From Code to Docs
How does FastAPI create this OpenAPI schema? It intelligently inspects your code:
* **Paths and Methods:** It looks at your path operation decorators like `@app.get("/items/")`, `@app.post("/items/")`, `@app.get("/users/{user_id}")`.
* **Parameters:** It examines your function parameters, their type hints (`item_id: int`, `q: str | None = None`), and any extra information provided using `Path()`, `Query()` as seen in [Chapter 2: Path Operations & Parameter Declaration](02_path_operations___parameter_declaration.md).
* **Request Bodies:** It uses the Pydantic models you declare as type hints for request body parameters (`item: Item`) from [Chapter 3: Data Validation & Serialization (Pydantic)](03_data_validation___serialization__pydantic_.md).
* **Responses:** It uses the `response_model` you define in decorators and the status codes to describe possible responses.
* **Metadata:** It reads docstrings from your functions and metadata like `title`, `description`, `tags`, `summary`, `deprecated` that you add to your path operations or parameters.
Because the documentation is generated *from* the code, it stays **synchronized**. If you change a parameter type or add a new endpoint, the documentation updates automatically the next time you run the app!
### 3. Interactive API Documentation UIs
Having the OpenAPI schema (the blueprint) is great, but it's just a JSON file. FastAPI goes a step further and provides two beautiful, interactive web interfaces *out-of-the-box* that use this schema:
* **Swagger UI (at `/docs`):** This interface provides a rich, interactive environment where you can:
* Browse all your API endpoints, grouped by tags.
* See details for each endpoint: description, parameters, request body structure, possible responses.
* **Try it out!** You can directly make API calls from your browser, fill in parameters, and see the actual responses. This is incredibly useful for testing and debugging.
* **ReDoc (at `/redoc`):** This provides an alternative documentation view, often considered cleaner for pure documentation reading, presenting a three-panel layout with navigation, documentation, and code samples. It's less focused on interactive "try it out" functionality compared to Swagger UI but excellent for understanding the API structure.
## Using the Automatic Docs
The best part? You barely have to do anything to get basic documentation! Let's use a simple example building on previous chapters.
```python
# main.py
from fastapi import FastAPI, Path, Query
from pydantic import BaseModel
from typing import Annotated
# Define a Pydantic model (like in Chapter 3)
class Item(BaseModel):
name: str
description: str | None = None
price: float
tax: float | None = None
app = FastAPI(
title="My Super API",
description="This is a very fancy API built with FastAPI",
version="1.0.0",
)
# Simple in-memory storage
fake_items_db = {}
@app.post("/items/", response_model=Item, tags=["Items"])
async def create_item(item: Item):
"""
Create a new item and store it.
- **name**: Each item must have a name.
- **description**: A long description.
- **price**: Price must be positive.
"""
item_id = len(fake_items_db) + 1
fake_items_db[item_id] = item
return item # Return the created item
@app.get("/items/{item_id}", response_model=Item, tags=["Items"])
async def read_item(
item_id: Annotated[int, Path(
title="The ID of the item to get",
description="The ID of the item you want to retrieve.",
gt=0
)]
):
"""
Retrieve a single item by its ID.
"""
if item_id not in fake_items_db:
# We'll cover proper error handling in Chapter 6
from fastapi import HTTPException
raise HTTPException(status_code=404, detail="Item not found")
return fake_items_db[item_id]
@app.get("/items/", tags=["Items"])
async def read_items(
skip: Annotated[int, Query(description="Number of items to skip")] = 0,
limit: Annotated[int, Query(description="Maximum number of items to return")] = 10
):
"""
Retrieve a list of items with pagination.
"""
items = list(fake_items_db.values())
return items[skip : skip + limit]
```
**Running the App:**
Save this as `main.py` and run it with Uvicorn:
```bash
uvicorn main:app --reload
```
Now, open your web browser and go to these URLs:
1. **`http://127.0.0.1:8000/docs`**
You'll see the **Swagger UI**:
* The API title ("My Super API"), version, and description you provided when creating `FastAPI()` are shown at the top.
* Endpoints are grouped under the "Items" tag (because we added `tags=["Items"]`).
* Expand an endpoint (e.g., `POST /items/`). You'll see:
* The description from the function's docstring (`Create a new item...`).
* A "Parameters" section (empty for this POST, but would show path/query params if present).
* A "Request body" section showing the required JSON structure based on the `Item` Pydantic model, including descriptions if you add them to the model fields.
* A "Responses" section showing the expected `200 OK` response (based on `response_model=Item`) and the automatic `422 Validation Error` response.
* A "Try it out" button! Click it, edit the example JSON body, and click "Execute" to send a real request to your running API.
2. **`http://127.0.0.1:8000/redoc`**
You'll see the **ReDoc** interface:
* A cleaner, more static documentation layout.
* It displays the same information derived from your code and the OpenAPI schema (paths, parameters, schemas, descriptions) but in a different presentation format.
3. **`http://127.0.0.1:8000/openapi.json`**
You'll see the raw **OpenAPI schema** in JSON format. This is the machine-readable definition that powers both `/docs` and `/redoc`. Tools can use this URL to automatically generate client code, run tests, and more.
**Enhancing the Docs:**
Notice how FastAPI used:
* `title`, `description`, `version` in `app = FastAPI(...)` for the overall API info.
* `tags=["Items"]` to group related operations.
* Docstrings (`"""Create a new item..."""`) for operation descriptions.
* Pydantic models (`Item`) for request body and response schemas.
* Type hints and `Path`/`Query` for parameter definitions, including their `title` and `description`.
You can make your documentation even richer by adding more details like examples, summaries, and descriptions to your Pydantic models and parameters.
```python
# Example: Adding more detail to the Pydantic model
from pydantic import BaseModel, Field
# ... other imports ...
class Item(BaseModel):
name: str = Field(..., # ... means required
title="Item Name",
description="The name of the item.",
example="Super Gadget")
description: str | None = Field(default=None,
title="Item Description",
max_length=300,
example="A very useful gadget.")
price: float = Field(...,
gt=0, # Price must be greater than 0
title="Price",
description="The price of the item in USD.",
example=19.99)
tax: float | None = Field(default=None,
ge=0, # Tax >= 0 if provided
title="Tax",
description="Optional sales tax.",
example=1.60)
# ... rest of your FastAPI app ...
```
With these `Field` annotations, your documentation (especially in the "Schemas" section at the bottom of `/docs`) will become even more descriptive and helpful.
## How it Works Under the Hood (Simplified)
How does FastAPI pull off this magic?
1. **App Initialization:** When your `FastAPI()` application starts up, it doesn't just prepare to handle requests; it also sets up the documentation system.
2. **Route Inspection:** FastAPI iterates through all the path operations you've defined (like `@app.post("/items/")`, `@app.get("/items/{item_id}")`). It uses Python's `inspect` module and its own logic to analyze each route.
3. **Metadata Extraction:** For each route, it gathers all relevant information:
* The URL path (`/items/`, `/items/{item_id}`)
* The HTTP method (`POST`, `GET`)
* Function parameters (name, type hint, default value, `Path`/`Query`/`Body` info)
* Pydantic models used for request bodies and `response_model`.
* Status codes.
* Docstrings, tags, summary, description, operation ID, deprecation status.
4. **OpenAPI Model Building:** FastAPI uses this extracted information to populate a set of Pydantic models that represent the structure of an OpenAPI document (these models live in `fastapi.openapi.models`, like `OpenAPI`, `Info`, `PathItem`, `Operation`, `Schema`, etc.). The core function doing this heavy lifting is `fastapi.openapi.utils.get_openapi`.
5. **Schema Generation:** Pydantic models used in request/response bodies or parameters are converted into JSON Schema definitions, which are embedded within the OpenAPI structure under `components.schemas`. This describes the expected data shapes.
6. **Docs Endpoint Creation:** FastAPI automatically adds three special routes to your application:
* `/openapi.json`: This endpoint is configured to call `get_openapi` when requested, generate the complete OpenAPI schema as a Python dictionary, and return it as a JSON response.
* `/docs`: This endpoint uses the `fastapi.openapi.docs.get_swagger_ui_html` function. This function generates an HTML page that includes the necessary JavaScript and CSS for Swagger UI (usually loaded from a CDN). Crucially, this HTML tells the Swagger UI JavaScript to fetch the API definition from `/openapi.json`.
* `/redoc`: Similarly, this endpoint uses `fastapi.openapi.docs.get_redoc_html` to generate an HTML page that loads ReDoc and tells it to fetch the API definition from `/openapi.json`.
7. **Serving Docs:** When you visit `/docs` or `/redoc` in your browser:
* The browser first receives the basic HTML page from FastAPI.
* The JavaScript (Swagger UI or ReDoc) within that page then makes a *separate* request back to your FastAPI application, asking for `/openapi.json`.
* FastAPI responds with the generated OpenAPI JSON schema.
* The JavaScript in your browser parses this schema and dynamically renders the interactive documentation interface you see.
Here's a simplified view of the process when you access `/docs`:
```mermaid
sequenceDiagram
participant Browser
participant FastAPIApp as FastAPI App (Python Backend)
participant RouteInspector as Route Inspector (Internal)
participant OpenAPIGenerator as OpenAPI Generator (Internal - get_openapi)
participant SwaggerUIHandler as /docs Handler (Internal)
participant OpenAPISchemaHandler as /openapi.json Handler (Internal)
Note over FastAPIApp: App Starts & Inspects Routes
FastAPIApp->>RouteInspector: Analyze @app.post("/items/"), @app.get("/items/{id}") etc.
RouteInspector-->>FastAPIApp: Extracted Route Metadata
Note over Browser: User navigates to /docs
Browser->>+FastAPIApp: GET /docs
FastAPIApp->>SwaggerUIHandler: Process request for /docs
SwaggerUIHandler-->>FastAPIApp: Generate HTML page loading Swagger UI JS/CSS (points JS to /openapi.json)
FastAPIApp-->>-Browser: Send Swagger UI HTML page
Note over Browser: Browser renders HTML, Swagger UI JS executes
Browser->>+FastAPIApp: GET /openapi.json (requested by Swagger UI JS)
FastAPIApp->>OpenAPISchemaHandler: Process request for /openapi.json
OpenAPISchemaHandler->>OpenAPIGenerator: Use stored route metadata to build OpenAPI schema dict
OpenAPIGenerator-->>OpenAPISchemaHandler: Return OpenAPI Schema (dict)
OpenAPISchemaHandler-->>FastAPIApp: Convert schema dict to JSON
FastAPIApp-->>-Browser: Send JSON Response (The OpenAPI Schema)
Note over Browser: Swagger UI JS receives schema and renders interactive docs
Browser->>Browser: Display Interactive API Documentation
```
This integration means your documentation isn't just an afterthought; it's a first-class citizen derived directly from the code that runs your API.
## Conclusion
You've now seen how FastAPI leverages the OpenAPI standard and your own Python code (type hints, Pydantic models, docstrings) to provide automatic, interactive API documentation.
* You learned about the **OpenAPI specification** as a standard way to describe APIs.
* You saw that FastAPI **automatically generates** this specification by inspecting your path operations, parameters, and models.
* You explored the **interactive documentation UIs** provided by Swagger UI (`/docs`) and ReDoc (`/redoc`), which make understanding and testing your API much easier.
* You understood that because the docs are generated from code, they **stay up-to-date** automatically.
This feature significantly improves the developer experience for both the creators and consumers of your API.
In the next chapter, we'll explore a powerful FastAPI feature called Dependency Injection. It helps manage complex dependencies (like database connections or authentication logic) that your path operations might need, and it also integrates neatly with the OpenAPI documentation system.
Ready to manage dependencies like a pro? Let's move on to [Chapter 5: Dependency Injection](05_dependency_injection.md)!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,296 @@
# Chapter 5: Dependency Injection
Welcome back! In [Chapter 4: OpenAPI & Automatic Docs](04_openapi___automatic_docs.md), we saw how FastAPI automatically generates interactive documentation for our API, making it easy for others (and ourselves!) to understand and use. This works because FastAPI understands the structure of our paths, parameters, and Pydantic models.
Now, let's explore another powerful feature that helps us write cleaner, more reusable, and better-organized code: **Dependency Injection**.
## What Problem Does This Solve?
Imagine you're building several API endpoints, and many of them need the same piece of information or the same setup step performed before they can do their main job. For example:
* **Database Connection:** Many endpoints might need to talk to a database. You need to get a database "session" or connection first.
* **User Authentication:** Many endpoints might require the user to be logged in. You need to check their credentials (like a token in a header) and fetch their user details.
* **Common Parameters:** Maybe several endpoints share common query parameters like `skip` and `limit` for pagination.
You *could* write the code to get the database session, check the user, or parse the pagination parameters inside *each* path operation function. But that would be very repetitive (violating the DRY - Don't Repeat Yourself - principle) and hard to maintain. If you need to change how you get a database session, you'd have to update it in many places!
FastAPI's **Dependency Injection (DI)** system provides an elegant solution to this. It allows you to define these common pieces of logic (like getting a user or a DB session) as separate, reusable functions called "dependencies". Then, you simply "declare" that your path operation function needs the result of that dependency, and FastAPI automatically takes care of running the dependency and providing ("injecting") the result into your function.
**Our Goal Today:** Learn how to use FastAPI's `Depends` function to manage dependencies, reuse code, and make our API logic cleaner and more modular.
**Analogy:** Think of your path operation function as the main chef preparing a dish (handling the request). Before the chef can cook, they might need specific ingredients prepared or tools set up. Dependency Injection is like having specialized assistants (dependencies):
* One assistant fetches fresh vegetables (e.g., gets common query parameters).
* Another assistant prepares the cooking station (e.g., gets a database session).
* Another assistant checks the order ticket to see who the dish is for (e.g., authenticates the user).
The chef simply tells the head waiter (`Depends`) what they need ("I need prepared vegetables", "I need the cooking station ready"), and the assistants automatically provide them just in time. The chef doesn't need to know the details of *how* the vegetables were fetched or the station prepared; they just get the result.
## Key Concepts
1. **Dependency:** A function (or other callable) that provides some value needed by your path operation function (or even by another dependency). Examples: a function to get the current user, a function to connect to the database, a function to parse common query parameters.
2. **`Depends`:** A special function imported from `fastapi` (`from fastapi import Depends`) that you use in the parameters of your path operation function to signal that it requires a dependency. You use it like this: `parameter_name: Annotated[ReturnType, Depends(dependency_function)]`.
3. **Injection:** FastAPI "injects" the *result* returned by the dependency function into the parameter of your path operation function. If `dependency_function()` returns the value `10`, then `parameter_name` will be `10` inside your path function.
4. **Automatic Execution:** FastAPI automatically figures out which dependencies are needed for a given request, calls them in the correct order (if dependencies depend on others), and manages their results.
5. **Reusability:** Define a dependency once, and use `Depends(your_dependency)` in multiple path operations.
6. **Caching (Per Request):** By default, if a dependency is declared multiple times for the *same request* (e.g., if multiple path operation parameters need it, or if other dependencies need it), FastAPI will only run the dependency function *once* per request and reuse the result. This is efficient, especially for things like database connections or fetching user data. You can disable this cache if needed.
7. **Hierarchy:** Dependencies can depend on other dependencies using `Depends` in their own parameters, forming a chain or tree of dependencies. FastAPI resolves this entire structure.
## Using Dependencies: A Simple Example
Let's start with a very common scenario: having shared query parameters for pagination.
1. **Define the Dependency Function:** Create a regular Python function that takes the parameters you want to share.
```python
# common_dependencies.py (or within your router file)
from typing import Annotated
from fastapi import Query
# This is our dependency function
# It takes the common query parameters
async def common_parameters(
q: Annotated[str | None, Query(description="Optional query string")] = None,
skip: Annotated[int, Query(description="Items to skip", ge=0)] = 0,
limit: Annotated[int, Query(description="Max items to return", le=100)] = 100,
):
# It simply returns a dictionary containing these parameters
return {"q": q, "skip": skip, "limit": limit}
```
**Explanation:**
* This looks like a normal function that could handle path operation parameters.
* It takes `q`, `skip`, and `limit` as arguments, using `Query` for validation and documentation just like we learned in [Chapter 2: Path Operations & Parameter Declaration](02_path_operations___parameter_declaration.md).
* It returns a dictionary containing the values it received. This dictionary will be the "result" injected into our path functions.
2. **Use `Depends` in Path Operations:** Now, import `Depends` and your dependency function, and use it in your path operation parameters.
```python
# routers/items.py (example)
from typing import Annotated
from fastapi import APIRouter, Depends
# Assume common_parameters is defined in common_dependencies.py
from ..common_dependencies import common_parameters
router = APIRouter()
# Fake data for demonstration
fake_items = [{"item_name": "Foo"}, {"item_name": "Bar"}, {"item_name": "Baz"}]
@router.get("/items/")
# Here's the magic! Declare 'commons' parameter using Depends
async def read_items(
commons: Annotated[dict, Depends(common_parameters)] # Dependency Injection!
):
# Inside this function, 'commons' will be the dictionary returned
# by common_parameters after FastAPI calls it with the query params.
print(f"Received common parameters: {commons}")
# Use the values from the dependency
q = commons["q"]
skip = commons["skip"]
limit = commons["limit"]
response_items = fake_items[skip : skip + limit]
if q:
response_items = [item for item in response_items if q in item["item_name"]]
return response_items
@router.get("/users/")
# We can reuse the SAME dependency here!
async def read_users(
commons: Annotated[dict, Depends(common_parameters)] # Reusing the dependency
):
# 'commons' will again be the dict returned by common_parameters
print(f"Received common parameters for users: {commons}")
# Imagine fetching users using commons['skip'], commons['limit']...
return {"message": "Users endpoint", "params": commons}
```
**Explanation:**
* `from fastapi import Depends`: We import `Depends`.
* `from ..common_dependencies import common_parameters`: We import our dependency function.
* `commons: Annotated[dict, Depends(common_parameters)]`: This is the key part!
* We declare a parameter named `commons`.
* Its type hint is `dict` (because our dependency returns a dictionary). *Technically, FastAPI infers the type from the dependency function's return type hint if available, but explicitly adding `dict` here helps clarity.* For more complex types, use the exact return type.
* We wrap the type hint and `Depends(common_parameters)` in `Annotated`. This is the standard way to use `Depends`.
* `Depends(common_parameters)` tells FastAPI: "Before running `read_items`, call the `common_parameters` function. Take the query parameters `q`, `skip`, `limit` from the incoming request, pass them to `common_parameters`, get its return value, and assign it to the `commons` variable."
* **Reusability:** Notice how `read_users` uses the *exact same* dependency declaration `Annotated[dict, Depends(common_parameters)]`. We didn't have to repeat the `q`, `skip`, `limit` definitions.
**How it Behaves:**
1. Run your app (`uvicorn main:app --reload`, assuming `main.py` includes this router).
2. Visit `http://127.0.0.1:8000/items/?skip=1&limit=1`.
* FastAPI sees `Depends(common_parameters)`.
* It extracts `skip=1` and `limit=1` (and `q=None`) from the query string.
* It calls `common_parameters(q=None, skip=1, limit=1)`.
* `common_parameters` returns `{"q": None, "skip": 1, "limit": 1}`.
* FastAPI calls `read_items(commons={"q": None, "skip": 1, "limit": 1})`.
* You see the print statement and get the response `[{"item_name":"Bar"}]`.
3. Visit `http://127.0.0.1:8000/users/?q=test`.
* FastAPI calls `common_parameters(q="test", skip=0, limit=100)`.
* `common_parameters` returns `{"q": "test", "skip": 0, "limit": 100}`.
* FastAPI calls `read_users(commons={"q": "test", "skip": 0, "limit": 100})`.
* You see the print statement and get the JSON response.
## Dependencies Can Depend on Other Dependencies
The real power comes when dependencies themselves need other dependencies. Let's sketch a simplified example for getting an item from a fake database.
1. **Define a "DB Session" Dependency:** (This will be fake, just returning a string).
```python
# common_dependencies.py
async def get_db_session():
print("Getting DB Session")
# In reality, this would connect to a DB and yield/return a session object
session = "fake_db_session_123"
# You might use 'yield' here for setup/teardown (see FastAPI docs)
return session
```
2. **Define a Dependency that Uses the DB Session:**
```python
# common_dependencies.py
from typing import Annotated
from fastapi import Depends, HTTPException
# Import the DB session dependency
from .common_dependencies import get_db_session
async def get_item_from_db(
item_id: int, # Takes a regular path parameter
db: Annotated[str, Depends(get_db_session)] # Depends on get_db_session!
):
print(f"Getting item {item_id} using DB session: {db}")
# Fake database interaction
fake_db = {1: "Item One", 2: "Item Two"}
if item_id not in fake_db:
raise HTTPException(status_code=404, detail="Item not found in DB")
return fake_db[item_id]
```
**Explanation:**
* `get_item_from_db` takes a regular `item_id` (which FastAPI will get from the path).
* It *also* takes `db: Annotated[str, Depends(get_db_session)]`. It declares its *own* dependency on `get_db_session`.
* When FastAPI needs to run `get_item_from_db`, it first sees the `Depends(get_db_session)`. It runs `get_db_session`, gets `"fake_db_session_123"`, and then calls `get_item_from_db(item_id=..., db="fake_db_session_123")`.
3. **Use the High-Level Dependency in a Path Operation:**
```python
# routers/items.py
# ... other imports ...
from ..common_dependencies import get_item_from_db
@router.get("/db_items/{item_id}")
# This endpoint depends on get_item_from_db
async def read_db_item(
item_id: int, # Path parameter for get_item_from_db
item_name: Annotated[str, Depends(get_item_from_db)] # Inject result here!
):
# 'item_name' will be the string returned by get_item_from_db
# after it used the result from get_db_session.
return {"item_id": item_id, "name_from_db": item_name}
```
**Explanation:**
* The `read_db_item` function only needs to declare `Depends(get_item_from_db)`.
* FastAPI automatically handles the whole chain: `read_db_item` -> `get_item_from_db` -> `get_db_session`.
* Notice the `item_id: int` path parameter is declared in *both* `read_db_item` and `get_item_from_db`. FastAPI is smart enough to pass the path parameter value to the dependency that needs it.
**Caching in Action:**
If `get_db_session` was also needed directly by `read_db_item` (e.g., `db_session: Annotated[str, Depends(get_db_session)]`), FastAPI would *still* only call `get_db_session` **once** for the entire request to `/db_items/{item_id}` because of the default caching (`use_cache=True` in `Depends`). The result `"fake_db_session_123"` would be shared.
## How it Works Under the Hood (Simplified)
Let's trace a request to `/db_items/2` using the example above:
1. **Request:** Client sends `GET /db_items/2`.
2. **Routing:** FastAPI matches the request to the `read_db_item` path operation function.
3. **Dependency Analysis:** FastAPI inspects the signature of `read_db_item`:
* `item_id: int` -> Needs value from path. Value is `2`.
* `item_name: Annotated[str, Depends(get_item_from_db)]` -> Needs the result of `get_item_from_db`.
4. **Solving `get_item_from_db`:** FastAPI inspects `get_item_from_db`:
* `item_id: int` -> Needs a value. FastAPI sees `item_id` is also needed by the parent (`read_db_item`) and comes from the path. Value is `2`.
* `db: Annotated[str, Depends(get_db_session)]` -> Needs the result of `get_db_session`.
5. **Solving `get_db_session`:** FastAPI inspects `get_db_session`:
* It has no parameters.
* Checks cache: Has `get_db_session` run for this request? No.
* Calls `get_db_session()`. It prints "Getting DB Session" and returns `"fake_db_session_123"`.
* Stores `get_db_session` -> `"fake_db_session_123"` in the request cache.
6. **Calling `get_item_from_db`:** FastAPI now has the dependencies for `get_item_from_db`:
* `item_id` = `2` (from path)
* `db` = `"fake_db_session_123"` (from `get_db_session` result)
* Calls `get_item_from_db(item_id=2, db="fake_db_session_123")`.
* It prints "Getting item 2 using DB session: fake_db_session_123", looks up `2` in its fake DB, and returns `"Item Two"`.
* Stores `get_item_from_db` -> `"Item Two"` in the request cache.
7. **Calling `read_db_item`:** FastAPI now has the dependencies for `read_db_item`:
* `item_id` = `2` (from path)
* `item_name` = `"Item Two"` (from `get_item_from_db` result)
* Calls `read_db_item(item_id=2, item_name="Item Two")`.
8. **Response:** The function returns `{"item_id": 2, "name_from_db": "Item Two"}`, which FastAPI sends back to the client as JSON.
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant Client
participant FastAPIApp as FastAPI App
participant DepSolver as Dependency Solver
participant GetItemFunc as get_item_from_db
participant GetDBFunc as get_db_session
participant PathOpFunc as read_db_item
Client->>+FastAPIApp: GET /db_items/2
FastAPIApp->>+DepSolver: Solve dependencies for read_db_item(item_id, Depends(get_item_from_db))
DepSolver->>DepSolver: Need path param 'item_id' (value=2)
DepSolver->>DepSolver: Need result of get_item_from_db
DepSolver->>+DepSolver: Solve dependencies for get_item_from_db(item_id, Depends(get_db_session))
DepSolver->>DepSolver: Need 'item_id' (value=2, from path)
DepSolver->>DepSolver: Need result of get_db_session
DepSolver->>DepSolver: Check cache for get_db_session: Miss
DepSolver->>+GetDBFunc: Call get_db_session()
GetDBFunc-->>-DepSolver: Return "fake_db_session_123"
DepSolver->>DepSolver: Cache: get_db_session -> "fake_db_session_123"
DepSolver-->>-DepSolver: Dependencies for get_item_from_db ready
DepSolver->>+GetItemFunc: Call get_item_from_db(item_id=2, db="fake_db_session_123")
GetItemFunc-->>-DepSolver: Return "Item Two"
DepSolver->>DepSolver: Cache: get_item_from_db -> "Item Two"
DepSolver-->>-FastAPIApp: Dependencies for read_db_item ready
FastAPIApp->>+PathOpFunc: Call read_db_item(item_id=2, item_name="Item Two")
PathOpFunc-->>-FastAPIApp: Return {"item_id": 2, "name_from_db": "Item Two"}
FastAPIApp-->>-Client: Send JSON Response
```
### Code Connections
* **`fastapi.Depends`** (`fastapi/param_functions.py`): This class is mostly a marker. When FastAPI analyzes function parameters, it looks for instances of `Depends`.
* **`fastapi.dependencies.utils.get_dependant`**: This crucial function takes a callable (like your path operation function or another dependency) and inspects its signature. It identifies which parameters are path/query/body parameters and which are dependencies (marked with `Depends`). It builds a `Dependant` object representing this.
* **`fastapi.dependencies.models.Dependant`**: A data structure (dataclass) that holds information about a callable: its name, the callable itself, its path/query/header/cookie/body parameters, and importantly, a list of *other* `Dependant` objects for its sub-dependencies. This creates the dependency tree/graph.
* **`fastapi.dependencies.utils.solve_dependencies`**: This is the engine that recursively traverses the `Dependant` graph for a given request. It figures out the order, checks the cache (`dependency_cache`), calls the dependency functions (using `run_in_threadpool` for sync functions or awaiting async ones), handles results from generators (`yield`), and gathers all the computed values needed to finally call the target path operation function.
FastAPI intelligently combines Python's introspection capabilities with this structured dependency resolution system.
## Conclusion
You've learned about FastAPI's powerful Dependency Injection system!
* You saw how to define reusable logic in **dependency functions**.
* You learned to use **`Depends`** in your path operation function parameters to tell FastAPI what dependencies are needed.
* You understood that FastAPI automatically **calls** dependencies and **injects** their results into your function.
* You saw how dependencies can **depend on other dependencies**, creating manageable hierarchies.
* You learned that results are **cached per request** by default for efficiency.
* You grasped the core idea: separating concerns and promoting **reusable code**.
Dependency Injection is fundamental to building complex, maintainable applications in FastAPI. It's used extensively for things like database connections, authentication, authorization, and processing complex parameter sets.
While dependencies help manage complexity, sometimes things inevitably go wrong a database might be unavailable, validation might fail within a dependency, or unexpected errors might occur. How should our API handle these situations gracefully? That's what we'll cover next.
Ready to handle errors like a pro? Let's move on to [Chapter 6: Error Handling](06_error_handling.md)!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,326 @@
# Chapter 6: Error Handling
Welcome back! In [Chapter 5: Dependency Injection](05_dependency_injection.md), we learned how to structure our code using dependencies to manage common tasks like pagination or database sessions. This helps keep our code clean and reusable.
But what happens when things don't go as planned? A user might request data that doesn't exist, or they might send invalid input. Our API needs a way to gracefully handle these situations and inform the client about what went wrong.
**Our Goal Today:** Learn how FastAPI helps us manage errors effectively, both for problems we expect (like "item not found") and for unexpected issues like invalid input data.
## What Problem Does This Solve?
Imagine our online store API. We have an endpoint like `/items/{item_id}` to fetch details about a specific item. What should happen if a user tries to access `/items/9999` but there's no item with ID 9999 in our database?
If we don't handle this, our application might crash or return a confusing, generic server error (like `500 Internal Server Error`). This isn't helpful for the person using our API. They need clear feedback: "The item you asked for doesn't exist."
Similarly, if a user tries to *create* an item (`POST /items/`) but forgets to include the required `price` field in the JSON body, we shouldn't just crash. We need to tell them, "You forgot the price field!"
FastAPI provides a structured way to handle these different types of errors, ensuring clear communication with the client. Think of it as setting up clear emergency procedures for your API.
## Key Concepts
1. **`HTTPException` for Expected Errors:**
* These are errors you anticipate might occur based on the client's request, like requesting a non-existent resource or lacking permissions.
* You can **raise** `HTTPException` directly in your code.
* You specify an appropriate HTTP **status code** (like `404 Not Found`, `403 Forbidden`) and a helpful **detail message** (like `"Item not found"`).
* FastAPI catches this exception and automatically sends a properly formatted JSON error response to the client.
2. **`RequestValidationError` for Invalid Input:**
* This error occurs when the data sent by the client in the request (path parameters, query parameters, or request body) fails the validation rules defined by your type hints and Pydantic models (as seen in [Chapter 2: Path Operations & Parameter Declaration](02_path_operations___parameter_declaration.md) and [Chapter 3: Data Validation & Serialization (Pydantic)](03_data_validation___serialization__pydantic_.md)).
* FastAPI **automatically** catches these validation errors.
* It sends back a `422 Unprocessable Entity` response containing detailed information about *which* fields were invalid and *why*. You usually don't need to write extra code for this!
3. **Custom Exception Handlers:**
* For more advanced scenarios, you can define your *own* functions to handle specific types of exceptions (either built-in Python exceptions or custom ones you create).
* This gives you full control over how errors are logged and what response is sent back to the client.
## Using `HTTPException` for Expected Errors
Let's solve our "item not found" problem using `HTTPException`.
1. **Import `HTTPException`:**
```python
# main.py or your router file
from fastapi import FastAPI, HTTPException
app = FastAPI() # Or use your APIRouter
# Simple in-memory storage (like from Chapter 4)
fake_items_db = {1: {"name": "Foo"}, 2: {"name": "Bar"}}
```
**Explanation:** We import `HTTPException` directly from `fastapi`.
2. **Check and Raise in Your Path Operation:**
```python
@app.get("/items/{item_id}")
async def read_item(item_id: int):
# Check if the requested item_id exists in our "database"
if item_id not in fake_items_db:
# If not found, raise HTTPException!
raise HTTPException(status_code=404, detail="Item not found")
# If found, proceed normally
return {"item": fake_items_db[item_id]}
```
**Explanation:**
* Inside `read_item`, we check if the `item_id` exists as a key in our `fake_items_db` dictionary.
* If `item_id` is *not* found, we `raise HTTPException(...)`.
* `status_code=404`: We use the standard HTTP status code `404 Not Found`. FastAPI knows many common status codes (you can also use `from starlette import status; raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, ...)` for more readability).
* `detail="Item not found"`: We provide a human-readable message explaining the error. This will be sent back to the client in the JSON response body.
* If the item *is* found, the `raise` statement is skipped, and the function returns the item details as usual.
**How it Behaves:**
* **Request:** Client sends `GET /items/1`
* **Response (Status Code 200):**
```json
{"item": {"name": "Foo"}}
```
* **Request:** Client sends `GET /items/99`
* **Response (Status Code 404):**
```json
{"detail": "Item not found"}
```
FastAPI automatically catches the `HTTPException` you raised and sends the correct HTTP status code along with the `detail` message formatted as JSON.
## Automatic Handling of `RequestValidationError`
You've already seen this in action without realizing it! When you define Pydantic models for your request bodies or use type hints for path/query parameters, FastAPI automatically validates incoming data.
Let's revisit the `create_item` example from [Chapter 3: Data Validation & Serialization (Pydantic)](03_data_validation___serialization__pydantic_.md):
```python
# main.py or your router file
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
# Pydantic model requiring name and price
class Item(BaseModel):
name: str
price: float
description: str | None = None
@app.post("/items/")
# Expects request body matching the Item model
async def create_item(item: Item):
# If execution reaches here, validation PASSED automatically.
return {"message": "Item received!", "item_data": item.model_dump()}
```
**How it Behaves (Automatically):**
* **Request:** Client sends `POST /items/` with a *valid* JSON body:
```json
{
"name": "Gadget",
"price": 19.95
}
```
* **Response (Status Code 200):**
```json
{
"message": "Item received!",
"item_data": {
"name": "Gadget",
"price": 19.95,
"description": null
}
}
```
* **Request:** Client sends `POST /items/` with an *invalid* JSON body (missing `price`):
```json
{
"name": "Widget"
}
```
* **Response (Status Code 422):** FastAPI *automatically* intercepts this before `create_item` runs and sends:
```json
{
"detail": [
{
"type": "missing",
"loc": [
"body",
"price"
],
"msg": "Field required",
"input": {
"name": "Widget"
},
"url": "..." // Link to Pydantic error docs
}
]
}
```
* **Request:** Client sends `POST /items/` with an *invalid* JSON body (wrong type for `price`):
```json
{
"name": "Doohickey",
"price": "cheap"
}
```
* **Response (Status Code 422):** FastAPI automatically sends:
```json
{
"detail": [
{
"type": "float_parsing",
"loc": [
"body",
"price"
],
"msg": "Input should be a valid number, unable to parse string as a number",
"input": "cheap",
"url": "..."
}
]
}
```
Notice that we didn't write any `try...except` blocks or `if` statements in `create_item` to handle these validation issues. FastAPI and Pydantic take care of it, providing detailed error messages that tell the client exactly what went wrong and where (`loc`). This is a huge time saver!
## Custom Exception Handlers (A Quick Look)
Sometimes, you might want to handle specific errors in a unique way. Maybe you want to log a particular error to a monitoring service, or perhaps you need to return error responses in a completely custom format different from FastAPI's default.
FastAPI allows you to register **exception handlers** using the `@app.exception_handler()` decorator.
**Example:** Imagine you have a custom error `UnicornNotFound` and want to return a `418 I'm a teapot` status code when it occurs.
1. **Define the Custom Exception:**
```python
# Can be in your main file or a separate exceptions.py
class UnicornNotFound(Exception):
def __init__(self, name: str):
self.name = name
```
2. **Define the Handler Function:**
```python
# main.py
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
# Assuming UnicornNotFound is defined above or imported
app = FastAPI()
# Decorator registers this function to handle UnicornNotFound errors
@app.exception_handler(UnicornNotFound)
async def unicorn_exception_handler(request: Request, exc: UnicornNotFound):
# This function runs whenever UnicornNotFound is raised
return JSONResponse(
status_code=418, # I'm a teapot!
content={"message": f"Oops! Can't find unicorn named: {exc.name}."},
)
```
**Explanation:**
* `@app.exception_handler(UnicornNotFound)`: This tells FastAPI that the `unicorn_exception_handler` function should be called whenever an error of type `UnicornNotFound` is raised *and not caught* elsewhere.
* The handler function receives the `request` object and the exception instance (`exc`).
* It returns a `JSONResponse` with the desired status code (418) and a custom content dictionary.
3. **Raise the Custom Exception in a Path Operation:**
```python
@app.get("/unicorns/{name}")
async def read_unicorn(name: str):
if name == "yolo":
# Raise our custom exception
raise UnicornNotFound(name=name)
return {"unicorn_name": name, "message": "Unicorn exists!"}
```
**How it Behaves:**
* **Request:** `GET /unicorns/sparklehoof`
* **Response (Status Code 200):**
```json
{"unicorn_name": "sparklehoof", "message": "Unicorn exists!"}
```
* **Request:** `GET /unicorns/yolo`
* **Response (Status Code 418):** (Handled by `unicorn_exception_handler`)
```json
{"message": "Oops! Can't find unicorn named: yolo."}
```
Custom handlers provide flexibility, but for most common API errors, `HTTPException` and the automatic `RequestValidationError` handling are sufficient.
## How it Works Under the Hood (Simplified)
When an error occurs during a request, FastAPI follows a process to decide how to respond:
**Scenario 1: Raising `HTTPException`**
1. **Raise:** Your path operation code (e.g., `read_item`) executes `raise HTTPException(status_code=404, detail="Item not found")`.
2. **Catch:** FastAPI's internal request/response cycle catches this specific `HTTPException`.
3. **Find Handler:** FastAPI checks if there's a custom handler registered for `HTTPException`. If not (which is usually the case unless you override it), it uses its **default handler** for `HTTPException`.
4. **Default Handler Executes:** The default handler (`fastapi.exception_handlers.http_exception_handler`) takes the `status_code` and `detail` from the exception you raised.
5. **Create Response:** It creates a `starlette.responses.JSONResponse` containing `{"detail": exc.detail}` and sets the status code to `exc.status_code`.
6. **Send Response:** This JSON response is sent back to the client.
```mermaid
sequenceDiagram
participant Client
participant FastAPIApp as FastAPI App
participant RouteHandler as Route Handler (read_item)
participant DefaultHTTPExceptionHandler as Default HTTPException Handler
Client->>+FastAPIApp: GET /items/99
FastAPIApp->>+RouteHandler: Call read_item(item_id=99)
RouteHandler->>RouteHandler: Check DB: item 99 not found
RouteHandler-->>-FastAPIApp: raise HTTPException(404, "Item not found")
Note over FastAPIApp: Catches HTTPException
FastAPIApp->>+DefaultHTTPExceptionHandler: Handle the exception instance
DefaultHTTPExceptionHandler->>DefaultHTTPExceptionHandler: Extract status_code=404, detail="Item not found"
DefaultHTTPExceptionHandler-->>-FastAPIApp: Return JSONResponse(status=404, content={"detail": "..."})
FastAPIApp-->>-Client: Send 404 JSON Response
```
**Scenario 2: Automatic `RequestValidationError`**
1. **Request:** Client sends `POST /items/` with invalid data (e.g., missing `price`).
2. **Parameter/Body Parsing:** FastAPI tries to parse the request body and validate it against the `Item` Pydantic model before calling `create_item`.
3. **Pydantic Raises:** Pydantic's validation fails and raises a `pydantic.ValidationError`.
4. **FastAPI Wraps:** FastAPI catches the `pydantic.ValidationError` and wraps it inside its own `fastapi.exceptions.RequestValidationError` to add context.
5. **Catch:** FastAPI's internal request/response cycle catches the `RequestValidationError`.
6. **Find Handler:** FastAPI looks for a handler for `RequestValidationError` and finds its default one.
7. **Default Handler Executes:** The default handler (`fastapi.exception_handlers.request_validation_exception_handler`) takes the `RequestValidationError`.
8. **Extract & Format Errors:** It calls the `.errors()` method on the exception to get the list of validation errors provided by Pydantic. It then formats this list into the standard structure (with `loc`, `msg`, `type`).
9. **Create Response:** It creates a `JSONResponse` with status code `422` and the formatted error details as the content.
10. **Send Response:** This 422 JSON response is sent back to the client. Your `create_item` function was never even called.
### Code Connections
* **`fastapi.exceptions.HTTPException`**: The class you import and raise for expected client errors. Defined in `fastapi/exceptions.py`. It inherits from `starlette.exceptions.HTTPException`.
* **`fastapi.exception_handlers.http_exception_handler`**: The default function that handles `HTTPException`. Defined in `fastapi/exception_handlers.py`. It creates a `JSONResponse`.
* **`fastapi.exceptions.RequestValidationError`**: The exception FastAPI raises internally when Pydantic validation fails for request data. Defined in `fastapi/exceptions.py`.
* **`fastapi.exception_handlers.request_validation_exception_handler`**: The default function that handles `RequestValidationError`. Defined in `fastapi/exception_handlers.py`. It calls `jsonable_encoder(exc.errors())` and creates a 422 `JSONResponse`.
* **`@app.exception_handler(ExceptionType)`**: The decorator used on the `FastAPI` app instance to register your own custom handler functions. The `exception_handler` method is part of the `FastAPI` class in `fastapi/applications.py`.
## Conclusion
You've learned how FastAPI helps you manage errors gracefully!
* You can handle **expected client errors** (like "not found") by raising **`HTTPException`** with a specific `status_code` and `detail` message.
* FastAPI **automatically handles validation errors** (`RequestValidationError`) when incoming data doesn't match your Pydantic models or type hints, returning detailed `422` responses.
* You can define **custom exception handlers** for fine-grained control over error responses and logging using `@app.exception_handler()`.
Using these tools makes your API more robust, predictable, and easier for clients to interact with, even when things go wrong. Clear error messages are a crucial part of a good API design.
Now that we know how to handle errors, let's think about another critical aspect: security. How do we protect our endpoints, ensuring only authorized users can access certain data or perform specific actions?
Ready to secure your API? Let's move on to [Chapter 7: Security Utilities](07_security_utilities.md)!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,355 @@
# Chapter 7: Security Utilities
Hi there! 👋 In [Chapter 6: Error Handling](06_error_handling.md), we learned how to handle situations where things go wrong in our API, like when a user requests an item that doesn't exist. Now, let's talk about protecting our API endpoints.
Imagine our online store API. Anyone should be able to browse items (`GET /items/`). But maybe only registered, logged-in users should be allowed to *create* new items (`POST /items/`) or view their own profile (`GET /users/me`). How do we ensure only the right people can access certain parts of our API?
That's where **Security Utilities** come in!
**Our Goal Today:** Learn how FastAPI provides ready-made tools to implement common security mechanisms like username/password checks or API keys, making it easy to protect your endpoints.
## What Problem Does This Solve?
When you build an API, some parts might be public, but others need protection. You need a way to:
1. **Identify the User:** Figure out *who* is making the request. Are they logged in? Do they have a valid API key? This process is called **Authentication** (AuthN - proving who you are).
2. **Check Permissions (Optional but related):** Once you know who the user is, you might need to check if they have permission to do what they're asking. Can user "Alice" delete user "Bob"? This is called **Authorization** (AuthZ - checking what you're allowed to do). (We'll focus mainly on Authentication in this beginner chapter).
3. **Ask for Credentials:** How does the user provide their identity? Common ways include:
* **HTTP Basic Authentication:** Sending a username and password directly (encoded) in the request headers. Simple, but less secure over plain HTTP.
* **API Keys:** Sending a secret key (a long string) in the headers, query parameters, or cookies. Common for server-to-server communication.
* **OAuth2 Bearer Tokens:** Sending a temporary token (obtained after logging in) in the headers. Very common for web and mobile apps.
4. **Document Security:** How do you tell users of your API (in the `/docs`) that certain endpoints require authentication and how to provide it?
Implementing these security schemes from scratch can be complex and tricky. FastAPI gives you pre-built components (like different types of locks and keys) that handle the common patterns for asking for and receiving credentials.
## Key Concepts
1. **Security Schemes:** These are the standard protocols or methods used for authentication, like HTTP Basic, API Keys (in different locations), and OAuth2. FastAPI provides classes that represent these schemes (e.g., `HTTPBasic`, `APIKeyHeader`, `OAuth2PasswordBearer`). Think of these as the *type* of lock mechanism you want to install on your door.
2. **`fastapi.security` Module:** This module contains all the pre-built security scheme classes. You'll import things like `HTTPBasic`, `APIKeyHeader`, `APIKeyQuery`, `APIKeyCookie`, `OAuth2PasswordBearer` from here.
3. **Credentials:** The actual "secret" information the user provides to prove their identity (username/password, the API key string, the OAuth2 token string).
4. **Verifier Dependency:** A function you write (a dependency, like we learned about in [Chapter 5: Dependency Injection](05_dependency_injection.md)) that takes the credentials extracted by the security scheme and checks if they are valid. It might check a username/password against a database or validate an API key. This function decides if the "key" fits the "lock".
5. **`Security()` Function:** This is a special function imported from `fastapi` (`from fastapi import Security`). It works almost exactly like `Depends()`, but it's specifically designed for security dependencies. You use it like this: `user: Annotated[UserType, Security(your_verifier_dependency)]`.
* **Main Difference from `Depends()`:** Using `Security()` tells FastAPI to automatically add the corresponding security requirements to your OpenAPI documentation (`/docs`). This means `/docs` will show a little lock icon on protected endpoints and provide UI elements for users to enter their credentials (like username/password or a token) when trying out the API.
**Analogy:**
* **Security Scheme (`HTTPBasic`, `APIKeyHeader`):** The type of lock on the door (e.g., a key lock, a combination lock).
* **Scheme Instance (`security = HTTPBasic()`):** Installing that specific lock on a particular door frame.
* **Credentials (`username/password`, `API key`):** The key or combination provided by the person trying to open the door.
* **Verifier Dependency (`get_current_user`):** The person or mechanism that takes the key/combination, checks if it's correct, and decides whether to let the person in.
* **`Security(get_current_user)`:** Declaring that the door requires the verifier to check the key/combination before allowing entry, and also putting a "Lock" sign on the door in the building map (`/docs`).
## Using Security Utilities: HTTP Basic Auth Example
Let's protect an endpoint using the simplest method: HTTP Basic Authentication. We'll create an endpoint `/users/me` that requires a valid username and password.
**Step 1: Import necessary tools**
We need `HTTPBasic` (the scheme), `HTTPBasicCredentials` (a Pydantic model to hold the extracted username/password), `Security` (to declare the dependency), `Annotated`, and `HTTPException` (for errors).
```python
# main.py (or your router file)
from typing import Annotated
from fastapi import Depends, FastAPI, HTTPException, status
from fastapi.security import HTTPBasic, HTTPBasicCredentials
```
**Step 2: Create an instance of the security scheme**
We create an instance of `HTTPBasic`. This object knows *how* to ask the browser/client for username/password via standard HTTP mechanisms.
```python
# Right after imports
security = HTTPBasic()
app = FastAPI() # Or use your APIRouter
```
**Step 3: Define the "Verifier" Dependency Function**
This function will receive the credentials extracted by `security` and check if they are valid. For this beginner example, we'll use hardcoded values. In a real app, you'd check against a database.
```python
# Our "verifier" function
def get_current_username(credentials: Annotated[HTTPBasicCredentials, Depends(security)]):
# NOTE: In a real app, NEVER hardcode credentials like this!
# Always use secure password hashing (e.g., with passlib)
# and check against a database.
correct_username = "stanley"
correct_password = "password123" # Don't do this in production!
# Basic check (insecure comparison for demonstration)
is_correct_username = credentials.username == correct_username
is_correct_password = credentials.password == correct_password # Insecure!
if not (is_correct_username and is_correct_password):
# If credentials are bad, raise an exception
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
headers={"WWW-Authenticate": "Basic"}, # Required header for 401 Basic Auth
)
# If credentials are okay, return the username
return credentials.username
```
**Explanation:**
* `get_current_username` is our dependency function.
* `credentials: Annotated[HTTPBasicCredentials, Depends(security)]`: It depends on our `security` object (`HTTPBasic`). FastAPI will run `security` first. `security` will extract the username and password from the `Authorization: Basic ...` header and provide them as an `HTTPBasicCredentials` object to this function.
* Inside, we perform a (very insecure, for demo only!) check against hardcoded values.
* If the check fails, we `raise HTTPException` with status `401 Unauthorized`. The `headers={"WWW-Authenticate": "Basic"}` part is important; it tells the browser *how* it should ask for credentials (using the Basic scheme).
* If the check passes, we return the validated username.
**Step 4: Use `Security()` in the Path Operation**
Now, let's create our protected endpoint `/users/me`. Instead of `Depends`, we use `Security` with our verifier function.
```python
@app.get("/users/me")
async def read_current_user(
# Use Security() with the verifier function
username: Annotated[str, Security(get_current_username)]
):
# If the code reaches here, get_current_username ran successfully
# and returned the validated username.
# 'username' variable now holds the result from get_current_username.
return {"username": username}
```
**Explanation:**
* `username: Annotated[str, Security(get_current_username)]`: We declare that this path operation requires the `get_current_username` dependency, using `Security`.
* FastAPI will first run `get_current_username`.
* `get_current_username` will, in turn, trigger `security` (`HTTPBasic`) to get the credentials.
* If `get_current_username` succeeds (doesn't raise an exception), its return value (the username string) will be injected into the `username` parameter of `read_current_user`.
* If `get_current_username` (or the underlying `HTTPBasic`) raises an `HTTPException`, the request stops, the error response is sent, and `read_current_user` is never called.
* Crucially, `Security()` also adds the HTTP Basic security requirement to the OpenAPI schema for this endpoint.
**How it Behaves:**
1. **Run the App:** `uvicorn main:app --reload`
2. **Visit `/docs`:** Go to `http://127.0.0.1:8000/docs`.
* You'll see the `/users/me` endpoint now has a **padlock icon** 🔒 next to it.
* Click the "Authorize" button (usually near the top right). A popup will appear asking for Username and Password for the "HTTPBasic" scheme.
* Enter `stanley` and `password123` and click Authorize.
* Now, try out the `/users/me` endpoint. Click "Try it out", then "Execute". It should work and return `{"username": "stanley"}`. The browser automatically added the correct `Authorization` header because you authorized in the UI.
* Click "Authorize" again and "Logout". Now try executing `/users/me` again. You'll get a `401 Unauthorized` error with `{"detail": "Not authenticated"}` (this default comes from `HTTPBasic` when no credentials are provided).
3. **Use `curl` (Command Line):**
* `curl http://127.0.0.1:8000/users/me` -> Returns `{"detail":"Not authenticated"}` (401).
* `curl -u wronguser:wrongpass http://127.0.0.1:8000/users/me` -> Returns `{"detail":"Incorrect email or password"}` (401). The `-u` flag makes `curl` use HTTP Basic Auth.
* `curl -u stanley:password123 http://127.0.0.1:8000/users/me` -> Returns `{"username": "stanley"}` (200 OK).
You've successfully protected an endpoint using HTTP Basic Auth!
## Other Common Schemes (Briefly)
The pattern is very similar for other schemes.
### API Key in Header
```python
# --- Imports ---
from fastapi.security import APIKeyHeader
# --- Scheme Instance ---
api_key_header_scheme = APIKeyHeader(name="X-API-KEY") # Expect key in X-API-KEY header
# --- Verifier Dependency (Example) ---
async def get_api_key(
api_key: Annotated[str, Security(api_key_header_scheme)] # Use Security() with the SCHEME instance here
):
if api_key == "SECRET_API_KEY": # Check the key (use a secure way in real apps!)
return api_key
else:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN, detail="Could not validate API KEY"
)
# --- Path Operation ---
@app.get("/secure-data")
async def get_secure_data(
# Inject the VALIDATED key using Depends() - no need for Security() again
# if the get_api_key dependency already uses Security() internally.
# Alternatively, if get_api_key just returned the key without raising errors,
# you could use Security(get_api_key) here. Let's stick to the pattern:
# the verifier dependency uses Security(scheme), the endpoint uses Depends(verifier)
# or directly uses Security(verifier) if the verifier handles errors.
# Let's adjust get_api_key to make it cleaner:
api_key: Annotated[str, Security(api_key_header_scheme)] # Scheme extracts the key
):
# Now, a separate check or use the key
if api_key == "SECRET_API_KEY": # Re-checking here for simplicity, ideally done in a dependent function
return {"data": "sensitive data", "api_key_used": api_key}
else:
# This path might not be reachable if auto_error=True in APIKeyHeader
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid API Key provided")
# Let's refine the API Key example pattern to match the Basic Auth pattern:
# Scheme Instance
api_key_header_scheme = APIKeyHeader(name="X-API-KEY", auto_error=False) # auto_error=False lets verifier handle missing key
# Verifier Dependency
async def verify_api_key(api_key: Annotated[str | None, Security(api_key_header_scheme)]):
if api_key is None:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="X-API-KEY header missing")
if api_key == "SECRET_API_KEY":
return api_key # Return key or user info associated with the key
else:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid API Key")
# Path Operation using the verifier
@app.get("/secure-data")
async def get_secure_data_v2(
# Use Security() with the VERIFIER function
verified_key: Annotated[str, Security(verify_api_key)]
):
# verified_key holds the result from verify_api_key (the validated key)
return {"data": "sensitive data", "key": verified_key}
```
### OAuth2 Password Bearer Flow
This is common for user logins in web apps. It usually involves two endpoints: one to exchange username/password for a token (`/token`), and protected endpoints that require the token.
```python
# --- Imports ---
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
# --- Scheme Instance ---
# The 'tokenUrl' points to the path operation where users get the token
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
# --- Token Endpoint (Example) ---
@app.post("/token")
async def login_for_access_token(
form_data: Annotated[OAuth2PasswordRequestForm, Depends()]
):
# 1. Verify form_data.username and form_data.password (check DB)
# 2. If valid, create an access token (e.g., a JWT)
# 3. Return the token
# (Skipping implementation details for brevity)
access_token = f"token_for_{form_data.username}" # Fake token
return {"access_token": access_token, "token_type": "bearer"}
# --- Verifier Dependency (Example: decode token and get user) ---
async def get_current_user(token: Annotated[str, Security(oauth2_scheme)]):
# In a real app:
# 1. Decode the token (e.g., JWT)
# 2. Validate the token (check expiry, signature)
# 3. Extract user identifier from token payload
# 4. Fetch user from database
# 5. Raise HTTPException if token is invalid or user doesn't exist
if token == "token_for_stanley": # Fake check
return {"username": "stanley", "email": "stanley@example.com"}
else:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid authentication credentials",
headers={"WWW-Authenticate": "Bearer"},
)
# --- Protected Path Operation ---
@app.get("/users/me/oauth")
async def read_users_me_oauth(
# Use Security() with the user verifier function
current_user: Annotated[dict, Security(get_current_user)]
):
# current_user holds the dict returned by get_current_user
return current_user
```
The core pattern remains: Instantiate the scheme -> Define a verifier dependency that uses the scheme -> Protect endpoints using `Security(verifier_dependency)`.
## How it Works Under the Hood (Simplified)
Let's trace the HTTP Basic Auth example (`GET /users/me` requiring `stanley`/`password123`):
1. **Request:** Client sends `GET /users/me` with header `Authorization: Basic c3RhbmxleTpwYXNzd29yZDEyMw==` (where `c3Rh...` is base64("stanley:password123")).
2. **Routing:** FastAPI matches the request to `read_current_user`.
3. **Dependency Analysis:** FastAPI sees `username: Annotated[str, Security(get_current_username)]`. It knows it needs to resolve the `get_current_username` dependency using the `Security` mechanism.
4. **Security Dependency Resolution:**
* FastAPI looks inside `get_current_username` and sees its dependency: `credentials: Annotated[HTTPBasicCredentials, Depends(security)]`.
* It needs to resolve `security` (our `HTTPBasic()` instance).
5. **Scheme Execution (`HTTPBasic.__call__`)**:
* FastAPI calls the `security` object (which is callable).
* The `HTTPBasic` object's `__call__` method executes. It reads the `Authorization` header from the request.
* It finds the `Basic` scheme and the parameter `c3RhbmxleTpwYXNzd29yZDEyMw==`.
* It base64-decodes the parameter to get `stanley:password123`.
* It splits this into username (`stanley`) and password (`password123`).
* It creates and returns an `HTTPBasicCredentials(username="stanley", password="password123")` object.
* *(If the header was missing or malformed, `HTTPBasic.__call__` would raise `HTTPException(401)` here, stopping the process).*
6. **Verifier Execution (`get_current_username`)**:
* FastAPI now has the result from `security`. It calls `get_current_username(credentials=<HTTPBasicCredentials object>)`.
* Your verifier code runs. It compares the credentials. They match the hardcoded values.
* The function returns the username `"stanley"`.
* *(If the credentials didn't match, your code would raise `HTTPException(401)` here, stopping the process).*
7. **Path Operation Execution (`read_current_user`)**:
* FastAPI now has the result from `get_current_username`. It calls `read_current_user(username="stanley")`.
* Your path operation function runs and returns `{"username": "stanley"}`.
8. **Response:** FastAPI sends the 200 OK JSON response back to the client.
9. **OpenAPI Generation:** Separately, when generating `/openapi.json`, FastAPI sees `Security(get_current_username)` -> `Depends(security)` -> `security` is `HTTPBasic`. It adds the "HTTPBasic" security requirement definition to the global `components.securitySchemes` and references it in the security requirements for the `/users/me` path operation. This is what makes the lock icon appear in `/docs`.
Here's a simplified diagram:
```mermaid
sequenceDiagram
participant Client
participant FastAPIApp as FastAPI App
participant HTTPBasicInst as security (HTTPBasic Instance)
participant VerifierFunc as get_current_username
participant PathOpFunc as read_current_user
Client->>+FastAPIApp: GET /users/me (Authorization: Basic ...)
FastAPIApp->>FastAPIApp: Match route, see Security(get_current_username)
FastAPIApp->>FastAPIApp: Resolve get_current_username dependencies: Depends(security)
FastAPIApp->>+HTTPBasicInst: Call security(request)
HTTPBasicInst->>HTTPBasicInst: Read header, decode base64, split user/pass
HTTPBasicInst-->>-FastAPIApp: Return HTTPBasicCredentials(user="stanley", pass="...")
FastAPIApp->>+VerifierFunc: Call get_current_username(credentials=...)
VerifierFunc->>VerifierFunc: Check credentials -> OK
VerifierFunc-->>-FastAPIApp: Return username "stanley"
FastAPIApp->>+PathOpFunc: Call read_current_user(username="stanley")
PathOpFunc-->>-FastAPIApp: Return {"username": "stanley"}
FastAPIApp-->>-Client: Send 200 OK JSON Response
```
## Code Connections
* **`fastapi.Security`**: The function you import and use. It's a thin wrapper around `fastapi.params.Security`. (`fastapi/param_functions.py`)
* **`fastapi.params.Security`**: The class that signals a security dependency, inheriting from `Depends` but adding the `scopes` parameter. (`fastapi/params.py`)
* **`fastapi.security.*`**: This package contains the scheme implementations:
* `fastapi.security.http`: Contains `HTTPBase`, `HTTPBasic`, `HTTPBearer`, `HTTPDigest`, and the `HTTPBasicCredentials`, `HTTPAuthorizationCredentials` models.
* `fastapi.security.api_key`: Contains `APIKeyHeader`, `APIKeyQuery`, `APIKeyCookie`.
* `fastapi.security.oauth2`: Contains `OAuth2`, `OAuth2PasswordBearer`, `OAuth2AuthorizationCodeBearer`, `OAuth2PasswordRequestForm`, `SecurityScopes`.
* **Scheme `__call__` methods**: Each scheme class (e.g., `HTTPBasic`, `APIKeyHeader`, `OAuth2PasswordBearer`) implements `async def __call__(self, request: Request)` which contains the logic to extract credentials from the specific request location (headers, query, etc.).
* **Dependency Injection System**: The core system described in [Chapter 5: Dependency Injection](05_dependency_injection.md) resolves the dependencies, calling the scheme instance and then your verifier function.
* **OpenAPI Integration**: FastAPI's OpenAPI generation logic specifically checks for `Security` dependencies and uses the associated scheme model (`security.model`) to add the correct security requirements to the schema.
## Conclusion
You've now learned the basics of securing your FastAPI endpoints!
* You understand the need for **authentication** (who is the user?).
* You know about common **security schemes** like HTTP Basic, API Keys, and OAuth2 Bearer tokens.
* You learned that FastAPI provides **utility classes** (e.g., `HTTPBasic`, `APIKeyHeader`, `OAuth2PasswordBearer`) in the `fastapi.security` module to handle these schemes.
* You saw how to use the **`Security()`** function (similar to `Depends()`) to integrate these schemes into your path operations via **verifier dependencies**.
* You understand that `Security()` automatically adds security requirements to your **OpenAPI documentation** (`/docs`).
* You grasped the core pattern: **Scheme Instance -> Verifier Dependency -> `Security(verifier)`**.
Using these tools allows you to easily add robust security layers to your API without reinventing the wheel.
Sometimes, after handling a request and sending a response, you might need to perform some follow-up actions, like sending a notification email or processing some data, without making the user wait. How can we do that?
Ready to run tasks in the background? Let's move on to [Chapter 8: Background Tasks](08_background_tasks.md)!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,180 @@
# Chapter 8: Background Tasks
Welcome back! In [Chapter 7: Security Utilities](07_security_utilities.md), we learned how to protect our API endpoints using FastAPI's security features. Now, let's explore how to perform actions *after* we've already sent a response back to the user.
## What Problem Does This Solve?
Imagine a user registers on your website. When they submit their registration form, your API endpoint needs to:
1. Create the new user account in the database.
2. Send a welcome email to the user.
3. Send a notification to an admin.
4. Return a "Success!" message to the user.
Creating the user (step 1) is quick and essential before confirming success. But sending emails or notifications (steps 2 and 3) can sometimes be slow. Should the user have to wait several extra seconds just for the emails to be sent before they see the "Success!" message? Probably not! It would be much better if the API could send the "Success!" response immediately after creating the user, and then handle sending the emails *in the background*.
This is exactly what **Background Tasks** allow you to do in FastAPI. They let you define operations that need to happen *after* the response has been sent to the client, ensuring your users get a fast response time for the main action.
**Analogy:** Think of your path operation function as having a conversation with the user (sending the response). Once the main conversation is finished, you might hand off a follow-up task (like mailing a letter) to an assistant to complete later, so you don't keep the user waiting. Background Tasks are like that helpful assistant.
## Key Concepts
1. **`BackgroundTasks` Object:** A special object provided by FastAPI that holds a list of tasks to be run later.
2. **Dependency Injection:** You get access to this object by declaring it as a parameter in your path operation function, just like we learned in [Chapter 5: Dependency Injection](05_dependency_injection.md). Example: `def my_endpoint(background_tasks: BackgroundTasks): ...`.
3. **`add_task()` Method:** You use the `add_task()` method on the `BackgroundTasks` object to schedule a function to run in the background. You provide the function itself and any arguments it needs. Example: `background_tasks.add_task(send_welcome_email, user.email, user.name)`.
4. **Post-Response Execution:** FastAPI (specifically, the underlying Starlette framework) ensures that all functions added via `add_task()` are executed *only after* the response has been successfully sent back to the client.
## Using Background Tasks
Let's create a simple example. Imagine we want to write a message to a log file *after* sending a notification response to the user.
**Step 1: Import `BackgroundTasks`**
First, import the necessary class from `fastapi`.
```python
# main.py (or your router file)
from fastapi import BackgroundTasks, FastAPI
app = FastAPI()
```
**Step 2: Define the Task Function**
This is the function you want to run in the background. It can be a regular `def` function or an `async def` function.
```python
# A function to simulate writing to a log
# In a real app, this might send an email, process data, etc.
def write_log(message: str):
# Simulate writing to a file
with open("log.txt", mode="a") as log_file:
log_file.write(message + "\n")
print(f"Log written: {message}") # Also print to console for demo
```
**Explanation:**
* This is a simple Python function `write_log` that takes a `message` string.
* It opens a file named `log.txt` in "append" mode (`a`) and writes the message to it.
* We also print to the console so we can easily see when it runs during testing.
**Step 3: Inject `BackgroundTasks` and use `add_task`**
Now, modify your path operation function to accept `BackgroundTasks` as a parameter and use its `add_task` method.
```python
@app.post("/send-notification/{email}")
async def send_notification(
email: str,
background_tasks: BackgroundTasks # Inject BackgroundTasks
):
# The message we want to log in the background
log_message = f"Notification sent to: {email}"
# Add the task to run after the response
background_tasks.add_task(write_log, log_message) # Schedule write_log
# Return the response immediately
return {"message": "Notification sent successfully!"}
```
**Explanation:**
* `background_tasks: BackgroundTasks`: We declare a parameter named `background_tasks` with the type hint `BackgroundTasks`. FastAPI's dependency injection system will automatically create and provide a `BackgroundTasks` object here.
* `background_tasks.add_task(write_log, log_message)`: This is the crucial line.
* We call the `add_task` method on the injected `background_tasks` object.
* The first argument is the function we want to run in the background (`write_log`).
* The subsequent arguments (`log_message`) are the arguments that will be passed to our `write_log` function when it's eventually called.
* `return {"message": "Notification sent successfully!"}`: The function returns its response *without* waiting for `write_log` to finish.
**How it Behaves:**
1. **Run the App:** `uvicorn main:app --reload`
2. **Send a Request:** Use `curl` or the `/docs` UI to send a `POST` request to `/send-notification/test@example.com`.
```bash
curl -X POST http://127.0.0.1:8000/send-notification/test@example.com
```
3. **Immediate Response:** You will immediately receive the JSON response:
```json
{"message":"Notification sent successfully!"}
```
4. **Background Execution:** *After* the response above has been sent, look at your Uvicorn console output. You will see the message:
```
Log written: Notification sent to: test@example.com
```
Also, check your project directory. A file named `log.txt` will have been created (or appended to) with the content:
```
Notification sent to: test@example.com
```
This demonstrates that the `write_log` function ran *after* the client received the success message, preventing any delay for the user.
## How it Works Under the Hood (Simplified)
What's happening behind the scenes when you use `BackgroundTasks`?
1. **Request In:** A request arrives at your FastAPI application (e.g., `POST /send-notification/test@example.com`).
2. **Dependency Injection:** FastAPI processes the request, routes it to `send_notification`, and prepares its dependencies. It sees the `background_tasks: BackgroundTasks` parameter and creates an empty `BackgroundTasks` object instance.
3. **Path Function Runs:** Your `send_notification` function is called with the `email` and the empty `background_tasks` object.
4. **`add_task` Called:** Your code calls `background_tasks.add_task(write_log, log_message)`. This doesn't *run* `write_log` yet; it just adds the function (`write_log`) and its arguments (`log_message`) to an internal list within the `background_tasks` object.
5. **Response Returned:** Your path function finishes and returns the dictionary `{"message": "Notification sent successfully!"}`.
6. **Middleware Magic (Starlette):** FastAPI (using Starlette middleware) takes the response object *and* the `background_tasks` object (which now contains the scheduled task).
7. **Response Sent:** The middleware sends the HTTP response (`200 OK` with the JSON body) back to the client over the network.
8. **Tasks Executed:** *After* the response has been sent, the Starlette middleware iterates through the tasks stored in the `background_tasks` object. For each task, it calls the stored function (`write_log`) with the stored arguments (`log_message`). This happens in the server's process, separate from the initial request-response flow.
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant Client
participant FastAPIApp as FastAPI App (via Starlette)
participant PathFunc as send_notification
participant BGTasks as BackgroundTasks Object
participant BGExecutor as Background Task Executor (Starlette)
participant TaskFunc as write_log
Client->>+FastAPIApp: POST /send-notification/test@example.com
FastAPIApp->>FastAPIApp: Route to send_notification
FastAPIApp->>+PathFunc: Call send_notification(email="...", background_tasks=BGTasks)
PathFunc->>+BGTasks: background_tasks.add_task(write_log, "...")
BGTasks-->>-PathFunc: Task added to internal list
PathFunc-->>-FastAPIApp: Return response {"message": "..."}
Note over FastAPIApp: FastAPI/Starlette prepares to send response AND notes background tasks
FastAPIApp-->>-Client: Send HTTP 200 OK Response
Note over FastAPIApp: Response sent, now run background tasks
FastAPIApp->>+BGExecutor: Execute tasks from BGTasks object
BGExecutor->>+TaskFunc: Call write_log("...")
TaskFunc->>TaskFunc: Write to log.txt
TaskFunc-->>-BGExecutor: Task finished
BGExecutor-->>-FastAPIApp: All tasks finished
```
### Code Connections
* **`fastapi.BackgroundTasks`**: This class (in `fastapi/background.py`) inherits directly from `starlette.background.BackgroundTasks`. It mostly just provides type hints and documentation specific to FastAPI.
* **`BackgroundTasks.add_task`**: This method simply calls the `add_task` method of the parent Starlette class.
* **`starlette.background.BackgroundTasks`**: This is where the core logic resides (in the `starlette` library, which FastAPI builds upon). It stores tasks as tuples of `(callable, args, kwargs)`.
* **`starlette.middleware.exceptions.ExceptionMiddleware` (and potentially others):** Starlette's middleware stack, particularly around exception handling and response sending, is responsible for checking if a `BackgroundTasks` object exists on the response object after the main endpoint code has run. If tasks exist, the middleware ensures they are executed *after* the response is sent using `anyio.create_task_group().start_soon()` or similar mechanisms. See `starlette.responses.Response.__call__`.
Essentially, FastAPI provides a convenient way (via dependency injection) to access Starlette's background task functionality.
## Conclusion
You've learned how to use FastAPI's `BackgroundTasks` to perform operations *after* sending a response to the client!
* You understand that this is useful for **slow or non-critical tasks** (like sending emails or notifications) that shouldn't delay the user's primary action.
* You learned to inject the **`BackgroundTasks`** object as a dependency.
* You saw how to schedule functions using the **`add_task(func, *args, **kwargs)`** method.
* You understand that these tasks run **after the response** has been delivered.
This feature helps you build more responsive APIs by deferring non-essential work.
This chapter concludes our core introduction to FastAPI! We've covered setting up applications, defining routes, handling parameters and data validation, using dependency injection, handling errors, securing endpoints, and now running background tasks. With these building blocks, you can create powerful and efficient web APIs.
Where do you go from here? You can dive deeper into the official FastAPI documentation to explore advanced topics like WebSockets, middleware, bigger application structures, testing, and deployment. Happy coding!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

48
output/FastAPI/index.md Normal file
View File

@@ -0,0 +1,48 @@
# Tutorial: FastAPI
FastAPI is a modern, *high-performance* web framework for building APIs with Python.
It's designed to be **easy to use**, fast to code, and ready for production.
Key features include **automatic data validation** (using Pydantic), **dependency injection**, and **automatic interactive API documentation** (OpenAPI and Swagger UI).
**Source Repository:** [https://github.com/fastapi/fastapi/tree/628c34e0cae200564d191c95d7edea78c88c4b5e/fastapi](https://github.com/fastapi/fastapi/tree/628c34e0cae200564d191c95d7edea78c88c4b5e/fastapi)
```mermaid
flowchart TD
A0["FastAPI Application & Routing"]
A1["Path Operations & Parameter Declaration"]
A2["Data Validation & Serialization (Pydantic)"]
A3["Dependency Injection"]
A4["OpenAPI & Automatic Docs"]
A5["Error Handling"]
A6["Security Utilities"]
A7["Background Tasks"]
A0 -- "Defines Routes for" --> A1
A1 -- "Uses for parameter/body val..." --> A2
A1 -- "Uses Depends() for dependen..." --> A3
A0 -- "Generates API spec for" --> A4
A0 -- "Manages global" --> A5
A3 -- "Injects BackgroundTasks object" --> A7
A6 -- "Uses Depends mechanism (Sec..." --> A3
A6 -- "Raises HTTPException on fai..." --> A5
A4 -- "Reads definitions from" --> A1
A4 -- "Reads Pydantic models for s..." --> A2
A4 -- "Reads security scheme defin..." --> A6
A5 -- "Handles RequestValidationEr..." --> A2
```
## Chapters
1. [FastAPI Application & Routing](01_fastapi_application___routing.md)
2. [Path Operations & Parameter Declaration](02_path_operations___parameter_declaration.md)
3. [Data Validation & Serialization (Pydantic)](03_data_validation___serialization__pydantic_.md)
4. [OpenAPI & Automatic Docs](04_openapi___automatic_docs.md)
5. [Dependency Injection](05_dependency_injection.md)
6. [Error Handling](06_error_handling.md)
7. [Security Utilities](07_security_utilities.md)
8. [Background Tasks](08_background_tasks.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,155 @@
# Chapter 1: Application Object (`Flask`)
Welcome to your first step into the world of Flask! Flask is a "microframework" for building web applications in Python. "Micro" doesn't mean it's limited; it means Flask provides the essentials to get started quickly, letting you add features as needed.
In this chapter, we'll explore the absolute heart of any Flask application: the **Application Object**.
## What Problem Does It Solve? The Need for a Control Tower
Imagine you're building a simple website. Maybe it just needs to show "Hello, World!" when someone visits the homepage. How does the web server know *what* Python code to run when a request comes in for `/` (the homepage)? How does it manage different pages (like `/about` or `/contact`)? How does it handle settings or connect to other tools?
You need a central place to manage all these tasks. Think of a busy airport: you need a **control tower** to direct planes (incoming web requests), manage runways (URL paths), and coordinate ground crew (other parts of your application).
In Flask, the `Flask` object is that control tower. It's the main object you create that represents your entire web application.
## Creating Your First Flask Application
Let's create the simplest possible Flask app. You'll need a Python file (let's call it `hello.py`).
1. **Import Flask:** First, you need to bring the `Flask` class into your code.
2. **Create an Instance:** Then, you create an *instance* of this class. This instance *is* your application.
```python
# hello.py
from flask import Flask
# Create the application object
app = Flask(__name__)
# We'll add more here soon!
```
Let's break down `app = Flask(__name__)`:
* `from flask import Flask`: This line imports the necessary `Flask` class from the Flask library you installed.
* `app = Flask(...)`: This creates the actual application object. We usually call the variable `app`, but you could name it something else.
* `__name__`: This is a special Python variable. When you run a Python script directly, Python sets `__name__` to the string `"__main__"`. If the script is imported by another script, `__name__` is set to the module's name (e.g., `"hello"` if your file is `hello.py`).
* **Why `__name__`?** Flask uses this argument to figure out the *location* of your application. This helps it find other files like templates and static assets (images, CSS) later on. For simple, single-module applications, using `__name__` is standard practice and almost always correct. The Flask documentation notes that if you're building a larger application structured as a Python package, you might hardcode the package name instead (like `app = Flask('yourapplication')`), but for beginners, `__name__` is the way to go.
This `app` object is now ready to be configured and run.
## Adding a Basic Route
Our `app` object doesn't do anything yet. Let's tell it what to do when someone visits the homepage (`/`). We do this using a *route*. We'll cover routing in detail in the next chapter, but here's a taste:
```python
# hello.py (continued)
from flask import Flask
app = Flask(__name__)
# Define what happens when someone visits the homepage ("/")
@app.route('/')
def index():
return 'Hello, World!'
# More code to run the app below...
```
* `@app.route('/')`: This is a Python decorator. It modifies the function defined right below it (`index`). It tells our `app` object: "When a web request comes in for the URL path `/`, call the `index` function."
* `def index(): ...`: This is a simple Python function. Flask calls these "view functions."
* `return 'Hello, World!'`: Whatever the view function returns is sent back to the user's web browser as the response.
## Running Your Application
How do we start the web server so people can actually visit our page? We use the `app` object's `run()` method. It's common practice to put this inside a special `if` block:
```python
# hello.py (end of the file)
from flask import Flask
app = Flask(__name__)
@app.route('/')
def index():
return 'Hello, World!'
# This block runs the app only when the script is executed directly
if __name__ == '__main__':
# Start the built-in development server
app.run(debug=True)
```
* `if __name__ == '__main__':`: This standard Python construct ensures that the code inside it only runs when you execute `hello.py` directly (like typing `python hello.py` in your terminal). It prevents the server from starting if you were to *import* `hello.py` into another Python file.
* `app.run()`: This method starts Flask's built-in development web server. This server is great for testing but **not** suitable for production (live websites).
* `debug=True`: This enables Flask's "debug mode". It provides helpful error messages in the browser and automatically restarts the server whenever you save changes to your code, making development much easier. **Never use debug mode in production!**
**To run this:**
1. Save the complete code as `hello.py`.
2. Open your terminal or command prompt.
3. Navigate to the directory where you saved the file.
4. Run the command: `python hello.py`
5. You'll see output like this:
```
* Serving Flask app 'hello'
* Debug mode: on
* Running on http://127.0.0.1:5000 (Press CTRL+C to quit)
* Restarting with stat
* Debugger is active!
* Debugger PIN: ...
```
6. Open your web browser and go to `http://127.0.0.1:5000/`.
7. You should see the text "Hello, World!"
You've just created and run your first Flask application! The `app = Flask(__name__)` line was the crucial first step, creating the central object that manages everything.
## Under the Hood: What Happens When You Create `Flask(__name__)`?
While you don't *need* to know the deep internals right away, a little insight helps understanding. When you call `app = Flask(__name__)`, several things happen inside Flask (simplified):
1. **Initialization:** The `Flask` class's `__init__` method (found in `app.py`, inheriting from `App` in `sansio/app.py`) is called.
2. **Path Determination:** It uses the `import_name` (`__name__`) you passed to figure out the application's `root_path`. This is like finding the main hangar at the airport. (See `get_root_path` in `helpers.py` and `find_package` in `sansio/scaffold.py`).
3. **Configuration Setup:** It creates a configuration object (`self.config`), usually an instance of the `Config` class (from `config.py`). This object holds settings like `DEBUG`, `SECRET_KEY`, etc. We'll cover this in [Configuration (`Config`)](06_configuration___config__.md).
4. **URL Map Creation:** It creates a `URL Map` (`self.url_map`), which is responsible for matching incoming request URLs to your view functions. This is core to the [Routing System](02_routing_system.md).
5. **Internal Structures:** It sets up various internal dictionaries to store things like your view functions (`self.view_functions`), error handlers (`self.error_handler_spec`), functions to run before/after requests, etc.
6. **Static Route (Optional):** If you configured a `static_folder` (Flask does by default), it automatically adds a URL rule (like `/static/<filename>`) to serve static files like CSS and JavaScript.
Here's a simplified diagram of the process:
```mermaid
sequenceDiagram
participant UserCode as hello.py
participant Flask as Flask(__init__)
participant App as Base App(__init__)
participant Config as Config()
participant URLMap as URL Map()
UserCode->>+Flask: app = Flask(__name__)
Flask->>+App: Initialize base features (paths, folders)
App-->>-Flask: Base initialized
Flask->>+Config: Create config object (self.config)
Config-->>-Flask: Config ready
Flask->>+URLMap: Create URL map (self.url_map)
URLMap-->>-Flask: Map ready
Flask-->>-UserCode: Return Flask instance (app)
```
The `app` object returned is now the fully initialized "control tower," ready to register routes and handle requests.
## Conclusion
You've learned about the most fundamental concept in Flask: the **Application Object**, created by instantiating the `Flask` class (usually as `app = Flask(__name__)`). This object acts as the central registry and controller for your entire web application. It's where you define URL routes, manage configuration, and connect various components.
We saw how to create a minimal application, add a simple route using `@app.route()`, and run the development server using `app.run()`.
Now that you have your central `app` object, the next logical step is to understand how Flask directs incoming web requests to the correct Python functions. That's the job of the routing system.
Ready to direct some traffic? Let's move on to [Routing System](02_routing_system.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,206 @@
# Chapter 2: Routing System
Welcome back! In [Chapter 1: Application Object (`Flask`)](01_application_object___flask__.md), we learned how to create the central `app` object, the control tower for our Flask application. We even added a simple "Hello, World!" page using `@app.route('/')`.
But how did Flask know that visiting the homepage (`/`) should run our `index()` function? And how can we create more pages, like an "About Us" page at `/about`? That's where the **Routing System** comes in.
## What Problem Does It Solve? The Need for Directions
Imagine you have a website with multiple pages: a homepage, an about page, a contact page, maybe even pages for individual user profiles. When a user types a URL like `http://yourwebsite.com/about` into their browser, how does your Flask application know *which* piece of Python code should handle this request and generate the "About Us" content?
You need a system to map these incoming URLs to the specific Python functions that generate the response for each page. Think of it like a city map's index:
* **URL:** The street address you want to find (e.g., `/about`).
* **Routing System:** The index in the map book.
* **View Function:** The specific page number in the map book that shows the details for that address.
Flask's routing system, largely powered by a library called Werkzeug, acts as this index. It lets you define URL patterns (like `/` or `/about` or `/user/<username>`) and connect them to your Python functions (called **view functions**).
## Defining Routes with `@app.route()`
In Flask, the most common way to define these URL-to-function mappings is using the `@app.route()` decorator, which we briefly saw in Chapter 1.
Let's revisit our `hello.py` and add an "About" page.
1. We keep the route for the homepage (`/`).
2. We add a *new* route for `/about`.
```python
# hello.py
from flask import Flask
# Create the application object from Chapter 1
app = Flask(__name__)
# Route for the homepage
@app.route('/')
def index():
return 'Welcome to the Homepage!'
# NEW: Route for the about page
@app.route('/about')
def about():
return 'This is the About Us page.'
# Code to run the app (from Chapter 1)
if __name__ == '__main__':
app.run(debug=True)
```
**Explanation:**
* `@app.route('/')`: This tells Flask: "If a request comes in for the URL path `/`, execute the function directly below (`index`)."
* `@app.route('/about')`: This tells Flask: "If a request comes in for the URL path `/about`, execute the function directly below (`about`)."
* `def index(): ...` and `def about(): ...`: These are our **view functions**. They contain the Python code that runs for their respective routes and must return the response to send back to the browser.
**Running this:**
1. Save the code as `hello.py`.
2. Run `python hello.py` in your terminal.
3. Visit `http://127.0.0.1:5000/` in your browser. You should see "Welcome to the Homepage!".
4. Visit `http://127.0.0.1:5000/about`. You should see "This is the About Us page.".
See? The routing system directed each URL to the correct view function!
## Dynamic Routes: Using Variables in URLs
What if you want pages that change based on the URL? For example, a profile page for different users like `/user/alice` and `/user/bob`. You don't want to write a new view function for every single user!
Flask allows you to define *variable parts* in your URL rules using angle brackets `< >`.
Let's create a dynamic route to greet users:
```python
# hello.py (continued)
# ... (keep Flask import, app creation, index, and about routes) ...
# NEW: Dynamic route for user profiles
@app.route('/user/<username>')
def show_user_profile(username):
# The 'username' variable from the URL is passed to the function!
return f'Hello, {username}!'
# ... (keep the if __name__ == '__main__': block) ...
```
**Explanation:**
* `@app.route('/user/<username>')`:
* The `/user/` part is fixed.
* `<username>` is a **variable placeholder**. Flask will match any text here (like `alice`, `bob`, `123`) and capture it.
* `def show_user_profile(username):`:
* Notice the function now accepts an argument named `username`. This **must match** the variable name used in the angle brackets in the route.
* Flask automatically passes the value captured from the URL to this argument.
* `return f'Hello, {username}!'`: We use an f-string to include the captured username in the response.
**Running this:**
1. Save the updated `hello.py` (make sure `debug=True` is still set so the server restarts).
2. Visit `http://127.0.0.1:5000/user/Alice`. You should see "Hello, Alice!".
3. Visit `http://127.0.0.1:5000/user/Bob`. You should see "Hello, Bob!".
Flask's routing system matched both URLs to the same rule (`/user/<username>`) and passed the different usernames (`'Alice'`, `'Bob'`) to the `show_user_profile` function.
## Specifying Data Types: Converters
By default, variables captured from the URL are treated as strings. But what if you need a number? For example, displaying blog post number 5 at `/post/5`. You might want Flask to ensure that only numbers are accepted for that part of the URL.
You can specify a **converter** inside the angle brackets using `<converter:variable_name>`.
Let's add a route for blog posts using the `int` converter:
```python
# hello.py (continued)
# ... (keep previous code) ...
# NEW: Route for displaying a specific blog post by ID
@app.route('/post/<int:post_id>')
def show_post(post_id):
# Flask ensures post_id is an integer and passes it here
# Note: We are just showing the ID, not actually fetching a post
return f'Showing Post Number: {post_id} (Type: {type(post_id).__name__})'
# ... (keep the if __name__ == '__main__': block) ...
```
**Explanation:**
* `@app.route('/post/<int:post_id>')`:
* `<int:post_id>` tells Flask: "Match this part of the URL, but only if it looks like an integer. Convert it to an integer and pass it as the `post_id` variable."
* `def show_post(post_id):`: The `post_id` argument will now receive an actual Python `int`.
**Running this:**
1. Save the updated `hello.py`.
2. Visit `http://127.0.0.1:5000/post/123`. You should see "Showing Post Number: 123 (Type: int)".
3. Visit `http://127.0.0.1:5000/post/abc`. You'll get a "Not Found" error! Why? Because `abc` doesn't match the `int` converter, so Flask doesn't consider this URL to match the rule.
Common converters include:
* `string`: (Default) Accepts any text without a slash.
* `int`: Accepts positive integers.
* `float`: Accepts positive floating-point values.
* `path`: Like `string` but also accepts slashes (useful for matching file paths).
* `uuid`: Accepts UUID strings.
## Under the Hood: How Does Routing Work?
You don't *need* to know the deep internals, but understanding the basics helps.
When you define routes using `@app.route()`, Flask doesn't immediately check URLs. Instead, it builds a map, like pre-compiling that map index we talked about.
1. **Building the Map:**
* When you create your `app = Flask(__name__)` ([Chapter 1](01_application_object___flask__.md)), Flask initializes an empty `URLMap` object (from the Werkzeug library, stored in `app.url_map`). See `Flask.__init__` in `app.py` which calls `super().__init__` in `sansio/app.py`, which creates the `self.url_map`.
* Each time you use `@app.route('/some/rule', ...)` or directly call `app.add_url_rule(...)` (see `sansio/scaffold.py`), Flask creates a `Rule` object (like `Rule('/user/<username>')`) describing the pattern, the allowed HTTP methods (GET, POST, etc.), the endpoint name (usually the function name), and any converters.
* This `Rule` object is added to the `app.url_map`.
2. **Matching a Request:**
* When a request like `GET /user/Alice` arrives, Flask's `wsgi_app` method (in `app.py`) gets called.
* It uses the `app.url_map` and the incoming request environment (URL path, HTTP method) to find a matching `Rule`. Werkzeug's `MapAdapter.match()` method (created via `app.create_url_adapter` which calls `url_map.bind_to_environ`) does the heavy lifting here.
* If a match is found for `/user/<username>`, `match()` returns the endpoint name (e.g., `'show_user_profile'`) and a dictionary of the extracted variables (e.g., `{'username': 'Alice'}`). These get stored on the `request` object ([Chapter 3](03_request_and_response_objects.md)) as `request.url_rule` and `request.view_args`.
* If no rule matches, a "Not Found" (404) error is raised.
3. **Dispatching to the View Function:**
* Flask's `app.dispatch_request()` method (in `app.py`) takes the endpoint name from `request.url_rule.endpoint`.
* It looks up the actual Python view function associated with that endpoint name in the `app.view_functions` dictionary (which `@app.route` also populated).
* It calls the view function, passing the extracted variables from `request.view_args` as keyword arguments (e.g., `show_user_profile(username='Alice')`).
* The return value of the view function becomes the response.
Here's a simplified diagram of the matching process:
```mermaid
sequenceDiagram
participant Browser
participant FlaskApp as app.wsgi_app
participant URLMap as url_map.bind(...).match()
participant ViewFunc as show_user_profile()
Browser->>+FlaskApp: GET /user/Alice
FlaskApp->>+URLMap: Match path '/user/Alice' and method 'GET'?
URLMap-->>-FlaskApp: Match found! Endpoint='show_user_profile', Args={'username': 'Alice'}
FlaskApp->>+ViewFunc: Call show_user_profile(username='Alice')
ViewFunc-->>-FlaskApp: Return 'Hello, Alice!'
FlaskApp-->>-Browser: Send response 'Hello, Alice!'
```
The key takeaway is that `@app.route` builds a map upfront, and Werkzeug efficiently searches this map for each incoming request to find the right function and extract any variable parts.
## Conclusion
You've learned how Flask's **Routing System** acts as a map between URLs and the Python functions (view functions) that handle them.
* We use the `@app.route()` decorator to define URL rules.
* We can create static routes (like `/about`) and dynamic routes using variables (`/user/<username>`).
* Converters (`<int:post_id>`) allow us to specify the expected data type for URL variables, providing automatic validation and conversion.
* Under the hood, Flask and Werkzeug build a `URLMap` from these rules and use it to efficiently dispatch incoming requests to the correct view function.
Now that we know how to direct requests to the right functions, what information comes *with* a request (like form data or query parameters)? And how do we properly format the data we send *back*? That's where the Request and Response objects come in.
Let's dive into [Chapter 3: Request and Response Objects](03_request_and_response_objects.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,257 @@
# Chapter 3: Request and Response Objects
Welcome back! In [Chapter 2: Routing System](02_routing_system.md), we learned how Flask uses routes (`@app.route(...)`) to direct incoming web requests to the correct Python view functions. We saw how to create static routes like `/about` and dynamic routes like `/user/<username>`.
But what exactly *is* a "web request"? And how do we send back something more sophisticated than just a plain string like `'Hello, World!'`? That's where **Request** and **Response** objects come into play.
## What Problem Do They Solve? The Need for Envelopes
Think about sending and receiving mail. When you receive a letter, it's not just the message inside that matters. The envelope has important information: the sender's address, the recipient's address, maybe a stamp indicating priority. When you send a letter back, you also need an envelope to put your message in, address it correctly, and maybe specify if it's regular mail or express.
In the world of web applications (specifically HTTP, the language browsers and servers speak):
* The **Request** object is like the *incoming mail*. It contains everything the client (usually a web browser) sent to your server: the URL they requested, any data they submitted (like in a search box or login form), special instructions (HTTP headers), the method they used (like GET for fetching data or POST for submitting data), and more.
* The **Response** object is like the *outgoing mail* you send back. It contains the content you want to show the user (like an HTML page), the status of the request (like "OK" or "Not Found"), and any special instructions for the browser (HTTP headers, like instructions on how to cache the page).
Flask provides easy-to-use objects to represent these two sides of the communication.
## The Request Object: Unpacking the Incoming Mail
Inside your view functions, Flask makes a special object called `request` available. You need to import it from the `flask` library first. This object holds all the information about the incoming request that triggered your view function.
```python
# hello.py (continued)
from flask import Flask, request # Import request
app = Flask(__name__)
@app.route('/')
def index():
# Access the HTTP method (GET, POST, etc.)
method = request.method
# Access the browser's user agent string (an HTTP header)
user_agent = request.headers.get('User-Agent')
return f'Hello! You used the {method} method. Your browser is: {user_agent}'
# ... (rest of the app, including if __name__ == '__main__': ...)
```
**Explanation:**
* `from flask import request`: We import the `request` object.
* `request.method`: This attribute tells you *how* the user made the request (e.g., 'GET', 'POST'). Visiting a page normally uses GET.
* `request.headers`: This is a dictionary-like object containing HTTP headers sent by the browser. We use `.get('User-Agent')` to safely get the browser identification string.
**Running this:**
1. Save and run `hello.py`.
2. Visit `http://127.0.0.1:5000/` in your browser.
3. You'll see something like: "Hello! You used the GET method. Your browser is: Mozilla/5.0 (..." (your specific browser details will vary).
### Getting Data from the URL (Query Parameters)
Often, data is included directly in the URL after a `?`, like `http://127.0.0.1:5000/search?query=flask`. These are called query parameters. The `request` object provides the `args` attribute to access them.
```python
# hello.py (continued)
from flask import Flask, request
app = Flask(__name__)
@app.route('/search')
def search():
# Get the value of the 'query' parameter from the URL
# request.args.get() is safer than request.args[] as it returns None if the key doesn't exist
search_term = request.args.get('query')
if search_term:
return f'You searched for: {search_term}'
else:
return 'Please provide a search term using ?query=...'
# ... (rest of the app)
```
**Running this:**
1. Save and run `hello.py`.
2. Visit `http://127.0.0.1:5000/search?query=python+web+framework`.
3. You should see: "You searched for: python web framework".
4. Visit `http://127.0.0.1:5000/search`.
5. You should see: "Please provide a search term using ?query=..."
### Getting Data from Forms (POST Requests)
When a user submits an HTML form, the browser usually sends the data using the POST method. This data isn't in the URL; it's in the body of the request. The `request` object provides the `form` attribute to access this data.
Let's create a simple login page (we won't actually log anyone in yet).
First, a route to *show* the form (using GET):
```python
# hello.py (continued)
from flask import Flask, request, make_response # Import make_response
app = Flask(__name__)
@app.route('/login', methods=['GET']) # Only allow GET for this view
def show_login_form():
# Just return the raw HTML for the form
return '''
<form method="POST">
Username: <input type="text" name="username"><br>
Password: <input type="password" name="password"><br>
<input type="submit" value="Log In">
</form>
'''
# ... (add the next route below)
```
Now, a route to *handle* the form submission (using POST):
```python
# hello.py (continued)
@app.route('/login', methods=['POST']) # Only allow POST for this view
def process_login():
# Access form data using request.form
username = request.form.get('username')
password = request.form.get('password') # In a real app, NEVER just display a password!
if username and password:
return f'Attempting login for username: {username}'
else:
return 'Missing username or password', 400 # Return an error status code
# ... (rest of the app, including if __name__ == '__main__': ...)
```
**Explanation:**
* `@app.route('/login', methods=['GET'])`: We specify that `show_login_form` only handles GET requests.
* `@app.route('/login', methods=['POST'])`: We specify that `process_login` only handles POST requests. This allows the same URL (`/login`) to do different things based on the HTTP method.
* `<form method="POST">`: The HTML form is set to use the POST method when submitted.
* `request.form.get('username')`: Inside `process_login`, we access the submitted form data using the `name` attributes of the input fields (`name="username"`).
* `return 'Missing...', 400`: Here we return not just a string, but also a number. Flask understands this as `(body, status_code)`. `400` means "Bad Request".
**Running this:**
1. Save and run `hello.py`.
2. Visit `http://127.0.0.1:5000/login`. You'll see the simple login form.
3. Enter a username and password and click "Log In".
4. The browser will send a POST request to `/login`. The `process_login` function will handle it, and you'll see: "Attempting login for username: [your username]".
The `request` object is your window into the data sent by the client. You'll use `request.args` for URL parameters (GET) and `request.form` for form data (POST) most often.
## The Response Object: Crafting the Outgoing Mail
We've seen that Flask takes the return value of your view function and turns it into the HTTP response sent back to the browser.
* Returning a string: Flask creates a Response with that string as the body, a `200 OK` status code, and a `text/html` content type.
* Returning a tuple `(body, status)`: Flask uses the `body` (string) and the specified `status` code (integer).
* Returning a tuple `(body, status, headers)`: Flask uses the body, status, and adds the specified `headers` (a dictionary or list of tuples).
For more control, you can explicitly create a Response object using the `make_response` helper function.
```python
# hello.py (continued)
from flask import Flask, make_response # Import make_response
app = Flask(__name__)
@app.route('/custom')
def custom_response():
# Create a response object from a string
response = make_response("This response has custom headers!")
# Set a custom header
response.headers['X-My-Custom-Header'] = 'Flask is Fun!'
# Set a cookie (we'll learn more about sessions/cookies later)
response.set_cookie('mycookie', 'some_value')
# Set a specific status code (optional, defaults to 200)
response.status_code = 201 # 201 means "Created"
return response # Return the fully configured response object
# ... (rest of the app)
```
**Explanation:**
* `from flask import make_response`: We import the helper function.
* `response = make_response(...)`: Creates a Response object. You can pass the body content here.
* `response.headers['...'] = '...'`: Allows setting custom HTTP headers. Browsers might use these for caching, security, or other purposes. Your own JavaScript code could also read them.
* `response.set_cookie(...)`: A convenient way to set a cookie to be stored by the browser.
* `response.status_code = 201`: Sets the HTTP status code. While `200` means "OK", other codes have specific meanings (`404` Not Found, `403` Forbidden, `500` Server Error, `201` Created, `302` Redirect, etc.).
* `return response`: We return the response object we manually configured.
Using `make_response` gives you fine-grained control over exactly what gets sent back to the client.
## Under the Hood: Werkzeug and the Request/Response Cycle
Flask doesn't reinvent the wheel for handling low-level HTTP details. It uses another excellent Python library called **Werkzeug** (pronounced "verk-zoyg", German for "tool"). Flask's `Request` and `Response` objects are actually subclasses of Werkzeug's base `Request` and `Response` classes, adding some Flask-specific conveniences.
Here's a simplified view of what happens when a request comes in:
1. **Incoming Request:** Your web server (like the Flask development server, or a production server like Gunicorn/uWSGI) receives the raw HTTP request from the browser.
2. **WSGI Environment:** The server translates this raw request into a standard Python dictionary called the WSGI `environ`. This dictionary contains all the request details (path, method, headers, input stream, etc.).
3. **Flask App Called:** The server calls your Flask application object (`app`) as a WSGI application, passing it the `environ`. (See `app.wsgi_app` in `app.py`).
4. **Request Context:** Flask creates a **Request Context**. This involves:
* Creating a `Request` object (usually `flask.wrappers.Request`) by feeding it the `environ`. Werkzeug does the heavy lifting of parsing the environment. (See `app.request_context` in `app.py` which uses `app.request_class`).
* Making this `request` object (and other context-specific things like `session`) easily accessible. (We'll cover contexts in detail in [Chapter 5](05_context_globals___current_app____request____session____g__.md) and [Chapter 7](07_application_and_request_contexts.md)).
5. **Routing:** Flask's routing system ([Chapter 2](02_routing_system.md)) uses `request.path` and `request.method` to find the correct view function via the `app.url_map`.
6. **View Function Call:** Flask calls your view function, possibly passing arguments extracted from the URL (like `username` in `/user/<username>`).
7. **Accessing Request Data:** Inside your view function, you access data using the `request` object (e.g., `request.args`, `request.form`).
8. **View Return Value:** Your view function returns a value (string, tuple, Response object).
9. **Response Creation:** Flask calls `app.make_response()` (see `app.py`) on the return value. This either uses the Response object you returned directly, or constructs a new one (`flask.wrappers.Response` or `app.response_class`) based on the string/tuple you returned. Werkzeug's `Response` handles formatting the body, status, and headers correctly.
10. **Response Sent:** Flask returns the Response object's details (status, headers, body) back to the WSGI server.
11. **Outgoing Response:** The server transmits the HTTP response back to the browser.
12. **Context Teardown:** The Request Context is cleaned up.
```mermaid
sequenceDiagram
participant Browser
participant WSGIServer as WSGI Server
participant FlaskApp as Flask App (wsgi_app)
participant RequestCtx as Request Context
participant ReqObj as Request Object
participant Routing
participant ViewFunc as Your View Function
participant RespObj as Response Object
Browser->>+WSGIServer: Sends HTTP Request (e.g., GET /search?query=flask)
WSGIServer->>+FlaskApp: Calls app(environ, start_response)
FlaskApp->>+RequestCtx: Creates Request Context(environ)
RequestCtx->>+ReqObj: Creates Request(environ)
RequestCtx-->>-FlaskApp: Request Context ready (request is now available)
FlaskApp->>+Routing: Matches request.path, request.method
Routing-->>-FlaskApp: Finds view_func=search, args={}
FlaskApp->>+ViewFunc: Calls search()
ViewFunc->>ReqObj: Accesses request.args.get('query')
ViewFunc-->>-FlaskApp: Returns "You searched for: flask" (string)
FlaskApp->>+RespObj: Calls make_response("...")
RespObj-->>-FlaskApp: Response object created (status=200, body="...", headers={...})
FlaskApp-->>-WSGIServer: Returns Response (via start_response, iterable body)
WSGIServer-->>-Browser: Sends HTTP Response
Note right of FlaskApp: Request Context is torn down
```
The key takeaway is that Flask uses Werkzeug to wrap the raw incoming request data into a convenient `Request` object and helps you format your return value into a proper `Response` object to send back.
## Conclusion
In this chapter, we explored the fundamental Request and Response objects in Flask.
* The **`request` object** (imported from `flask`) gives you access to incoming data within your view functions, like URL parameters (`request.args`), form data (`request.form`), HTTP methods (`request.method`), and headers (`request.headers`). It's like opening the incoming mail.
* Flask automatically converts the return value of your view functions into a **Response object**. You can return strings, tuples `(body, status)` or `(body, status, headers)`, or use `make_response` to create and customize a `Response` object directly (setting status codes, headers, cookies). This is like preparing your outgoing mail.
* These objects are built upon Werkzeug's robust foundation.
Now you know how to receive data from the user and how to send back customized responses. But writing HTML directly inside Python strings (like in our form example) gets messy very quickly. How can we separate our presentation logic (HTML) from our application logic (Python)? That's where templating comes in!
Let's move on to [Chapter 4: Templating (Jinja2 Integration)](04_templating__jinja2_integration_.md) to see how Flask makes generating HTML much easier.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,304 @@
# Chapter 4: Templating (Jinja2 Integration)
Welcome back! In [Chapter 3: Request and Response Objects](03_request_and_response_objects.md), we saw how to handle incoming requests and craft outgoing responses. We even created a simple HTML form, but we had to write the HTML code directly as a string inside our Python function. Imagine building a whole website like that it would get very messy very quickly!
How can we separate the design and structure of our web pages (HTML) from the Python code that generates the dynamic content? This chapter introduces **Templating**.
## What Problem Does It Solve? Mixing Code and Design is Messy
Think about writing a personalized email newsletter. You have a standard letter format (the design), but you need to insert specific details for each recipient (the dynamic data), like their name. You wouldn't want to write the entire letter from scratch in your code for every single person!
Similarly, when building a web page, you have the HTML structure (the design), but parts of it need to change based on data from your application (like showing the currently logged-in user's name, a list of products, or search results). Putting complex HTML directly into your Python view functions makes the code hard to read, hard to maintain, and difficult for web designers (who might not know Python) to work on.
We need a way to create HTML "templates" with special placeholders for the dynamic parts, and then have our Python code fill in those placeholders with actual data.
Flask uses a powerful template engine called **Jinja2** to solve this problem. Jinja2 lets you create HTML files (or other text files) that include variables and simple logic (like loops and conditions) directly within the template itself. Flask provides a convenient function, `render_template`, to take one of these template files, fill in the data, and give you back the final HTML ready to send to the user's browser.
It's exactly like **mail merge**:
* **Template File (`.html`):** Your standard letter format.
* **Placeholders (`{{ variable }}`):** The spots where you'd put <<Name>> or <<Address>>.
* **Context Variables (Python dictionary):** The actual data (e.g., `name="Alice"`, `address="..."`).
* **`render_template` Function:** The mail merge tool itself.
* **Final HTML:** The personalized letter ready to be sent.
## Creating Your First Template
By default, Flask looks for template files in a folder named `templates` right next to your main application file (like `hello.py`).
1. Create a folder named `templates` in the same directory as your `hello.py` file.
2. Inside the `templates` folder, create a file named `hello.html`.
```html
<!-- templates/hello.html -->
<!doctype html>
<html>
<head>
<title>Hello Flask!</title>
</head>
<body>
<h1>Hello, {{ name_in_template }}!</h1>
<p>Welcome to our templated page.</p>
</body>
</html>
```
**Explanation:**
* This is mostly standard HTML.
* `{{ name_in_template }}`: This is a Jinja2 **placeholder** or **expression**. It tells Jinja2: "When this template is rendered, replace this part with the value of the variable named `name_in_template` that the Python code provides."
## Rendering Templates with `render_template`
Now, let's modify our Python code (`hello.py`) to use this template. We need to:
1. Import the `render_template` function from Flask.
2. Call `render_template` in our view function, passing the name of the template file and any variables we want to make available in the template.
```python
# hello.py
# Make sure 'request' is imported if you use it elsewhere,
# otherwise remove it for this example.
from flask import Flask, render_template
app = Flask(__name__)
# Route for the homepage
@app.route('/')
def index():
# The name we want to display in the template
user_name = "World"
# Render the template, passing the user_name as a variable
# The key on the left ('name_in_template') is how we access it in HTML.
# The value on the right (user_name) is the Python variable.
return render_template('hello.html', name_in_template=user_name)
# NEW Route to greet a specific user using the same template
@app.route('/user/<username>')
def greet_user(username):
# Here, 'username' comes from the URL
# We still use 'name_in_template' as the key for the template
return render_template('hello.html', name_in_template=username)
# Code to run the app (from Chapter 1)
if __name__ == '__main__':
app.run(debug=True)
```
**Explanation:**
* `from flask import render_template`: We import the necessary function.
* `render_template('hello.html', ...)`: This tells Flask to find the `hello.html` file (it looks in the `templates` folder).
* `name_in_template=user_name`: This is the crucial part where we pass data *into* the template. This creates a "context" dictionary like `{'name_in_template': 'World'}` (or `{'name_in_template': 'Alice'}` in the second route). Jinja2 uses this context to fill in the placeholders. The keyword argument name (`name_in_template`) **must match** the variable name used inside the `{{ }}` in the HTML file.
**Running this:**
1. Make sure you have the `templates` folder with `hello.html` inside it.
2. Save the updated `hello.py`.
3. Run `python hello.py` in your terminal.
4. Visit `http://127.0.0.1:5000/`. Your browser will receive and display HTML generated from `hello.html`, showing: "Hello, World!".
5. Visit `http://127.0.0.1:5000/user/Alice`. Your browser will receive HTML generated from the *same* `hello.html` template, but this time showing: "Hello, Alice!".
See how we reused the same HTML structure but dynamically changed the content using `render_template` and variables!
## Basic Jinja2 Syntax: Variables, Conditionals, and Loops
Jinja2 offers more than just variable substitution. You can use basic programming constructs right inside your HTML.
There are two main types of delimiters:
* `{{ ... }}`: Used for **expressions**. This is where you put variables you want to display, or even simple calculations or function calls. The result is inserted into the HTML.
* `{% ... %}`: Used for **statements**. This includes things like `if`/`else` blocks, `for` loops, and other control structures. These don't directly output text but control how the template is rendered.
Let's look at some examples.
### Example: Using `if`/`else`
Imagine you want to show different content depending on whether a user is logged in.
**Python (`hello.py`):**
```python
# hello.py (add this route)
@app.route('/profile')
def profile():
# Simulate a logged-in user for demonstration
current_user = {'name': 'Charlie', 'is_logged_in': True}
# Simulate no user logged in
# current_user = None
return render_template('profile.html', user=current_user)
# ... (keep other routes and run code)
```
**Template (`templates/profile.html`):**
```html
<!-- templates/profile.html -->
<!doctype html>
<html>
<head><title>User Profile</title></head>
<body>
{% if user and user.is_logged_in %}
<h1>Welcome back, {{ user.name }}!</h1>
<p>You are logged in.</p>
{% else %}
<h1>Welcome, Guest!</h1>
<p>Please log in.</p>
{% endif %}
</body>
</html>
```
**Explanation:**
* `{% if user and user.is_logged_in %}`: Starts an `if` block. Jinja2 checks if the `user` variable exists and if its `is_logged_in` attribute is true.
* `{% else %}`: If the `if` condition is false, the code under `else` is used.
* `{% endif %}`: Marks the end of the `if` block.
* `{{ user.name }}`: Accesses the `name` attribute of the `user` dictionary passed from Python.
If you run this and visit `/profile`, you'll see the "Welcome back, Charlie!" message. If you change `current_user` to `None` in the Python code and refresh, you'll see the "Welcome, Guest!" message.
### Example: Using `for` Loops
Let's say you want to display a list of items.
**Python (`hello.py`):**
```python
# hello.py (add this route)
@app.route('/items')
def show_items():
item_list = ['Apple', 'Banana', 'Cherry']
return render_template('items.html', items=item_list)
# ... (keep other routes and run code)
```
**Template (`templates/items.html`):**
```html
<!-- templates/items.html -->
<!doctype html>
<html>
<head><title>Item List</title></head>
<body>
<h2>Available Items:</h2>
<ul>
{% for fruit in items %}
<li>{{ fruit }}</li>
{% else %}
<li>No items available.</li>
{% endfor %}
</ul>
</body>
</html>
```
**Explanation:**
* `{% for fruit in items %}`: Starts a `for` loop. It iterates over the `items` list passed from Python. In each iteration, the current item is assigned to the variable `fruit`.
* `<li>{{ fruit }}</li>`: Inside the loop, we display the current `fruit`.
* `{% else %}`: This optional block is executed if the `items` list was empty.
* `{% endfor %}`: Marks the end of the `for` loop.
Visiting `/items` will show a bulleted list of the fruits.
## Generating URLs within Templates using `url_for`
Just like we used `url_for` in Python ([Chapter 2: Routing System](02_routing_system.md)) to avoid hardcoding URLs, we often need to generate URLs within our HTML templates (e.g., for links or form actions). Flask automatically makes the `url_for` function available inside your Jinja2 templates.
**Template (`templates/navigation.html`):**
```html
<!-- templates/navigation.html -->
<nav>
<ul>
<li><a href="{{ url_for('index') }}">Home</a></li>
<li><a href="{{ url_for('show_items') }}">Items</a></li>
<li><a href="{{ url_for('greet_user', username='Admin') }}">Admin Profile</a></li>
<!-- Example link that might require login -->
{% if user and user.is_logged_in %}
<li><a href="{{ url_for('profile') }}">My Profile</a></li>
{% else %}
<li><a href="#">Login</a></li> {# Replace # with login URL later #}
{% endif %}
</ul>
</nav>
```
**Explanation:**
* `{{ url_for('index') }}`: Generates the URL for the view function associated with the endpoint `'index'` (which is likely `/`).
* `{{ url_for('show_items') }}`: Generates the URL for the `show_items` endpoint (likely `/items`).
* `{{ url_for('greet_user', username='Admin') }}`: Generates the URL for the `greet_user` endpoint, filling in the `username` variable (likely `/user/Admin`).
Using `url_for` in templates ensures that your links will always point to the correct place, even if you change the URL rules in your Python code later.
## Under the Hood: How `render_template` Works
When you call `render_template('some_template.html', var=value)`, here's a simplified sequence of what happens inside Flask and Jinja2:
1. **Get Jinja Environment:** Flask accesses its configured Jinja2 environment (`current_app.jinja_env`). This environment holds the settings, filters, globals, and crucially, the **template loader**. (See `templating.py:render_template` which accesses `current_app.jinja_env`).
2. **Find Template:** The environment asks its loader (`app.jinja_env.loader`, which is typically a `DispatchingJinjaLoader` as created in `app.py:create_jinja_environment` and `templating.py:Environment`) to find the template file (`'some_template.html'`).
3. **Loader Search:** The `DispatchingJinjaLoader` knows where to look:
* It first checks the application's `template_folder` (usually `./templates`).
* If not found, it checks the `template_folder` of any registered Blueprints (more on those in [Chapter 8: Blueprints](08_blueprints.md)). (See `templating.py:DispatchingJinjaLoader._iter_loaders`).
4. **Load and Parse:** Once the loader finds the file, Jinja2 reads its content, parses it, and compiles it into an internal representation (a `Template` object) for efficient rendering. This might be cached. (Handled by `jinja_env.get_or_select_template`).
5. **Update Context:** Flask calls `app.update_template_context(context)` to add standard variables like `request`, `session`, `g`, and `config` to the dictionary of variables you passed (`{'var': value}`). This is done using "context processors" (more in [Chapter 5](05_context_globals___current_app____request____session____g__.md)). (See `templating.py:_render`).
6. **Signal:** Flask sends the `before_render_template` signal.
7. **Render:** The `Template` object's `render()` method is called with the combined context dictionary. Jinja2 processes the template, executing statements (`{% %}`) and substituting expressions (`{{ }}`) with values from the context.
8. **Return HTML:** The `render()` method returns the final, fully rendered HTML string.
9. **Signal:** Flask sends the `template_rendered` signal.
10. **Send Response:** Flask takes this HTML string and builds an HTTP Response object to send back to the browser ([Chapter 3](03_request_and_response_objects.md)).
```mermaid
sequenceDiagram
participant ViewFunc as Your View Function
participant RenderFunc as flask.render_template()
participant JinjaEnv as app.jinja_env
participant Loader as DispatchingJinjaLoader
participant TemplateObj as Template Object
participant Response as Flask Response
ViewFunc->>+RenderFunc: render_template('hello.html', name_in_template='Alice')
RenderFunc->>+JinjaEnv: get_or_select_template('hello.html')
JinjaEnv->>+Loader: Find 'hello.html'
Loader-->>-JinjaEnv: Found template file content
JinjaEnv-->>-RenderFunc: Return compiled TemplateObj
Note over RenderFunc, Response: Update context (add request, g, etc.)
RenderFunc->>+TemplateObj: render({'name_in_template': 'Alice', 'request': ..., ...})
TemplateObj-->>-RenderFunc: Return "<html>...Hello, Alice!...</html>"
RenderFunc-->>-ViewFunc: Return HTML string
ViewFunc->>+Response: Create Response from HTML string
Response-->>-ViewFunc: Response object
ViewFunc-->>Browser: Return Response
```
The key players are the `Flask` application instance (which holds the Jinja2 environment configuration), the `render_template` function, and the Jinja2 `Environment` itself, which uses loaders to find templates and context processors to enrich the data available during rendering.
## Conclusion
Templating is a fundamental technique for building dynamic web pages. Flask integrates seamlessly with the powerful Jinja2 template engine.
* We learned that templating separates HTML structure from Python logic.
* Flask looks for templates in a `templates` folder by default.
* The `render_template()` function is used to load a template file and pass data (context variables) to it.
* Jinja2 templates use `{{ variable }}` to display data and `{% statement %}` for control flow (like `if` and `for`).
* The `url_for()` function is available in templates for generating URLs dynamically.
Now you can create clean, maintainable HTML pages driven by your Flask application's data and logic.
But how do functions like `url_for`, and variables like `request` and `session`, magically become available inside templates without us explicitly passing them every time? This happens through Flask's context system and context processors. Let's explore these "magic" variables in the next chapter.
Ready to uncover the context? Let's move on to [Chapter 5: Context Globals (`current_app`, `request`, `session`, `g`)](05_context_globals___current_app____request____session____g__.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,311 @@
# Chapter 5: Context Globals (`current_app`, `request`, `session`, `g`)
Welcome back! In [Chapter 4: Templating (Jinja2 Integration)](04_templating__jinja2_integration_.md), we learned how to separate our HTML structure from our Python code using templates and the `render_template` function. We saw how variables like `request` and functions like `url_for` seemed to be magically available in our templates.
But how does that work? And more importantly, how can we easily access important information like the current application instance or the details of the incoming web request *inside* our Python view functions without passing these objects around manually to every single function? Imagine having to add `app` and `request` as arguments to all your helper functions it would be very repetitive!
This chapter introduces Flask's solution: **Context Globals**.
## What Problem Do They Solve? Avoiding Tedious Parameter Passing
Think about working on a team project. There are certain tools or pieces of information everyone on the team needs access to frequently: the project plan, the shared calendar, the main contact person. It would be inefficient if every time someone needed the project plan, they had to specifically ask someone else to pass it to them. Instead, you might have a central place or a well-known name (like "The Plan") that everyone knows how to find.
Similarly, in a Flask application, several objects are very commonly needed while handling a web request:
* The application instance itself (to access configuration, loggers, etc.).
* The incoming request object (to get form data, query parameters, headers, etc.).
* A way to store temporary information related to the current user across multiple requests (the session).
* A temporary storage space just for the *current* request.
Passing these objects explicitly as parameters to every function that might need them (especially view functions, `before_request` functions, `after_request` functions, template context processors) would make our code cluttered and harder to manage.
Flask provides special "global" variables **`current_app`**, **`request`**, **`session`**, and **`g`** that act like smart pointers. They automatically find and give you access to the *correct* object relevant to the specific request you are currently handling, without you needing to pass anything around. They feel like magic variables!
## Meet the Context Globals
These special variables are technically called **proxies**. Think of a proxy as a stand-in or an agent. When you talk to the `request` proxy, it secretly finds the *actual* request object for the HTTP request that is currently being processed and acts on its behalf. This magic happens using Flask's "context" system, which we'll touch on later and explore more in [Chapter 7](07_application_and_request_contexts.md).
Let's meet the main context globals:
1. **`request`**: Represents the incoming HTTP request from the client (browser). It contains all the data the client sent, like form data, URL parameters, HTTP headers, the requested URL, etc. We already used this in [Chapter 3: Request and Response Objects](03_request_and_response_objects.md).
2. **`session`**: A dictionary-like object that lets you store information specific to a user *across multiple requests*. It's commonly used for things like remembering if a user is logged in, or storing items in a shopping cart. Flask typically uses secure cookies to handle this.
3. **`current_app`**: Represents the *instance* of your Flask application that is handling the current request. This is useful for accessing application-wide configurations, resources, or extensions. It points to the same object you created with `app = Flask(__name__)` in [Chapter 1](01_application_object___flask__.md), but you can access it from anywhere *during* a request without needing the `app` variable directly.
4. **`g`**: A simple namespace object (think of it like an empty box or scratchpad) that is available only for the duration of the *current request*. You can use it to store temporary data that multiple functions within the same request cycle might need access to, without passing it around. For example, you might store the current logged-in user object or a database connection here. It gets reset for every new request. The 'g' stands for "global", but it's global *only within the request context*.
## Using the Context Globals
First, you usually need to import them from the `flask` package:
```python
from flask import Flask, request, session, current_app, g, render_template
import os # For generating a secret key
# Create the application object
app = Flask(__name__)
# !! IMPORTANT !! Sessions require a secret key for security.
# In a real app, set this from an environment variable or config file!
# Never hardcode it like this in production.
app.config['SECRET_KEY'] = os.urandom(24)
# We'll learn more about config in Chapter 6: Configuration (Config)
```
Now let's see how to use them.
### `request`: Accessing Incoming Data
We saw this in Chapter 3. Notice how the `index` function can use `request` directly without it being passed as an argument.
```python
# hello.py (continued)
@app.route('/')
def index():
user_agent = request.headers.get('User-Agent', 'Unknown')
method = request.method
return f'Welcome! Method: {method}, Browser: {user_agent}'
```
**Explanation:**
* `request.headers.get(...)`: Accesses the HTTP headers from the incoming request.
* `request.method`: Gets the HTTP method used (e.g., 'GET', 'POST').
Flask automatically makes the correct `request` object available here when the `/` route is visited.
### `current_app`: Accessing Application Settings
Imagine you want to log something using the application's logger or access a configuration value.
```python
# hello.py (continued)
# Add another config value for demonstration
app.config['MY_SETTING'] = 'Flask is Cool'
@app.route('/app-info')
def app_info():
# Access the application's logger
current_app.logger.info('Someone accessed the app-info page.')
# Access a configuration value
setting = current_app.config.get('MY_SETTING', 'Default Value')
debug_mode = current_app.config['DEBUG'] # Accessing debug status
return f'My Setting: {setting}<br>Debug Mode: {debug_mode}'
# Make sure debug is enabled for the logger example to show easily
# if __name__ == '__main__':
# app.run(debug=True)
```
**Explanation:**
* `current_app.logger.info(...)`: Uses the logger configured on the `app` object.
* `current_app.config.get(...)`: Accesses the application's configuration dictionary.
Again, `app_info` doesn't need `app` passed in; `current_app` provides access to it within the request context.
### `session`: Remembering Things Across Requests
Sessions allow you to store data associated with a specific user's browser session. Flask uses a secret key (`app.secret_key` or `app.config['SECRET_KEY']`) to cryptographically sign the session cookie, preventing users from modifying it. **Always set a strong, random secret key!**
Let's create a simple view counter that increments each time the *same* user visits the page.
```python
# hello.py (continued)
@app.route('/counter')
def counter():
# Get the current count from the session, default to 0 if not found
count = session.get('view_count', 0)
# Increment the count
count += 1
# Store the new count back in the session
session['view_count'] = count
# Log the session content (for demonstration)
current_app.logger.info(f"Session data: {session}")
return f'You have visited this page {count} times during this session.'
```
**Explanation:**
* `session.get('view_count', 0)`: Reads the `view_count` value from the session. If it's the first visit, it doesn't exist yet, so we default to `0`.
* `session['view_count'] = count`: Stores the updated count back into the session.
* Flask handles sending the updated session data back to the browser in a secure cookie behind the scenes.
**Running this:**
1. Make sure `app.config['SECRET_KEY']` is set in your `hello.py`.
2. Run `python hello.py`.
3. Visit `http://127.0.0.1:5000/counter`. You'll see "You have visited this page 1 times...".
4. Refresh the page. You'll see "You have visited this page 2 times...".
5. Refresh again. It will become 3, and so on.
6. If you close your browser completely and reopen it (or use a private/incognito window), the count will reset to 1 because the session cookie is typically cleared or different.
### `g`: Temporary Storage for a Single Request
The `g` object is useful for storing data that needs to be accessed by multiple functions *within the same request cycle*. A common example is loading the current user's information from a database or verifying an API key. You might do this in a `@app.before_request` function and then access the result in your view function using `g`.
Let's simulate loading some data before the request and accessing it in the view.
```python
# hello.py (continued)
import time
# This function runs BEFORE every request
@app.before_request
def load_request_data():
# Imagine loading data from a database or external source here
g.request_time = time.time()
g.user = 'Guest' # Default user
# Maybe check for an API key or user session here and set g.user accordingly
# For example: if session.get('logged_in_user'): g.user = session['logged_in_user']
current_app.logger.info(f"Before request: Set g.user to {g.user}")
@app.route('/show-g')
def show_g():
# Access the data stored in 'g' by the before_request handler
req_time = g.get('request_time', 'Not Set')
current_user = g.get('user', 'Unknown')
# Check if it's still there after the request (it shouldn't be for the *next* request)
# We can't easily show this here, but g is cleared between requests.
return f'Data from g:<br>Request Time: {req_time}<br>User: {current_user}'
# This function runs AFTER every request, even if errors occur
# It receives the response object
@app.teardown_request
def teardown_request_data(exception=None):
# This is a good place to clean up resources stored in g, like DB connections
req_time = g.pop('request_time', None) # Safely remove request_time
user = g.pop('user', None) # Safely remove user
if req_time:
duration = time.time() - req_time
current_app.logger.info(f"Teardown request: User={user}, Duration={duration:.4f}s")
else:
current_app.logger.info("Teardown request: g values already popped or not set.")
# ... (rest of the app, including if __name__ == '__main__': app.run(debug=True))
```
**Explanation:**
* `@app.before_request`: This decorator registers `load_request_data` to run before each request is processed.
* `g.request_time = ...` and `g.user = ...`: We store arbitrary data on the `g` object. It acts like a Python object where you can set attributes.
* `g.get('request_time', ...)`: In the view function `show_g`, we retrieve the data stored on `g`. Using `.get()` is safer as it allows providing a default if the attribute wasn't set.
* `@app.teardown_request`: This decorator registers `teardown_request_data` to run after the request has been handled and the response sent, even if an exception occurred. It's a good place to clean up resources stored in `g`. `g.pop()` is used to get the value and remove it, preventing potential issues if the teardown runs multiple times in complex scenarios.
When you visit `/show-g`, the `before_request` function runs first, setting `g.user` and `g.request_time`. Then `show_g` runs and reads those values from `g`. Finally, `teardown_request` runs. If you make another request, `g` will be empty again until `before_request` runs for that *new* request.
## Why "Context"? The Magic Behind the Scenes
How do these globals always know which `request` or `app` to point to, especially if your web server is handling multiple requests at the same time?
Flask manages this using **Contexts**. There are two main types:
1. **Application Context:** Holds information about the application itself. When an application context is active, `current_app` and `g` point to the correct application instance and its request-global storage (`g`). An application context is automatically created when a request context is pushed, or you can create one manually using `with app.app_context():`. This is needed for tasks that aren't tied to a specific request but need the application, like running background jobs or initializing database tables via a script.
2. **Request Context:** Holds information about a single, specific HTTP request. When a request context is active, `request` and `session` point to the correct request object and session data for *that specific request*. Flask automatically creates and activates (pushes) a request context when it receives an incoming HTTP request and removes (pops) it when the request is finished.
Think of these contexts like temporary bubbles or environments. When Flask handles a request, it inflates a request context bubble (which automatically includes an application context bubble inside it). Inside this bubble, the names `request`, `session`, `current_app`, and `g` are set up to point to the objects belonging to *that specific bubble*. If another request comes in concurrently (in a different thread or process), Flask creates a *separate* bubble for it, and the context globals inside that second bubble point to *its* own request, session, app, and g objects.
This system ensures that even with multiple simultaneous requests, `request` in the code handling request A always refers to request A's data, while `request` in the code handling request B always refers to request B's data.
We will explore contexts in more detail in [Chapter 7: Application and Request Contexts](07_application_and_request_contexts.md).
## Under the Hood: Proxies and `contextvars`
How do these variables like `request` actually *do* the lookup within the current context?
Flask uses a concept called **Local Proxies**, specifically `werkzeug.local.LocalProxy`. These proxy objects are essentially clever stand-ins. When you access an attribute or method on a proxy (like `request.method`), the proxy doesn't have the method itself. Instead, it performs a lookup to find the *real* object it should be representing *at that moment* based on the current context.
Under the hood, Flask (since version 1.1, leveraging Werkzeug updates) uses Python's built-in `contextvars` module (or a backport for older Python versions). `contextvars` provides special kinds of variables (`ContextVar`) that can hold different values depending on the current execution context (like the specific request/thread/async task being handled).
1. Flask defines context variables, for example, `_cv_request` in `flask.globals`.
2. When a request context is pushed (`RequestContext.push()` in `ctx.py`), Flask stores the actual `Request` object for the current request into `_cv_request` *for the current context*.
3. The `request` global variable (defined in `flask.globals`) is a `LocalProxy` that is configured to look up the object stored in `_cv_request`.
4. When your code uses `request.method`, the proxy sees it needs the real request object, looks at the current context's value for `_cv_request`, gets the real `Request` object stored there, and then calls the `.method` attribute on *that* object.
A similar process happens for `current_app`, `session`, and `g` using `_cv_app`.
Here's how `request` and `session` are defined in `flask/globals.py`:
```python
# flask/globals.py (simplified)
from contextvars import ContextVar
from werkzeug.local import LocalProxy
# ... other imports
# Context Variables hold the actual context objects
_cv_app: ContextVar[AppContext] = ContextVar("flask.app_ctx")
_cv_request: ContextVar[RequestContext] = ContextVar("flask.request_ctx")
# Proxies point to objects within the currently active context
# The LocalProxy is told how to find the real object (e.g., via _cv_request)
# and which attribute on that context object to return (e.g., 'request')
request: Request = LocalProxy(_cv_request, "request") # type: ignore
session: SessionMixin = LocalProxy(_cv_request, "session") # type: ignore
current_app: Flask = LocalProxy(_cv_app, "app") # type: ignore
g: _AppCtxGlobals = LocalProxy(_cv_app, "g") # type: ignore
```
This proxy mechanism allows you to write clean code using simple global names, while Flask handles the complexity of ensuring those names point to the correct, context-specific objects behind the scenes.
Here's a diagram showing two concurrent requests and how the `request` proxy resolves differently in each context:
```mermaid
sequenceDiagram
participant UserCodeA as View Func (Req A)
participant Proxy as request (LocalProxy)
participant ContextVars as Context Storage
participant UserCodeB as View Func (Req B)
Note over UserCodeA, UserCodeB: Requests A and B handled concurrently
UserCodeA->>+Proxy: Access request.method
Proxy->>+ContextVars: Get current value of _cv_request
ContextVars-->>-Proxy: Return RequestContext A
Proxy->>RequestContextA: Get 'request' attribute (Real Request A)
RequestContextA-->>Proxy: Return Real Request A
Proxy->>RealRequestA: Access 'method' attribute
RealRequestA-->>Proxy: Return 'GET'
Proxy-->>-UserCodeA: Return 'GET'
UserCodeB->>+Proxy: Access request.form['name']
Proxy->>+ContextVars: Get current value of _cv_request
ContextVars-->>-Proxy: Return RequestContext B
Proxy->>RequestContextB: Get 'request' attribute (Real Request B)
RequestContextB-->>Proxy: Return Real Request B
Proxy->>RealRequestB: Access 'form' attribute
RealRequestB-->>Proxy: Return FormDict B
Proxy->>FormDictB: Get item 'name'
FormDictB-->>Proxy: Return 'Bob'
Proxy-->>-UserCodeB: Return 'Bob'
```
## Conclusion
You've learned about Flask's Context Globals: `current_app`, `request`, `session`, and `g`. These are powerful proxy objects that simplify your code by providing easy access to application- or request-specific information without needing to pass objects around manually.
* **`request`**: Accesses incoming request data.
* **`session`**: Stores user-specific data across requests (requires `SECRET_KEY`).
* **`current_app`**: Accesses the active application instance and its config/resources.
* **`g`**: A temporary storage space for the duration of a single request.
These globals work their magic through Flask's **context** system (Application Context and Request Context) and **proxies** that look up the correct object in the currently active context, often powered by Python's `contextvars`.
Understanding these globals is key to writing idiomatic Flask code. You'll frequently use `request` to handle user input, `session` for user state, `current_app` for configuration, and `g` for managing request-scoped resources like database connections.
Speaking of configuration, how exactly do we set things like the `SECRET_KEY`, database URLs, or other settings for our application? That's the topic of our next chapter.
Let's learn how to manage settings effectively in [Chapter 6: Configuration (`Config`)](06_configuration___config__.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,356 @@
# Chapter 6: Configuration (`Config`)
Welcome back! In [Chapter 5: Context Globals (`current_app`, `request`, `session`, `g`)](05_context_globals___current_app____request____session____g__.md), we saw how Flask uses context globals like `current_app` and `session`. We even learned that using the `session` requires setting a `SECRET_KEY` on our application object. But where is the best place to put settings like the secret key, or maybe a database connection string, or a flag to turn debugging features on or off? We definitely don't want to hardcode these directly into our main application logic!
This chapter introduces Flask's built-in solution: the **Configuration** system.
## What Problem Does It Solve? The Need for a Settings Panel
Imagine building a piece of electronic equipment, like a stereo amplifier. It has various knobs and switches: volume, bass, treble, input source selectors. These controls allow you to adjust the amplifier's behavior without opening it up and rewiring things.
A web application also needs settings to control its behavior:
* **Security:** A `SECRET_KEY` is needed for secure sessions.
* **Debugging:** Should detailed error messages be shown (useful for development, dangerous for production)?
* **Database:** Where is the database located? What are the login credentials?
* **External Services:** What are the API keys for services like email sending or payment processing?
Hardcoding these values directly in your view functions or application setup code is messy and inflexible. If you need to change the database location when deploying your app from your laptop to a real server, you'd have to find and change the code. This is prone to errors and makes managing different environments (development, testing, production) difficult.
Flask provides a central object, usually accessed via `app.config`, that acts like your application's main **settings panel**. It's a dictionary-like object where you can store all your configuration values. Flask itself uses this object for its own settings (like `DEBUG` or `SECRET_KEY`), and you can add your own custom settings too. Crucially, Flask provides convenient ways to load these settings from different places, like files or environment variables, keeping your configuration separate from your code.
Our primary use case right now is setting the `SECRET_KEY` properly so we can use the `session` object securely, as discussed in [Chapter 5](05_context_globals___current_app____request____session____g__.md).
## Meet `app.config`
When you create a Flask application object (`app = Flask(__name__)`), Flask automatically creates a configuration object for you, accessible as `app.config`.
* It works like a standard Python dictionary: you can store values using keys (e.g., `app.config['SECRET_KEY'] = '...'`) and retrieve them (e.g., `key = app.config['SECRET_KEY']`).
* Keys are typically uppercase strings (e.g., `DEBUG`, `DATABASE_URI`). Flask's built-in settings follow this convention, and it's recommended for your own settings too.
* It comes pre-populated with some default values.
* It has special methods to load configuration from various sources.
## Populating the Configuration
There are several ways to add settings to `app.config`. Let's explore the most common ones.
### 1. Directly from Code (In-Place)
You can set configuration values directly like you would with a dictionary. This is often done right after creating the `app` object.
```python
# hello.py (or your main app file)
from flask import Flask
import os
app = Flask(__name__)
# Setting configuration directly
app.config['DEBUG'] = True # Turn on debug mode
app.config['SECRET_KEY'] = os.urandom(24) # Generate a random key (OK for simple dev)
app.config['MY_CUSTOM_SETTING'] = 'Hello Config!'
print(f"Debug mode is: {app.config['DEBUG']}")
print(f"My custom setting: {app.config.get('MY_CUSTOM_SETTING')}")
# Using .get() is safer if the key might not exist
print(f"Another setting: {app.config.get('NON_EXISTENT_KEY', 'Default Value')}")
# ... rest of your app (routes, etc.) ...
# Example route accessing config
@app.route('/config-example')
def config_example():
custom_val = app.config.get('MY_CUSTOM_SETTING', 'Not set')
return f'The custom setting is: {custom_val}'
if __name__ == '__main__':
# The app.run(debug=True) argument also sets app.config['DEBUG'] = True
# but setting it explicitly ensures it's set even if run differently.
app.run()
```
**Explanation:**
* We directly assign values to keys in `app.config`.
* `os.urandom(24)` generates a random byte string suitable for a secret key during development. **Never hardcode a predictable secret key, especially in production!**
* We can access values using `[]` or the safer `.get()` method which allows providing a default.
**When to use:** Good for setting Flask's built-in defaults (like `DEBUG`) temporarily during development or setting simple, non-sensitive values. **Not ideal for secrets or complex configurations**, especially for deployment, as it mixes configuration with code.
### 2. From a Python Object (`from_object`)
You can define your configuration in a separate Python object (like a class) or a dedicated module (`.py` file) and then load it using `app.config.from_object()`. This method only loads attributes whose names are **all uppercase**.
First, create a configuration file, say `config.py`:
```python
# config.py
# Note: Only uppercase variables will be loaded by from_object
DEBUG = True # Set debug mode
SECRET_KEY = 'a-very-secret-and-complex-key-loaded-from-object' # KEEP SECRET IN REAL APPS
DATABASE_URI = 'sqlite:///mydatabase.db'
# This lowercase variable will NOT be loaded into app.config
internal_value = 'ignore me'
```
Now, load it in your main application file:
```python
# hello.py
from flask import Flask
app = Flask(__name__)
# Load configuration from the config.py file (using its import path as a string)
app.config.from_object('config')
# Alternatively, if you imported the module:
# import config
# app.config.from_object(config)
print(f"Loaded Debug: {app.config.get('DEBUG')}")
print(f"Loaded Secret Key: {app.config.get('SECRET_KEY')}")
print(f"Loaded DB URI: {app.config.get('DATABASE_URI')}")
print(f"Internal Value (should be None): {app.config.get('internal_value')}")
# ... rest of your app ...
if __name__ == '__main__':
app.run()
```
**Explanation:**
* `app.config.from_object('config')` tells Flask to import the module named `config` (which corresponds to `config.py`) and look for any uppercase attributes (`DEBUG`, `SECRET_KEY`, `DATABASE_URI`).
* It copies the values of these uppercase attributes into the `app.config` dictionary.
* `internal_value` is ignored because it's lowercase.
**When to use:** Great for organizing your default configuration or different configurations (e.g., `DevelopmentConfig`, `ProductionConfig` classes) within your project structure. Helps keep settings separate from application logic.
### 3. From a Python File (`from_pyfile`)
Similar to `from_object`, but instead of importing a module, `app.config.from_pyfile()` executes a Python file (it doesn't have to end in `.py`, often `.cfg` is used by convention) and loads its uppercase variables.
Create a configuration file, say `settings.cfg`:
```python
# settings.cfg
# This file will be executed by Python
SECRET_KEY = 'secret-key-loaded-from-pyfile'
SERVER_NAME = '127.0.0.1:5000' # Example setting
# You can even have simple logic if needed
import os
APP_ROOT = os.path.dirname(__file__)
```
Load it in your application:
```python
# hello.py
from flask import Flask
import os
app = Flask(__name__)
# Construct the path to the config file relative to this file
# __file__ is the path to the current python script (hello.py)
# os.path.dirname gets the directory containing hello.py
# os.path.join creates the full path to settings.cfg
config_file_path = os.path.join(os.path.dirname(__file__), 'settings.cfg')
# Load configuration from the file
# Set silent=True to ignore errors if the file doesn't exist
loaded = app.config.from_pyfile(config_file_path, silent=False)
if loaded:
print("Loaded config from settings.cfg")
print(f"Loaded Secret Key: {app.config.get('SECRET_KEY')}")
print(f"Loaded Server Name: {app.config.get('SERVER_NAME')}")
print(f"Calculated APP_ROOT: {app.config.get('APP_ROOT')}")
else:
print("Could not load settings.cfg")
# ... rest of your app ...
if __name__ == '__main__':
app.run()
```
**Explanation:**
* `app.config.from_pyfile('settings.cfg')` reads the specified file, executes it as Python code, and loads the uppercase variables into `app.config`.
* This allows configuration files to be simple variable assignments but also include basic Python logic if needed.
* The `silent=True` argument is useful if the config file is optional.
**When to use:** Very flexible. Good for separating configuration completely from your application package. Often used for instance-specific configurations (settings for a particular deployment).
### 4. From Environment Variables (`from_envvar`)
This is a common pattern, especially for production deployment. Instead of hardcoding the *path* to a configuration file, you store the path in an environment variable. `app.config.from_envvar()` reads the filename from the specified environment variable and then loads that file using `from_pyfile`.
Imagine you have your `settings.cfg` from the previous example.
Before running your app, you set an environment variable in your terminal:
* **Linux/macOS:** `export YOURAPP_SETTINGS=/path/to/your/settings.cfg`
* **Windows (cmd):** `set YOURAPP_SETTINGS=C:\path\to\your\settings.cfg`
* **Windows (PowerShell):** `$env:YOURAPP_SETTINGS="C:\path\to\your\settings.cfg"`
Then, in your code:
```python
# hello.py
from flask import Flask
app = Flask(__name__)
# Load configuration from the file specified by the YOURAPP_SETTINGS env var
# Set silent=True to allow the app to run even if the env var isn't set
loaded = app.config.from_envvar('YOURAPP_SETTINGS', silent=True)
if loaded:
print(f"Loaded config from file specified in YOURAPP_SETTINGS: {app.config.get('SECRET_KEY')}")
else:
print("YOURAPP_SETTINGS environment variable not set or file not found.")
# You might want to set default configs here or raise an error
# ... rest of your app ...
if __name__ == '__main__':
app.run()
```
**Explanation:**
* `app.config.from_envvar('YOURAPP_SETTINGS')` looks for the environment variable `YOURAPP_SETTINGS`.
* If found, it takes the value (which should be a file path, e.g., `/path/to/your/settings.cfg`) and loads that file using `from_pyfile()`.
* This decouples the *location* of the config file from your application code.
**When to use:** Excellent for production and deployment. Allows operators to specify the configuration file location without modifying the application code. Essential for managing different environments (development, staging, production) where configuration files might reside in different places or contain different values (especially secrets).
### Loading Order and Overrides
You can use multiple loading methods. Each subsequent method will **override** any values set by previous methods if the keys are the same.
A common pattern is:
1. Set default values directly in `app.config` or load from a default `config.py` using `from_object`.
2. Load settings from an instance-specific file (e.g., `settings.cfg`) using `from_pyfile` or `from_envvar`. This allows deployment-specific settings (like database URLs or secret keys) to override the defaults.
```python
# hello.py
from flask import Flask
import os
app = Flask(__name__)
# 1. Set built-in defaults maybe? Or load from a base config object.
app.config['DEBUG'] = False # Default to False for safety
app.config['SECRET_KEY'] = 'default-insecure-key' # Default bad key
# You could load more defaults from an object here:
# app.config.from_object('yourapp.default_config')
# 2. Try to load from an environment variable pointing to a deployment-specific file
config_file_path = os.environ.get('YOURAPP_SETTINGS')
if config_file_path:
try:
app.config.from_pyfile(config_file_path)
print(f"Loaded overrides from {config_file_path}")
except OSError as e:
print(f"Warning: Could not load config file {config_file_path}: {e}")
else:
print("Info: YOURAPP_SETTINGS environment variable not set, using defaults.")
print(f"Final Debug value: {app.config['DEBUG']}")
print(f"Final Secret Key: {app.config['SECRET_KEY']}")
# ... rest of your app ...
if __name__ == '__main__':
app.run()
```
Now, if `YOURAPP_SETTINGS` points to a file containing `DEBUG = True` and a different `SECRET_KEY`, those values will override the defaults set earlier.
## Accessing Configuration Values
Once loaded, you can access configuration values anywhere you have access to the application object (`app`) or the `current_app` proxy (within a request or application context, see [Chapter 5](05_context_globals___current_app____request____session____g__.md)).
```python
from flask import current_app, session
# Inside a view function or other request-context code:
@app.route('/some-route')
def some_view():
# Using current_app proxy
api_key = current_app.config.get('MY_API_KEY')
if not api_key:
return "Error: API Key not configured!", 500
# Flask extensions often use app.config too
session['user_id'] = 123 # Uses current_app.config['SECRET_KEY'] implicitly
# ... use api_key ...
return f"Using API Key starting with: {api_key[:5]}..."
# Accessing outside a request context (e.g., in setup code)
# Requires the app object directly or an app context
with app.app_context():
print(f"Accessing SECRET_KEY via current_app: {current_app.config['SECRET_KEY']}")
# Or directly via the app object if available
print(f"Accessing SECRET_KEY via app: {app.config['SECRET_KEY']}")
```
## Under the Hood: The `Config` Object
What's happening when you call these methods?
1. **`app.config` Object:** When you create `Flask(__name__)`, the `Flask` constructor creates an instance of `app.config_class` (which defaults to `flask.Config`) and assigns it to `app.config`. The constructor passes the application's `root_path` and the `default_config` dictionary. (See `Flask.__init__` in `app.py` calling `self.make_config`, which uses `self.config_class` defined in `sansio/app.py`).
2. **`Config` Class:** The `flask.Config` class (in `config.py`) inherits directly from Python's built-in `dict`. This is why you can use standard dictionary methods like `[]`, `.get()`, `.update()`, etc.
3. **Loading Methods:**
* `from_object(obj)`: If `obj` is a string, it imports it using `werkzeug.utils.import_string`. Then, it iterates through the attributes of the object (`dir(obj)`) and copies any attribute whose name is entirely uppercase into the config dictionary (`self[key] = getattr(obj, key)`).
* `from_pyfile(filename)`: It constructs the full path to the file using `os.path.join(self.root_path, filename)`. It creates a temporary module object (`types.ModuleType`). It opens and reads the file, compiles the content (`compile()`), and then executes it within the temporary module's dictionary (`exec(..., d.__dict__)`). Finally, it calls `self.from_object()` on the temporary module object to load the uppercase variables.
* `from_envvar(variable_name)`: It simply reads the environment variable (`os.environ.get(variable_name)`). If the variable exists and is not empty, it calls `self.from_pyfile()` using the value of the environment variable as the filename.
Here's a simplified diagram for `from_pyfile`:
```mermaid
sequenceDiagram
participant UserCode as Your App Code
participant AppConfig as app.config (Config obj)
participant OS as File System
participant PythonExec as Python Interpreter
UserCode->>+AppConfig: app.config.from_pyfile('settings.cfg')
AppConfig->>+OS: Find file 'settings.cfg' relative to root_path
OS-->>-AppConfig: Return file handle
AppConfig->>+PythonExec: Compile and Execute file content in a temporary module scope
PythonExec-->>-AppConfig: Execution complete (vars defined in temp scope)
AppConfig->>AppConfig: Iterate temp scope, copy UPPERCASE vars to self (dict)
AppConfig-->>-UserCode: Return True (if successful)
```
The key takeaway is that `app.config` is fundamentally a Python dictionary enhanced with convenient methods for populating itself from common configuration sources like Python objects, files, and environment variables, filtering for uppercase keys.
## Conclusion
Configuration is essential for any non-trivial Flask application. The `app.config` object provides a centralized, dictionary-like store for all your application settings.
* We learned that configuration helps separate settings (like `SECRET_KEY`, `DEBUG`, database URLs) from application code.
* `app.config` is the central object, behaving like a dictionary.
* We explored various ways to load configuration: directly in code, from Python objects (`from_object`), from Python files (`from_pyfile`), and via environment variables pointing to files (`from_envvar`).
* We saw that loading order matters, allowing defaults to be overridden by deployment-specific settings.
* Configuration can be accessed using `app.config` or `current_app.config`.
Properly managing configuration makes your application more secure, flexible, and easier to deploy and maintain across different environments.
Now that we've covered the main building blocks the application object, routing, request/response handling, templating, context globals, and configuration you might be wondering about the "magic" behind those context globals (`request`, `current_app`, etc.). How does Flask manage their state, especially when handling multiple requests? Let's delve deeper into the mechanics of contexts.
Ready to understand the context lifecycle? Let's move on to [Chapter 7: Application and Request Contexts](07_application_and_request_contexts.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,202 @@
# Chapter 7: Application and Request Contexts
Welcome back! In [Chapter 6: Configuration (`Config`)](06_configuration___config__.md), we learned how to manage settings for our Flask application using the `app.config` object. And in [Chapter 5: Context Globals (`current_app`, `request`, `session`, `g`)](05_context_globals___current_app____request____session____g__.md), we met special variables like `request` and `current_app` that seem to magically know about the current request or application.
But how does Flask keep track of which request is which, especially if multiple users are accessing our web app at the same time? How does it ensure that `request` refers to *User A's* request when handling User A, and *User B's* request when handling User B? This magic is managed by **Application and Request Contexts**.
## What Problem Do They Solve? Keeping Things Separate
Imagine you're working at a busy service desk. Many people come up asking for different things simultaneously. You need a way to keep each person's request and related information separate from everyone else's. You can't just use one shared notepad for everyone that would be chaos! Instead, for each person, you might create a temporary folder or workspace to hold their specific documents and details while you help them.
In a web application, your Flask server might be handling requests from many different users at the same time. Each request has its own data (like form submissions or URL parameters) and potentially its own user session. Storing this information in simple global variables in your Python code would be disastrous, as data from one request could overwrite or interfere with data from another.
Flask uses **Contexts** to solve this problem. Contexts act like those temporary, isolated workspaces. They ensure that variables like `request`, `session`, `current_app`, and `g` always point to the information relevant to the *specific task* Flask is currently working on (usually, handling one particular incoming web request).
## The Two Main Types of Contexts
Flask has two primary types of contexts:
1. **Application Context (`AppContext`):**
* **Analogy:** Think of this as the main office building or the overall project workspace.
* **Purpose:** It holds information related to the application instance itself, regardless of any specific web request. It binds the `current_app` proxy (pointing to your `Flask` app instance) and the `g` proxy (a temporary storage space).
* **When is it active?** It's automatically active *during* a web request. It's also needed for tasks *outside* of web requests that still need access to the application, such as running command-line interface (CLI) commands (like database migrations) or background jobs.
2. **Request Context (`RequestContext`):**
* **Analogy:** Think of this as a specific meeting room set up just for handling one client's request (one incoming web request).
* **Purpose:** It holds information specific to *one single incoming web request*. It binds the `request` proxy (containing details of the HTTP request) and the `session` proxy (for user-specific session data).
* **When is it active?** Flask automatically creates and activates a Request Context when a web request comes in, and removes it after the request is handled.
* **Relationship:** A Request Context *always* includes an Application Context within it. You can't have a meeting room (`RequestContext`) without being inside the main office building (`AppContext`).
Here's a simple breakdown:
| Context Type | Analogy | Key Globals Bound | Typical Use Case | Lifespan |
| :---------------- | :------------------- | :---------------- | :----------------------------------- | :---------------------------------------------- |
| Application | Main Office Building | `current_app`, `g` | CLI commands, background tasks | Active during requests, or manually activated |
| Request | Temporary Meeting Room | `request`, `session` | Handling a single web request | Created/destroyed for each web request |
## How Flask Uses Contexts Automatically (During Requests)
Most of the time, you don't need to worry about manually managing contexts. When a browser sends a request to your Flask application:
1. **Request Arrives:** Your WSGI server (like the Flask development server) receives the HTTP request.
2. **Context Creation:** Flask automatically creates a `RequestContext` object based on the incoming request details (the WSGI environment).
3. **Context Pushing:** Flask *pushes* this `RequestContext`. This does two things:
* It makes the `request` and `session` proxies point to the specific request and session objects for *this* request.
* It *also* pushes an `AppContext` (if one isn't already active for this thread/task), making `current_app` and `g` point to the correct application and a fresh `g` object. "Pushing" is like activating that temporary workspace.
4. **Code Execution:** Your view function runs. Because the contexts are active, you can freely use `request`, `session`, `current_app`, and `g` inside your function, and they will refer to the correct objects for the current request.
5. **Response Sent:** Your view function returns a response.
6. **Context Popping:** After the response is sent, Flask *pops* the `RequestContext` (and the `AppContext` if it was pushed along with it). This cleans up the workspace, effectively deactivating those specific `request`, `session`, and `g` objects for that request.
This automatic push/pop mechanism ensures that each request is handled in its own isolated context, preventing data clashes between concurrent requests.
## Manually Pushing Contexts (Outside Requests)
What if you need to access application settings or resources *outside* of a typical web request? For example, maybe you have a separate Python script (`init_db.py`) that needs to initialize your database using configuration stored in `app.config`. Since there's no incoming web request, Flask won't automatically create any contexts.
In these cases, you need to manually push an **Application Context** using `app.app_context()`.
```python
# init_db.py (Example script to run from command line)
from flask import Flask
# Assume your main Flask app object is defined in hello.py
# We need to import it here.
# In a real project, you'd structure this better, maybe using a factory function.
try:
# Let's assume hello.py has app = Flask(__name__)
from hello import app
except ImportError:
print("Could not import 'app' from hello.py")
print("Make sure hello.py exists and defines the Flask app.")
exit(1)
# Define a function that needs app access
def setup_database():
# We need an application context to access current_app.config
# Without the 'with' block, current_app would not be available here.
with app.app_context():
# Now we can safely access app configuration via current_app
db_uri = app.config.get('DATABASE_URI', 'No DB URI Set!')
print(f"Inside app context: Accessing config...")
print(f"Database URI found: {db_uri}")
# Imagine database setup code here that uses the URI
print("Database initialization logic would run here.")
# ---- Main execution part of the script ----
if __name__ == "__main__":
print("Running database setup script...")
setup_database()
print("Script finished.")
```
**Explanation:**
* `from hello import app`: We import the actual `Flask` application instance.
* `with app.app_context():`: This is the key part! It creates an application context for the `app` instance and pushes it, making it active within the `with` block.
* Inside the block, `current_app` becomes available and correctly points to our `app` object. We can now safely access `current_app.config`.
* When the `with` block exits, the application context is automatically popped.
**To run this (assuming `hello.py` exists and defines `app`):**
1. Save the code above as `init_db.py` in the same directory as `hello.py`.
2. Optionally, add `app.config['DATABASE_URI'] = 'sqlite:///mydatabase.db'` to `hello.py` to see it picked up.
3. Run from your terminal: `python init_db.py`
4. You'll see output showing that the config was accessed successfully *inside* the context.
Similarly, if you need to simulate a request environment (perhaps for testing helper functions that rely on `request`), you can use `app.test_request_context()` which pushes both a Request and Application context.
```python
# example_test_context.py
from hello import app # Assuming hello.py defines app = Flask(__name__)
# A helper function that might be used inside a view
def get_user_agent_info():
# This function relies on the 'request' context global
from flask import request
user_agent = request.headers.get('User-Agent', 'Unknown')
return f"Request came from: {user_agent}"
# --- Simulate calling the function outside a real request ---
if __name__ == "__main__":
# Create a test request context for a fake GET request to '/'
# This pushes both Request and App contexts
with app.test_request_context('/', method='GET'):
# Now, inside this block, 'request' is available!
print("Inside test request context...")
agent_info = get_user_agent_info()
print(agent_info)
print("Outside context.")
# Trying to call get_user_agent_info() here would fail because
# the request context has been popped.
```
## Under the Hood: Context Locals and Stacks
How does Flask actually manage these contexts and make the globals like `request` point to the right object?
Historically, Flask used thread-local storage and maintained stacks of contexts for each thread. When `request` was accessed, it would look at the top of the request context stack *for the current thread*.
Modern Flask (leveraging updates in its core dependency, Werkzeug) relies on Python's built-in `contextvars` module. This module provides a more robust way to manage context-specific state that works correctly with both threads and modern asynchronous programming (like `async`/`await`).
Here's a simplified conceptual idea:
1. **Context Variables:** Flask defines special "context variables" (using `contextvars.ContextVar`) for the application context (`_cv_app`) and the request context (`_cv_request`). Think of these like special slots that can hold different values depending on the current execution context (the specific request being handled).
2. **Pushing:** When Flask pushes a context (e.g., `RequestContext.push()`), it stores the actual context object (like the `RequestContext` instance for the current request) into the corresponding context variable (`_cv_request.set(the_request_context)`).
3. **Proxies:** The context globals (`request`, `session`, `current_app`, `g`) are special `LocalProxy` objects (from Werkzeug). They don't hold the data directly.
4. **Proxy Access:** When you access something like `request.args`, the `request` proxy does the following:
* Looks up the *current* value stored in the `_cv_request` context variable. This gives it the *actual* `RequestContext` object for the currently active request.
* Retrieves the real `request` object stored *within* that `RequestContext`.
* Finally, accesses the `.args` attribute on that real request object.
5. **Popping:** When Flask pops a context (e.g., `RequestContext.pop()`), it resets the context variable (`_cv_request.reset(token)`), effectively clearing that slot for the current context.
This `contextvars` mechanism ensures that even if your server is handling many requests concurrently (in different threads or async tasks), each one has its own isolated value for `_cv_app` and `_cv_request`, so the proxies always resolve to the correct objects for the task at hand.
Let's visualize the request lifecycle with contexts:
```mermaid
sequenceDiagram
participant Browser
participant FlaskApp as Flask App (WSGI)
participant Contexts as Context Management
participant YourView as Your View Function
participant Globals as request Proxy
Browser->>+FlaskApp: Sends GET /user/alice
FlaskApp->>+Contexts: Request arrives, create RequestContext (incl. AppContext)
Contexts->>Contexts: Push RequestContext (sets _cv_request)
Contexts->>Contexts: Push AppContext (sets _cv_app)
Note over Contexts: request, session, current_app, g are now active
FlaskApp->>+YourView: Calls view_func(username='alice')
YourView->>+Globals: Access request.method
Globals->>Contexts: Lookup _cv_request -> finds current RequestContext
Globals-->>YourView: Returns 'GET' (from real request object)
YourView-->>-FlaskApp: Returns Response("Hello Alice")
FlaskApp->>+Contexts: Response sent, Pop RequestContext (resets _cv_request)
Contexts->>Contexts: Pop AppContext (resets _cv_app)
Note over Contexts: Context globals are now unbound for this request
FlaskApp-->>-Browser: Sends HTTP Response
```
This diagram shows that Flask sets up (pushes) the context before calling your view and tears it down (pops) afterwards, allowing the proxies like `request` to find the right data while your code runs.
## Conclusion
Contexts are fundamental to how Flask manages state during the lifecycle of the application and individual requests. They provide isolated workspaces to prevent data from different requests interfering with each other.
* **Application Context (`AppContext`):** Provides access to the application (`current_app`) and global storage (`g`). Used implicitly during requests and manually via `app.app_context()` for tasks like CLI commands.
* **Request Context (`RequestContext`):** Provides access to request-specific data (`request`) and the user session (`session`). Automatically managed by Flask during the web request cycle. Contains an `AppContext`.
* **Context Globals:** Proxies like `request` and `current_app` rely on the currently active contexts to find the correct objects.
* **Management:** Flask usually handles context push/pop automatically for web requests. Manual pushing (`app.app_context()`, `app.test_request_context()`) is needed for specific scenarios like scripts, background jobs, or testing.
Understanding contexts helps explain how Flask allows convenient access to request and application data through globals while maintaining safety and isolation between concurrent operations.
Now that we understand how Flask manages state and configuration for the core application, how do we organize larger applications with multiple sections or features? That's where Blueprints come in.
Let's learn how to structure our projects in [Chapter 8: Blueprints](08_blueprints.md).
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,281 @@
# Chapter 8: Blueprints
Welcome back! In [Chapter 7: Application and Request Contexts](07_application_and_request_contexts.md), we explored the "magic" behind Flask's context system, understanding how variables like `request` and `current_app` work reliably even with multiple concurrent requests.
Now, imagine your simple "Hello, World!" application starts growing. You add user profiles, an admin section, maybe a blog. Putting all your routes, view functions, and related logic into a single Python file (like our `hello.py`) quickly becomes messy and hard to manage. How can we organize our growing Flask application into smaller, more manageable pieces?
That's where **Blueprints** come in!
## What Problem Do They Solve? Organizing a Growing House
Think about building a house. You wouldn't try to build the kitchen, bathroom, and bedrooms all mixed together in one big pile. Instead, you might have separate plans or even pre-fabricated modules for each section. The kitchen module has its specific plumbing and electrical needs, the bathroom has its fixtures, etc. Once these modules are ready, you assemble them into the main structure of the house.
Similarly, as your Flask application grows, you want to group related features together. For example:
* All the routes related to user authentication (`/login`, `/logout`, `/register`).
* All the routes for an admin control panel (`/admin/dashboard`, `/admin/users`).
* All the routes for a public-facing blog (`/blog`, `/blog/<post_slug>`).
Trying to manage all these in one file leads to:
* **Clutter:** The main application file becomes huge and hard to navigate.
* **Confusion:** It's difficult to see which routes belong to which feature.
* **Poor Reusability:** If you wanted to reuse the "blog" part in another project, it would be hard to extract just that code.
**Blueprints** provide Flask's solution for this. They let you define collections of routes, view functions, templates, and static files as separate modules. You can develop these modules independently and then "register" them with your main Flask application, potentially multiple times or under different URL prefixes.
They are like the **prefabricated sections of your house**. You build the "user authentication module" (a blueprint) separately, then plug it into your main application structure.
## Creating and Using a Simple Blueprint
Let's see how this works. Imagine we want to create a separate section for user-related pages.
1. **Create a Blueprint Object:** Instead of using `@app.route()`, we first create a `Blueprint` object.
2. **Define Routes on the Blueprint:** We use decorators like `@bp.route()` (where `bp` is our blueprint object) to define routes *within* that blueprint.
3. **Register the Blueprint with the App:** In our main application file, we tell the Flask `app` object about our blueprint using `app.register_blueprint()`.
Let's structure our project. We'll have our main `app.py` and a separate file for our user routes, maybe inside a `blueprints` folder:
```
yourproject/
├── app.py # Main Flask application setup
├── blueprints/
│ └── __init__.py # Makes 'blueprints' a Python package (can be empty)
│ └── user.py # Our user blueprint routes
└── templates/
└── user/
└── profile.html # Template for the user profile
```
**Step 1 & 2: Define the Blueprint (`blueprints/user.py`)**
```python
# blueprints/user.py
from flask import Blueprint, render_template, abort
# 1. Create the Blueprint object
# 'user' is the name of the blueprint. Used internally by Flask.
# __name__ helps locate the blueprint's resources (like templates).
# template_folder specifies where to look for this blueprint's templates.
user_bp = Blueprint('user', __name__, template_folder='../templates/user')
# Sample user data (replace with database logic in a real app)
users = {
"alice": {"name": "Alice", "email": "alice@example.com"},
"bob": {"name": "Bob", "email": "bob@example.com"},
}
# 2. Define routes ON THE BLUEPRINT using @user_bp.route()
@user_bp.route('/profile/<username>')
def profile(username):
user_info = users.get(username)
if not user_info:
abort(404) # User not found
# Note: render_template will now look in 'templates/user/' first
# because of template_folder='../templates/user' in Blueprint()
return render_template('profile.html', user=user_info)
@user_bp.route('/')
def user_list():
# A simple view within the user blueprint
return f"List of users: {', '.join(users.keys())}"
```
**Explanation:**
* `from flask import Blueprint`: We import the `Blueprint` class.
* `user_bp = Blueprint('user', __name__, template_folder='../templates/user')`: We create an instance.
* `'user'`: The name of this blueprint. This is used later for generating URLs (`url_for`).
* `__name__`: Helps Flask determine the blueprint's root path, similar to how it works for the main `Flask` app object ([Chapter 1](01_application_object___flask__.md)).
* `template_folder='../templates/user'`: Tells this blueprint where its specific templates are located relative to `user.py`.
* `@user_bp.route(...)`: We define routes using the blueprint object, *not* the main `app` object.
**Step 3: Register the Blueprint (`app.py`)**
Now, we need to tell our main Flask application about this blueprint.
```python
# app.py
from flask import Flask
from blueprints.user import user_bp # Import the blueprint object
app = Flask(__name__)
# We might have other config here, like SECRET_KEY from Chapter 6
# app.config['SECRET_KEY'] = 'your secret key'
# Register the blueprint with the main application
# We can add a url_prefix here!
app.register_blueprint(user_bp, url_prefix='/users')
# Maybe add a simple homepage route directly on the app
@app.route('/')
def home():
return 'Welcome to the main application!'
if __name__ == '__main__':
app.run(debug=True)
```
**Explanation:**
* `from blueprints.user import user_bp`: We import the `Blueprint` instance we created in `user.py`.
* `app.register_blueprint(user_bp, url_prefix='/users')`: This is the crucial step.
* It tells the `app` object to include all the routes defined in `user_bp`.
* `url_prefix='/users'`: This is very useful! It means all routes defined *within* the `user_bp` will automatically be prefixed with `/users`.
* The `/profile/<username>` route in `user.py` becomes `/users/profile/<username>`.
* The `/` route in `user.py` becomes `/users/`.
**Template (`templates/user/profile.html`)**
```html
<!-- templates/user/profile.html -->
<!doctype html>
<html>
<head><title>User Profile</title></head>
<body>
<h1>Profile for {{ user.name }}</h1>
<p>Email: {{ user.email }}</p>
<p><a href="{{ url_for('user.user_list') }}">Back to User List</a></p>
<p><a href="{{ url_for('home') }}">Back to Home</a></p>
</body>
</html>
```
**Running this:**
1. Create the directory structure and files as shown above.
2. Run `python app.py` in your terminal.
3. Visit `http://127.0.0.1:5000/`. You'll see "Welcome to the main application!" (Handled by `app.py`).
4. Visit `http://127.0.0.1:5000/users/`. You'll see "List of users: alice, bob" (Handled by `user.py`, route `/`, with prefix `/users`).
5. Visit `http://127.0.0.1:5000/users/profile/alice`. You'll see the profile page for Alice (Handled by `user.py`, route `/profile/<username>`, with prefix `/users`).
6. Visit `http://127.0.0.1:5000/users/profile/charlie`. You'll get a 404 Not Found error, as handled by `profile()` in `user.py`.
Notice how the blueprint allowed us to neatly separate the user-related code into `blueprints/user.py`, keeping `app.py` cleaner. The `url_prefix` made it easy to group all user routes under `/users/`.
## Generating URLs with `url_for` and Blueprints
How does `url_for` work when routes are defined in blueprints? You need to prefix the endpoint name with the **blueprint name**, followed by a dot (`.`).
Look back at the `profile.html` template:
* `{{ url_for('user.user_list') }}`: Generates the URL for the `user_list` view function *within* the `user` blueprint. Because of the `url_prefix='/users'`, this generates `/users/`.
* `{{ url_for('user.profile', username='alice') }}` (if used in Python): Would generate `/users/profile/alice`.
* `{{ url_for('home') }}`: Generates the URL for the `home` view function, which is registered directly on the `app`, not a blueprint. This generates `/`.
If you are generating a URL for an endpoint *within the same blueprint*, you can use a dot prefix for a relative link:
```python
# Inside blueprints/user.py
from flask import url_for
@user_bp.route('/link-example')
def link_example():
# Generate URL for 'profile' endpoint within the *same* blueprint ('user')
alice_url = url_for('.profile', username='alice') # Note the leading dot!
# alice_url will be '/users/profile/alice'
# Generate URL for the main app's 'home' endpoint
home_url = url_for('home') # No dot needed for app routes
# home_url will be '/'
return f'Alice profile: {alice_url}<br>Homepage: {home_url}'
```
Using the blueprint name (`user.profile`) or the relative dot (`.profile`) ensures `url_for` finds the correct endpoint, even if multiple blueprints happen to use the same view function name (like `index`).
## Blueprint Resources: Templates and Static Files
As we saw, you can specify `template_folder` when creating a `Blueprint`. When `render_template('profile.html')` is called from within the `user_bp`'s `profile` view, Flask (via Jinja2's `DispatchingJinjaLoader`, see [Chapter 4](04_templating__jinja2_integration_.md)) will look for `profile.html` in this order:
1. The application's template folder (`templates/`).
2. The blueprint's template folder (`templates/user/` in our example).
This allows blueprints to have their own templates, potentially overriding application-wide templates if needed, but usually just keeping them organized.
Similarly, you can specify a `static_folder` and `static_url_path` for a blueprint. This allows a blueprint to bundle its own CSS, JavaScript, or image files.
```python
# blueprints/admin.py
admin_bp = Blueprint('admin', __name__,
static_folder='static', # Look in blueprints/admin/static/
static_url_path='/admin-static', # URL like /admin-static/style.css
template_folder='templates') # Look in blueprints/admin/templates/
# Then register with the app:
# app.register_blueprint(admin_bp, url_prefix='/admin')
```
Accessing blueprint static files uses `url_for` with the special `static` endpoint, prefixed by the blueprint name:
```html
<!-- Inside an admin blueprint template -->
<link rel="stylesheet" href="{{ url_for('admin.static', filename='style.css') }}">
<!-- Generates a URL like: /admin-static/style.css -->
```
## Under the Hood: How Registration Works
What actually happens when you call `app.register_blueprint(bp)`?
1. **Deferred Functions:** When you use decorators like `@bp.route`, `@bp.before_request`, `@bp.errorhandler`, etc., on a `Blueprint` object, the blueprint doesn't immediately tell the application about them. Instead, it stores these actions as "deferred functions" in a list (`bp.deferred_functions`). See `Blueprint.route` calling `Blueprint.add_url_rule`, which calls `Blueprint.record`.
2. **Registration Call:** `app.register_blueprint(bp, url_prefix='/users')` is called.
3. **State Creation:** The application creates a `BlueprintSetupState` object. This object holds references to the blueprint (`bp`), the application (`app`), and the options passed during registration (like `url_prefix='/users'`).
4. **Recording the Blueprint:** The app adds the blueprint to its `app.blueprints` dictionary. This is important for routing and `url_for`.
5. **Executing Deferred Functions:** The app iterates through the list of `deferred_functions` stored in the blueprint. For each deferred function, it calls it, passing the `BlueprintSetupState` object.
6. **Applying Settings:** Inside the deferred function (which was created back when you used, e.g., `@bp.route`), the function now has access to both the original arguments (`'/'`, `view_func`, etc.) and the setup state (`state`).
* For a route, the deferred function typically calls `state.add_url_rule(...)`.
* `state.add_url_rule` then calls `app.add_url_rule(...)`, but it *modifies* the arguments first:
* It prepends the `url_prefix` from the `state` (e.g., `/users`) to the route's `rule`.
* It prepends the blueprint's name (`state.name`, e.g., `user`) plus a dot to the route's `endpoint` (e.g., `profile` becomes `user.profile`).
* It applies other options like `subdomain`.
* For other decorators like `@bp.before_request`, the deferred function registers the handler function in the appropriate application dictionary (e.g., `app.before_request_funcs`) but uses the blueprint's name as the key (or `None` for app-wide handlers added via the blueprint).
7. **Nested Blueprints:** If the blueprint being registered itself contains nested blueprints, the registration process is called recursively for those nested blueprints, adjusting prefixes and names accordingly.
Here's a simplified diagram for registering a route via a blueprint:
```mermaid
sequenceDiagram
participant Code as Your Code (e.g., user.py)
participant BP as user_bp (Blueprint obj)
participant App as Main App (Flask obj)
participant State as BlueprintSetupState
Code->>+BP: @user_bp.route('/profile/<name>')
BP->>BP: record(deferred_add_rule_func)
BP-->>-Code: Decorator applied
Note over App: Later, in app.py...
App->>App: app.register_blueprint(user_bp, url_prefix='/users')
App->>+State: Create BlueprintSetupState(bp=user_bp, app=app, options={...})
State-->>-App: Return state object
App->>BP: For func in user_bp.deferred_functions:
Note right of BP: func = deferred_add_rule_func
App->>BP: func(state)
BP->>+State: deferred_add_rule_func calls state.add_url_rule('/profile/<name>', ...)
State->>App: Calls app.add_url_rule('/users/profile/<name>', endpoint='user.profile', ...)
App->>App: Adds rule to app.url_map
State-->>-BP: add_url_rule finished
BP-->>App: Deferred function finished
```
The key idea is **deferral**. Blueprints record actions but don't apply them until they are registered on an actual application, using the `BlueprintSetupState` to correctly prefix routes and endpoints.
## Conclusion
Blueprints are Flask's powerful solution for organizing larger applications. They allow you to group related routes, views, templates, and static files into modular, reusable components.
* We learned how to **create** a `Blueprint` object.
* We saw how to **define routes** and other handlers using blueprint decorators (`@bp.route`, `@bp.before_request`, etc.).
* We learned how to **register** a blueprint with the main application using `app.register_blueprint()`, optionally specifying a `url_prefix`.
* We understood how `url_for` works with blueprint endpoints (using `blueprint_name.endpoint_name` or `.endpoint_name`).
* Blueprints help keep your codebase **organized, maintainable, and modular**.
By breaking down your application into logical blueprints, you can manage complexity much more effectively as your project grows. This structure also makes it easier for teams to work on different parts of the application simultaneously.
This concludes our core tutorial on Flask's fundamental concepts! You now have a solid understanding of the Application Object, Routing, Request/Response, Templating, Context Globals, Configuration, Contexts, and Blueprints. With these tools, you're well-equipped to start building your own web applications with Flask.
From here, you might explore Flask extensions for common tasks (like database integration with Flask-SQLAlchemy, user authentication with Flask-Login, form handling with Flask-WTF), delve into testing your Flask applications, or learn about different deployment strategies. Happy Flasking!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

47
output/Flask/index.md Normal file
View File

@@ -0,0 +1,47 @@
# Tutorial: Flask
Flask is a lightweight **web framework** for Python.
It helps you build web applications by handling incoming *web requests* and sending back *responses*.
Flask provides tools for **routing** URLs to your Python functions, managing *request data*, creating *responses*, and using *templates* to generate HTML.
**Source Repository:** [https://github.com/pallets/flask/tree/ab8149664182b662453a563161aa89013c806dc9/src/flask](https://github.com/pallets/flask/tree/ab8149664182b662453a563161aa89013c806dc9/src/flask)
```mermaid
flowchart TD
A0["0: Application Object (Flask)"]
A1["1: Blueprints"]
A2["2: Routing System"]
A3["3: Request and Response Objects"]
A4["4: Application and Request Contexts"]
A5["5: Context Globals (current_app, request, session, g)"]
A6["6: Configuration (Config)"]
A7["7: Templating (Jinja2 Integration)"]
A0 -- "Registers" --> A1
A0 -- "Uses" --> A2
A0 -- "Handles" --> A3
A0 -- "Manages" --> A4
A0 -- "Holds" --> A6
A0 -- "Integrates" --> A7
A1 -- "Defines routes using" --> A2
A2 -- "Matches URL from" --> A3
A3 -- "Bound within" --> A4
A4 -- "Enables access to" --> A5
A7 -- "Accesses" --> A5
```
## Chapters
1. [Application Object (`Flask`)](01_application_object___flask__.md)
2. [Routing System](02_routing_system.md)
3. [Request and Response Objects](03_request_and_response_objects.md)
4. [Templating (Jinja2 Integration)](04_templating__jinja2_integration_.md)
5. [Context Globals (`current_app`, `request`, `session`, `g`)](05_context_globals___current_app____request____session____g__.md)
6. [Configuration (`Config`)](06_configuration___config__.md)
7. [Application and Request Contexts](07_application_and_request_contexts.md)
8. [Blueprints](08_blueprints.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,306 @@
# Chapter 1: Graph / StateGraph - The Blueprint of Your Application
Welcome to the LangGraph tutorial! We're excited to help you learn how to build powerful, stateful applications with Large Language Models (LLMs).
Imagine you're building an application, maybe a chatbot, an agent that performs tasks, or something that processes data in multiple steps. As these applications get more complex, just calling an LLM once isn't enough. You need a way to structure the flow maybe call an LLM, then a tool, then another LLM based on the result. How do you manage this sequence of steps and the information passed between them?
That's where **Graphs** come in!
## What Problem Do Graphs Solve?
Think of a complex task like baking a cake. You don't just throw all the ingredients in the oven. There's a sequence: mix dry ingredients, mix wet ingredients, combine them, pour into a pan, bake, cool, frost. Each step depends on the previous one.
LangGraph helps you define these steps and the order they should happen in. It provides a way to create a **flowchart** or a **blueprint** for your application's logic.
The core idea is to break down your application into:
1. **Nodes:** These are the individual steps or actions (like "mix dry ingredients" or "call the LLM").
2. **Edges:** These are the connections or transitions between the steps, defining the order (after mixing dry ingredients, mix wet ingredients).
LangGraph provides different types of graphs, but the most common and useful one for building stateful applications is the `StateGraph`.
## Core Concepts: `Graph`, `StateGraph`, and `MessageGraph`
Let's look at the main types of graphs you'll encounter:
1. **`Graph` (The Basic Blueprint)**
* This is the most fundamental type. You define nodes (steps) and edges (connections).
* It's like a basic flowchart diagram.
* You explicitly define how information passes from one node to the next.
* While foundational, you'll often use the more specialized `StateGraph` for convenience.
```python
# This is a conceptual example - we usually use StateGraph
from langgraph.graph import Graph
# Define simple functions or Runnables as nodes
def step_one(input_data):
print("Running Step 1")
return input_data * 2
def step_two(processed_data):
print("Running Step 2")
return processed_data + 5
# Create a basic graph
basic_graph_builder = Graph()
# Add nodes
basic_graph_builder.add_node("A", step_one)
basic_graph_builder.add_node("B", step_two)
# Add edges (connections)
basic_graph_builder.add_edge("A", "B") # Run B after A
basic_graph_builder.set_entry_point("A") # Start at A
# basic_graph_builder.set_finish_point("B") # Not needed for this simple Graph type
```
2. **`StateGraph` (The Collaborative Whiteboard)**
* This is the workhorse for most LangGraph applications. It's a specialized `Graph`.
* **Key Idea:** Nodes communicate *implicitly* by reading from and writing to a shared **State** object.
* **Analogy:** Imagine a central whiteboard (the State). Each node (person) can read what's on the whiteboard, do some work, and then update the whiteboard with new information or changes.
* You define the *structure* of this shared state first (e.g., what keys it holds).
* Each node receives the *current* state and returns a *dictionary* containing only the parts of the state it wants to *update*. LangGraph handles merging these updates into the main state.
3. **`MessageGraph` (The Chatbot Specialist)**
* This is a further specialization of `StateGraph`, designed specifically for building chatbots or conversational agents.
* It automatically manages a `messages` list within its state.
* Nodes typically take the current list of messages and return new messages to be added.
* It uses a special function (`add_messages`) to append messages while handling potential duplicates or updates based on message IDs. This makes building chat flows much simpler.
For the rest of this chapter, we'll focus on `StateGraph` as it introduces the core concepts most clearly.
## Building a Simple `StateGraph`
Let's build a tiny application that takes a number, adds 1 to it, and then multiplies it by 2.
**Step 1: Define the State**
First, we define the "whiteboard" the structure of the data our graph will work with. We use Python's `TypedDict` for this.
```python
from typing import TypedDict
class MyState(TypedDict):
# Our state will hold a single number called 'value'
value: int
```
This tells our `StateGraph` that the shared information will always contain an integer named `value`.
**Step 2: Define the Nodes**
Nodes are functions (or LangChain Runnables) that perform the work. They take the current `State` as input and return a dictionary containing the *updates* to the state.
```python
# Node 1: Adds 1 to the value
def add_one(state: MyState) -> dict:
print("--- Running Adder Node ---")
current_value = state['value']
new_value = current_value + 1
print(f"Input value: {current_value}, Output value: {new_value}")
# Return *only* the key we want to update
return {"value": new_value}
# Node 2: Multiplies the value by 2
def multiply_by_two(state: MyState) -> dict:
print("--- Running Multiplier Node ---")
current_value = state['value']
new_value = current_value * 2
print(f"Input value: {current_value}, Output value: {new_value}")
# Return the update
return {"value": new_value}
```
Notice how each function takes `state` and returns a `dict` specifying which part of the state (`"value"`) should be updated and with what new value.
**Step 3: Create the Graph and Add Nodes/Edges**
Now we assemble our blueprint using `StateGraph`.
```python
from langgraph.graph import StateGraph, END, START
# Create a StateGraph instance linked to our state definition
workflow = StateGraph(MyState)
# Add the nodes to the graph
workflow.add_node("adder", add_one)
workflow.add_node("multiplier", multiply_by_two)
# Set the entry point --> where does the flow start?
workflow.set_entry_point("adder")
# Add edges --> how do the nodes connect?
workflow.add_edge("adder", "multiplier") # After adder, run multiplier
# Set the finish point --> where does the flow end?
# We use the special identifier END
workflow.add_edge("multiplier", END)
```
* `StateGraph(MyState)`: Creates the graph, telling it to use our `MyState` structure.
* `add_node("name", function)`: Registers our functions as steps in the graph with unique names.
* `set_entry_point("adder")`: Specifies that the `adder` node should run first. This implicitly creates an edge from a special `START` point to `adder`.
* `add_edge("adder", "multiplier")`: Creates a connection. After `adder` finishes, `multiplier` will run.
* `add_edge("multiplier", END)`: Specifies that after `multiplier` finishes, the graph execution should stop. `END` is a special marker for the graph's conclusion.
**Step 4: Compile the Graph**
Before we can run it, we need to `compile` the graph. This finalizes the structure and makes it executable.
```python
# Compile the workflow into an executable object
app = workflow.compile()
```
**Step 5: Run It!**
Now we can invoke our compiled graph (`app`) with some initial state.
```python
# Define the initial state
initial_state = {"value": 5}
# Run the graph
final_state = app.invoke(initial_state)
# Print the final result
print("\n--- Final State ---")
print(final_state)
```
**Expected Output:**
```text
--- Running Adder Node ---
Input value: 5, Output value: 6
--- Running Multiplier Node ---
Input value: 6, Output value: 12
--- Final State ---
{'value': 12}
```
As you can see, the graph executed the nodes in the defined order (`adder` then `multiplier`), automatically passing the updated state between them!
## How Does `StateGraph` Work Under the Hood?
You defined the nodes and edges, but what actually happens when you call `invoke()`?
1. **Initialization:** LangGraph takes your initial input (`{"value": 5}`) and puts it onto the "whiteboard" (the internal state).
2. **Execution Engine:** A powerful internal component called the [Pregel Execution Engine](05_pregel_execution_engine.md) takes over. It looks at the current state and the graph structure.
3. **Following Edges:** It starts at the `START` node and follows the edge to the entry point (`adder`).
4. **Node Execution:** It runs the `adder` function, passing it the current state (`{"value": 5}`).
5. **State Update:** The `adder` function returns `{"value": 6}`. The Pregel engine uses special mechanisms called [Channels](03_channels.md) to update the value associated with the `"value"` key on the "whiteboard". The state is now `{"value": 6}`.
6. **Next Step:** The engine sees the edge from `adder` to `multiplier`.
7. **Node Execution:** It runs the `multiplier` function, passing it the *updated* state (`{"value": 6}`).
8. **State Update:** `multiplier` returns `{"value": 12}`. The engine updates the state again via the [Channels](03_channels.md). The state is now `{"value": 12}`.
9. **Following Edges:** The engine sees the edge from `multiplier` to `END`.
10. **Finish:** Reaching `END` signals the execution is complete. The final state (`{"value": 12}`) is returned.
Here's a simplified visual:
```mermaid
sequenceDiagram
participant User
participant App (CompiledGraph)
participant State
participant AdderNode as adder
participant MultiplierNode as multiplier
User->>App: invoke({"value": 5})
App->>State: Initialize state = {"value": 5}
App->>AdderNode: Execute(state)
AdderNode->>State: Read value (5)
AdderNode-->>App: Return {"value": 6}
App->>State: Update state = {"value": 6}
App->>MultiplierNode: Execute(state)
MultiplierNode->>State: Read value (6)
MultiplierNode-->>App: Return {"value": 12}
App->>State: Update state = {"value": 12}
App->>User: Return final state {"value": 12}
```
Don't worry too much about the details of Pregel or Channels yet we'll cover them in later chapters. The key takeaway is that `StateGraph` manages the state and orchestrates the execution based on your defined nodes and edges.
## A Peek at the Code (`graph/state.py`, `graph/graph.py`)
Let's briefly look at the code snippets provided to see how these concepts map to the implementation:
* **`StateGraph.__init__` (`graph/state.py`)**:
```python
# Simplified view
class StateGraph(Graph):
def __init__(self, state_schema: Optional[Type[Any]] = None, ...):
super().__init__()
# ... stores the state_schema ...
self.schema = state_schema
# ... analyzes the schema to understand state keys and how to update them ...
self._add_schema(state_schema)
# ... sets up internal dictionaries for channels, nodes etc. ...
```
This code initializes the graph, crucially storing the `state_schema` you provide. It analyzes this schema to figure out the "keys" on your whiteboard (like `"value"`) and sets up the internal structures ([Channels](03_channels.md)) needed to manage updates to each key.
* **`StateGraph.add_node` (`graph/state.py`)**:
```python
# Simplified view
def add_node(self, node: str, action: RunnableLike, ...):
# ... basic checks for name conflicts, reserved names (START, END) ...
if node in self.channels: # Cannot use a state key name as a node name
raise ValueError(...)
# ... wrap the provided action (function/runnable) ...
runnable = coerce_to_runnable(action, ...)
# ... store the node details (runnable, input type etc.) ...
self.nodes[node] = StateNodeSpec(runnable, ..., input=input or self.schema, ...)
return self
```
When you add a node, it stores the associated function (`action`) and links it to the provided `node` name. It also figures out what input schema the node expects (usually the main graph state schema).
* **`Graph.add_edge` (`graph/graph.py`)**:
```python
# Simplified view from the base Graph class
def add_edge(self, start_key: str, end_key: str):
# ... checks for invalid edges (e.g., starting from END) ...
# ... basic validation ...
# Stores the connection as a simple pair
self.edges.add((start_key, end_key))
return self
```
Adding an edge is relatively simple it just records the `(start_key, end_key)` pair in a set, representing the connection.
* **`StateGraph.compile` (`graph/state.py`)**:
```python
# Simplified view
def compile(self, ...):
# ... validation checks ...
self.validate(...)
# ... create the CompiledStateGraph instance ...
compiled = CompiledStateGraph(builder=self, ...)
# ... add nodes, edges, branches to the compiled version ...
for key, node in self.nodes.items():
compiled.attach_node(key, node)
for start, end in self.edges:
compiled.attach_edge(start, end)
# ... more setup for branches, entry/exit points ...
# ... finalize and return the compiled graph ...
return compiled.validate()
```
Compilation takes your defined nodes and edges and builds the final, executable `CompiledStateGraph`. It sets up the internal machinery ([Pregel](05_pregel_execution_engine.md), [Channels](03_channels.md)) based on your blueprint.
## Conclusion
You've learned the fundamental concept in LangGraph: the **Graph**.
* Graphs define the structure and flow of your application using **Nodes** (steps) and **Edges** (connections).
* **`StateGraph`** is the most common type, where nodes communicate implicitly by reading and updating a shared **State** object (like a whiteboard).
* **`MessageGraph`** is a specialized `StateGraph` for easily building chatbots.
* You define the state structure, write node functions that update parts of the state, connect them with edges, and `compile` the graph to make it runnable.
Now that you understand how to define the overall *structure* of your application using `StateGraph`, the next step is to dive deeper into what constitutes a **Node**.
Let's move on to [Chapter 2: Nodes (`PregelNode`)](02_nodes___pregelnode__.md) to explore how individual steps are defined and executed.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,224 @@
# Chapter 2: Nodes (`PregelNode`) - The Workers of Your Graph
In [Chapter 1: Graph / StateGraph](01_graph___stategraph.md), we learned how `StateGraph` acts as a blueprint or a flowchart for our application. It defines the overall structure and the shared "whiteboard" (the State) that holds information.
But who actually does the work? If the `StateGraph` is the assembly line blueprint, who are the workers on the line?
That's where **Nodes** come in!
## What Problem Do Nodes Solve?
Think back to our cake baking analogy from Chapter 1. We had steps like "mix dry ingredients," "mix wet ingredients," "combine," etc. Each of these distinct actions needs to be performed by someone or something.
In LangGraph, **Nodes** represent these individual units of work or computation steps within your graph.
* **Analogy:** Imagine chefs in a kitchen (the graph). Each chef (node) has a specific task: one chops vegetables, another mixes the sauce, another cooks the main course. They all work with shared ingredients (the state) from the pantry and fridge, and they put their finished components back for others to use.
Nodes are the core building blocks that perform the actual logic of your application.
## Key Concepts: What Makes a Node?
1. **The Action:** At its heart, a node is usually a Python function or a LangChain Runnable. This is the code that gets executed when the node runs.
2. **Input:** A node typically reads data it needs from the shared graph **State**. It receives the *current* state when it's invoked. In our `StateGraph` example from Chapter 1, both `add_one` and `multiply_by_two` received the `state` dictionary containing the current `value`.
3. **Execution:** The node runs its defined logic (the function or Runnable).
4. **Output:** After executing, a node in a `StateGraph` returns a dictionary. This dictionary specifies *which parts* of the shared state the node wants to *update* and what the new values should be. LangGraph takes care of merging these updates back into the main state.
## Adding Nodes to Your Graph (`add_node`)
How do we tell our `StateGraph` about these workers? We use the `add_node` method.
Let's revisit the code from Chapter 1:
**Step 1: Define the Node Functions**
These are our "workers". They take the state and return updates.
```python
from typing import TypedDict
# Define the state structure (the whiteboard)
class MyState(TypedDict):
value: int
# Node 1: Adds 1 to the value
def add_one(state: MyState) -> dict:
print("--- Running Adder Node ---")
current_value = state['value']
new_value = current_value + 1
print(f"Input value: {current_value}, Output value: {new_value}")
# Return *only* the key we want to update
return {"value": new_value}
# Node 2: Multiplies the value by 2
def multiply_by_two(state: MyState) -> dict:
print("--- Running Multiplier Node ---")
current_value = state['value']
new_value = current_value * 2
print(f"Input value: {current_value}, Output value: {new_value}")
# Return the update
return {"value": new_value}
```
**Step 2: Create the Graph and Add Nodes**
Here's where we hire our workers and assign them names on the assembly line.
```python
from langgraph.graph import StateGraph
# Create the graph builder linked to our state
workflow = StateGraph(MyState)
# Add the first node:
# Give it the name "adder" and tell it to use the 'add_one' function
workflow.add_node("adder", add_one)
# Add the second node:
# Give it the name "multiplier" and tell it to use the 'multiply_by_two' function
workflow.add_node("multiplier", multiply_by_two)
# (Edges like set_entry_point, add_edge, etc. define the flow *between* nodes)
# ... add edges and compile ...
```
* `workflow.add_node("adder", add_one)`: This line registers the `add_one` function as a node within the `workflow` graph. We give it the unique name `"adder"`. When the graph needs to execute the "adder" step, it will call our `add_one` function.
* `workflow.add_node("multiplier", multiply_by_two)`: Similarly, this registers the `multiply_by_two` function under the name `"multiplier"`.
It's that simple! You define what a step does (the function) and then register it with `add_node`, giving it a name so you can connect it using edges later.
## How Do Nodes Actually Run? (Under the Hood)
You've defined the functions and added them as nodes. What happens internally when the graph executes?
1. **Triggering:** The [Pregel Execution Engine](05_pregel_execution_engine.md) (LangGraph's internal coordinator) determines which node should run next based on the graph's structure (edges) and the current state. For example, after the `START` point, it knows to run the entry point node ("adder" in our example).
2. **Reading State:** Before running the node's function (`add_one`), the engine reads the necessary information from the shared state. It knows what the function needs (the `MyState` dictionary). This reading happens via mechanisms called [Channels](03_channels.md), which manage the shared state.
3. **Invoking the Function:** The engine calls the node's function (e.g., `add_one`), passing the state it just read (`{'value': 5}`).
4. **Executing Logic:** Your function's code runs (e.g., `5 + 1`).
5. **Receiving Updates:** The engine receives the dictionary returned by the function (e.g., `{'value': 6}`).
6. **Writing State:** The engine uses [Channels](03_channels.md) again to update the shared state with the information from the returned dictionary. The state on the "whiteboard" is now modified (e.g., becomes `{'value': 6}`).
7. **Next Step:** The engine then looks for the next edge originating from the completed node ("adder") to determine what runs next ("multiplier").
Here's a simplified view of the "adder" node executing:
```mermaid
sequenceDiagram
participant Engine as Pregel Engine
participant State (via Channels)
participant AdderNode as adder (add_one func)
Engine->>State (via Channels): Read 'value' (current state is {'value': 5})
State (via Channels)-->>Engine: Returns {'value': 5}
Engine->>AdderNode: Invoke add_one({'value': 5})
Note over AdderNode: Function executes: 5 + 1 = 6
AdderNode-->>Engine: Return {'value': 6}
Engine->>State (via Channels): Write update: 'value' = 6
State (via Channels)-->>Engine: Acknowledge (state is now {'value': 6})
Engine->>Engine: Find next node based on edge from "adder"
```
## A Peek at the Code (`graph/state.py`, `pregel/read.py`)
Let's look at simplified snippets to see how this maps to the code:
* **`StateGraph.add_node` (`graph/state.py`)**:
```python
# Simplified view
class StateGraph(Graph):
# ... (other methods) ...
def add_node(
self,
node: str, # The name you give the node (e.g., "adder")
action: RunnableLike, # The function or Runnable (e.g., add_one)
*,
# ... other optional parameters ...
input: Optional[Type[Any]] = None, # Optional: specific input type for this node
) -> Self:
# ... (checks for valid name, etc.) ...
if node in self.channels: # Can't use a state key name as a node name
raise ValueError(...)
# Converts your function into a standard LangChain Runnable if needed
runnable = coerce_to_runnable(action, ...)
# Stores the node's details, including the runnable and input schema
self.nodes[node] = StateNodeSpec(
runnable=runnable,
metadata=None, # Optional metadata
input=input or self.schema, # Default to graph's main state schema
# ... other details ...
)
return self
```
When you call `add_node`, LangGraph stores your function (`action`) under the given `node` name. It wraps your function into a standard `Runnable` object (`coerce_to_runnable`) and keeps track of what input schema it expects (usually the graph's main state schema). This stored information is a `StateNodeSpec`.
* **`CompiledStateGraph.attach_node` (`graph/state.py`)**:
```python
# Simplified view (during graph.compile())
class CompiledStateGraph(CompiledGraph):
# ... (other methods) ...
def attach_node(self, key: str, node: Optional[StateNodeSpec]) -> None:
# ... (handles START node specially) ...
if node is not None:
# Determine what parts of the state this node needs to read
input_schema = node.input
input_values = list(self.builder.schemas[input_schema]) # Keys to read
# Create the internal representation: PregelNode
self.nodes[key] = PregelNode(
triggers=[f"branch:to:{key}"], # When should this node run? (Connected via Channels)
channels=input_values, # What state keys does it read?
mapper=_pick_mapper(...), # How to format the input state for the function
writers=[ChannelWrite(...)], # How to write the output back to state (via Channels)
bound=node.runnable, # The actual function/Runnable to execute!
# ... other internal details ...
)
# ...
```
During the `compile()` step, the information stored in `StateNodeSpec` is used to create the actual operational node object, which is internally called `PregelNode`. This `PregelNode` is the real "worker" managed by the execution engine.
* **`PregelNode` (`pregel/read.py`)**:
```python
# Simplified view
class PregelNode(Runnable):
channels: Union[list[str], Mapping[str, str]] # State keys to read as input
triggers: list[str] # Channel updates that activate this node
mapper: Optional[Callable[[Any], Any]] # Function to format input state
writers: list[Runnable] # Runnables to write output back to Channels
bound: Runnable[Any, Any] # << THE ACTUAL FUNCTION/RUNNABLE YOU PROVIDED >>
# ... other attributes like retry policy, tags, etc. ...
def __init__(self, *, channels, triggers, writers, bound, ...) -> None:
self.channels = channels
self.triggers = list(triggers)
self.writers = writers or []
self.bound = bound # Your code lives here!
# ... initialize other attributes ...
# ... (methods for execution, handled by the Pregel engine) ...
```
The `PregelNode` object encapsulates everything needed to run your node:
* `bound`: This holds the actual function or Runnable you passed to `add_node`.
* `channels`: Specifies which parts of the state (managed by [Channels](03_channels.md)) to read as input.
* `triggers`: Specifies which [Channels](03_channels.md) must be updated to make this node eligible to run.
* `writers`: Defines how the output of `bound` should be written back to the state using [Channels](03_channels.md).
Don't worry too much about `PregelNode` details right now. The key idea is that `add_node` registers your function, and `compile` turns it into an executable component (`PregelNode`) that the graph engine can manage, telling it when to run, what state to read, and how to write results back.
## Conclusion
You've now learned about the "workers" in your LangGraph application: **Nodes**.
* Nodes are the individual computational steps defined by Python functions or LangChain Runnables.
* They read from the shared `StateGraph` state.
* They execute their logic.
* They return dictionaries specifying updates to the state.
* You add them to your graph using `graph.add_node("node_name", your_function)`.
* Internally, they are represented as `PregelNode` objects, managed by the execution engine.
We have the blueprint (`StateGraph`) and the workers (`Nodes`). But how exactly does information get passed around? How does the "adder" node's output (`{'value': 6}`) reliably get to the "multiplier" node? How is the state managed efficiently?
That's the role of [Chapter 3: Channels](03_channels.md), the communication system of the graph.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,426 @@
# Chapter 3: Channels - The Communication System
In [Chapter 1: Graph / StateGraph](01_graph___stategraph.md), we learned about the `StateGraph` as the blueprint for our application, holding the shared "whiteboard" or state. In [Chapter 2: Nodes (`PregelNode`)](02_nodes___pregelnode__.md), we met the "workers" or Nodes that perform tasks and read/write to this whiteboard.
But how does this "whiteboard" *actually* work? How does the information written by one node reliably get seen by the next? What happens if multiple nodes try to write to the *same part* of the whiteboard at roughly the same time?
This is where **Channels** come in. They are the fundamental mechanism for communication and state management within a `StateGraph`.
## What Problem Do Channels Solve?
Imagine our simple graph from Chapter 1:
```python
# State: {'value': int}
# Node 1: adder (reads 'value', returns {'value': value + 1})
# Node 2: multiplier (reads 'value', returns {'value': value * 2})
# Flow: START -> adder -> multiplier -> END
```
When `adder` runs with `{'value': 5}`, it returns `{'value': 6}`. How does this update the central state so that `multiplier` receives `{'value': 6}` and not the original `{'value': 5}`?
Furthermore, what if we had a more complex graph where two different nodes, say `node_A` and `node_B`, both finished their work and *both* wanted to update the `value` key in the same step? Should the final `value` be the one from `node_A`, the one from `node_B`, their sum, or something else?
**Channels** solve these problems by defining:
1. **Storage:** How the value for a specific key in the state is stored.
2. **Update Logic:** How incoming updates for that key are combined or processed.
## Channels: Mailboxes for Your State
Think of the shared state (our "whiteboard") not as one big surface, but as a collection of **mailboxes**.
* **Each key in your state dictionary (`MyState`) gets its own dedicated mailbox.** In our example, there's a mailbox labeled `"value"`.
* When a Node finishes and returns a dictionary (like `{'value': 6}`), the [Pregel Execution Engine](05_pregel_execution_engine.md) acts like a mail carrier. It takes the value `6` and puts it into the mailbox labeled `"value"`.
* When another Node needs to read the state, the engine goes to the relevant mailboxes (like `"value"`) and gets the current contents.
This mailbox concept ensures that updates intended for `"value"` only affect `"value"`, and updates for another key (say, `"messages"`) would go into *its* own separate mailbox.
**Crucially, each mailbox (Channel) has specific rules about how incoming mail (updates) is handled.** Does the new mail replace the old one? Is it added to a list? Is it mathematically combined with the previous value? These rules are defined by the **Channel Type**.
## How Channels Work: The Update Cycle
Here's a step-by-step view of how channels manage state during graph execution:
1. **Node Returns Update:** A node (e.g., `adder`) finishes and returns a dictionary (e.g., `{'value': 6}`).
2. **Engine Routes Update:** The [Pregel Execution Engine](05_pregel_execution_engine.md) sees the key `"value"` and routes the update `6` to the Channel associated with `"value"`.
3. **Channel Receives Update(s):** The `"value"` Channel receives `6`. If other nodes also returned updates for `"value"` in the same step, the Channel would receive all of them in a sequence (e.g., `[6, maybe_another_update]`).
4. **Channel Applies Update Logic:** The Channel uses its specific rule (its type) to process the incoming update(s). For example, a `LastValue` channel would just keep the *last* update it received in the sequence. A `BinaryOperatorAggregate` channel might *sum* all the updates with its current value.
5. **State is Updated:** The Channel now holds the new, processed value.
6. **Node Reads State:** When the next node (e.g., `multiplier`) needs the state, the Engine queries the relevant Channels (e.g., the `"value"` Channel).
7. **Channel Provides Value:** The Channel provides its current stored value (e.g., `6`) to the Engine, which passes it to the node.
This ensures that state updates are handled consistently according to predefined rules for each piece of state.
## Common Channel Types: Defining the Rules
LangGraph provides several types of Channels, each with different update logic. You usually define which channel type to use for a state key when you define your state `TypedDict`, often using `typing.Annotated`.
Here are the most common ones:
1. **`LastValue[T]`** (The Default Overwriter)
* **Rule:** Keeps only the **last** value it received. If multiple updates arrive in the same step, the final value is simply the last one in the sequence processed by the engine.
* **Analogy:** Like a standard variable assignment (`my_variable = new_value`). The old value is discarded.
* **When to Use:** This is the **default** for keys in your `TypedDict` state unless you specify otherwise with `Annotated`. It's perfect for state values that should be replaced entirely, like the current step's result or a user's latest query.
* **Code:** `langgraph.channels.LastValue` (from `channels/last_value.py`)
```python
# channels/last_value.py (Simplified)
class LastValue(Generic[Value], BaseChannel[Value, Value, Value]):
# ... (initializer, etc.)
value: Any = MISSING # Stores the single, last value
def update(self, values: Sequence[Value]) -> bool:
if len(values) == 0: # No updates this step
return False
# If multiple updates in one step, only the last one matters!
# Example: if values = [update1, update2], self.value becomes update2
self.value = values[-1]
return True
def get(self) -> Value:
if self.value is MISSING:
raise EmptyChannelError()
return self.value # Return the currently stored last value
```
* **How to Use (Implicitly):**
```python
from typing import TypedDict
class MyState(TypedDict):
# Because we didn't use Annotated, LangGraph defaults to LastValue[int]
value: int
user_query: str # Also defaults to LastValue[str]
```
2. **`BinaryOperatorAggregate[T]`** (The Combiner)
* **Rule:** Takes an initial "identity" value (like `0` for addition, `1` for multiplication) and a **binary operator** function (e.g., `+`, `*`, `operator.add`). When it receives updates, it applies the operator between its current value and each new update, accumulating the result.
* **Analogy:** Like a running total (`total += new_number`).
* **When to Use:** Useful for accumulating scores, counts, or combining numerical results.
* **Code:** `langgraph.channels.BinaryOperatorAggregate` (from `channels/binop.py`)
```python
# channels/binop.py (Simplified)
import operator
from typing import Callable
class BinaryOperatorAggregate(Generic[Value], BaseChannel[Value, Value, Value]):
# ... (initializer stores the operator and identity value)
value: Any = MISSING
operator: Callable[[Value, Value], Value]
def update(self, values: Sequence[Value]) -> bool:
if not values:
return False
# Start with the first value if the channel was empty
if self.value is MISSING:
self.value = values[0]
values = values[1:]
# Apply the operator for all subsequent values
for val in values:
self.value = self.operator(self.value, val)
return True
def get(self) -> Value:
# ... (return self.value, handling MISSING)
```
* **How to Use (Explicitly with `Annotated`):**
```python
import operator
from typing import TypedDict, Annotated
from langgraph.channels import BinaryOperatorAggregate
class AgentState(TypedDict):
# Use Annotated to specify the channel type and operator
total_score: Annotated[int, BinaryOperatorAggregate(int, operator.add)]
# ^^^ state key 'total_score' will use BinaryOperatorAggregate with addition
```
3. **`Topic[T]`** (The Collector)
* **Rule:** Collects all updates it receives into a **list**. By default (`accumulate=False`), it clears the list after each step, so `get()` returns only the updates from the *immediately preceding* step. If `accumulate=True`, it keeps adding to the list across multiple steps.
* **Analogy:** Like appending to a log file or a list (`my_list.append(new_item)`).
* **When to Use:** Great for gathering messages in a conversation (`MessageGraph` uses this internally!), collecting events, or tracking a sequence of results.
* **Code:** `langgraph.channels.Topic` (from `channels/topic.py`)
```python
# channels/topic.py (Simplified)
from typing import Sequence, List, Union
class Topic(Generic[Value], BaseChannel[Sequence[Value], Union[Value, list[Value]], list[Value]]):
# ... (initializer sets accumulate flag)
values: list[Value]
accumulate: bool
def update(self, updates: Sequence[Union[Value, list[Value]]]) -> bool:
old_len = len(self.values)
# Clear list if not accumulating
if not self.accumulate:
self.values = []
# Flatten and extend the list with new updates
new_values = list(flatten(updates)) # flatten handles list-of-lists
self.values.extend(new_values)
return len(self.values) != old_len # Return True if list changed
def get(self) -> Sequence[Value]:
# ... (return list(self.values), handling empty)
```
* **How to Use (Explicitly with `Annotated`):**
```python
from typing import TypedDict, Annotated, List
from langgraph.channels import Topic
class ChatState(TypedDict):
# Use Annotated to specify the Topic channel
# The final type hint for the state is List[str]
chat_history: Annotated[List[str], Topic(str, accumulate=True)]
# ^^^ state key 'chat_history' will use Topic to accumulate strings
```
There are other specialized channels like `EphemeralValue` (clears after reading) and `Context` (allows passing values down without modifying state), but `LastValue`, `BinaryOperatorAggregate`, and `Topic` are the most fundamental.
## Channels in Action: Our Simple Graph Revisited
Let's trace our `adder` -> `multiplier` graph again, focusing on the implicit `LastValue` channel for the `"value"` key:
```python
from typing import TypedDict
from langgraph.graph import StateGraph, END, START
# State uses implicit LastValue[int] for 'value'
class MyState(TypedDict):
value: int
# Nodes (same as before)
def add_one(state: MyState) -> dict:
return {"value": state['value'] + 1}
def multiply_by_two(state: MyState) -> dict:
return {"value": state['value'] * 2}
# Graph setup (same as before)
workflow = StateGraph(MyState)
workflow.add_node("adder", add_one)
workflow.add_node("multiplier", multiply_by_two)
workflow.set_entry_point("adder")
workflow.add_edge("adder", "multiplier")
workflow.add_edge("multiplier", END)
app = workflow.compile()
# Execution with initial state {"value": 5}
initial_state = {"value": 5}
final_state = app.invoke(initial_state)
```
Here's the flow with the Channel involved:
```mermaid
sequenceDiagram
participant User
participant App as CompiledGraph
participant Engine as Pregel Engine
participant ValueChannel as "value" (LastValue)
participant AdderNode as adder
participant MultiplierNode as multiplier
User->>App: invoke({"value": 5})
App->>Engine: Start execution
Engine->>ValueChannel: Initialize/Set state from input (value = 5)
App->>Engine: Entry point is "adder"
Engine->>ValueChannel: Read current value (5)
ValueChannel-->>Engine: Returns 5
Engine->>AdderNode: Execute(state={'value': 5})
AdderNode-->>Engine: Return {"value": 6}
Engine->>ValueChannel: Update with [6]
Note over ValueChannel: LastValue rule: value becomes 6
ValueChannel-->>Engine: Acknowledge update
Engine->>Engine: Follow edge "adder" -> "multiplier"
Engine->>ValueChannel: Read current value (6)
ValueChannel-->>Engine: Returns 6
Engine->>MultiplierNode: Execute(state={'value': 6})
MultiplierNode-->>Engine: Return {"value": 12}
Engine->>ValueChannel: Update with [12]
Note over ValueChannel: LastValue rule: value becomes 12
ValueChannel-->>Engine: Acknowledge update
Engine->>Engine: Follow edge "multiplier" -> END
Engine->>ValueChannel: Read final value (12)
ValueChannel-->>Engine: Returns 12
Engine->>App: Execution finished, final state {'value': 12}
App->>User: Return final state {'value': 12}
```
The `LastValue` channel ensures that the output of `adder` correctly overwrites the initial state before `multiplier` reads it.
## Example: Using `BinaryOperatorAggregate` Explicitly
Let's modify the state to *sum* values instead of overwriting them.
```python
import operator
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END, START
# Import the channel type
from langgraph.channels import BinaryOperatorAggregate
# Define state with an explicitly configured channel
class SummingState(TypedDict):
# Use Annotated to specify the channel and its operator (addition)
value: Annotated[int, BinaryOperatorAggregate(int, operator.add)]
# Node 1: Returns 5 to be ADDED to the current value
def add_five(state: SummingState) -> dict:
print(f"--- Running Adder Node 1 (current value: {state.get('value', 0)}) ---")
# Note: We return the *increment*, not the new total
return {"value": 5}
# Node 2: Returns 10 to be ADDED to the current value
def add_ten(state: SummingState) -> dict:
print(f"--- Running Adder Node 2 (current value: {state['value']}) ---")
# Note: We return the *increment*, not the new total
return {"value": 10}
# Create graph
workflow = StateGraph(SummingState)
workflow.add_node("adder1", add_five)
workflow.add_node("adder2", add_ten)
workflow.set_entry_point("adder1")
workflow.add_edge("adder1", "adder2")
workflow.add_edge("adder2", END)
app = workflow.compile()
# Run with initial state value = 0 (BinaryOperatorAggregate defaults int to 0)
print("Invoking graph...")
# You could also provide an initial value: app.invoke({"value": 100})
final_state = app.invoke({})
print("\n--- Final State ---")
print(final_state)
```
**Expected Output:**
```text
Invoking graph...
--- Running Adder Node 1 (current value: 0) ---
--- Running Adder Node 2 (current value: 5) ---
--- Final State ---
{'value': 15}
```
Because we used `Annotated[int, BinaryOperatorAggregate(int, operator.add)]`, the `"value"` channel now *adds* incoming updates (`5` then `10`) to its current state, resulting in a final sum of `15`.
## How `StateGraph` Finds the Right Channel
You might wonder how `StateGraph` knows whether to use `LastValue` or something else. When you initialize `StateGraph(MyState)`, it inspects your state schema (`MyState`).
* It uses Python's `get_type_hints(MyState, include_extras=True)` to look at each field (like `value`).
* If a field has `Annotated[SomeType, SomeChannelConfig]`, it uses `SomeChannelConfig` (e.g., `BinaryOperatorAggregate(...)`, `Topic(...)`) to create the channel for that key.
* If a field is just `SomeType` (like `value: int`), it defaults to creating a `LastValue[SomeType]` channel for that key.
This logic is primarily handled within the `StateGraph._add_schema` method, which calls internal helpers like `_get_channels`.
```python
# graph/state.py (Simplified view of channel detection)
def _get_channels(schema: Type[dict]) -> tuple[...]:
# ... gets type hints including Annotated metadata ...
type_hints = get_type_hints(schema, include_extras=True)
all_keys = {}
for name, typ in type_hints.items():
# Checks if the annotation specifies a channel or binop
if channel := _is_field_channel(typ) or _is_field_binop(typ):
channel.key = name
all_keys[name] = channel
else:
# Default case: Use LastValue
fallback = LastValue(typ)
fallback.key = name
all_keys[name] = fallback
# ... separate BaseChannel instances from ManagedValueSpec ...
return channels, managed_values, type_hints
def _is_field_channel(typ: Type[Any]) -> Optional[BaseChannel]:
# Checks if Annotated metadata contains a BaseChannel instance or class
if hasattr(typ, "__metadata__"):
meta = typ.__metadata__
if len(meta) >= 1 and isinstance(meta[-1], BaseChannel):
return meta[-1] # Return the channel instance directly
# ... (handle channel classes too) ...
return None
def _is_field_binop(typ: Type[Any]) -> Optional[BinaryOperatorAggregate]:
# Checks if Annotated metadata contains a callable (the reducer function)
if hasattr(typ, "__metadata__"):
meta = typ.__metadata__
if len(meta) >= 1 and callable(meta[-1]):
# ... (validate function signature) ...
return BinaryOperatorAggregate(typ, meta[-1]) # Create binop channel
return None
# --- In StateGraph.__init__ ---
# self._add_schema(state_schema) # This calls _get_channels
```
## Under the Hood: `BaseChannel`
All channel types inherit from a base class called `BaseChannel`. This class defines the common interface that the [Pregel Execution Engine](05_pregel_execution_engine.md) uses to interact with any channel.
```python
# channels/base.py (Simplified Abstract Base Class)
from abc import ABC, abstractmethod
from typing import Generic, Sequence, TypeVar
Value = TypeVar("Value") # The type of the stored state
Update = TypeVar("Update") # The type of incoming updates
Checkpoint = TypeVar("Checkpoint") # The type of saved state
class BaseChannel(Generic[Value, Update, Checkpoint], ABC):
# ... (init, type properties) ...
@abstractmethod
def update(self, values: Sequence[Update]) -> bool:
"""Combines the sequence of updates with the current channel value."""
# Must be implemented by subclasses (like LastValue, Topic)
pass
@abstractmethod
def get(self) -> Value:
"""Returns the current value of the channel."""
# Must be implemented by subclasses
pass
@abstractmethod
def checkpoint(self) -> Checkpoint:
"""Returns a serializable representation of the channel's state."""
# Used by the Checkpointer
pass
@abstractmethod
def from_checkpoint(self, checkpoint: Checkpoint) -> Self:
"""Creates a new channel instance from a saved checkpoint."""
# Used by the Checkpointer
pass
```
The specific logic for `LastValue`, `Topic`, `BinaryOperatorAggregate`, etc., is implemented within their respective `update` and `get` methods, adhering to this common interface. The `checkpoint` and `from_checkpoint` methods are crucial for saving and loading the graph's state, which we'll explore more in [Chapter 6: Checkpointer (`BaseCheckpointSaver`)](06_checkpointer___basecheckpointsaver__.md).
## Conclusion
You've learned about **Channels**, the crucial communication and state management system within LangGraph's `StateGraph`.
* Channels act like **mailboxes** for each key in your graph's state.
* They define **how updates are combined** when nodes write to the state.
* The default channel is **`LastValue`**, which overwrites the previous value.
* You can use `typing.Annotated` in your state definition to specify other channel types like **`BinaryOperatorAggregate`** (for combining values, e.g., summing) or **`Topic`** (for collecting updates into a list).
* `StateGraph` automatically creates the correct channel for each state key based on your type hints.
Understanding channels helps you control precisely how information flows and accumulates in your stateful applications.
Now that we know how the state is managed (Channels) and how work gets done (Nodes), how do we control the *flow* of execution? What if we want to go to different nodes based on the current state? That's where conditional logic comes in.
Let's move on to [Chapter 4: Control Flow Primitives (`Branch`, `Send`, `Interrupt`)](04_control_flow_primitives___branch____send____interrupt__.md) to learn how to direct the traffic within our graph.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,608 @@
# Chapter 4: Control Flow Primitives (`Branch`, `Send`, `Interrupt`)
In [Chapter 3: Channels](03_channels.md), we saw how information is stored and updated in our graph's shared state using Channels. We have the blueprint ([`StateGraph`](01_graph___stategraph.md)), the workers ([`Nodes`](02_nodes___pregelnode__.md)), and the communication system ([Channels](03_channels.md)).
But what if we don't want our graph to follow a single, fixed path? What if we need it to make decisions? For example, imagine a chatbot: sometimes it needs to use a tool (like a search engine), and other times it can answer directly. How do we tell the graph *which* path to take based on the current situation?
This is where **Control Flow Primitives** come in. They are special mechanisms that allow you to dynamically direct the execution path of your graph, making it much more flexible and powerful.
## What Problem Do Control Flow Primitives Solve?
Think of our graph like a train system. So far, we've only built tracks that go in a straight line from one station (node) to the next. Control flow primitives are like the **switches** and **signals** that allow the train (our execution flow) to:
1. **Choose a path:** Decide whether to go left or right at a junction based on some condition (like an "if" statement).
2. **Dispatch specific trains:** Send a specific piece of cargo directly to a particular station, maybe even multiple pieces to the same station to be processed in parallel.
3. **Wait for instructions:** Pause the train journey until an external signal (like human approval) is given.
LangGraph provides three main primitives for this:
* **`Branch`**: Acts like a conditional router or switch ("if/else"). It directs the flow to different nodes based on the current state.
* **`Send`**: Allows a node to directly trigger another node with specific input, useful for parallel processing patterns like map-reduce.
* **`Interrupt`**: Pauses the graph execution, usually to wait for external input (like a human clicking "Approve") before continuing.
Let's explore each one.
## 1. `Branch` - The Conditional Router
Imagine our chatbot needs to decide: "Should I use the search tool, or can I answer from my knowledge?" This decision depends on the conversation history or the user's specific question stored in the graph's state.
The `Branch` primitive allows us to implement this kind of conditional logic. You add it using the `graph.add_conditional_edges()` method.
**How it Works:**
1. You define a regular node (let's call it `should_i_search`).
2. You define a separate **routing function**. This function takes the current state and decides *which node* should run next. It returns the name of the next node (or a list of names).
3. You connect the `should_i_search` node to the routing function using `add_conditional_edges`. You tell it: "After `should_i_search` finishes, call this routing function to decide where to go next."
4. You provide a mapping (a dictionary) that links the possible return values of your routing function to the actual node names in your graph.
**Example: Chatbot Deciding to Search**
Let's build a tiny graph that decides whether to go to a `search_tool` node or a `respond_directly` node.
**Step 1: Define State**
```python
from typing import TypedDict, Annotated, List
import operator
class ChatState(TypedDict):
user_query: str
# We'll store the decision here
next_action: str
# Keep track of intermediate results
search_result: Annotated[List[str], operator.add] # Use Topic or add if accumulating
final_response: str
```
Our state holds the user's query and a field `next_action` to store the decision.
**Step 2: Define Nodes**
```python
# Node that decides the next step
def determine_action(state: ChatState) -> dict:
print("--- Determining Action ---")
query = state['user_query']
if "weather" in query.lower():
print("Decision: Need to use search tool for weather.")
return {"next_action": "USE_TOOL"}
else:
print("Decision: Can respond directly.")
return {"next_action": "RESPOND"}
# Node representing the search tool
def run_search_tool(state: ChatState) -> dict:
print("--- Using Search Tool ---")
query = state['user_query']
# Simulate finding a result
result = f"Search result for '{query}': It's sunny!"
# We return the result to be ADDED to the state list
return {"search_result": [result]} # Return as list for operator.add
# Node that generates a final response
def generate_response(state: ChatState) -> dict:
print("--- Generating Response ---")
if state.get("search_result"):
response = f"Based on my search: {state['search_result'][-1]}"
else:
response = f"Responding directly to: {state['user_query']}"
return {"final_response": response}
```
**Step 3: Define the Routing Function**
This function reads the `next_action` from the state and returns the *key* we'll use in our mapping.
```python
def route_based_on_action(state: ChatState) -> str:
print("--- Routing ---")
action = state['next_action']
print(f"Routing based on action: {action}")
if action == "USE_TOOL":
return "route_to_tool" # This key must match our path_map
else:
return "route_to_respond" # This key must match our path_map
```
**Step 4: Build the Graph with Conditional Edges**
```python
from langgraph.graph import StateGraph, END, START
workflow = StateGraph(ChatState)
workflow.add_node("decider", determine_action)
workflow.add_node("search_tool", run_search_tool)
workflow.add_node("responder", generate_response)
workflow.set_entry_point("decider")
# After 'decider', call 'route_based_on_action' to choose the next step
workflow.add_conditional_edges(
"decider", # Start node
route_based_on_action, # The routing function
{
# Map the routing function's output to actual node names
"route_to_tool": "search_tool",
"route_to_respond": "responder"
}
)
# Define what happens *after* the conditional paths
workflow.add_edge("search_tool", "responder") # After searching, generate response
workflow.add_edge("responder", END) # After responding, end
# Compile
app = workflow.compile()
```
* `add_conditional_edges("decider", route_based_on_action, ...)`: This is the key part. It tells LangGraph: after the "decider" node runs, execute the `route_based_on_action` function.
* `path_map = {"route_to_tool": "search_tool", ...}`: This dictionary maps the string returned by `route_based_on_action` to the actual next node to execute.
**Step 5: Run It!**
```python
# Scenario 1: Query needs the tool
print("--- Scenario 1: Weather Query ---")
input1 = {"user_query": "What's the weather like?"}
final_state1 = app.invoke(input1)
print("Final State 1:", final_state1)
print("\n--- Scenario 2: Direct Response ---")
# Scenario 2: Query doesn't need the tool
input2 = {"user_query": "Tell me a joke."}
final_state2 = app.invoke(input2)
print("Final State 2:", final_state2)
```
**Expected Output:**
```text
--- Scenario 1: Weather Query ---
--- Determining Action ---
Decision: Need to use search tool for weather.
--- Routing ---
Routing based on action: USE_TOOL
--- Using Search Tool ---
--- Generating Response ---
Final State 1: {'user_query': "What's the weather like?", 'next_action': 'USE_TOOL', 'search_result': ["Search result for 'What's the weather like?': It's sunny!"], 'final_response': "Based on my search: Search result for 'What's the weather like?': It's sunny!"}
--- Scenario 2: Direct Response ---
--- Determining Action ---
Decision: Can respond directly.
--- Routing ---
Routing based on action: RESPOND
--- Generating Response ---
Final State 2: {'user_query': 'Tell me a joke.', 'next_action': 'RESPOND', 'search_result': [], 'final_response': 'Responding directly to: Tell me a joke.'}
```
See how the graph took different paths based on the `next_action` set by the `decider` node and interpreted by the `route_based_on_action` function!
**Visualizing the Branch:**
```mermaid
graph TD
Start[START] --> Decider(decider);
Decider -- route_based_on_action --> Route{Routing Logic};
Route -- "route_to_tool" --> Search(search_tool);
Route -- "route_to_respond" --> Respond(responder);
Search --> Respond;
Respond --> End(END);
```
**Internals (`graph/branch.py`)**
* When you call `add_conditional_edges`, LangGraph stores a `Branch` object (`graph/branch.py`). This object holds your routing function (`path`) and the mapping (`path_map` / `ends`).
* During execution, after the source node ("decider") finishes, the [Pregel Execution Engine](05_pregel_execution_engine.md) runs the `Branch` object.
* The `Branch.run()` method eventually calls your routing function (`_route` or `_aroute` internally) with the current state.
* It takes the return value (e.g., "route_to_tool"), looks it up in the `ends` dictionary to get the actual node name ("search_tool"), and tells the engine to schedule that node next.
```python
# graph/branch.py (Simplified view)
class Branch(NamedTuple):
path: Runnable # Your routing function wrapped as a Runnable
ends: Optional[dict[Hashable, str]] # Your path_map
# ... other fields ...
def _route(self, input: Any, config: RunnableConfig, ...) -> Runnable:
# ... reads current state if needed ...
value = ... # Get the state
result = self.path.invoke(value, config) # Call your routing function
# ... determines destination node(s) using self.ends mapping ...
destinations = [self.ends[r] for r in result]
# ... tells the engine (via writer) which node(s) to run next ...
return writer(destinations, config) or input # writer is a callback to the engine
# graph/state.py (Simplified view)
class StateGraph(Graph):
# ...
def add_conditional_edges(self, source, path, path_map, ...):
# ... wrap 'path' into a Runnable ...
runnable_path = coerce_to_runnable(path, ...)
# Create and store the Branch object
self.branches[source][name] = Branch.from_path(runnable_path, path_map, ...)
return self
```
## 2. `Send` - Directing Specific Traffic
Sometimes, you don't just want to choose *one* path, but you want to trigger a *specific* node with *specific* data, possibly multiple times. This is common in "map-reduce" patterns where you split a task into smaller pieces, process each piece independently, and then combine the results.
The `Send` primitive allows a node (or a conditional edge function) to directly "send" a piece of data to another node, telling the engine: "Run *this* node next, and give it *this* input."
**How it Works:**
1. You import `Send` from `langgraph.graph` (or `langgraph.types`).
2. In a node or a conditional edge function, instead of just returning a state update or a node name, you return `Send(target_node_name, data_for_that_node)`.
3. You can return a list of `Send` objects to trigger multiple node executions, potentially in parallel (depending on the executor).
**Example: Simple Map-Reduce**
Let's imagine we want to process a list of items. One node splits the list, another node processes each item individually (the "map" step), and a final node aggregates the results (the "reduce" step).
**Step 1: Define State**
```python
from typing import TypedDict, List, Annotated
import operator
class MapReduceState(TypedDict):
items_to_process: List[str]
# Use Topic or operator.add to collect results from worker nodes
processed_items: Annotated[List[str], operator.add]
final_result: str
```
**Step 2: Define Nodes**
```python
# Node to prepare items (not really needed here, but shows the flow)
def prepare_items(state: MapReduceState) -> dict:
print("--- Preparing Items (No change) ---")
# In a real scenario, this might fetch or generate the items
return {}
# Node to process a single item (Our "Worker")
def process_single_item(state: dict) -> dict:
# Note: This node receives the dict passed via Send, NOT the full MapReduceState
item = state['item']
print(f"--- Processing Item: {item} ---")
processed = f"Processed_{item.upper()}"
# Return the processed item to be ADDED to the list in the main state
return {"processed_items": [processed]} # Return list for operator.add
# Node to aggregate results
def aggregate_results(state: MapReduceState) -> dict:
print("--- Aggregating Results ---")
all_processed = state['processed_items']
final = ", ".join(all_processed)
return {"final_result": final}
```
**Step 3: Define the Dispatching Function (using `Send`)**
This function will run after `prepare_items` and will use `Send` to trigger `process_single_item` for each item.
```python
from langgraph.graph import Send # Import Send
def dispatch_work(state: MapReduceState) -> List[Send]:
print("--- Dispatching Work ---")
items = state['items_to_process']
send_packets = []
for item in items:
print(f"Sending item '{item}' to worker node.")
# Create a Send object for each item
# Target node: "worker"
# Data payload: a dictionary {'item': current_item}
packet = Send("worker", {"item": item})
send_packets.append(packet)
return send_packets # Return a list of Send objects
```
**Step 4: Build the Graph**
```python
from langgraph.graph import StateGraph, END, START
workflow = StateGraph(MapReduceState)
workflow.add_node("preparer", prepare_items)
workflow.add_node("worker", process_single_item) # The node targeted by Send
workflow.add_node("aggregator", aggregate_results)
workflow.set_entry_point("preparer")
# After 'preparer', call 'dispatch_work' which returns Send packets
workflow.add_conditional_edges("preparer", dispatch_work)
# NOTE: We don't need a path_map here because dispatch_work directly
# returns Send objects specifying the target node.
# The 'worker' node outputs are aggregated implicitly by the 'processed_items' channel.
# We need an edge to tell the graph when to run the aggregator.
# Let's wait until ALL workers triggered by Send are done.
# We can achieve this implicitly if the aggregator reads state written by workers.
# A simple edge ensures aggregator runs *after* the step involving workers.
# (More complex aggregation might need explicit barrier channels)
workflow.add_edge("worker", "aggregator")
workflow.add_edge("aggregator", END)
# Compile
app = workflow.compile()
```
**Step 5: Run It!**
```python
input_state = {"items_to_process": ["apple", "banana", "cherry"]}
final_state = app.invoke(input_state)
print("\nFinal State:", final_state)
```
**Expected Output (order of processing might vary):**
```text
--- Preparing Items (No change) ---
--- Dispatching Work ---
Sending item 'apple' to worker node.
Sending item 'banana' to worker node.
Sending item 'cherry' to worker node.
--- Processing Item: apple ---
--- Processing Item: banana ---
--- Processing Item: cherry ---
--- Aggregating Results ---
Final State: {'items_to_process': ['apple', 'banana', 'cherry'], 'processed_items': ['Processed_APPLE', 'Processed_BANANA', 'Processed_CHERRY'], 'final_result': 'Processed_APPLE, Processed_BANANA, Processed_CHERRY'}
```
The `dispatch_work` function returned three `Send` objects. The LangGraph engine then scheduled the "worker" node to run three times, each time with a different input dictionary (`{'item': 'apple'}`, `{'item': 'banana'}`, `{'item': 'cherry'}`). The results were automatically collected in `processed_items` thanks to the `operator.add` reducer on our `Annotated` state key. Finally, the `aggregator` ran.
**Internals (`types.py`, `constants.py`)**
* `Send(node, arg)` is a simple data class defined in `langgraph/types.py`.
* When a node or branch returns `Send` objects, the engine collects them. Internally, these are often associated with a special channel key like `TASKS` (defined in `langgraph/constants.py`).
* The [Pregel Execution Engine](05_pregel_execution_engine.md) processes these `TASKS`. For each `Send(node, arg)`, it schedules the target `node` to run in the *next* step, passing `arg` as its input.
* This allows for dynamic, data-driven invocation of nodes outside the standard edge connections.
```python
# types.py (Simplified view)
class Send:
__slots__ = ("node", "arg")
node: str # Target node name
arg: Any # Data payload for the node
def __init__(self, /, node: str, arg: Any) -> None:
self.node = node
self.arg = arg
# ... repr, eq, hash ...
# constants.py (Simplified view)
TASKS = sys.intern("__pregel_tasks") # Internal key for Send objects
# pregel/algo.py (Conceptual idea during task processing)
# if write is for TASKS channel:
# packet = write_value # This is the Send object
# # Schedule packet.node to run in the next step with packet.arg
# schedule_task(node=packet.node, input=packet.arg, ...)
```
## 3. `Interrupt` - Pausing for Instructions
Sometimes, your graph needs to stop and wait for external input before proceeding. A common case is Human-in-the-Loop (HITL), where an AI agent proposes a plan or an action, and a human needs to approve it.
The `Interrupt` primitive allows a node to pause the graph's execution and wait. This requires a [Checkpointer](06_checkpointer___basecheckpointsaver__.md) to be configured, as the graph needs to save its state to be resumable later.
**How it Works:**
1. You import `interrupt` from `langgraph.types`.
2. Inside a node, you call `interrupt(value_to_send_to_client)`.
3. This immediately raises a special `GraphInterrupt` exception.
4. The LangGraph engine catches this, saves the current state using the checkpointer, and returns control to your calling code, often signaling that an interrupt occurred. The `value_to_send_to_client` is included in the information returned.
5. Later, you can resume the graph execution by providing a value. This is typically done by invoking the compiled graph again with a special `Command(resume=value_for_interrupt)` object (from `langgraph.types`) and the same configuration (including the thread ID for the checkpointer).
6. When resumed, the graph loads the saved state. The execution engine restarts the *interrupted node from the beginning*. When the code reaches the `interrupt()` call again, instead of raising an exception, it *returns* the `value_for_interrupt` that you provided when resuming. The node then continues executing from that point.
**Example: Human Approval Step**
Let's create a graph where a node plans an action, another node presents it for human approval (using `interrupt`), and a final node executes it if approved.
**Step 1: Define State**
```python
from typing import TypedDict, Optional
class ApprovalState(TypedDict):
plan: str
# We'll use the resume value to implicitly know if approved
feedback: Optional[str] # Store feedback/approval status
```
**Step 2: Define Nodes (including interrupt)**
```python
from langgraph.types import interrupt, Command # Import interrupt and Command
# Node that creates a plan
def create_plan(state: ApprovalState) -> dict:
print("--- Creating Plan ---")
plan = "Plan: Execute risky action X."
return {"plan": plan}
# Node that requests human approval using interrupt
def request_approval(state: ApprovalState) -> dict:
print("--- Requesting Human Approval ---")
plan = state['plan']
print(f"Proposed Plan: {plan}")
# Call interrupt, passing the plan to the client
# Execution STOPS here on the first run.
feedback_or_approval = interrupt(plan)
# --- Execution RESUMES here on the second run ---
print(f"--- Resumed with feedback: {feedback_or_approval} ---")
# Store the feedback received from the resume command
return {"feedback": str(feedback_or_approval)} # Ensure it's a string
# Node that executes the plan (only if approved implicitly by resuming)
def execute_plan(state: ApprovalState) -> dict:
print("--- Executing Plan ---")
if state.get("feedback"): # Check if we got feedback (meaning we resumed)
print(f"Executing '{state['plan']}' based on feedback: {state['feedback']}")
return {} # No state change needed
else:
# This path shouldn't be hit if interrupt works correctly
print("Execution skipped (no feedback received).")
return{}
```
**Step 3: Build the Graph (with Checkpointer!)**
```python
from langgraph.graph import StateGraph, END, START
# Need a checkpointer for interrupts!
from langgraph.checkpoint.memory import MemorySaver
workflow = StateGraph(ApprovalState)
workflow.add_node("planner", create_plan)
workflow.add_node("approval_gate", request_approval)
workflow.add_node("executor", execute_plan)
workflow.set_entry_point("planner")
workflow.add_edge("planner", "approval_gate")
workflow.add_edge("approval_gate", "executor") # Runs after interrupt is resolved
workflow.add_edge("executor", END)
# Create checkpointer and compile
memory_saver = MemorySaver()
app = workflow.compile(checkpointer=memory_saver)
```
**Step 4: Run and Resume**
```python
import uuid
# Unique ID for this conversation thread is needed for the checkpointer
config = {"configurable": {"thread_id": str(uuid.uuid4())}}
print("--- Initial Invocation ---")
# Start the graph. It should interrupt at the approval node.
interrupt_info = None
for chunk in app.stream({"plan": ""}, config=config):
print(chunk)
# Check if the chunk contains interrupt information
if "__interrupt__" in chunk:
interrupt_info = chunk["__interrupt__"]
print("\n!! Graph Interrupted !!")
break # Stop processing stream after interrupt
# The client code inspects the interrupt value (the plan)
if interrupt_info:
print(f"Interrupt Value (Plan): {interrupt_info[0].value}") # interrupt_info is a tuple
# --- Simulate human interaction ---
human_decision = "Approved, proceed with caution."
print(f"\n--- Resuming with Decision: '{human_decision}' ---")
# Resume execution with the human's feedback/approval
# We pass the decision using Command(resume=...)
for chunk in app.stream(Command(resume=human_decision), config=config):
print(chunk)
else:
print("Graph finished without interruption.")
```
**Expected Output:**
```text
--- Initial Invocation ---
{'planner': {'plan': 'Plan: Execute risky action X.'}}
{'approval_gate': None} # Node starts execution
--- Requesting Human Approval ---
Proposed Plan: Plan: Execute risky action X.
{'__interrupt__': (Interrupt(value='Plan: Execute risky action X.', resumable=True, ns=..., when='during'),)} # Interrupt occurs
!! Graph Interrupted !!
Interrupt Value (Plan): Plan: Execute risky action X.
--- Resuming with Decision: 'Approved, proceed with caution.' ---
{'approval_gate': {'feedback': 'Approved, proceed with caution.'}} # Node resumes and finishes
--- Resumed with feedback: Approved, proceed with caution. ---
{'executor': {}} # Executor node runs
--- Executing Plan ---
Executing 'Plan: Execute risky action X.' based on feedback: Approved, proceed with caution.
{'__end__': {'plan': 'Plan: Execute risky action X.', 'feedback': 'Approved, proceed with caution.'}} # Graph finishes
```
The graph paused at `request_approval` after printing the plan. We then resumed it by sending `Command(resume="Approved, proceed with caution.")`. The `request_approval` node restarted, the `interrupt()` call returned our resume value, which was stored in the state, and finally, the `executor` node ran using that feedback.
**Internals (`types.py`, `errors.py`, Checkpointer)**
* The `interrupt(value)` function (in `langgraph/types.py`) checks if a resume value is available for the current step within the node.
* If no resume value exists (first run), it raises a `GraphInterrupt` exception (`langgraph/errors.py`) containing an `Interrupt` object (`langgraph/types.py`) which holds the `value`.
* The [Pregel Execution Engine](05_pregel_execution_engine.md) catches `GraphInterrupt`.
* If a [Checkpointer](06_checkpointer___basecheckpointsaver__.md) is present, the engine saves the current state (including which node was interrupted) and passes the `Interrupt` object back to the caller.
* When you resume with `Command(resume=resume_value)`, the engine loads the checkpoint.
* It knows which node was interrupted and provides the `resume_value` to it (often via a special `RESUME` entry written to the state channels, managed internally via `PregelScratchpad` in `pregel/algo.py`).
* The node restarts. When `interrupt()` is called again, it finds the `resume_value` (provided via the scratchpad or internal state) and returns it instead of raising an exception.
```python
# types.py (Simplified view)
def interrupt(value: Any) -> Any:
# ... access internal config/scratchpad ...
scratchpad = conf[CONFIG_KEY_SCRATCHPAD]
idx = scratchpad.interrupt_counter()
# Check if resume value already exists for this interrupt index
if scratchpad.resume and idx < len(scratchpad.resume):
return scratchpad.resume[idx] # Return existing resume value
# Check if a new global resume value was provided
v = scratchpad.get_null_resume(consume=True)
if v is not None:
# Store and return the new resume value
scratchpad.resume.append(v)
conf[CONFIG_KEY_SEND]([(RESUME, scratchpad.resume)]) # Update state internally
return v
# No resume value - raise the interrupt exception
raise GraphInterrupt(
(Interrupt(value=value, resumable=True, ns=...),)
)
# types.py (Simplified view)
@dataclasses.dataclass
class Interrupt:
value: Any # The value passed to interrupt()
resumable: bool = True
# ... other fields ...
# types.py (Simplified view)
@dataclasses.dataclass
class Command:
# ... other fields like update, goto ...
resume: Optional[Any] = None # Value to provide to a pending interrupt
# errors.py (Simplified view)
class GraphInterrupt(Exception): # Base class for interrupts
pass
```
## Conclusion
You've learned about the essential tools for controlling the flow of execution in your LangGraph applications:
* **`Branch`** (`add_conditional_edges`): Used to create conditional paths, like `if/else` statements, directing the flow based on the current state. Requires a routing function and often a path map.
* **`Send`**: Used to directly trigger a specific node with specific data, bypassing normal edges. Essential for patterns like map-reduce where you want to invoke the same worker node multiple times with different inputs.
* **`Interrupt`** (`langgraph.types.interrupt`): Used to pause graph execution, typically for human-in-the-loop scenarios. Requires a checkpointer and is resumed using `Command(resume=...)`.
These primitives transform your graph from a simple linear sequence into a dynamic, decision-making process capable of handling complex, real-world workflows.
Now that we understand how nodes execute, how state is managed via channels, and how control flow directs traffic, let's look at the engine that orchestrates all of this behind the scenes.
Next up: [Chapter 5: Pregel Execution Engine](05_pregel_execution_engine.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,223 @@
# Chapter 5: Pregel Execution Engine - The Engine Room
In the previous chapters, we learned how to build the blueprint of our application using [`StateGraph`](01_graph___stategraph.md), define the workers with [`Nodes`](02_nodes___pregelnode__.md), manage the shared state with [`Channels`](03_channels.md), and direct the traffic using [Control Flow Primitives](04_control_flow_primitives___branch____send____interrupt__.md).
But what actually takes all these pieces the blueprint, the workers, the communication rules, the traffic signals and makes them *run*? What ensures Node A runs, its output updates the state correctly via channels, and then Node B (or maybe Node C based on a Branch) runs with that updated state?
Meet the **Pregel Execution Engine**. This is the heart of LangGraph, the engine room that drives your graph forward.
## What Problem Does Pregel Solve?
Imagine you've designed a complex assembly line (your `StateGraph`). You have different stations (Nodes) where specific tasks are done, conveyor belts (Channels) moving parts between stations, and switches (Branches) directing parts down different paths.
How do you ensure the line runs smoothly? You need a manager! Someone who:
1. Knows the overall plan (the graph structure).
2. Knows which station should work next based on what just finished.
3. Delivers the right parts (state) to the right station.
4. Collects the finished work from a station.
5. Updates the central inventory (the shared state via Channels).
6. Deals with decisions (Branches) and special instructions (Sends, Interrupts).
7. Handles multiple stations working at the same time if possible (parallelism).
8. Keeps track of progress and can save the state (Checkpointing).
The **Pregel Execution Engine** is this assembly line manager for your LangGraph application. It takes your compiled graph definition and orchestrates its execution step-by-step.
## Key Concepts: How Pregel Manages the Flow
Pregel is inspired by a system developed at Google for processing large graphs. LangGraph adapts these ideas for executing AI agents and multi-step workflows. Here's how it works conceptually:
1. **Step-by-Step Execution ("Supersteps"):** Pregel runs the graph in discrete steps, often called "supersteps." Think of it like turns in a board game.
2. **Scheduling Nodes:** In each step, Pregel looks at the current state and the graph structure (edges, branches) to figure out which [Nodes (`PregelNode`)](02_nodes___pregelnode__.md) should run *in this turn*. This could be the entry point node at the start, nodes triggered by the previous step's output, or nodes activated by a `Send` command.
3. **Executing Nodes:** It runs the scheduled nodes. If multiple nodes are scheduled for the same step and they don't directly depend on each other *within that step*, Pregel might run them in parallel using background threads or asyncio tasks.
4. **Gathering Updates:** As each node finishes, it returns a dictionary of updates (like `{"value": 6}`). Pregel collects all these updates from all the nodes that ran in the current step.
5. **Updating State via Channels:** Pregel takes the collected updates and applies them to the shared state using the appropriate [`Channels`](03_channels.md). For example, it sends `6` to the `"value"` channel, which might overwrite the old value (if it's `LastValue`) or add to it (if it's `BinaryOperatorAggregate`).
6. **Looping:** After updating the state, Pregel checks if there are more nodes to run (e.g., nodes connected by edges from the ones that just finished) or if the graph has reached the `END`. If there's more work, it starts the next step (superstep).
7. **Handling Control Flow:** It seamlessly integrates [Control Flow Primitives](04_control_flow_primitives___branch____send____interrupt__.md). When a `Branch` needs to run, Pregel executes the routing function and schedules the next node accordingly. When `Send` is used, Pregel schedules the target node with the specific data. When `Interrupt` occurs, Pregel pauses execution (and relies on a [Checkpointer](06_checkpointer___basecheckpointsaver__.md) to save state).
8. **Checkpointing:** At configurable points (often after each step), Pregel interacts with the [Checkpointer (`BaseCheckpointSaver`)](06_checkpointer___basecheckpointsaver__.md) to save the current state of all channels. This allows the graph to be paused and resumed later.
Essentially, Pregel is the **orchestrator** that manages the entire lifecycle of a graph's execution.
## How Pregel Executes Our Simple Graph
Let's revisit the simple `adder -> multiplier` graph from [Chapter 1: Graph / StateGraph](01_graph___stategraph.md) and see how Pregel runs it when you call `app.invoke({"value": 5})`.
**Graph:**
* State: `{'value': int}` (uses `LastValue` channel by default)
* Nodes: `adder` (value+1), `multiplier` (value*2)
* Edges: `START -> adder`, `adder -> multiplier`, `multiplier -> END`
**Execution Flow:**
1. **Start:** `app.invoke({"value": 5})` is called. The Pregel engine inside the compiled `app` takes over.
2. **Initialization:** Pregel sets the initial state in the `"value"` [Channel](03_channels.md) to `5`. `step = 0`.
3. **Step 1 Begins:**
* **Scheduling:** Pregel sees the edge from `START` to `adder`. It schedules the `adder` node to run in this step.
* **Execution:** Pregel retrieves the current state (`{'value': 5}`) from the [Channel](03_channels.md) and runs the `add_one` function associated with the `adder` node.
* **Gathering Updates:** The `add_one` function returns `{"value": 6}`. Pregel gathers this write.
* **Applying Updates:** Pregel sends the update `6` to the `"value"` [Channel](03_channels.md). Since it's a `LastValue` channel, its state becomes `6`.
* **(Checkpointing):** If a checkpointer is configured (and enabled for this step), Pregel saves the state (`{'value': 6}`).
* `step` increments to `1`.
4. **Step 2 Begins:**
* **Scheduling:** Pregel looks at edges originating from nodes that completed in Step 1 (`adder`). It finds the edge `adder -> multiplier`. It schedules the `multiplier` node.
* **Execution:** Pregel retrieves the current state (`{'value': 6}`) from the `"value"` [Channel](03_channels.md) and runs the `multiply_by_two` function.
* **Gathering Updates:** The `multiply_by_two` function returns `{"value": 12}`. Pregel gathers this write.
* **Applying Updates:** Pregel sends the update `12` to the `"value"` [Channel](03_channels.md). The channel's state becomes `12`.
* **(Checkpointing):** Pregel saves the state (`{'value': 12}`).
* `step` increments to `2`.
5. **Step 3 Begins:**
* **Scheduling:** Pregel looks at edges from `multiplier`. It finds the edge `multiplier -> END`. Reaching `END` means no more application nodes are scheduled.
* **(Execution, Gathering, Applying):** No application nodes run.
* **(Checkpointing):** Pregel saves the final state (`{'value': 12}`).
6. **Finish:** Pregel detects the `END` state. Execution halts.
7. **Return:** The final state (`{'value': 12}`) is read from the channels and returned by `app.invoke()`.
**Visualizing the Flow:**
```mermaid
sequenceDiagram
participant User
participant App as CompiledGraph
participant PregelEngine as Pregel Engine
participant StateChannels as Channels
participant AdderNode as adder
participant MultiplierNode as multiplier
User->>App: invoke({"value": 5})
App->>PregelEngine: Start Execution
PregelEngine->>StateChannels: Initialize state {"value": 5}
Note over PregelEngine: Step 1
PregelEngine->>PregelEngine: Schedule 'adder' (from START)
PregelEngine->>StateChannels: Read state ({'value': 5})
PregelEngine->>AdderNode: Run add_one({'value': 5})
AdderNode-->>PregelEngine: Return {"value": 6}
PregelEngine->>StateChannels: Apply update {"value": 6}
StateChannels-->>PregelEngine: State is now {'value': 6}
Note over PregelEngine: Step 2
PregelEngine->>PregelEngine: Schedule 'multiplier' (from 'adder')
PregelEngine->>StateChannels: Read state ({'value': 6})
PregelEngine->>MultiplierNode: Run multiply_by_two({'value': 6})
MultiplierNode-->>PregelEngine: Return {"value": 12}
PregelEngine->>StateChannels: Apply update {"value": 12}
StateChannels-->>PregelEngine: State is now {'value': 12}
Note over PregelEngine: Step 3
PregelEngine->>PregelEngine: Check edges from 'multiplier' (sees END)
PregelEngine->>PregelEngine: No more nodes to schedule. Finish.
PregelEngine->>StateChannels: Read final state ({'value': 12})
PregelEngine->>App: Return final state {'value': 12}
App->>User: Return {'value': 12}
```
Pregel acts as the hidden conductor ensuring each part plays at the right time with the right information.
## Internal Implementation: A Glimpse Under the Hood
You don't typically interact with the Pregel engine directly; it's encapsulated within the compiled graph object you get from `graph.compile()`. However, understanding its core components helps clarify how LangGraph works. The main logic resides in the `langgraph/pregel/` directory.
1. **Compilation:** When you call `graph.compile()`, LangGraph analyzes your nodes, edges, branches, and state schema. It translates your high-level graph definition into an internal representation suitable for the Pregel engine. This includes creating the actual [`PregelNode`](02_nodes___pregelnode__.md) objects which contain information about which channels to read, which function to run, and how to write outputs back.
2. **The Loop (`pregel/loop.py`):** The core execution happens within a loop (managed by classes like `SyncPregelLoop` or `AsyncPregelLoop`). Each iteration of this loop represents one "superstep".
3. **Task Preparation (`pregel/algo.py::prepare_next_tasks`):** At the start of each step, this function determines which tasks (nodes) are ready to run. It checks:
* Which [Channels](03_channels.md) were updated in the previous step.
* Which nodes are triggered by those updated channels (based on edges and branches).
* Are there any pending `Send` messages ([Control Flow Primitives](04_control_flow_primitives___branch____send____interrupt__.md)) targeting specific nodes?
* It uses internal versioning on channels to avoid re-running nodes unnecessarily if their inputs haven't changed.
4. **Task Execution (`pregel/runner.py::PregelRunner`):** This component takes the list of tasks scheduled for the current step and executes them.
* It uses an executor (like Python's `concurrent.futures.ThreadPoolExecutor` for sync code or `asyncio` for async code) to potentially run independent tasks in parallel.
* For each task, it reads the required state from the [Channels](03_channels.md), executes the node's function/Runnable, and collects the returned writes (the update dictionary).
* It handles retries if configured for a node.
5. **Applying Writes (`pregel/algo.py::apply_writes`):** After tasks in a step complete (or fail), this function gathers all the writes returned by those tasks.
* It groups writes by channel name.
* It calls the `.update()` method on each corresponding [Channel](03_channels.md) object, passing the collected updates for that channel. The channel itself enforces its update logic (e.g., `LastValue` overwrites, `Topic` appends).
* It updates the internal checkpoint state with new channel versions.
6. **Checkpointing (`pregel/loop.py`, `checkpoint/base.py`):** The loop interacts with the configured [Checkpointer (`BaseCheckpointSaver`)](06_checkpointer___basecheckpointsaver__.md) to save the graph's state (the values and versions of all channels) at appropriate times (e.g., after each step).
7. **Interrupt Handling (`pregel/loop.py`, `types.py::interrupt`):** If a node calls `interrupt()`, the `PregelRunner` catches the `GraphInterrupt` exception. The `PregelLoop` then coordinates with the [Checkpointer](06_checkpointer___basecheckpointsaver__.md) to save state and pause execution, returning control to the user. Resuming involves loading the checkpoint and providing the resume value back to the waiting `interrupt()` call.
**Simplified Code Snippets:**
* **Task Preparation (Conceptual):**
```python
# pregel/algo.py (Simplified Concept)
def prepare_next_tasks(checkpoint, processes, channels, config, step, ...):
tasks = {}
# Check PUSH tasks (from Send)
for packet in checkpoint["pending_sends"]:
# ... create task if node exists ...
task = create_task_for_send(packet, ...)
tasks[task.id] = task
# Check PULL tasks (from edges/triggers)
for name, proc in processes.items():
# Check if any trigger channel for 'proc' was updated since last seen
if _triggers(channels, checkpoint["channel_versions"], proc):
# ... read input for the node ...
task = create_task_for_pull(name, proc, ...)
tasks[task.id] = task
return tasks
```
This function checks both explicit `Send` commands and regular node triggers based on updated channels to build the list of tasks for the next step.
* **Applying Writes (Conceptual):**
```python
# pregel/algo.py (Simplified Concept)
def apply_writes(checkpoint, channels, tasks: list[PregelExecutableTask], get_next_version):
# ... (sort tasks for determinism, update seen versions) ...
pending_writes_by_channel = defaultdict(list)
for task in tasks:
for chan, val in task.writes: # task.writes is the dict returned by the node
if chan in channels:
pending_writes_by_channel[chan].append(val)
# ... (handle TASKS, PUSH, managed values etc.) ...
updated_channels = set()
# Apply writes to channels
for chan_name, values_to_update in pending_writes_by_channel.items():
channel_obj = channels[chan_name]
if channel_obj.update(values_to_update): # Channel applies its logic here!
# If updated, bump the version in the checkpoint
checkpoint["channel_versions"][chan_name] = get_next_version(...)
updated_channels.add(chan_name)
# ... (handle channels that weren't written to but need bumping) ...
return updated_channels
```
This function takes the results from all nodes in a step and uses the `channel.update()` method to modify the state according to each channel's rules.
* **The Main Loop (Conceptual):**
```python
# pregel/loop.py (Simplified Concept - SyncPregelLoop/AsyncPregelLoop)
class PregelLoop:
def run(self): # Simplified invoke/stream logic
with self: # Enters context (loads checkpoint, sets up channels)
while self.tick(): # tick executes one step
# Start tasks for the current step using PregelRunner
runner = PregelRunner(...)
for _ in runner.tick(self.tasks):
# Yield control back, allowing writes/outputs to be streamed
pass # (actual stream logic happens via callbacks)
return self.output # Return final result
```
The loop repeatedly calls `tick()`. Inside `tick()`, it prepares tasks, runs them using `PregelRunner`, applies the resulting writes, handles checkpoints/interrupts, and determines if another step is needed.
You don't need to know the deep implementation details, but understanding this step-by-step process managed by Pregel helps visualize how your graph comes alive.
## Conclusion
The **Pregel Execution Engine** is the powerful, yet hidden, coordinator that runs your LangGraph graphs.
* It executes the graph **step-by-step** (supersteps).
* In each step, it **schedules** which nodes run based on the graph structure and current state.
* It **runs** the nodes (potentially in parallel).
* It **gathers** node outputs and **updates** the shared state using [`Channels`](03_channels.md).
* It seamlessly integrates [`Control Flow Primitives`](04_control_flow_primitives___branch____send____interrupt__.md) like `Branch`, `Send`, and `Interrupt`.
* It works with a [`Checkpointer`](06_checkpointer___basecheckpointsaver__.md) to save and resume state.
Think of it as the engine ensuring your application's logic flows correctly, state is managed reliably, and complex operations are orchestrated smoothly.
We've mentioned checkpointing several times the ability to save and load the graph's state. This is crucial for long-running processes, human-in-the-loop workflows, and resilience. How does that work?
Let's dive into [Chapter 6: Checkpointer (`BaseCheckpointSaver`)](06_checkpointer___basecheckpointsaver__.md) to understand how LangGraph persists and resumes state.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,391 @@
# Chapter 6: Checkpointer (`BaseCheckpointSaver`) - Saving Your Progress
In [Chapter 5: Pregel Execution Engine](05_pregel_execution_engine.md), we saw how the engine runs our graph step-by-step. But what happens if a graph takes hours to run, or if it needs to pause and wait for a human? If the program crashes or we need to stop it, do we lose all the progress?
That's where **Checkpointers** come to the rescue!
## What Problem Do Checkpointers Solve?
Imagine you're playing a long video game. You wouldn't want to start from the very beginning every time you stop playing, right? Games have save points or checkpoints that record your progress.
LangGraph's **Checkpointer** does the same thing for your graph execution. It automatically saves the graph's state at certain points, usually after each step completed by the [Pregel Execution Engine](05_pregel_execution_engine.md).
This is incredibly useful for:
1. **Long-Running Processes:** If your graph involves many steps or calls to slow tools/LLMs, you can stop it and resume later without losing work.
2. **Resilience:** If your program crashes unexpectedly, you can restart it from the last saved checkpoint.
3. **Human-in-the-Loop (HITL):** As we saw with `Interrupt` in [Chapter 4: Control Flow Primitives](04_control_flow_primitives___branch____send____interrupt__.md), pausing the graph requires saving its state so it can be perfectly restored when the human provides input. Checkpointers are essential for this.
**Analogy:** Think of a checkpointer as an automatic "Save" button for your graph's progress. It takes snapshots of the shared "whiteboard" ([Channels](03_channels.md)) so you can always pick up where you left off.
## Key Concepts
1. **What is Saved?** The checkpointer saves the current value and version of every [Channel](03_channels.md) in your graph's state. It also keeps track of which step the graph was on and any pending tasks (like those created by `Send`).
2. **When is it Saved?** The [Pregel Execution Engine](05_pregel_execution_engine.md) typically triggers the checkpointer to save after each "superstep" (a round of node executions and state updates).
3. **Where is it Saved?** This depends on the specific checkpointer implementation you choose. LangGraph provides several:
* `MemorySaver`: Stores checkpoints in your computer's RAM. Simple for testing, but **lost when your script ends**.
* `SqliteSaver`: Stores checkpoints in a local SQLite database file, making them persistent across script runs.
* Other savers might store checkpoints in cloud databases or other persistent storage.
4. **`thread_id` (The Save Slot Name):** To save and load progress correctly, you need a way to identify *which* specific run of the graph you want to work with. Think of this like naming your save file in a game. In LangGraph, this identifier is called the `thread_id`. You provide it in the `config` when you run the graph. Each unique `thread_id` represents an independent "conversation" or execution history.
## How to Use a Checkpointer
Using a checkpointer is straightforward. You just need to tell LangGraph *which* checkpointer to use when you compile your graph.
**Step 1: Import a Checkpointer**
Let's start with the simplest one, `MemorySaver`.
```python
# Import the simplest checkpointer
from langgraph.checkpoint.memory import MemorySaver
```
**Step 2: Instantiate the Checkpointer**
```python
# Create an instance of the memory checkpointer
memory_saver = MemorySaver()
```
**Step 3: Compile Your Graph with the Checkpointer**
Let's reuse our simple `adder -> multiplier` graph. The graph definition itself doesn't change.
```python
# --- Define State and Nodes (same as Chapter 1) ---
from typing import TypedDict
from langgraph.graph import StateGraph, END, START
class MyState(TypedDict):
value: int
def add_one(state: MyState) -> dict:
print(f"Adder: Adding 1 to {state['value']}")
return {"value": state['value'] + 1}
def multiply_by_two(state: MyState) -> dict:
print(f"Multiplier: Doubling {state['value']}")
return {"value": state['value'] * 2}
# --- Build the Graph (same as Chapter 1) ---
workflow = StateGraph(MyState)
workflow.add_node("adder", add_one)
workflow.add_node("multiplier", multiply_by_two)
workflow.set_entry_point("adder")
workflow.add_edge("adder", "multiplier")
workflow.add_edge("multiplier", END)
# --- Compile WITH the checkpointer ---
# Pass the checkpointer instance to the compile method
app = workflow.compile(checkpointer=memory_saver)
```
That's it! By passing `checkpointer=memory_saver` to `compile()`, you've enabled automatic checkpointing for this graph.
**Step 4: Run with a `thread_id`**
To use the checkpointer, you need to provide a configuration dictionary (`config`) containing a unique identifier for this specific execution thread.
```python
import uuid
# Create a unique ID for this run
thread_id = str(uuid.uuid4())
config = {"configurable": {"thread_id": thread_id}}
# Define the initial state
initial_state = {"value": 5}
print("--- Running Graph (First Time) ---")
# Run the graph with the config
final_state = app.invoke(initial_state, config=config)
print("\n--- Final State (First Run) ---")
print(final_state)
```
**Expected Output (First Run):**
```text
--- Running Graph (First Time) ---
Adder: Adding 1 to 5
Multiplier: Doubling 6
--- Final State (First Run) ---
{'value': 12}
```
Behind the scenes, `MemorySaver` saved the state after the `adder` step and after the `multiplier` step, associating it with the `thread_id` you provided.
**Step 5: Resume the Graph**
Now, let's imagine we stopped the process. If we run the *same graph* with the *same `thread_id`*, the checkpointer allows the [Pregel Execution Engine](05_pregel_execution_engine.md) to load the last saved state and continue. Since the first run finished completely, running `invoke` again will just load the final state.
```python
print("\n--- Running Graph Again with SAME thread_id ---")
# Use the SAME config (containing the same thread_id)
# Provide NO initial state, as it will be loaded from the checkpoint
resumed_state = app.invoke(None, config=config)
print("\n--- Final State (Resumed Run) ---")
print(resumed_state)
# Let's check the saved states using the checkpointer directly
print("\n--- Checkpoints Saved ---")
for checkpoint in memory_saver.list(config):
print(checkpoint)
```
**Expected Output (Second Run):**
```text
--- Running Graph Again with SAME thread_id ---
# Notice: No node printouts because the graph already finished!
# It just loads the final saved state.
--- Final State (Resumed Run) ---
{'value': 12}
--- Checkpoints Saved ---
# You'll see checkpoint objects representing saved states
CheckpointTuple(config={'configurable': {'thread_id': '...'}}, checkpoint={'v': 1, 'ts': '...', 'id': '...', 'channel_values': {'value': 6}, 'channel_versions': {'adder': 1}, 'versions_seen': {'adder': {}}}, metadata={'source': 'loop', 'step': 1, ...}, ...)
CheckpointTuple(config={'configurable': {'thread_id': '...'}}, checkpoint={'v': 1, 'ts': '...', 'id': '...', 'channel_values': {'value': 12}, 'channel_versions': {'adder': 1, 'multiplier': 2}, 'versions_seen': {'adder': {}, 'multiplier': {'adder': 1}}}, metadata={'source': 'loop', 'step': 2, ...}, ...)
CheckpointTuple(config={'configurable': {'thread_id': '...'}}, checkpoint={'v': 1, 'ts': '...', 'id': '...', 'channel_values': {'value': 12}, 'channel_versions': {'adder': 1, 'multiplier': 2}, 'versions_seen': {'adder': {}, 'multiplier': {'adder': 1}}}, metadata={'source': 'loop', 'step': 3, ...}, ...)
```
The checkpointer successfully loaded the final state (`{'value': 12}`) associated with that `thread_id`.
**Checkpointers and `Interrupt` (Human-in-the-Loop)**
Remember the `Interrupt` example from [Chapter 4](04_control_flow_primitives___branch____send____interrupt__.md)?
```python
# (Simplified HITL example from Chapter 4)
from langgraph.types import interrupt, Command
# ... (State, Nodes: create_plan, request_approval, execute_plan) ...
# Compile WITH checkpointer (REQUIRED for interrupt)
memory_saver_hitl = MemorySaver()
app_hitl = workflow.compile(checkpointer=memory_saver_hitl)
# Run, get interrupted
config_hitl = {"configurable": {"thread_id": str(uuid.uuid4())}}
for chunk in app_hitl.stream({"plan": ""}, config=config_hitl):
# ... (detect interrupt) ...
print("Graph interrupted!")
break
# Resume after human decision
human_decision = "Approved"
for chunk in app_hitl.stream(Command(resume=human_decision), config=config_hitl):
# ... (process remaining steps) ...
print("Graph resumed and finished!")
```
When `interrupt()` was called inside the `request_approval` node, the [Pregel Execution Engine](05_pregel_execution_engine.md) automatically used the `memory_saver_hitl` checkpointer to save the *exact state* of the graph at that moment (including the plan). When we called `stream` again with `Command(resume=...)` and the *same* `config_hitl`, the engine loaded that saved state using the checkpointer, allowing the graph to continue exactly where it left off, now with the human's feedback.
**Without a checkpointer, `Interrupt` cannot work.**
## How Checkpointing Works Internally
What happens behind the scenes when a checkpointer is configured?
**Saving:**
1. **Step Complete:** The [Pregel Execution Engine](05_pregel_execution_engine.md) finishes a step (e.g., after running the `adder` node and updating the state).
2. **Signal Checkpointer:** The engine tells the configured checkpointer (`MemorySaver` in our example) that it's time to save.
3. **Gather State:** The checkpointer (or the engine on its behalf) accesses all the active [Channels](03_channels.md).
4. **Serialize State:** For each channel, it calls the channel's internal `checkpoint()` method to get a serializable representation of its current value (e.g., the number `6` for the `"value"` channel).
5. **Store Checkpoint:** The checkpointer bundles the serialized channel values, their versions, the current step number, and other metadata into a `Checkpoint` object. It then stores this `Checkpoint` associated with the current `thread_id` provided in the `config`. `MemorySaver` stores it in a dictionary in RAM; `SqliteSaver` writes it to a database table.
**Loading (Resuming):**
1. **Invoke with `thread_id`:** You call `app.invoke(None, config=config)` where `config` contains a `thread_id` that has been previously saved.
2. **Request Checkpoint:** The [Pregel Execution Engine](05_pregel_execution_engine.md) asks the checkpointer to load the latest checkpoint for the given `thread_id`.
3. **Retrieve Checkpoint:** The checkpointer retrieves the saved `Checkpoint` object (e.g., from its memory dictionary or the database).
4. **Restore State:** The engine takes the saved channel values from the checkpoint. For each channel, it calls the channel's `from_checkpoint()` method (or similar internal logic) to restore its state. The "whiteboard" ([Channels](03_channels.md)) is now exactly as it was when the checkpoint was saved.
5. **Continue Execution:** The engine looks at the saved step number and metadata to figure out where to resume execution, typically by preparing the tasks for the *next* step.
Here's a simplified view of the interaction:
```mermaid
sequenceDiagram
participant User
participant App as CompiledGraph
participant Engine as Pregel Engine
participant Saver as Checkpointer (e.g., MemorySaver)
participant Storage as Underlying Storage (RAM, DB)
%% Saving %%
Engine->>Engine: Finishes Step N
Engine->>Saver: Save checkpoint for config (thread_id)
Saver->>Engine: Request current channel states & versions
Engine-->>Saver: Provides states & versions
Saver->>Storage: Store Checkpoint(Step N, states, versions) linked to thread_id
Storage-->>Saver: Acknowledge Save
Saver-->>Engine: Save Complete
%% Loading %%
User->>App: invoke(None, config with thread_id)
App->>Engine: Start/Resume Execution
Engine->>Saver: Get latest checkpoint for config (thread_id)
Saver->>Storage: Retrieve Checkpoint linked to thread_id
Storage-->>Saver: Returns Checkpoint(Step N, states, versions)
Saver-->>Engine: Provides Checkpoint
Engine->>Engine: Restore channel states from checkpoint
Engine->>Engine: Prepare tasks for Step N+1
Engine->>App: Continue execution...
```
## A Peek at the Code (`checkpoint/base.py`, `checkpoint/memory.py`, `pregel/loop.py`)
Let's look at the core components:
* **`BaseCheckpointSaver` (`checkpoint/base.py`)**: This is the abstract base class (like a template) that all checkpointers must implement. It defines the essential methods the engine needs.
```python
# checkpoint/base.py (Highly Simplified)
from abc import ABC, abstractmethod
from typing import Any, Mapping, Optional, Sequence, Tuple, TypedDict
# Represents a saved checkpoint
class Checkpoint(TypedDict):
channel_values: Mapping[str, Any] # Saved state of channels
channel_versions: Mapping[str, int] # Internal versions
versions_seen: Mapping[str, Mapping[str, int]] # Tracking for node execution
# ... other metadata like v, ts, id, pending_sends ...
# Represents the checkpoint tuple retrieved from storage
class CheckpointTuple(NamedTuple):
config: dict # The config used (includes thread_id)
checkpoint: Checkpoint
metadata: dict
# ... other fields like parent_config, pending_writes ...
class BaseCheckpointSaver(ABC):
# --- Sync Methods ---
@abstractmethod
def get_tuple(self, config: dict) -> Optional[CheckpointTuple]:
"""Load the checkpoint tuple for the given config."""
...
@abstractmethod
def put(self, config: dict, checkpoint: Checkpoint, metadata: dict) -> dict:
"""Save a checkpoint."""
...
# --- Async Methods (similar structure) ---
@abstractmethod
async def aget_tuple(self, config: dict) -> Optional[CheckpointTuple]:
"""Async load the checkpoint tuple."""
...
@abstractmethod
async def aput(self, config: dict, checkpoint: Checkpoint, metadata: dict) -> dict:
"""Async save a checkpoint."""
...
# --- Other methods (list, put_writes) omitted for brevity ---
```
The key methods are `get_tuple` (to load) and `put` (to save), along with their async counterparts (`aget_tuple`, `aput`). Any specific checkpointer (like `MemorySaver`, `SqliteSaver`) must provide concrete implementations for these methods.
* **`MemorySaver` (`checkpoint/memory.py`)**: A simple implementation that uses an in-memory dictionary.
```python
# checkpoint/memory.py (Highly Simplified)
import threading
from collections import defaultdict
class MemorySaver(BaseCheckpointSaver):
def __init__(self):
# Use a dictionary to store checkpoints in RAM
# Key: thread_id, Value: List of CheckpointTuples
self._checkpoints: defaultdict[str, list[CheckpointTuple]] = defaultdict(list)
self._lock = threading.RLock() # To handle multiple threads safely
def get_tuple(self, config: dict) -> Optional[CheckpointTuple]:
thread_id = config["configurable"]["thread_id"]
with self._lock:
if checkpoints := self._checkpoints.get(thread_id):
# Return the latest checkpoint for this thread_id
return checkpoints[-1]
return None
def put(self, config: dict, checkpoint: Checkpoint, metadata: dict) -> dict:
thread_id = config["configurable"]["thread_id"]
with self._lock:
# Append the new checkpoint to the list for this thread_id
self._checkpoints[thread_id].append(
CheckpointTuple(config, checkpoint, metadata)
)
return {"configurable": {"thread_id": thread_id}}
# ... async methods (aget_tuple, aput) are similar using the same dict ...
# ... list method iterates through the dictionary ...
```
As you can see, `MemorySaver` just uses a standard Python dictionary (`self._checkpoints`) to store the `CheckpointTuple` for each `thread_id`. This is simple but not persistent.
* **Integration (`pregel/loop.py`)**: The [Pregel Execution Engine](05_pregel_execution_engine.md) (`PregelLoop` classes) interacts with the checkpointer during its execution cycle.
```python
# pregel/loop.py (Conceptual Snippets)
class PregelLoop: # Base class for Sync/Async loops
def __init__(self, ..., checkpointer: Optional[BaseCheckpointSaver], ...):
self.checkpointer = checkpointer
# ... other init ...
def _put_checkpoint(self, metadata: CheckpointMetadata) -> None:
# Called by the loop after a step or input processing
if self.checkpointer:
# 1. Create the Checkpoint object from current channels/state
checkpoint_data = create_checkpoint(self.checkpoint, self.channels, ...)
# 2. Call the checkpointer's put method (sync or async)
# (Uses self.submit to potentially run in background)
self.submit(self.checkpointer.put, self.checkpoint_config, checkpoint_data, metadata)
# 3. Update internal config with the new checkpoint ID
self.checkpoint_config = {"configurable": {"thread_id": ..., "checkpoint_id": checkpoint_data["id"]}}
def __enter__(self): # Or __aenter__ for async
# Called when the loop starts
if self.checkpointer:
# 1. Try to load an existing checkpoint tuple
saved = self.checkpointer.get_tuple(self.checkpoint_config)
else:
saved = None
if saved:
# 2. Restore state from the loaded checkpoint
self.checkpoint = saved.checkpoint
self.checkpoint_config = saved.config
# ... restore channels from saved.checkpoint['channel_values'] ...
else:
# Initialize with an empty checkpoint
self.checkpoint = empty_checkpoint()
# ... setup channels based on restored or empty checkpoint ...
return self
```
The `PregelLoop` uses the checkpointer's `get_tuple` method when it starts (in `__enter__` or `__aenter__`) to load any existing state. It uses the `put` method (inside `_put_checkpoint`) during execution to save progress.
## Conclusion
You've learned about **Checkpointers (`BaseCheckpointSaver`)**, the mechanism that gives your LangGraph applications memory and resilience.
* Checkpointers **save** the state of your graph's [Channels](03_channels.md) periodically.
* They **load** saved states to resume execution.
* This is crucial for **long-running graphs**, **human-in-the-loop** workflows (using `Interrupt`), and **recovering from failures**.
* You enable checkpointing by passing a `checkpointer` instance (like `MemorySaver` or `SqliteSaver`) to `graph.compile()`.
* You manage different execution histories using a unique `thread_id` in the `config`.
* `MemorySaver` is simple for testing but lost when the script ends; use database savers (like `SqliteSaver`) for true persistence.
This chapter concludes our tour of the core concepts in LangGraph! You now understand the fundamental building blocks: the blueprint ([`StateGraph`](01_graph___stategraph.md)), the workers ([`Nodes`](02_nodes___pregelnode__.md)), the communication system ([`Channels`](03_channels.md)), the traffic signals ([Control Flow Primitives](04_control_flow_primitives___branch____send____interrupt__.md)), the engine room ([Pregel Execution Engine](05_pregel_execution_engine.md)), and the save system ([Checkpointer](06_checkpointer___basecheckpointsaver__.md)).
With these concepts, you're well-equipped to start building your own sophisticated, stateful applications with LangGraph! Explore the documentation for more examples, advanced patterns, and different checkpointer implementations. Happy building!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

40
output/LangGraph/index.md Normal file
View File

@@ -0,0 +1,40 @@
# Tutorial: LangGraph
LangGraph helps you build complex **stateful applications**, like chatbots or agents, using a *graph-based approach*.
You define your application's logic as a series of steps (**Nodes**) connected by transitions (**Edges**) in a **Graph**.
The system manages the application's *shared state* using **Channels** and executes the graph step-by-step with its **Pregel engine**, handling things like branching, interruptions, and saving progress (**Checkpointing**).
**Source Repository:** [https://github.com/langchain-ai/langgraph/tree/55f922cf2f3e63600ed8f0d0cd1262a75a991fdc/libs/langgraph/langgraph](https://github.com/langchain-ai/langgraph/tree/55f922cf2f3e63600ed8f0d0cd1262a75a991fdc/libs/langgraph/langgraph)
```mermaid
flowchart TD
A0["Pregel Execution Engine"]
A1["Graph / StateGraph"]
A2["Channels"]
A3["Nodes (PregelNode)"]
A4["Checkpointer (BaseCheckpointSaver)"]
A5["Control Flow Primitives (Branch, Send, Interrupt)"]
A0 -- "Executes" --> A1
A1 -- "Contains" --> A3
A3 -- "Updates State Via" --> A2
A0 -- "Manages State Via" --> A2
A0 -- "Uses Checkpointer" --> A4
A1 -- "Defines Control Flow With" --> A5
A5 -- "Directs Execution Of" --> A0
A4 -- "Saves State Of" --> A2
```
## Chapters
1. [Graph / StateGraph](01_graph___stategraph.md)
2. [Nodes (`PregelNode`)](02_nodes___pregelnode__.md)
3. [Channels](03_channels.md)
4. [Control Flow Primitives (`Branch`, `Send`, `Interrupt`)](04_control_flow_primitives___branch____send____interrupt__.md)
5. [Pregel Execution Engine](05_pregel_execution_engine.md)
6. [Checkpointer (`BaseCheckpointSaver`)](06_checkpointer___basecheckpointsaver__.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,252 @@
# Chapter 1: Table / SSTable & TableCache
Welcome to your LevelDB journey! This is the first chapter where we'll start exploring the fundamental building blocks of LevelDB.
Imagine you're building a system to store a massive amount of data, like user profiles or product information. You need a way to save this data permanently (so it doesn't disappear when the computer turns off) and retrieve it quickly. How does LevelDB handle this?
The core idea we'll explore in this chapter is how LevelDB stores the bulk of its data on disk in special files and how it accesses them efficiently.
## What's the Problem? Storing Lots of Data Permanently
Databases need to store key-value pairs (like `user_id` -> `user_data`) persistently. This means writing the data to disk. However, disks are much slower than computer memory (RAM). If we just wrote every tiny change directly to a file, it would be very slow. Also, how do we organize the data on disk so we can find a specific key quickly without reading *everything*?
LevelDB's solution involves files called **SSTables** (Sorted String Tables), often just called **Tables** in the code.
## SSTable: The Sorted, Immutable Book on the Shelf
Think of an SSTable as a **permanently bound book** in a library.
1. **Stores Key-Value Pairs:** Just like a dictionary or an encyclopedia volume, an SSTable contains data entries, specifically key-value pairs.
2. **Sorted:** The keys inside an SSTable file are always stored in sorted order (like words in a dictionary). This is crucial for finding data quickly later on. If you're looking for the key "zebra", you know you don't need to look in the "A" section.
3. **Immutable:** Once an SSTable file is written to disk, LevelDB **never changes it**. It's like a printed book you can't erase or rewrite a page. If you need to update or delete data, LevelDB writes *new* information in *newer* SSTables. (We'll see how this works in later chapters like [Compaction](08_compaction.md)). This immutability makes many things simpler and safer.
4. **It's a File:** At the end of the day, an SSTable is just a file on your computer's disk. LevelDB gives these files names like `000005.ldb` or `000010.sst`.
Here's how LevelDB determines the filename for an SSTable:
```c++
// --- File: filename.cc ---
// Creates a filename like "dbname/000005.ldb"
std::string TableFileName(const std::string& dbname, uint64_t number) {
assert(number > 0);
// Uses a helper to format the number with leading zeros
// and adds the '.ldb' or '.sst' suffix.
return MakeFileName(dbname, number, "ldb"); // or "sst"
}
```
This simple function takes the database name (e.g., `/path/to/my/db`) and a unique number and creates the actual filename used on disk. The `.ldb` or `.sst` extension helps identify it as a LevelDB table file.
## Creating SSTables: `BuildTable`
How do these sorted, immutable files get created? This happens during processes like "flushing" data from memory or during "compaction" (which we'll cover in later chapters: [MemTable](02_memtable.md) and [Compaction](08_compaction.md)).
The function responsible for writing a new SSTable file is `BuildTable`. Think of `BuildTable` as the **printing press and binding machine** for our book analogy. It takes data (often from memory, represented by an `Iterator`) and writes it out to a new, sorted SSTable file on disk.
Let's look at a simplified view of `BuildTable`:
```c++
// --- File: builder.cc ---
// Builds an SSTable file from the key/value pairs provided by 'iter'.
Status BuildTable(const std::string& dbname, Env* env, const Options& options,
TableCache* table_cache, Iterator* iter, FileMetaData* meta) {
Status s;
// ... setup: determine filename, open the file for writing ...
std::string fname = TableFileName(dbname, meta->number);
WritableFile* file;
s = env->NewWritableFile(fname, &file);
// ... handle potential errors ...
// TableBuilder does the heavy lifting of formatting the file
TableBuilder* builder = new TableBuilder(options, file);
// Find the first key to store as the smallest key in metadata
iter->SeekToFirst();
meta->smallest.DecodeFrom(iter->key());
// Loop through all key-value pairs from the input iterator
Slice key;
for (; iter->Valid(); iter->Next()) {
key = iter->key();
// Add the key and value to the table being built
builder->Add(key, iter->value());
}
// Store the last key as the largest key in metadata
if (!key.empty()) {
meta->largest.DecodeFrom(key);
}
// Finish writing the file (adds index blocks, etc.)
s = builder->Finish();
// ... more steps: update metadata, sync file to disk, close file ...
if (s.ok()) {
meta->file_size = builder->FileSize();
s = file->Sync(); // Ensure data is physically written
}
if (s.ok()) {
s = file->Close();
}
// ... cleanup: delete builder, file; handle errors ...
return s;
}
```
**Explanation:**
1. **Input:** `BuildTable` receives data via an `Iterator`. An iterator is like a cursor that lets you go through key-value pairs one by one, already in sorted order. It also gets other necessary info like the database name (`dbname`), environment (`env`), options, the `TableCache` (we'll see this next!), and a `FileMetaData` object to store information *about* the new file (like its number, size, smallest key, and largest key).
2. **File Creation:** It creates a new, empty file using `env->NewWritableFile`.
3. **TableBuilder:** It uses a helper object called `TableBuilder` to handle the complex details of formatting the SSTable file structure (data blocks, index blocks, etc.).
4. **Iteration & Adding:** It loops through the `Iterator`. For each key-value pair, it calls `builder->Add()`. Because the input `Iterator` provides keys in sorted order, the `TableBuilder` can write them sequentially to the file.
5. **Metadata:** It records the very first key (`meta->smallest`) and the very last key (`meta->largest`) it processes. This is useful later for quickly knowing the range of keys stored in this file without opening it.
6. **Finishing Up:** It calls `builder->Finish()` to write out the final pieces of the SSTable (like the index). Then it `Sync`s the file to ensure the data is safely on disk and `Close`s it.
7. **Output:** If successful, a new `.ldb` file exists on disk containing the sorted key-value pairs, and the `meta` object is filled with details about this file.
## Accessing SSTables Efficiently: `TableCache`
Okay, so we have these SSTable files on disk. But reading from disk is slow. If we need to read from the same SSTable file multiple times (which is common), opening and closing it repeatedly, or re-reading its internal index structure, would be inefficient.
This is where the `TableCache` comes in. Think of the `TableCache` as a **smart librarian**.
1. **Keeps Files Open:** The librarian might keep the most popular books near the front desk instead of running to the far shelves every time someone asks for them. Similarly, the `TableCache` keeps recently used SSTable files open.
2. **Caches Structures:** Just opening the file isn't enough. LevelDB needs to read some index information *within* the SSTable file to find keys quickly. The `TableCache` also keeps this parsed information in memory (RAM). It uses a specific caching strategy called LRU (Least Recently Used) to decide which table information to keep in memory if the cache gets full.
3. **Provides Access:** When LevelDB needs to read data from a specific SSTable (identified by its file number), it asks the `TableCache`. The cache checks if it already has that table open and ready in memory. If yes (a "cache hit"), it returns access quickly. If no (a "cache miss"), it opens the actual file from disk, reads the necessary index info, stores it in the cache for next time, and then returns access.
Let's see how the `TableCache` finds a table:
```c++
// --- File: table_cache.cc ---
// Tries to find the Table structure for a given file number.
// If not in cache, opens the file and loads it.
Status TableCache::FindTable(uint64_t file_number, uint64_t file_size,
Cache::Handle** handle) {
Status s;
// Create a key for the cache lookup (based on file number)
char buf[sizeof(file_number)];
EncodeFixed64(buf, file_number);
Slice key(buf, sizeof(buf));
// 1. Try looking up the table in the cache
*handle = cache_->Lookup(key);
if (*handle == nullptr) { // Cache Miss!
// 2. If not found, open the actual file from disk
std::string fname = TableFileName(dbname_, file_number);
RandomAccessFile* file = nullptr;
Table* table = nullptr;
s = env_->NewRandomAccessFile(fname, &file); // Open the file
// ... handle errors, potentially check for old .sst filename ...
if (s.ok()) {
// 3. Parse the Table structure (index etc.) from the file
s = Table::Open(options_, file, file_size, &table);
}
if (s.ok()) {
// 4. Store the opened file and parsed Table in the cache
TableAndFile* tf = new TableAndFile;
tf->file = file;
tf->table = table;
*handle = cache_->Insert(key, tf, 1 /*charge*/, &DeleteEntry);
} else {
// Error occurred, cleanup
delete file;
// Note: Errors are NOT cached. We'll retry opening next time.
}
} // else: Cache Hit! *handle is already valid.
return s;
}
```
**Explanation:**
1. **Lookup:** It first tries `cache_->Lookup` using the `file_number`.
2. **Cache Miss:** If `Lookup` returns `nullptr`, it means the table isn't in the cache. It then proceeds to open the file (`env_->NewRandomAccessFile`).
3. **Table::Open:** It calls `Table::Open`, which reads the file's footer, parses the index block, and sets up a `Table` object ready for lookups.
4. **Insert:** If opening and parsing succeed, it creates a `TableAndFile` struct (holding both the file handle and the `Table` object) and inserts it into the cache using `cache_->Insert`. Now, the next time `FindTable` is called for this `file_number`, it will be a cache hit.
5. **Cache Hit:** If `Lookup` initially returned a valid handle, `FindTable` simply returns `Status::OK()`, and the caller can use the handle to get the `Table` object.
When LevelDB needs to read data, it often gets an `Iterator` for a specific SSTable via the `TableCache`:
```c++
// --- File: table_cache.cc ---
// Returns an iterator for reading the specified SSTable file.
Iterator* TableCache::NewIterator(const ReadOptions& options,
uint64_t file_number, uint64_t file_size,
Table** tableptr) {
// ... setup ...
Cache::Handle* handle = nullptr;
// Use FindTable to get the Table object (from cache or by opening file)
Status s = FindTable(file_number, file_size, &handle);
if (!s.ok()) {
return NewErrorIterator(s); // Return an iterator that yields the error
}
// Get the Table object from the cache handle
Table* table = reinterpret_cast<TableAndFile*>(cache_->Value(handle))->table;
// Ask the Table object to create a new iterator for its data
Iterator* result = table->NewIterator(options);
// Important: Register cleanup to release the cache handle when iterator is done
result->RegisterCleanup(&UnrefEntry, cache_, handle);
// Optionally return the Table object itself
if (tableptr != nullptr) {
*tableptr = table;
}
return result;
}
```
This function uses `FindTable` to get the `Table` object (either from the cache or by loading it from disk) and then asks that `Table` object to provide an `Iterator` to step through its key-value pairs. It also cleverly registers a cleanup function (`UnrefEntry`) so that when the iterator is no longer needed, the cache handle is released, allowing the cache to potentially evict the table later if needed.
Here's a diagram showing how a read might use the `TableCache`:
```mermaid
sequenceDiagram
participant Client as Read Operation
participant TableCache
participant Cache as LRUCache
participant OS/FileSystem as FS
participant TableObject as In-Memory Table Rep
Client->>TableCache: Get("some_key", file_num=5, size=1MB)
TableCache->>Cache: Lookup(file_num=5)?
alt Cache Hit
Cache-->>TableCache: Return handle for Table 5
TableCache->>TableObject: Find "some_key" within Table 5 data
TableObject-->>TableCache: Return value / not found
TableCache-->>Client: Return value / not found
else Cache Miss
Cache-->>TableCache: Not found (nullptr)
TableCache->>FS: Open file "000005.ldb"
FS-->>TableCache: Return file handle
TableCache->>TableObject: Create Table 5 representation from file handle + size
TableObject-->>TableCache: Return Table 5 object
TableCache->>Cache: Insert(file_num=5, Table 5 object)
Note right of Cache: Table 5 now cached
TableCache->>TableObject: Find "some_key" within Table 5 data
TableObject-->>TableCache: Return value / not found
TableCache-->>Client: Return value / not found
end
```
## Conclusion
In this chapter, we learned about two fundamental concepts in LevelDB:
1. **SSTable (Table):** These are the immutable, sorted files on disk where LevelDB stores the bulk of its key-value data. Think of them as sorted, bound books. They are created using `BuildTable`.
2. **TableCache:** This acts like an efficient librarian for SSTables. It keeps recently used tables open and their index structures cached in memory (RAM) to speed up access, avoiding slow disk reads whenever possible. It provides access to table data, often via iterators.
These two components work together to provide persistent storage and relatively fast access to the data within those files.
But where does the data *come from* before it gets written into an SSTable? Often, it lives in memory first. In the next chapter, we'll look at the in-memory structure where recent writes are held before being flushed to an SSTable.
Next up: [Chapter 2: MemTable](02_memtable.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,281 @@
# Chapter 2: MemTable
In [Chapter 1: Table / SSTable & TableCache](01_table___sstable___tablecache.md), we learned how LevelDB stores the bulk of its data permanently on disk in sorted, immutable files called SSTables. We also saw how the `TableCache` helps access these files efficiently.
But imagine you're updating your data frequently adding new users, changing scores, deleting temporary items. Writing every tiny change directly to a new SSTable file on disk would be incredibly slow, like carving every single note onto a stone tablet! We need a faster way to handle recent changes.
## What's the Problem? Slow Disk Writes for Every Change
Disk drives (even fast SSDs) are much slower than your computer's main memory (RAM). If LevelDB wrote every `Put` or `Delete` operation straight to an SSTable file, your application would constantly be waiting for the disk, making it feel sluggish.
How can we accept new writes quickly but still eventually store them permanently on disk?
## MemTable: The Fast In-Memory Notepad
LevelDB's solution is the **MemTable**. Think of it as a **temporary notepad** or a **scratchpad** that lives entirely in your computer's fast RAM.
1. **In-Memory:** It's stored in RAM, making reads and writes extremely fast.
2. **Holds Recent Writes:** When you `Put` a new key-value pair or `Delete` a key, the change goes into the MemTable first.
3. **Sorted:** Just like SSTables, the data inside the MemTable is kept sorted by key. This is important for efficiency later.
4. **Temporary:** It's only a temporary holding area. Eventually, its contents get written out to a permanent SSTable file on disk.
So, when you write data:
*Your Application* -> `Put("user123", "data")` -> **MemTable** (Fast RAM write!)
This makes write operations feel almost instantaneous to your application.
## How Reads Use the MemTable
When you try to read data using `Get(key)`, LevelDB is smart. It knows the most recent data might still be on the "notepad" (MemTable). So, it checks there *first*:
1. **Check MemTable:** Look for the key in the current MemTable.
* If the key is found, return the value immediately (super fast!).
* If a "deletion marker" for the key is found, stop and report "Not Found" (the key was recently deleted).
2. **Check Older MemTable (Immutable):** If there's an older MemTable being flushed (we'll cover this next), check that too.
3. **Check SSTables:** If the key wasn't found in memory (or wasn't deleted there), *then* LevelDB looks for it in the SSTable files on disk, using the [Table / SSTable & TableCache](01_table___sstable___tablecache.md) we learned about in Chapter 1.
This "check memory first" strategy ensures that you always read the most up-to-date value, even if it hasn't hit the disk yet.
```mermaid
sequenceDiagram
participant Client as App Read (Get)
participant LevelDB
participant MemTable as Active MemTable (RAM)
participant ImMemTable as Immutable MemTable (RAM, if exists)
participant TableCache as SSTable Cache (Disk/RAM)
Client->>LevelDB: Get("some_key")
LevelDB->>MemTable: Have "some_key"?
alt Key found in Active MemTable
MemTable-->>LevelDB: Yes, value is "xyz"
LevelDB-->>Client: Return "xyz"
else Key Deleted in Active MemTable
MemTable-->>LevelDB: Yes, it's deleted
LevelDB-->>Client: Return NotFound
else Not in Active MemTable
MemTable-->>LevelDB: No
LevelDB->>ImMemTable: Have "some_key"?
alt Key found in Immutable MemTable
ImMemTable-->>LevelDB: Yes, value is "abc"
LevelDB-->>Client: Return "abc"
else Key Deleted in Immutable MemTable
ImMemTable-->>LevelDB: Yes, it's deleted
LevelDB-->>Client: Return NotFound
else Not in Immutable MemTable
ImMemTable-->>LevelDB: No
LevelDB->>TableCache: Get("some_key") from SSTables
TableCache-->>LevelDB: Found "old_value" / NotFound
LevelDB-->>Client: Return "old_value" / NotFound
end
end
```
## What Happens When the Notepad Fills Up?
The MemTable lives in RAM, which is limited. We can't just keep adding data to it forever. LevelDB has a configured size limit for the MemTable ( `options.write_buffer_size`, often a few megabytes).
When the MemTable gets close to this size:
1. **Freeze!** LevelDB declares the current MemTable "immutable" (meaning read-only). No new writes go into this specific MemTable anymore. Let's call it `imm_` (Immutable MemTable).
2. **New Notepad:** LevelDB immediately creates a *new*, empty MemTable (`mem_`) to accept incoming writes. Your application doesn't pause; new writes just start going to the fresh MemTable.
3. **Flush to Disk:** A background task starts working on the frozen `imm_`. It reads all the sorted key-value pairs from `imm_` and uses the `BuildTable` process (from [Chapter 1](01_table___sstable___tablecache.md)) to write them into a brand new SSTable file on disk. This new file becomes part of "Level-0" (we'll learn more about levels in [Chapter 8: Compaction](08_compaction.md)).
4. **Discard:** Once the `imm_` is successfully written to the SSTable file, the in-memory `imm_` is discarded, freeing up RAM.
This process ensures that writes are always fast (going to the *new* `mem_`) while the *old* data is efficiently flushed to disk in the background.
```mermaid
graph TD
subgraph Writes
A[Incoming Writes: Put/Delete] --> B(Active MemTable mem_);
end
subgraph MemTable Full
B -- Reaches Size Limit --> C{Freeze mem_ -> becomes imm_};
C --> D(Create New Empty mem_);
A --> D;
C --> E{Background Flush};
end
subgraph Background Flush
E -- Reads Data --> F(Immutable MemTable imm_);
F -- Uses BuildTable --> G([Level-0 SSTable on Disk]);
G -- Flush Complete --> H{Discard imm_};
end
style G fill:#f9f,stroke:#333,stroke-width:2px
```
## Under the Hood: Keeping it Sorted with a SkipList
We mentioned that the MemTable keeps keys sorted. Why?
1. **Efficient Flushing:** When flushing the MemTable to an SSTable, the data needs to be written in sorted order. If the MemTable is already sorted, this is very efficient we just read through it sequentially.
2. **Efficient Reads:** Keeping it sorted allows for faster lookups within the MemTable itself.
How does LevelDB keep the MemTable sorted while allowing fast inserts? It uses a clever data structure called a **SkipList**.
Imagine a sorted linked list. To find an element, you might have to traverse many nodes. Now, imagine adding some "express lanes" (higher-level links) that skip over several nodes at a time. You can use these express lanes to quickly get close to your target, then drop down to the detailed level (the base list) to find the exact spot. This is the core idea of a SkipList!
* **Fast Inserts:** Adding a new item is generally fast.
* **Fast Lookups:** Finding an item is much faster than a simple linked list, often close to the speed of more complex balanced trees.
* **Efficient Iteration:** Reading all items in sorted order (needed for flushing) is straightforward.
The MemTable essentially wraps a SkipList provided by `skiplist.h`.
```c++
// --- File: db/memtable.h ---
#include "db/skiplist.h" // The SkipList data structure
#include "util/arena.h" // Memory allocator
class MemTable {
private:
// The core data structure: a SkipList.
// The Key is 'const char*' pointing into the Arena.
// KeyComparator helps compare keys correctly (we'll see this later).
typedef SkipList<const char*, KeyComparator> Table;
Arena arena_; // Allocates memory for nodes efficiently
Table table_; // The actual SkipList instance
int refs_; // Reference count for managing lifetime
// ... other members like KeyComparator ...
public:
// Add an entry (Put or Delete marker)
void Add(SequenceNumber seq, ValueType type, const Slice& key,
const Slice& value);
// Look up a key
bool Get(const LookupKey& key, std::string* value, Status* s);
// Create an iterator to scan the MemTable's contents
Iterator* NewIterator();
// Estimate memory usage
size_t ApproximateMemoryUsage();
// Constructor, Ref/Unref omitted for brevity...
};
```
This header shows the `MemTable` class uses an `Arena` for memory management and a `Table` (which is a `SkipList`) to store the data.
## Adding and Getting Data (Code View)
Let's look at simplified versions of `Add` and `Get`.
**Adding an Entry:**
When you call `db->Put(key, value)` or `db->Delete(key)`, it eventually calls `MemTable::Add`.
```c++
// --- File: db/memtable.cc ---
void MemTable::Add(SequenceNumber s, ValueType type, const Slice& key,
const Slice& value) {
// Calculate size needed for the entry in the skiplist.
// Format includes key size, key, sequence number + type tag, value size, value.
size_t key_size = key.size();
size_t val_size = value.size();
size_t internal_key_size = key_size + 8; // 8 bytes for seq + type
const size_t encoded_len = VarintLength(internal_key_size) +
internal_key_size + VarintLength(val_size) +
val_size;
// Allocate memory from the Arena
char* buf = arena_.Allocate(encoded_len);
// Encode the entry into the buffer 'buf' (details omitted)
// Format: [key_len][key_bytes][seq_num|type][value_len][value_bytes]
// ... encoding logic ...
// Insert the buffer pointer into the SkipList. The SkipList uses the
// KeyComparator to know how to sort based on the encoded format.
table_.Insert(buf);
}
```
**Explanation:**
1. **Calculate Size:** Determines how much memory is needed to store the key, value, sequence number, and type. (We'll cover sequence numbers and internal keys in [Chapter 9](09_internalkey___dbformat.md)).
2. **Allocate:** Gets a chunk of memory from the `Arena`. Arenas are efficient allocators for many small objects with similar lifetimes.
3. **Encode:** Copies the key, value, and metadata into the allocated buffer (`buf`).
4. **Insert:** Calls `table_.Insert(buf)`, where `table_` is the SkipList. The SkipList takes care of finding the correct sorted position and linking the new entry.
**Getting an Entry:**
When you call `db->Get(key)`, it checks the MemTable first using `MemTable::Get`.
```c++
// --- File: db/memtable.cc ---
bool MemTable::Get(const LookupKey& lkey, std::string* value, Status* s) {
// Get the specially formatted key to search for in the MemTable.
Slice memkey = lkey.memtable_key();
// Create an iterator for the SkipList.
Table::Iterator iter(&table_);
// Seek to the first entry >= the key we are looking for.
iter.Seek(memkey.data());
if (iter.Valid()) { // Did we find something at or after our key?
// Decode the key found in the SkipList
const char* entry = iter.key();
// ... decode logic to get user_key, sequence, type ...
Slice found_user_key = /* decoded user key */;
ValueType found_type = /* decoded type */;
// Check if the user key matches exactly
if (comparator_.comparator.user_comparator()->Compare(
found_user_key, lkey.user_key()) == 0) {
// It's the right key! Check the type.
if (found_type == kTypeValue) { // Is it a Put record?
// Decode the value and return it
Slice v = /* decoded value */;
value->assign(v.data(), v.size());
return true; // Found the value!
} else { // Must be kTypeDeletion
// Found a deletion marker for this key. Report "NotFound".
*s = Status::NotFound(Slice());
return true; // Found a deletion!
}
}
}
// Key not found in this MemTable
return false;
}
```
**Explanation:**
1. **Get Search Key:** Prepares the key in the format used internally by the MemTable (`LookupKey`).
2. **Create Iterator:** Gets a `SkipList::Iterator`.
3. **Seek:** Uses the iterator's `Seek` method to efficiently find the first entry in the SkipList whose key is greater than or equal to the search key.
4. **Check Found Entry:** If `Seek` finds an entry (`iter.Valid()`):
* It decodes the entry found in the SkipList.
* It compares the *user* part of the key to ensure it's an exact match (not just the next key in sorted order).
* If the keys match, it checks the `type`:
* If it's `kTypeValue`, it decodes the value and returns `true`.
* If it's `kTypeDeletion`, it sets the status to `NotFound` and returns `true` (indicating we found definitive information about the key it's deleted).
5. **Not Found:** If no matching key is found, it returns `false`.
## Conclusion
The **MemTable** is LevelDB's crucial in-memory cache for recent writes. It acts like a fast notepad:
* Accepts new `Put` and `Delete` operations quickly in RAM.
* Keeps entries sorted using an efficient **SkipList**.
* Allows recent data to be read quickly without touching the disk.
* When full, it's frozen, flushed to a new Level-0 **SSTable** file on disk in the background, and then discarded.
This design allows LevelDB to provide very fast write performance while still ensuring data is eventually persisted safely to disk.
However, what happens if the power goes out *after* data is written to the MemTable but *before* it's flushed to an SSTable? Isn't the data in RAM lost? To solve this, LevelDB uses another component alongside the MemTable: the Write-Ahead Log (WAL).
Next up: [Chapter 3: Write-Ahead Log (WAL) & LogWriter/LogReader](03_write_ahead_log__wal____logwriter_logreader.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,352 @@
# Chapter 3: Write-Ahead Log (WAL) & LogWriter/LogReader
In [Chapter 2: MemTable](02_memtable.md), we saw how LevelDB uses an in-memory `MemTable` (like a fast notepad) to quickly accept new writes (`Put` or `Delete`) before they are eventually flushed to an [SSTable](01_table___sstable___tablecache.md) file on disk.
This is great for speed! But what if the unthinkable happens? Imagine you've just written some important data. It's sitting safely in the `MemTable` in RAM, but *before* LevelDB gets a chance to write it to a permanent SSTable file, the power cord gets kicked out, or the server crashes!
Uh oh. Since RAM is volatile, anything in the `MemTable` that hadn't been saved to disk is **gone** forever when the power goes out. That's not very reliable for a database!
## What's the Problem? Losing Data on Crashes
How can LevelDB make sure that once your write operation *returns successfully*, the data is safe, even if the system crashes immediately afterwards? Relying only on the `MemTable` isn't enough because it lives in volatile RAM. We need a way to make writes durable (permanent) much sooner.
## Write-Ahead Log (WAL): The Database's Safety Journal
LevelDB's solution is the **Write-Ahead Log (WAL)**, often just called the **log**.
Think of the WAL as a **ship's logbook** or a **court reporter's transcript**.
1. **Write First:** Before the captain takes any significant action (like changing course), they write it down in the logbook *first*. Similarly, before LevelDB modifies the `MemTable` (which is in RAM), it **first appends** a description of the change (e.g., "Put key 'user1' with value 'dataA'") to a special file on disk the WAL file.
2. **Append-Only:** Like a logbook, entries are just added sequentially to the end. LevelDB doesn't go back and modify old entries in the current WAL file. This makes writing very fast it's just adding to the end of a file.
3. **On Disk:** Crucially, this WAL file lives on the persistent disk (HDD or SSD), not just in volatile RAM.
4. **Durability:** By writing to the WAL *before* acknowledging a write to the user, LevelDB ensures that even if the server crashes immediately after, the record of the operation is safely stored on disk in the log.
So, the write process looks like this:
*Your Application* -> `Put("user123", "data")` -> **1. Append to WAL file (Disk)** -> **2. Add to MemTable (RAM)** -> *Return Success*
```mermaid
sequenceDiagram
participant App as Application
participant LevelDB
participant WAL as WAL File (Disk)
participant MemTable as MemTable (RAM)
App->>LevelDB: Put("key", "value")
LevelDB->>WAL: Append Put("key", "value") Record
Note right of WAL: Physical disk write
WAL-->>LevelDB: Append successful
LevelDB->>MemTable: Add("key", "value")
MemTable-->>LevelDB: Add successful
LevelDB-->>App: Write successful
```
This "write-ahead" step ensures durability.
## What Happens During Recovery? Replaying the Logbook
Now, let's say the server crashes and restarts. LevelDB needs to recover its state. How does the WAL help?
1. **Check for Log:** When LevelDB starts up, it looks for a WAL file.
2. **Read the Log:** If a WAL file exists, it means the database might not have shut down cleanly, and the last `MemTable`'s contents (which were only in RAM) were lost. LevelDB creates a `LogReader` to read through the WAL file from beginning to end.
3. **Rebuild MemTable:** For each operation record found in the WAL (like "Put key 'user1' value 'dataA'", "Delete key 'user2'"), LevelDB re-applies that operation to a *new*, empty `MemTable` in memory. It's like rereading the ship's logbook to reconstruct what happened right before the incident.
4. **Recovery Complete:** Once the entire WAL is replayed, the `MemTable` is back to the state it was in right before the crash. LevelDB can now continue operating normally, accepting new reads and writes. The data from the WAL is now safely in the new `MemTable`, ready to be flushed to an SSTable later.
The WAL file essentially acts as a temporary backup for the `MemTable` until the `MemTable`'s contents are permanently stored in an SSTable. Once a `MemTable` is successfully flushed to an SSTable, the corresponding WAL file is no longer needed and can be deleted.
## LogWriter: Appending to the Log
The component responsible for writing records to the WAL file is `log::Writer`. Think of it as the dedicated writer making entries in our ship's logbook.
When LevelDB processes a write operation (often coming from a [WriteBatch](05_writebatch.md), which we'll see later), it serializes the batch of changes into a single chunk of data (a `Slice`) and asks the `log::Writer` to add it to the current log file.
```c++
// --- Simplified from db/db_impl.cc ---
// Inside DBImpl::Write(...) after preparing the batch:
Status status = log_->AddRecord(WriteBatchInternal::Contents(write_batch));
// ... check status ...
if (status.ok() && options.sync) {
// Optionally ensure the data hits the physical disk
status = logfile_->Sync();
}
if (status.ok()) {
// Only if WAL write succeeded, apply to MemTable
status = WriteBatchInternal::InsertInto(write_batch, mem_);
}
// ... handle status ...
```
**Explanation:**
1. `WriteBatchInternal::Contents(write_batch)`: Gets the serialized representation of the write operations (like one or more Puts/Deletes).
2. `log_->AddRecord(...)`: Calls the `log::Writer` instance (`log_`) to append this serialized data as a single record to the current WAL file (`logfile_`).
3. `logfile_->Sync()`: If the `sync` option is set (which is the default for ensuring durability), this command tells the operating system to *really* make sure the data written to the log file has reached the physical disk platters/flash, not just sitting in some OS buffer. This is crucial for surviving power loss.
4. `WriteBatchInternal::InsertInto(write_batch, mem_)`: Only *after* the log write is confirmed (and synced, if requested) does LevelDB apply the changes to the in-memory `MemTable`.
The `log::Writer` itself handles the details of how records are actually formatted within the log file. Log files are composed of fixed-size blocks (e.g., 32KB). A single record from `AddRecord` might be small enough to fit entirely within the remaining space in the current block, or it might be large and need to be split (fragmented) across multiple physical records spanning block boundaries.
```c++
// --- Simplified from db/log_writer.cc ---
Status Writer::AddRecord(const Slice& slice) {
const char* ptr = slice.data();
size_t left = slice.size(); // How much data is left to write?
Status s;
bool begin = true; // Is this the first fragment of this record?
do {
const int leftover = kBlockSize - block_offset_; // Space left in current block
// ... if leftover < kHeaderSize, fill trailer and start new block ...
// Calculate how much of the data can fit in this block
const size_t avail = kBlockSize - block_offset_ - kHeaderSize;
const size_t fragment_length = (left < avail) ? left : avail;
// Determine the type of this physical record (fragment)
RecordType type;
const bool end = (left == fragment_length); // Is this the last fragment?
if (begin && end) {
type = kFullType; // Fits entirely in one piece
} else if (begin) {
type = kFirstType; // First piece of a multi-piece record
} else if (end) {
type = kLastType; // Last piece of a multi-piece record
} else {
type = kMiddleType; // Middle piece of a multi-piece record
}
// Write this physical record (header + data fragment) to the file
s = EmitPhysicalRecord(type, ptr, fragment_length);
// Advance pointers and update remaining size
ptr += fragment_length;
left -= fragment_length;
begin = false; // Subsequent fragments are not the 'begin' fragment
} while (s.ok() && left > 0); // Loop until all data is written or error
return s;
}
// Simplified - Writes header (checksum, length, type) and payload
Status Writer::EmitPhysicalRecord(RecordType t, const char* ptr, size_t length) {
// ... format header (buf) with checksum, length, type ...
// ... compute checksum ...
// ... Encode checksum into header ...
// Write header and payload fragment
Status s = dest_->Append(Slice(buf, kHeaderSize));
if (s.ok()) {
s = dest_->Append(Slice(ptr, length));
// LevelDB might Flush() here or let the caller Sync() later
}
block_offset_ += kHeaderSize + length; // Update position in current block
return s;
}
```
**Explanation:**
* The `AddRecord` method takes the user's data (`slice`) and potentially breaks it into smaller `fragment_length` chunks.
* Each chunk is written as a "physical record" using `EmitPhysicalRecord`.
* `EmitPhysicalRecord` prepends a small header (`kHeaderSize`, 7 bytes) containing a checksum (for detecting corruption), the length of this fragment, and the `RecordType` (`kFullType`, `kFirstType`, `kMiddleType`, or `kLastType`).
* The `RecordType` tells the `LogReader` later how to reassemble these fragments back into the original complete record.
## LogReader: Reading the Log for Recovery
The counterpart to `LogWriter` is `log::Reader`. This is the component used during database startup (recovery) to read the records back from a WAL file. Think of it as the person carefully reading the ship's logbook after an incident.
The `log::Reader` reads the log file sequentially, block by block. It parses the physical record headers, verifies checksums, and pieces together the fragments (`kFirstType`, `kMiddleType`, `kLastType`) to reconstruct the original data records that were passed to `AddRecord`.
```c++
// --- Simplified from db/db_impl.cc ---
// Inside DBImpl::RecoverLogFile(...)
// Create the log reader for the specific log file number
std::string fname = LogFileName(dbname_, log_number);
SequentialFile* file;
Status status = env_->NewSequentialFile(fname, &file);
// ... check status ...
// Set up reporter for corruption errors
log::Reader::Reporter reporter;
// ... initialize reporter ...
log::Reader reader(file, &reporter, true /*checksum*/, 0 /*initial_offset*/);
// Read records one by one and apply them to a temporary MemTable
std::string scratch;
Slice record;
WriteBatch batch;
MemTable* mem = new MemTable(internal_comparator_);
mem->Ref();
while (reader.ReadRecord(&record, &scratch) && status.ok()) {
// record now holds a complete record originally passed to AddRecord
// Parse the record back into a WriteBatch
WriteBatchInternal::SetContents(&batch, record);
// Apply the operations from the batch to the MemTable
status = WriteBatchInternal::InsertInto(&batch, mem);
// ... check status ...
// Update the max sequence number seen
const SequenceNumber last_seq = /* ... get from batch ... */;
if (last_seq > *max_sequence) {
*max_sequence = last_seq;
}
// Optional: If MemTable gets too big during recovery, flush it
if (mem->ApproximateMemoryUsage() > options_.write_buffer_size) {
status = WriteLevel0Table(mem, edit, nullptr); // Flush to SSTable
mem->Unref();
mem = new MemTable(internal_comparator_);
mem->Ref();
// ... check status ...
}
}
delete file; // Close the log file
// ... handle final MemTable (mem) if not null ...
```
**Explanation:**
1. A `log::Reader` is created, pointing to the WAL file (`.log`) that needs recovery.
2. The code loops using `reader.ReadRecord(&record, &scratch)`.
* `record`: This `Slice` will point to the reassembled data of the next complete logical record found in the log.
* `scratch`: A temporary string buffer the reader might use if a record spans multiple blocks.
3. Inside the loop:
* The `record` (which contains a serialized `WriteBatch`) is parsed back into a `WriteBatch` object.
* `WriteBatchInternal::InsertInto(&batch, mem)` applies the operations (Puts/Deletes) from the recovered batch to the in-memory `MemTable` (`mem`).
* The code keeps track of the latest sequence number encountered.
* Optionally, if the `MemTable` fills up *during* recovery, it can be flushed to an SSTable just like during normal operation.
4. This continues until `ReadRecord` returns `false` (end of log file) or an error occurs.
The `log::Reader::ReadRecord` implementation handles the details of reading blocks, finding headers, checking checksums, and combining `kFirstType`, `kMiddleType`, `kLastType` fragments.
```c++
// --- Simplified from db/log_reader.cc ---
// Reads the next complete logical record. Returns true if successful.
bool Reader::ReadRecord(Slice* record, std::string* scratch) {
// ... skip records before initial_offset if necessary ...
scratch->clear();
record->clear();
bool in_fragmented_record = false;
Slice fragment; // To hold data from one physical record
while (true) {
// Reads the next physical record (header + data fragment) from the file blocks.
// Handles reading across block boundaries internally.
const unsigned int record_type = ReadPhysicalRecord(&fragment);
// ... handle resyncing logic after seeking ...
switch (record_type) {
case kFullType:
// ... sanity check for unexpected fragments ...
*record = fragment; // Got a complete record in one piece
return true;
case kFirstType:
// ... sanity check for unexpected fragments ...
scratch->assign(fragment.data(), fragment.size()); // Start of a new fragmented record
in_fragmented_record = true;
break;
case kMiddleType:
if (!in_fragmented_record) { /* Report corruption */ }
else { scratch->append(fragment.data(), fragment.size()); } // Append middle piece
break;
case kLastType:
if (!in_fragmented_record) { /* Report corruption */ }
else {
scratch->append(fragment.data(), fragment.size()); // Append final piece
*record = Slice(*scratch); // Reassembled record is complete
return true;
}
break;
case kEof:
return false; // End of log file
case kBadRecord:
// ... report corruption, clear state ...
in_fragmented_record = false;
scratch->clear();
break; // Try to find the next valid record
default:
// ... report corruption ...
in_fragmented_record = false;
scratch->clear();
break; // Try to find the next valid record
}
}
}
```
**Explanation:**
* `ReadRecord` calls `ReadPhysicalRecord` repeatedly in a loop.
* `ReadPhysicalRecord` (internal helper, not shown in full) reads from the file, parses the 7-byte header, checks the CRC, and returns the type and the data fragment (`result`). It handles skipping block trailers and reading new blocks as needed.
* Based on the `record_type`, `ReadRecord` either returns the complete record (`kFullType`), starts assembling fragments (`kFirstType`), appends fragments (`kMiddleType`), or finishes assembling and returns the record (`kLastType`).
* It manages the `scratch` buffer to hold the fragments being assembled.
## Recovery Process Diagram
Here's how the WAL is used during database startup if a crash occurred:
```mermaid
sequenceDiagram
participant App as Application Startup
participant LevelDB as DB::Open()
participant Env as Environment (OS/FS)
participant LogReader as log::Reader
participant MemTable as New MemTable (RAM)
App->>LevelDB: Open Database
LevelDB->>Env: Check for CURRENT file, MANIFEST, etc.
LevelDB->>Env: Look for .log files >= Manifest LogNumber
alt Log file(s) found
LevelDB->>LogReader : Create Reader for log file
loop Read Log Records
LogReader ->> Env: Read next block(s) from log file
Env-->>LogReader: Return data
LogReader ->> LogReader : Parse physical records, reassemble logical record
alt Record Found
LogReader -->> LevelDB: Return next record (WriteBatch data)
LevelDB ->> MemTable: Apply WriteBatch to MemTable
else End of Log or Error
LogReader -->> LevelDB: Indicate EOF / Error
Note right of LevelDB: Loop will exit
end
end
LevelDB ->> LogReader : Destroy Reader
Note right of LevelDB: MemTable now holds recovered state.
else No relevant log files
Note right of LevelDB: Clean shutdown or new DB. No log replay needed.
end
LevelDB-->>App: Database Opened Successfully
```
## Conclusion
The **Write-Ahead Log (WAL)** is a critical component for ensuring **durability** in LevelDB. By writing every operation to an append-only log file on disk *before* applying it to the in-memory `MemTable` and acknowledging the write, LevelDB guarantees that no acknowledged data is lost even if the server crashes.
* The `log::Writer` handles appending records to the current WAL file, dealing with block formatting and fragmentation.
* The `log::Reader` handles reading records back from the WAL file during recovery, verifying checksums and reassembling fragmented records.
* This recovery process replays the logged operations to rebuild the `MemTable` state that was lost in the crash.
The WAL, MemTable, and SSTables work together: WAL provides fast durability for recent writes, MemTable provides fast access to those recent writes in memory, and SSTables provide persistent, sorted storage for the bulk of the data.
Now that we understand the core storage structures (SSTables, MemTable, WAL), we can start looking at how they are managed and coordinated.
Next up: [Chapter 4: DBImpl](04_dbimpl.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

439
output/LevelDB/04_dbimpl.md Normal file
View File

@@ -0,0 +1,439 @@
# Chapter 4: DBImpl - The Database General Manager
In the previous chapters, we've explored some key ingredients of LevelDB:
* [SSTables](01_table___sstable___tablecache.md) for storing data permanently on disk.
* The [MemTable](02_memtable.md) for quickly handling recent writes in memory.
* The [Write-Ahead Log (WAL)](03_write_ahead_log__wal____logwriter_logreader.md) for ensuring durability even if the system crashes.
But how do all these pieces work together? Who tells LevelDB to write to the WAL first, *then* the MemTable? Who decides when the MemTable is full and needs to be flushed to an SSTable? Who coordinates reading data from both memory *and* disk files?
## What's the Problem? Orchestrating Everything
Imagine a large library. You have librarians putting books on shelves (SSTables), a front desk clerk taking newly returned books (MemTable), and a security guard logging everyone who enters (WAL). But someone needs to be in charge of the whole operation the **General Manager**.
This manager doesn't shelve every book themselves, but they direct the staff, manage the budget, decide when to rearrange sections (compaction), and handle emergencies (recovery). Without a manager, it would be chaos!
LevelDB needs a similar central coordinator to manage all its different parts and ensure they work together smoothly and correctly.
## DBImpl: The General Manager of LevelDB
The `DBImpl` class is the heart of LevelDB's implementation. It's the **General Manager** of our database library. It doesn't *contain* the data itself (that's in MemTables and SSTables), but it **orchestrates** almost every operation.
* It takes requests from your application (like `Put`, `Get`, `Delete`).
* It directs these requests to the right components (WAL, MemTable, TableCache).
* It manages the state of the database (like which MemTable is active, which files exist).
* It initiates and manages background tasks like flushing the MemTable and running compactions.
* It handles the recovery process when the database starts up.
Almost every interaction you have with a LevelDB database object ultimately goes through `DBImpl`.
## Key Responsibilities of DBImpl
Think of the `DBImpl` general manager juggling several key tasks:
1. **Handling Writes (`Put`, `Delete`, `Write`):** Ensuring data is safely written to the WAL and then the MemTable. Managing the process when the MemTable fills up.
2. **Handling Reads (`Get`, `NewIterator`):** Figuring out where to find the requested data checking the active MemTable, the soon-to-be-flushed immutable MemTable, and finally the various SSTable files on disk (using helpers like [Version & VersionSet](06_version___versionset.md) and [Table / SSTable & TableCache](01_table___sstable___tablecache.md)).
3. **Background Maintenance ([Compaction](08_compaction.md)):** Deciding when and how to run compactions to clean up old data, merge SSTables, and keep reads efficient. It schedules and oversees this background work.
4. **Startup and Recovery:** When the database opens, `DBImpl` manages locking the database directory, reading the manifest file ([Version & VersionSet](06_version___versionset.md)), and replaying the [WAL](03_write_ahead_log__wal____logwriter_logreader.md) to recover any data that wasn't flushed before the last shutdown or crash.
5. **Snapshot Management:** Handling requests to create and release snapshots, which provide a consistent view of the database at a specific point in time.
`DBImpl` uses other components extensively to perform these tasks. It holds references to the active MemTable (`mem_`), the immutable MemTable (`imm_`), the WAL (`log_`), the `TableCache`, and the `VersionSet` (which tracks all the SSTable files).
## How DBImpl Handles Writes
Let's trace a simple `Put` operation:
1. **Request:** Your application calls `db->Put("mykey", "myvalue")`.
2. **DBImpl Entry:** This call enters the `DBImpl::Put` method (which typically wraps the operation in a [WriteBatch](05_writebatch.md) and calls `DBImpl::Write`).
3. **Queueing (Optional):** `DBImpl` manages a queue of writers to ensure writes happen in order. It might group multiple concurrent writes together for efficiency (`BuildBatchGroup`).
4. **Making Room:** Before writing, `DBImpl` checks if there's space in the current `MemTable` (`mem_`). If not (`MakeRoomForWrite`), it might:
* Pause briefly if Level-0 SSTable count is high (slowdown trigger).
* Wait if the *immutable* MemTable (`imm_`) is still being flushed.
* Wait if Level-0 SSTable count is too high (stop trigger).
* **Trigger a MemTable switch:**
* Mark the current `mem_` as read-only (`imm_`).
* Create a new empty `mem_`.
* Create a new WAL file (`logfile_`).
* Schedule a background task (`MaybeScheduleCompaction`) to flush the old `imm_` to an SSTable.
5. **Write to WAL:** `DBImpl` writes the operation(s) to the current WAL file (`log_->AddRecord(...)`). If requested (`options.sync`), it ensures the WAL data is physically on disk (`logfile_->Sync()`).
6. **Write to MemTable:** Only after the WAL write succeeds, `DBImpl` inserts the data into the active `MemTable` (`mem_->Add(...)` via `WriteBatchInternal::InsertInto`).
7. **Return:** Control returns to your application.
Here's a highly simplified view of the `Write` method:
```c++
// --- Simplified from db/db_impl.cc ---
Status DBImpl::Write(const WriteOptions& options, WriteBatch* updates) {
// ... acquire mutex, manage writer queue (omitted) ...
// Step 4: Make sure there's space. This might trigger a MemTable switch
// and schedule background work. May wait if MemTable is full or
// too many L0 files exist.
Status status = MakeRoomForWrite(updates == nullptr /* force compact? */);
if (status.ok() && updates != nullptr) {
// ... potentially group multiple concurrent writes (BuildBatchGroup) ...
// Step 5: Add the batch to the Write-Ahead Log
status = log_->AddRecord(WriteBatchInternal::Contents(updates));
if (status.ok() && options.sync) {
// Ensure log entry is on disk if requested
status = logfile_->Sync();
// ... handle sync error by recording background error ...
}
// Step 6: Insert the batch into the active MemTable (only if WAL ok)
if (status.ok()) {
status = WriteBatchInternal::InsertInto(updates, mem_);
}
}
// ... update sequence number, manage writer queue, release mutex ...
return status; // Step 7: Return status to caller
}
```
**Explanation:** This code shows the core sequence: check/make room (`MakeRoomForWrite`), write to the log (`log_->AddRecord`), potentially sync the log (`logfile_->Sync`), and finally insert into the MemTable (`InsertInto(..., mem_)`). Error handling and writer coordination are omitted for clarity.
```mermaid
sequenceDiagram
participant App as Application
participant DBImpl
participant WriterQueue as Writer Queue
participant LogWriter as log::Writer (WAL)
participant MemTable as Active MemTable (RAM)
App->>DBImpl: Put("key", "value") / Write(batch)
DBImpl->>WriterQueue: Add writer to queue
Note over DBImpl: Waits if not front of queue
DBImpl->>DBImpl: MakeRoomForWrite()?
alt MemTable Full / L0 Trigger
DBImpl->>DBImpl: Switch MemTable, Schedule Flush
end
DBImpl->>LogWriter: AddRecord(batch_data)
opt Sync Option Enabled
DBImpl->>LogWriter: Sync() Log File
end
LogWriter-->>DBImpl: Log Write Status
alt Log Write OK
DBImpl->>MemTable: InsertInto(batch_data)
MemTable-->>DBImpl: Insert Status
DBImpl->>WriterQueue: Remove writer, Signal next
DBImpl-->>App: Return OK
else Log Write Failed
DBImpl->>WriterQueue: Remove writer, Signal next
DBImpl-->>App: Return Error Status
end
```
## How DBImpl Handles Reads
Reading data involves checking different places in a specific order to ensure the most recent value is found:
1. **Request:** Your application calls `db->Get("mykey")`.
2. **DBImpl Entry:** The call enters `DBImpl::Get`.
3. **Snapshot:** `DBImpl` determines the sequence number to read up to (either from the provided `ReadOptions::snapshot` or the current latest sequence number).
4. **Check MemTable:** It first checks the active `MemTable` (`mem_`). If the key is found (either a value or a deletion marker), the search stops, and the result is returned.
5. **Check Immutable MemTable:** If not found in `mem_`, and if an immutable MemTable (`imm_`) exists (one that's waiting to be flushed), it checks `imm_`. If found, the search stops.
6. **Check SSTables:** If the key wasn't found in memory, `DBImpl` asks the current `Version` (managed by `VersionSet`) to find the key in the SSTable files (`current->Get(...)`). The `Version` object knows which files might contain the key and uses the `TableCache` to access them efficiently.
7. **Update Stats (Optional):** If the read involved checking SSTables, `DBImpl` might update internal statistics about file access (`current->UpdateStats`). If a file is read frequently, this might trigger a future compaction (`MaybeScheduleCompaction`).
8. **Return:** The value found (or a "Not Found" status) is returned to the application.
A simplified view of `Get`:
```c++
// --- Simplified from db/db_impl.cc ---
Status DBImpl::Get(const ReadOptions& options, const Slice& key,
std::string* value) {
Status s;
SequenceNumber snapshot;
// ... (Step 3) Determine snapshot sequence number ...
mutex_.Lock(); // Need lock to access mem_, imm_, current version
MemTable* mem = mem_;
MemTable* imm = imm_;
Version* current = versions_->current();
mem->Ref(); // Increase reference counts
if (imm != nullptr) imm->Ref();
current->Ref();
mutex_.Unlock(); // Unlock for potentially slow lookups
LookupKey lkey(key, snapshot); // Internal key format for lookup
// Step 4: Check active MemTable
if (mem->Get(lkey, value, &s)) {
// Found in mem_ (value or deletion marker)
}
// Step 5: Check immutable MemTable (if it exists)
else if (imm != nullptr && imm->Get(lkey, value, &s)) {
// Found in imm_
}
// Step 6: Check SSTables via current Version
else {
Version::GetStats stats; // To record file access stats
s = current->Get(options, lkey, value, &stats);
// Step 7: Maybe update stats and schedule compaction
if (current->UpdateStats(stats)) {
mutex_.Lock();
MaybeScheduleCompaction(); // Needs lock
mutex_.Unlock();
}
}
// Decrease reference counts
mutex_.Lock();
mem->Unref();
if (imm != nullptr) imm->Unref();
current->Unref();
mutex_.Unlock();
return s; // Step 8: Return status
}
```
**Explanation:** This shows the order of checking: `mem->Get`, `imm->Get`, and finally `current->Get` (which searches SSTables). It also highlights the reference counting (`Ref`/`Unref`) needed because these components might be changed or deleted by background threads while the read is in progress. The lock is held only when accessing shared pointers, not during the actual data lookup.
```mermaid
sequenceDiagram
participant App as Application
participant DBImpl
participant MemTable as Active MemTable (RAM)
participant ImmMemTable as Immutable MemTable (RAM)
participant Version as Current Version
participant TableCache as TableCache (SSTables)
App->>DBImpl: Get("key")
DBImpl->>MemTable: Get(lkey)?
alt Key Found in MemTable
MemTable-->>DBImpl: Return value / deletion
DBImpl-->>App: Return value / NotFound
else Key Not Found in MemTable
MemTable-->>DBImpl: Not Found
DBImpl->>ImmMemTable: Get(lkey)?
alt Key Found in ImmMemTable
ImmMemTable-->>DBImpl: Return value / deletion
DBImpl-->>App: Return value / NotFound
else Key Not Found in ImmMemTable
ImmMemTable-->>DBImpl: Not Found
DBImpl->>Version: Get(lkey) from SSTables?
Version->>TableCache: Find key in relevant SSTables
TableCache-->>Version: Return value / deletion / NotFound
Version-->>DBImpl: Return value / deletion / NotFound
DBImpl-->>App: Return value / NotFound
end
end
```
## Managing Background Work (Compaction)
`DBImpl` is responsible for kicking off background work. It doesn't *do* the compaction itself (that logic is largely within [Compaction](08_compaction.md) and [VersionSet](06_version___versionset.md)), but it manages the *triggering* and the background thread.
* **When is work needed?** `DBImpl` checks if work is needed in a few places:
* After a MemTable switch (`MakeRoomForWrite` schedules flush of `imm_`).
* After a read operation updates file stats (`Get` might call `MaybeScheduleCompaction`).
* After a background compaction finishes (it checks if *more* compaction is needed).
* When explicitly requested (`CompactRange`).
* **Scheduling:** If work is needed and a background task isn't already running, `DBImpl::MaybeScheduleCompaction` sets a flag (`background_compaction_scheduled_`) and asks the `Env` (Environment object, handles OS interactions) to schedule a function (`DBImpl::BGWork`) to run on a background thread.
* **Performing Work:** The background thread eventually calls `DBImpl::BackgroundCall`, which locks the mutex and calls `DBImpl::BackgroundCompaction`. This method decides *what* work to do:
* If `imm_` exists, it calls `CompactMemTable` (which uses `WriteLevel0Table` -> `BuildTable`) to flush it.
* Otherwise, it asks the `VersionSet` to pick an appropriate SSTable compaction (`versions_->PickCompaction()`).
* It then calls `DoCompactionWork` to perform the actual SSTable compaction (releasing the main lock during the heavy lifting).
* **Signaling:** Once background work finishes, it signals (`background_work_finished_signal_.SignalAll()`) any foreground threads that might be waiting (e.g., a write operation waiting for `imm_` to be flushed).
Here's the simplified scheduling logic:
```c++
// --- Simplified from db/db_impl.cc ---
void DBImpl::MaybeScheduleCompaction() {
mutex_.AssertHeld(); // Must hold lock to check/change state
if (background_compaction_scheduled_) {
// Already scheduled
} else if (shutting_down_.load(std::memory_order_acquire)) {
// DB is closing
} else if (!bg_error_.ok()) {
// Background error stopped activity
} else if (imm_ == nullptr && // No MemTable flush needed AND
manual_compaction_ == nullptr && // No manual request AND
!versions_->NeedsCompaction()) { // VersionSet says no work needed
// No work to be done
} else {
// Work needs to be done! Schedule it.
background_compaction_scheduled_ = true;
env_->Schedule(&DBImpl::BGWork, this); // Ask Env to run BGWork later
}
}
```
**Explanation:** This function checks several conditions under a lock. If there's an immutable MemTable to flush (`imm_ != nullptr`) or the `VersionSet` indicates compaction is needed (`versions_->NeedsCompaction()`) and no background task is already scheduled, it marks one as scheduled and tells the environment (`env_`) to run the `BGWork` function in the background.
```mermaid
flowchart TD
A["Write/Read/Compact finishes"] --> B{"Need Compaction?"}
B -->|Yes| C{"BG Task Scheduled?"}
B -->|No| Z["Idle"]
C -->|Yes| Z
C -->|No| D["Mark BG Scheduled = true"]
D --> E["Schedule BGWork"]
E --> F["Background Thread Pool"]
F -->|Runs| G["DBImpl::BGWork"]
G --> H["DBImpl::BackgroundCall"]
H --> I{"Compact imm_ OR Pick/Run SSTable Compaction?"}
I --> J["Perform Compaction Work"]
J --> K["Mark BG Scheduled = false"]
K --> L["Signal Waiting Threads"]
L --> B
```
## Recovery on Startup
When you open a database, `DBImpl::Open` orchestrates the recovery process:
1. **Lock:** It locks the database directory (`env_->LockFile`) to prevent other processes from using it.
2. **Recover VersionSet:** It calls `versions_->Recover()`, which reads the `MANIFEST` file to understand the state of SSTables from the last clean run.
3. **Find Logs:** It scans the database directory for any `.log` files (WAL files) that are newer than the ones recorded in the `MANIFEST`. These logs represent writes that might not have been flushed to SSTables before the last shutdown/crash.
4. **Replay Logs:** For each relevant log file found, it calls `DBImpl::RecoverLogFile`.
* Inside `RecoverLogFile`, it creates a `log::Reader`.
* It reads records (which are serialized `WriteBatch`es) from the log file one by one.
* For each record, it applies the operations (`WriteBatchInternal::InsertInto`) to a temporary in-memory `MemTable`.
* This effectively rebuilds the state of the MemTable(s) as they were just before the crash/shutdown.
5. **Finalize State:** Once all logs are replayed, the recovered MemTable becomes the active `mem_`. If the recovery process itself filled the MemTable, `RecoverLogFile` might even flush it to a Level-0 SSTable (`WriteLevel0Table`). `DBImpl` updates the `VersionSet` with the recovered sequence number and potentially writes a new `MANIFEST`.
6. **Ready:** The database is now recovered and ready for new operations.
Here's a conceptual snippet from the recovery logic:
```c++
// --- Conceptual, simplified from DBImpl::RecoverLogFile ---
// Inside loop processing a single log file during recovery:
while (reader.ReadRecord(&record, &scratch) && status.ok()) {
// Check if record looks like a valid WriteBatch
if (record.size() < 12) { /* report corruption */ continue; }
// Parse the raw log record back into a WriteBatch object
WriteBatchInternal::SetContents(&batch, record);
// Create a MemTable if we don't have one yet for this log
if (mem == nullptr) {
mem = new MemTable(internal_comparator_);
mem->Ref();
}
// Apply the operations from the batch TO THE MEMTABLE
status = WriteBatchInternal::InsertInto(&batch, mem);
// ... handle error ...
// Keep track of the latest sequence number seen
const SequenceNumber last_seq = /* ... get sequence from batch ... */;
if (last_seq > *max_sequence) {
*max_sequence = last_seq;
}
// If the MemTable gets full *during recovery*, flush it!
if (mem->ApproximateMemoryUsage() > options_.write_buffer_size) {
status = WriteLevel0Table(mem, edit, nullptr); // Flush to L0 SSTable
mem->Unref();
mem = nullptr; // Will create a new one if needed
// ... handle error ...
}
}
// After loop, handle the final state of 'mem'
```
**Explanation:** This loop reads each record (a `WriteBatch`) from the log file using `reader.ReadRecord`. It then applies the batch's changes directly to an in-memory `MemTable` (`InsertInto(&batch, mem)`), effectively replaying the lost writes. It even handles flushing this MemTable if it fills up during the recovery process.
## The DBImpl Class (Code Glimpse)
The definition of `DBImpl` in `db_impl.h` shows the key components it manages:
```c++
// --- Simplified from db/db_impl.h ---
class DBImpl : public DB {
public:
DBImpl(const Options& options, const std::string& dbname);
~DBImpl() override;
// Public API methods (implementing DB interface)
Status Put(...) override;
Status Delete(...) override;
Status Write(...) override;
Status Get(...) override;
Iterator* NewIterator(...) override;
const Snapshot* GetSnapshot() override;
void ReleaseSnapshot(...) override;
// ... other public methods ...
private:
// Friend classes allow access to private members
friend class DB;
struct CompactionState; // Helper struct for compactions
struct Writer; // Helper struct for writer queue
// Core methods for internal operations
Status Recover(VersionEdit* edit, bool* save_manifest);
void CompactMemTable();
Status RecoverLogFile(...);
Status WriteLevel0Table(...);
Status MakeRoomForWrite(...);
void MaybeScheduleCompaction();
static void BGWork(void* db); // Background task entry point
void BackgroundCall();
void BackgroundCompaction();
Status DoCompactionWork(...);
// ... other private helpers ...
// == Key Member Variables ==
Env* const env_; // OS interaction layer
const InternalKeyComparator internal_comparator_; // For sorting keys
const Options options_; // Database configuration options
const std::string dbname_; // Database directory path
TableCache* const table_cache_; // Cache for open SSTable files
FileLock* db_lock_; // Lock file handle for DB directory
port::Mutex mutex_; // Main mutex protecting shared state
std::atomic<bool> shutting_down_; // Flag indicating DB closure
port::CondVar background_work_finished_signal_ GUARDED_BY(mutex_); // For waiting
MemTable* mem_ GUARDED_BY(mutex_); // Active memtable (accepts writes)
MemTable* imm_ GUARDED_BY(mutex_); // Immutable memtable (being flushed)
std::atomic<bool> has_imm_; // Fast check if imm_ is non-null
WritableFile* logfile_; // Current WAL file handle
uint64_t logfile_number_ GUARDED_BY(mutex_); // Current WAL file number
log::Writer* log_; // WAL writer object
VersionSet* const versions_ GUARDED_BY(mutex_); // Manages SSTables/Versions
// Queue of writers waiting for their turn
std::deque<Writer*> writers_ GUARDED_BY(mutex_);
// List of active snapshots
SnapshotList snapshots_ GUARDED_BY(mutex_);
// Files being generated by compactions
std::set<uint64_t> pending_outputs_ GUARDED_BY(mutex_);
// Is a background compaction scheduled/running?
bool background_compaction_scheduled_ GUARDED_BY(mutex_);
// Error status from background threads
Status bg_error_ GUARDED_BY(mutex_);
// Compaction statistics
CompactionStats stats_[config::kNumLevels] GUARDED_BY(mutex_);
};
```
**Explanation:** This header shows `DBImpl` inheriting from the public `DB` interface. It contains references to essential components like the `Env`, `Options`, `TableCache`, `MemTable` (`mem_` and `imm_`), WAL (`log_`, `logfile_`), and `VersionSet`. Crucially, it also has a `mutex_` to protect shared state accessed by multiple threads (foreground application threads and background compaction threads) and condition variables (`background_work_finished_signal_`) to allow threads to wait for background work.
## Conclusion
`DBImpl` is the central nervous system of LevelDB. It doesn't store the data itself, but it acts as the **General Manager**, receiving requests and coordinating the actions of all the other specialized components like the MemTable, WAL, VersionSet, and TableCache. It handles the intricate dance between fast in-memory writes, durable logging, persistent disk storage, background maintenance, and safe recovery. Understanding `DBImpl`'s role is key to seeing how all the pieces of LevelDB fit together to create a functional database.
One tool `DBImpl` uses to make writes efficient and atomic is the `WriteBatch`. Let's see how that works next.
Next up: [Chapter 5: WriteBatch](05_writebatch.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

Some files were not shown because too many files have changed in this diff Show More