diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..db84ce2 --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 Zachary Huang + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md index 02af832..ff472c8 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,23 @@ -

Agentic Coding - Project Template

+

Turns Codebase into Easy Tutorial

+ +![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg) + +

+Ever stared at a new codebase written by others feeling completely lost? This tutorial shows you how to build an AI agent that analyzes GitHub repositories and creates beginner-friendly tutorials explaining exactly how the code works. +

- -This is a project template for Agentic Coding with [Pocket Flow](https://github.com/The-Pocket/PocketFlow), a 100-line LLM framework, and Cursor. +This project crawls GitHub repositories and build a knowledge base from the code: -- We have included the [.cursorrules](.cursorrules) file to let Cursor AI help you build LLM projects. - -- Want to learn how to build LLM projects with Agentic Coding? - - - Check out the [Agentic Coding Guidance](https://the-pocket.github.io/PocketFlow/guide.html) - - - Check out the [YouTube Tutorial](https://www.youtube.com/@ZacharyLLM?sub_confirmation=1) +- **Analyze entire codebases** to identify core abstractions and how they interact +- **Transform complex code** into beginner-friendly tutorials with clear visualizations +- **Build understanding systematically** from fundamentals to advanced concepts in logical steps + +Built with [Pocket Flow](https://github.com/The-Pocket/PocketFlow), a 100-line LLM framework. \ No newline at end of file diff --git a/assets/banner.png b/assets/banner.png index 09f0f04..3baec68 100644 Binary files a/assets/banner.png and b/assets/banner.png differ diff --git a/output/AutoGen Core/01_agent.md b/output/AutoGen Core/01_agent.md new file mode 100644 index 0000000..b4c730f --- /dev/null +++ b/output/AutoGen Core/01_agent.md @@ -0,0 +1,281 @@ +# Chapter 1: Agent - The Workers of AutoGen + +Welcome to the AutoGen Core tutorial! We're excited to guide you through building powerful applications with autonomous agents. + +## Motivation: Why Do We Need Agents? + +Imagine you want to build an automated system to write blog posts. You might need one part of the system to research a topic and another part to write the actual post based on the research. How do you represent these different "workers" and make them talk to each other? + +This is where the concept of an **Agent** comes in. In AutoGen Core, an `Agent` is the fundamental building block representing an actor or worker in your system. Think of it like an employee in an office. + +## Key Concepts: Understanding Agents + +Let's break down what makes an Agent: + +1. **It's a Worker:** An Agent is designed to *do* things. This could be running calculations, calling a Large Language Model (LLM) like ChatGPT, using a tool (like a search engine), or managing a piece of data. +2. **It Has an Identity (`AgentId`):** Just like every employee has a name and a job title, every Agent needs a unique identity. This identity, called `AgentId`, has two parts: + * `type`: What kind of role does the agent have? (e.g., "researcher", "writer", "coder"). This helps organize agents. + * `key`: A unique name for this specific agent instance (e.g., "researcher-01", "amy-the-writer"). + + ```python + # From: _agent_id.py + class AgentId: + def __init__(self, type: str, key: str) -> None: + # ... (validation checks omitted for brevity) + self._type = type + self._key = key + + @property + def type(self) -> str: + return self._type + + @property + def key(self) -> str: + return self._key + + def __str__(self) -> str: + # Creates an id like "researcher/amy-the-writer" + return f"{self._type}/{self._key}" + ``` + This `AgentId` acts like the agent's address, allowing other agents (or the system) to send messages specifically to it. + +3. **It Has Metadata (`AgentMetadata`):** Besides its core identity, an agent often has descriptive information. + * `type`: Same as in `AgentId`. + * `key`: Same as in `AgentId`. + * `description`: A human-readable explanation of what the agent does (e.g., "Researches topics using web search"). + + ```python + # From: _agent_metadata.py + from typing import TypedDict + + class AgentMetadata(TypedDict): + type: str + key: str + description: str + ``` + This metadata helps understand the agent's purpose within the system. + +4. **It Communicates via Messages:** Agents don't work in isolation. They collaborate by sending and receiving messages. The primary way an agent receives work is through its `on_message` method. Think of this like the agent's inbox. + + ```python + # From: _agent.py (Simplified Agent Protocol) + from typing import Any, Mapping, Protocol + # ... other imports + + class Agent(Protocol): + @property + def id(self) -> AgentId: ... # The agent's unique ID + + async def on_message(self, message: Any, ctx: MessageContext) -> Any: + """Handles an incoming message.""" + # Agent's logic to process the message goes here + ... + ``` + When an agent receives a message, `on_message` is called. The `message` contains the data or task, and `ctx` (MessageContext) provides extra information about the message (like who sent it). We'll cover `MessageContext` more later. + +5. **It Can Remember Things (State):** Sometimes, an agent needs to remember information between tasks, like keeping notes on research progress. Agents can optionally implement `save_state` and `load_state` methods to store and retrieve their internal memory. + + ```python + # From: _agent.py (Simplified Agent Protocol) + class Agent(Protocol): + # ... other methods + + async def save_state(self) -> Mapping[str, Any]: + """Save the agent's internal memory.""" + # Return a dictionary representing the state + ... + + async def load_state(self, state: Mapping[str, Any]) -> None: + """Load the agent's internal memory.""" + # Restore state from the dictionary + ... + ``` + We'll explore state and memory in more detail in [Chapter 7: Memory](07_memory.md). + +6. **Different Agent Types:** AutoGen Core provides base classes to make creating agents easier: + * `BaseAgent`: The fundamental class most agents inherit from. It provides common setup. + * `ClosureAgent`: A very quick way to create simple agents using just a function (like hiring a temp worker for a specific task defined on the spot). + * `RoutedAgent`: An agent that can automatically direct different types of messages to different internal handler methods (like a smart receptionist). + +## Use Case Example: Researcher and Writer + +Let's revisit our blog post example. We want a `Researcher` agent and a `Writer` agent. + +**Goal:** +1. Tell the `Researcher` a topic (e.g., "AutoGen Agents"). +2. The `Researcher` finds some facts (we'll keep it simple and just make them up for now). +3. The `Researcher` sends these facts to the `Writer`. +4. The `Writer` receives the facts and drafts a short post. + +**Simplified Implementation Idea (using `ClosureAgent` for brevity):** + +First, let's define the messages they might exchange: + +```python +from dataclasses import dataclass + +@dataclass +class ResearchTopic: + topic: str + +@dataclass +class ResearchFacts: + topic: str + facts: list[str] + +@dataclass +class DraftPost: + topic: str + draft: str +``` +These are simple Python classes to hold the data being passed around. + +Now, let's imagine defining the `Researcher` using a `ClosureAgent`. This agent will listen for `ResearchTopic` messages. + +```python +# Simplified concept - requires AgentRuntime (Chapter 3) to actually run + +async def researcher_logic(agent_context, message: ResearchTopic, msg_context): + print(f"Researcher received topic: {message.topic}") + # In a real scenario, this would involve searching, calling an LLM, etc. + # For now, we just make up facts. + facts = [f"Fact 1 about {message.topic}", f"Fact 2 about {message.topic}"] + print(f"Researcher found facts: {facts}") + + # Find the Writer agent's ID (we assume we know it) + writer_id = AgentId(type="writer", key="blog_writer_1") + + # Send the facts to the Writer + await agent_context.send_message( + message=ResearchFacts(topic=message.topic, facts=facts), + recipient=writer_id, + ) + print("Researcher sent facts to Writer.") + # This agent doesn't return a direct reply + return None +``` +This `researcher_logic` function defines *what* the researcher does when it gets a `ResearchTopic` message. It processes the topic, creates `ResearchFacts`, and uses `agent_context.send_message` to send them to the `writer` agent. + +Similarly, the `Writer` agent would have its own logic: + +```python +# Simplified concept - requires AgentRuntime (Chapter 3) to actually run + +async def writer_logic(agent_context, message: ResearchFacts, msg_context): + print(f"Writer received facts for topic: {message.topic}") + # In a real scenario, this would involve LLM prompting + draft = f"Blog Post about {message.topic}:\n" + for fact in message.facts: + draft += f"- {fact}\n" + print(f"Writer drafted post:\n{draft}") + + # Perhaps save the draft or send it somewhere else + # For now, we just print it. We don't send another message. + return None # Or maybe return a confirmation/result +``` +This `writer_logic` function defines how the writer reacts to receiving `ResearchFacts`. + +**Important:** To actually *run* these agents and make them communicate, we need the `AgentRuntime` (covered in [Chapter 3: AgentRuntime](03_agentruntime.md)) and the `Messaging System` (covered in [Chapter 2: Messaging System](02_messaging_system__topic___subscription_.md)). For now, focus on the *idea* that Agents are distinct workers defined by their logic (`on_message`) and identified by their `AgentId`. + +## Under the Hood: How an Agent Gets a Message + +While the full message delivery involves the `Messaging System` and `AgentRuntime`, let's look at the agent's role when it receives a message. + +**Conceptual Flow:** + +```mermaid +sequenceDiagram + participant Sender as Sender Agent + participant Runtime as AgentRuntime + participant Recipient as Recipient Agent + + Sender->>+Runtime: send_message(message, recipient_id) + Runtime->>+Recipient: Locate agent by recipient_id + Runtime->>+Recipient: on_message(message, context) + Recipient->>Recipient: Process message using internal logic + alt Response Needed + Recipient->>-Runtime: Return response value + Runtime->>-Sender: Deliver response value + else No Response + Recipient->>-Runtime: Return None (or no return) + end +``` + +1. Some other agent (Sender) or the system decides to send a message to our agent (Recipient). +2. It tells the `AgentRuntime` (the manager): "Deliver this `message` to the agent with `recipient_id`". +3. The `AgentRuntime` finds the correct `Recipient` agent instance. +4. The `AgentRuntime` calls the `Recipient.on_message(message, context)` method. +5. The agent's internal logic inside `on_message` (or methods called by it, like in `RoutedAgent`) runs to process the message. +6. If the message requires a direct response (like an RPC call), the agent returns a value from `on_message`. If not (like a general notification or event), it might return `None`. + +**Code Glimpse:** + +The core definition is the `Agent` Protocol (`_agent.py`). It's like an interface or a contract โ€“ any class wanting to be an Agent *must* provide these methods. + +```python +# From: _agent.py - The Agent blueprint (Protocol) + +@runtime_checkable +class Agent(Protocol): + @property + def metadata(self) -> AgentMetadata: ... + + @property + def id(self) -> AgentId: ... + + async def on_message(self, message: Any, ctx: MessageContext) -> Any: ... + + async def save_state(self) -> Mapping[str, Any]: ... + + async def load_state(self, state: Mapping[str, Any]) -> None: ... + + async def close(self) -> None: ... +``` + +Most agents you create will inherit from `BaseAgent` (`_base_agent.py`). It provides some standard setup: + +```python +# From: _base_agent.py (Simplified) +class BaseAgent(ABC, Agent): + def __init__(self, description: str) -> None: + # Gets runtime & id from a special context when created by the runtime + # Raises error if you try to create it directly! + self._runtime: AgentRuntime = AgentInstantiationContext.current_runtime() + self._id: AgentId = AgentInstantiationContext.current_agent_id() + self._description = description + # ... + + # This is the final version called by the runtime + @final + async def on_message(self, message: Any, ctx: MessageContext) -> Any: + # It calls the implementation method you need to write + return await self.on_message_impl(message, ctx) + + # You MUST implement this in your subclass + @abstractmethod + async def on_message_impl(self, message: Any, ctx: MessageContext) -> Any: ... + + # Helper to send messages easily + async def send_message(self, message: Any, recipient: AgentId, ...) -> Any: + # It just asks the runtime to do the actual sending + return await self._runtime.send_message( + message, sender=self.id, recipient=recipient, ... + ) + # ... other methods like publish_message, save_state, load_state +``` +Notice how `BaseAgent` handles getting its `id` and `runtime` during creation and provides a convenient `send_message` method that uses the runtime. When inheriting from `BaseAgent`, you primarily focus on implementing the `on_message_impl` method to define your agent's unique behavior. + +## Next Steps + +You now understand the core concept of an `Agent` in AutoGen Core! It's the fundamental worker unit with an identity, the ability to process messages, and optionally maintain state. + +In the next chapters, we'll explore: + +* [Chapter 2: Messaging System](02_messaging_system__topic___subscription_.md): How messages actually travel between agents. +* [Chapter 3: AgentRuntime](03_agentruntime.md): The manager responsible for creating, running, and connecting agents. + +Let's continue building your understanding! + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) diff --git a/output/AutoGen Core/02_messaging_system__topic___subscription_.md b/output/AutoGen Core/02_messaging_system__topic___subscription_.md new file mode 100644 index 0000000..24e8e5d --- /dev/null +++ b/output/AutoGen Core/02_messaging_system__topic___subscription_.md @@ -0,0 +1,267 @@ +# Chapter 2: Messaging System (Topic & Subscription) + +In [Chapter 1: Agent](01_agent.md), we learned about Agents as individual workers. But how do they coordinate when one agent doesn't know exactly *who* needs the information it produces? Imagine our Researcher finds some facts. Maybe the Writer needs them, but maybe a Fact-Checker agent or a Summary agent also needs them later. How can the Researcher just announce "Here are the facts!" without needing a specific mailing list? + +This is where the **Messaging System**, specifically **Topics** and **Subscriptions**, comes in. It allows agents to broadcast messages to anyone interested, like posting on a company announcement board. + +## Motivation: Broadcasting Information + +Let's refine our blog post example: + +1. The `Researcher` agent finds facts about "AutoGen Agents". +2. Instead of sending *directly* to the `Writer`, the `Researcher` **publishes** these facts to a general "research-results" **Topic**. +3. The `Writer` agent has previously told the system it's **subscribed** to the "research-results" Topic. +4. The system sees the new message on the Topic and delivers it to the `Writer` (and any other subscribers). + +This way, the `Researcher` doesn't need to know who the `Writer` is, or even if a `Writer` exists! It just broadcasts the results. If we later add a `FactChecker` agent that also needs the results, it simply subscribes to the same Topic. + +## Key Concepts: Topics and Subscriptions + +Let's break down the components of this broadcasting system: + +1. **Topic (`TopicId`): The Announcement Board** + * A `TopicId` represents a specific channel or category for messages. Think of it like the name of an announcement board (e.g., "Project Updates", "General Announcements"). + * It has two main parts: + * `type`: What *kind* of event or information is this? (e.g., "research.completed", "user.request"). This helps categorize messages. + * `source`: *Where* or *why* did this event originate? Often, this relates to the specific task or context (e.g., the specific blog post being researched like "autogen-agents-blog-post", or the team generating the event like "research-team"). + + ```python + # From: _topic.py (Simplified) + from dataclasses import dataclass + + @dataclass(frozen=True) # Immutable: can't change after creation + class TopicId: + type: str + source: str + + def __str__(self) -> str: + # Creates an id like "research.completed/autogen-agents-blog-post" + return f"{self.type}/{self.source}" + ``` + This structure allows for flexible filtering. Agents might subscribe to all topics of a certain `type`, regardless of the `source`, or only to topics with a specific `source`. + +2. **Publishing: Posting the Announcement** + * When an agent has information to share broadly, it *publishes* a message to a specific `TopicId`. + * This is like pinning a note to the designated announcement board. The agent doesn't need to know who will read it. + +3. **Subscription (`Subscription`): Signing Up for Updates** + * A `Subscription` is how an agent declares its interest in certain `TopicId`s. + * It acts like a rule: "If a message is published to a Topic that matches *this pattern*, please deliver it to *this kind of agent*". + * The `Subscription` links a `TopicId` pattern (e.g., "all topics with type `research.completed`") to an `AgentId` (or a way to determine the `AgentId`). + +4. **Routing: Delivering the Mail** + * The `AgentRuntime` (the system manager we'll meet in [Chapter 3: AgentRuntime](03_agentruntime.md)) keeps track of all active `Subscription`s. + * When a message is published to a `TopicId`, the `AgentRuntime` checks which `Subscription`s match that `TopicId`. + * For each match, it uses the `Subscription`'s rule to figure out which specific `AgentId` should receive the message and delivers it. + +## Use Case Example: Researcher Publishes, Writer Subscribes + +Let's see how our Researcher and Writer can use this system. + +**Goal:** Researcher publishes facts to a topic, Writer receives them via subscription. + +**1. Define the Topic:** +We need a `TopicId` for research results. Let's say the `type` is "research.facts.available" and the `source` identifies the specific research task (e.g., "blog-post-autogen"). + +```python +# From: _topic.py +from autogen_core import TopicId + +# Define the topic for this specific research task +research_topic_id = TopicId(type="research.facts.available", source="blog-post-autogen") + +print(f"Topic ID: {research_topic_id}") +# Output: Topic ID: research.facts.available/blog-post-autogen +``` +This defines the "announcement board" we'll use. + +**2. Researcher Publishes:** +The `Researcher` agent, after finding facts, will use its `agent_context` (provided by the runtime) to publish the `ResearchFacts` message to this topic. + +```python +# Simplified concept - Researcher agent logic +# Assume 'agent_context' and 'message' (ResearchTopic) are provided + +# Define the facts message (from Chapter 1) +@dataclass +class ResearchFacts: + topic: str + facts: list[str] + +async def researcher_publish_logic(agent_context, message: ResearchTopic, msg_context): + print(f"Researcher working on: {message.topic}") + facts_data = ResearchFacts( + topic=message.topic, + facts=[f"Fact A about {message.topic}", f"Fact B about {message.topic}"] + ) + + # Define the specific topic for this task's results + results_topic = TopicId(type="research.facts.available", source=message.topic) # Use message topic as source + + # Publish the facts to the topic + await agent_context.publish_message(message=facts_data, topic_id=results_topic) + print(f"Researcher published facts to topic: {results_topic}") + # No direct reply needed + return None +``` +Notice the `agent_context.publish_message` call. The Researcher doesn't specify a recipient, only the topic. + +**3. Writer Subscribes:** +The `Writer` agent needs to tell the system it's interested in messages on topics like "research.facts.available". We can use a predefined `Subscription` type called `TypeSubscription`. This subscription typically means: "I am interested in all topics with this *exact type*. When a message arrives, create/use an agent of *my type* whose `key` matches the topic's `source`." + +```python +# From: _type_subscription.py (Simplified Concept) +from autogen_core import TypeSubscription, BaseAgent + +class WriterAgent(BaseAgent): + # ... agent implementation ... + async def on_message_impl(self, message: ResearchFacts, ctx): + # This method gets called when a subscribed message arrives + print(f"Writer ({self.id}) received facts via subscription: {message.facts}") + # ... process facts and write draft ... + +# How the Writer subscribes (usually done during runtime setup - Chapter 3) +# This tells the runtime: "Messages on topics with type 'research.facts.available' +# should go to a 'writer' agent whose key matches the topic source." +writer_subscription = TypeSubscription( + topic_type="research.facts.available", + agent_type="writer" # The type of agent that should handle this +) + +print(f"Writer subscription created for topic type: {writer_subscription.topic_type}") +# Output: Writer subscription created for topic type: research.facts.available +``` +When the `Researcher` publishes to `TopicId(type="research.facts.available", source="blog-post-autogen")`, the `AgentRuntime` will see that `writer_subscription` matches the `topic_type`. It will then use the rule: "Find (or create) an agent with `AgentId(type='writer', key='blog-post-autogen')` and deliver the message." + +**Benefit:** Decoupling! The Researcher just broadcasts. The Writer just listens for relevant broadcasts. We can add more listeners (like a `FactChecker` subscribing to the same `topic_type`) without changing the `Researcher` at all. + +## Under the Hood: How Publishing Works + +Let's trace the journey of a published message. + +**Conceptual Flow:** + +```mermaid +sequenceDiagram + participant Publisher as Publisher Agent + participant Runtime as AgentRuntime + participant SubRegistry as Subscription Registry + participant Subscriber as Subscriber Agent + + Publisher->>+Runtime: publish_message(message, topic_id) + Runtime->>+SubRegistry: Find subscriptions matching topic_id + SubRegistry-->>-Runtime: Return list of matching Subscriptions + loop For each matching Subscription + Runtime->>Subscription: map_to_agent(topic_id) + Subscription-->>Runtime: Return target AgentId + Runtime->>+Subscriber: Locate/Create Agent instance by AgentId + Runtime->>Subscriber: on_message(message, context) + Subscriber-->>-Runtime: Process message (optional return) + end + Runtime-->>-Publisher: Return (usually None for publish) +``` + +1. **Publish:** An agent calls `agent_context.publish_message(message, topic_id)`. This internally calls the `AgentRuntime`'s publish method. +2. **Lookup:** The `AgentRuntime` takes the `topic_id` and consults its internal `Subscription Registry`. +3. **Match:** The Registry checks all registered `Subscription` objects. Each `Subscription` has an `is_match(topic_id)` method. The registry finds all subscriptions where `is_match` returns `True`. +4. **Map:** For each matching `Subscription`, the Runtime calls its `map_to_agent(topic_id)` method. This method returns the specific `AgentId` that should handle this message based on the subscription rule and the topic details. +5. **Deliver:** The `AgentRuntime` finds the agent instance corresponding to the returned `AgentId` (potentially creating it if it doesn't exist yet, especially with `TypeSubscription`). It then calls that agent's `on_message` method, delivering the original published `message`. + +**Code Glimpse:** + +* **`TopicId` (`_topic.py`):** As shown before, a simple dataclass holding `type` and `source`. It includes validation to ensure the `type` follows certain naming conventions. + + ```python + # From: _topic.py + @dataclass(eq=True, frozen=True) + class TopicId: + type: str + source: str + # ... validation and __str__ ... + + @classmethod + def from_str(cls, topic_id: str) -> Self: + # Helper to parse "type/source" string + # ... implementation ... + ``` + +* **`Subscription` Protocol (`_subscription.py`):** This defines the *contract* for any subscription rule. + + ```python + # From: _subscription.py (Simplified Protocol) + from typing import Protocol + # ... other imports + + class Subscription(Protocol): + @property + def id(self) -> str: ... # Unique ID for this subscription instance + + def is_match(self, topic_id: TopicId) -> bool: + """Check if a topic matches this subscription's rule.""" + ... + + def map_to_agent(self, topic_id: TopicId) -> AgentId: + """Determine the target AgentId if is_match was True.""" + ... + ``` + Any class implementing these methods can act as a subscription rule. + +* **`TypeSubscription` (`_type_subscription.py`):** A common implementation of the `Subscription` protocol. + + ```python + # From: _type_subscription.py (Simplified) + class TypeSubscription(Subscription): + def __init__(self, topic_type: str, agent_type: str, ...): + self._topic_type = topic_type + self._agent_type = agent_type + # ... generates a unique self._id ... + + def is_match(self, topic_id: TopicId) -> bool: + # Matches if the topic's type is exactly the one we want + return topic_id.type == self._topic_type + + def map_to_agent(self, topic_id: TopicId) -> AgentId: + # Maps to an agent of the specified type, using the + # topic's source as the agent's unique key. + if not self.is_match(topic_id): + raise CantHandleException(...) # Should not happen if used correctly + return AgentId(type=self._agent_type, key=topic_id.source) + # ... id property ... + ``` + This implementation provides the "one agent instance per source" behavior for a specific topic type. + +* **`DefaultSubscription` (`_default_subscription.py`):** This is often used via a decorator (`@default_subscription`) and provides a convenient way to create a `TypeSubscription` where the `agent_type` is automatically inferred from the agent class being defined, and the `topic_type` defaults to "default" (but can be overridden). It simplifies common use cases. + + ```python + # From: _default_subscription.py (Conceptual Usage) + from autogen_core import BaseAgent, default_subscription, ResearchFacts + + @default_subscription # Uses 'default' topic type, infers agent type 'writer' + class WriterAgent(BaseAgent): + # Agent logic here... + async def on_message_impl(self, message: ResearchFacts, ctx): ... + + # Or specify the topic type + @default_subscription(topic_type="research.facts.available") + class SpecificWriterAgent(BaseAgent): + # Agent logic here... + async def on_message_impl(self, message: ResearchFacts, ctx): ... + ``` + +The actual sending (`publish_message`) and routing logic reside within the `AgentRuntime`, which we'll explore next. + +## Next Steps + +You've learned how AutoGen Core uses a publish/subscribe system (`TopicId`, `Subscription`) to allow agents to communicate without direct coupling. This is crucial for building flexible and scalable multi-agent applications. + +* **Topic (`TopicId`):** Named channels (`type`/`source`) for broadcasting messages. +* **Publish:** Sending a message to a Topic. +* **Subscription:** An agent's declared interest in messages on certain Topics, defining a routing rule. + +Now, let's dive into the orchestrator that manages agents and makes this messaging system work: + +* [Chapter 3: AgentRuntime](03_agentruntime.md): The manager responsible for creating, running, and connecting agents, including handling message publishing and subscription routing. + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/AutoGen Core/03_agentruntime.md b/output/AutoGen Core/03_agentruntime.md new file mode 100644 index 0000000..09aec0e --- /dev/null +++ b/output/AutoGen Core/03_agentruntime.md @@ -0,0 +1,349 @@ +# Chapter 3: AgentRuntime - The Office Manager + +In [Chapter 1: Agent](01_agent.md), we met the workers (`Agent`) of our system. In [Chapter 2: Messaging System](02_messaging_system__topic___subscription_.md), we saw how they can communicate broadly using topics and subscriptions. But who hires these agents? Who actually delivers the messages, whether direct or published? And who keeps the whole system running smoothly? + +This is where the **`AgentRuntime`** comes in. It's the central nervous system, the operating system, or perhaps the most fitting analogy: **the office manager** for all your agents. + +## Motivation: Why Do We Need an Office Manager? + +Imagine an office full of employees (Agents). You have researchers, writers, maybe coders. +* How does a new employee get hired and set up? +* When one employee wants to send a memo directly to another, who makes sure it gets to the right desk? +* When someone posts an announcement on the company bulletin board (publishes to a topic), who ensures everyone who signed up for that type of announcement sees it? +* Who starts the workday and ensures everything keeps running? + +Without an office manager, it would be chaos! The `AgentRuntime` serves this crucial role in AutoGen Core. It handles: + +1. **Agent Creation:** "Onboarding" new agents when they are needed. +2. **Message Routing:** Delivering direct messages (`send_message`) and published messages (`publish_message`). +3. **Lifecycle Management:** Starting, running, and stopping the whole system. +4. **State Management:** Keeping track of the overall system state (optional). + +## Key Concepts: Understanding the Manager's Job + +Let's break down the main responsibilities of the `AgentRuntime`: + +1. **Agent Instantiation (Hiring):** + * You don't usually create agent objects directly (like `my_agent = ResearcherAgent()`). Why? Because the agent needs to know *about* the runtime (the office it works in) to send messages, publish announcements, etc. + * Instead, you tell the `AgentRuntime`: "I need an agent of type 'researcher'. Here's a recipe (a **factory function**) for how to create one." This is done using `runtime.register_factory(...)`. + * When a message needs to go to a 'researcher' agent with a specific key (e.g., 'researcher-01'), the runtime checks if it already exists. If not, it uses the registered factory function to create (instantiate) the agent. + * **Crucially**, while creating the agent, the runtime provides special context (`AgentInstantiationContext`) so the new agent automatically gets its unique `AgentId` and a reference to the `AgentRuntime` itself. This is like giving a new employee their ID badge and telling them who the office manager is. + + ```python + # Simplified Concept - How a BaseAgent gets its ID and runtime access + # From: _agent_instantiation.py and _base_agent.py + + # Inside the agent's __init__ method (when inheriting from BaseAgent): + class MyAgent(BaseAgent): + def __init__(self, description: str): + # This magic happens *because* the AgentRuntime is creating the agent + # inside a special context. + self._runtime = AgentInstantiationContext.current_runtime() # Gets the manager + self._id = AgentInstantiationContext.current_agent_id() # Gets its own ID + self._description = description + # ... rest of initialization ... + ``` + This ensures agents are properly integrated into the system from the moment they are created. + +2. **Message Delivery (Mail Room):** + * **Direct Send (`send_message`):** When an agent calls `await agent_context.send_message(message, recipient_id)`, it's actually telling the `AgentRuntime`, "Please deliver this `message` directly to the agent identified by `recipient_id`." The runtime finds the recipient agent (creating it if necessary) and calls its `on_message` method. It's like putting a specific name on an envelope and handing it to the mail room. + * **Publish (`publish_message`):** When an agent calls `await agent_context.publish_message(message, topic_id)`, it tells the runtime, "Post this `message` to the announcement board named `topic_id`." The runtime then checks its list of **subscriptions** (who signed up for which boards). For every matching subscription, it figures out the correct recipient agent(s) (based on the subscription rule) and delivers the message to their `on_message` method. + +3. **Lifecycle Management (Opening/Closing the Office):** + * The runtime needs to be started to begin processing messages. Typically, you call `runtime.start()`. This usually kicks off a background process or loop that watches for incoming messages. + * When work is done, you need to stop the runtime gracefully. `runtime.stop_when_idle()` is common โ€“ it waits until all messages currently in the queue have been processed, then stops. `runtime.stop()` stops more abruptly. + +4. **State Management (Office Records):** + * The runtime can save the state of *all* the agents it manages (`runtime.save_state()`) and load it back later (`runtime.load_state()`). This is useful for pausing and resuming complex multi-agent interactions. It can also save/load state for individual agents (`runtime.agent_save_state()` / `runtime.agent_load_state()`). We'll touch more on state in [Chapter 7: Memory](07_memory.md). + +## Use Case Example: Running Our Researcher and Writer + +Let's finally run the Researcher/Writer scenario from Chapters 1 and 2. We need the `AgentRuntime` to make it happen. + +**Goal:** +1. Create a runtime. +2. Register factories for a 'researcher' and a 'writer' agent. +3. Tell the runtime that 'writer' agents are interested in "research.facts.available" topics (add subscription). +4. Start the runtime. +5. Send an initial `ResearchTopic` message to a 'researcher' agent. +6. Let the system run (Researcher publishes facts, Runtime delivers to Writer via subscription, Writer processes). +7. Stop the runtime when idle. + +**Code Snippets (Simplified):** + +```python +# 0. Imports and Message Definitions (from previous chapters) +import asyncio +from dataclasses import dataclass +from autogen_core import ( + AgentId, BaseAgent, SingleThreadedAgentRuntime, TopicId, + MessageContext, TypeSubscription, AgentInstantiationContext +) + +@dataclass +class ResearchTopic: topic: str +@dataclass +class ResearchFacts: topic: str; facts: list[str] +``` +These are the messages our agents will exchange. + +```python +# 1. Define Agent Logic (using BaseAgent) + +class ResearcherAgent(BaseAgent): + async def on_message_impl(self, message: ResearchTopic, ctx: MessageContext): + print(f"Researcher ({self.id}) got topic: {message.topic}") + facts = [f"Fact 1 about {message.topic}", f"Fact 2"] + results_topic = TopicId("research.facts.available", message.topic) + # Use the runtime (via self.publish_message helper) to publish + await self.publish_message( + ResearchFacts(topic=message.topic, facts=facts), results_topic + ) + print(f"Researcher ({self.id}) published facts to {results_topic}") + +class WriterAgent(BaseAgent): + async def on_message_impl(self, message: ResearchFacts, ctx: MessageContext): + print(f"Writer ({self.id}) received facts via topic '{ctx.topic_id}': {message.facts}") + draft = f"Draft for {message.topic}: {'; '.join(message.facts)}" + print(f"Writer ({self.id}) created draft: '{draft}'") + # This agent doesn't send further messages in this example +``` +Here we define the behavior of our two agent types, inheriting from `BaseAgent` which gives us `self.id`, `self.publish_message`, etc. + +```python +# 2. Define Agent Factories + +def researcher_factory(): + # Gets runtime/id via AgentInstantiationContext inside BaseAgent.__init__ + print("Runtime is creating a ResearcherAgent...") + return ResearcherAgent(description="I research topics.") + +def writer_factory(): + print("Runtime is creating a WriterAgent...") + return WriterAgent(description="I write drafts from facts.") +``` +These simple functions tell the runtime *how* to create instances of our agents when needed. + +```python +# 3. Setup and Run the Runtime + +async def main(): + # Create the runtime (the office manager) + runtime = SingleThreadedAgentRuntime() + + # Register the factories (tell the manager how to hire) + await runtime.register_factory("researcher", researcher_factory) + await runtime.register_factory("writer", writer_factory) + print("Registered agent factories.") + + # Add the subscription (tell manager who listens to which announcements) + # Rule: Messages to topics of type "research.facts.available" + # should go to a "writer" agent whose key matches the topic source. + writer_sub = TypeSubscription(topic_type="research.facts.available", agent_type="writer") + await runtime.add_subscription(writer_sub) + print(f"Added subscription: {writer_sub.id}") + + # Start the runtime (open the office) + runtime.start() + print("Runtime started.") + + # Send the initial message to kick things off + research_task_topic = "AutoGen Agents" + researcher_instance_id = AgentId(type="researcher", key=research_task_topic) + print(f"Sending initial topic '{research_task_topic}' to {researcher_instance_id}") + await runtime.send_message( + message=ResearchTopic(topic=research_task_topic), + recipient=researcher_instance_id, + ) + + # Wait until all messages are processed (wait for work day to end) + print("Waiting for runtime to become idle...") + await runtime.stop_when_idle() + print("Runtime stopped.") + +# Run the main function +asyncio.run(main()) +``` +This script sets up the `SingleThreadedAgentRuntime`, registers the blueprints (factories) and communication rules (subscription), starts the process, and then shuts down cleanly. + +**Expected Output (Conceptual Order):** + +``` +Registered agent factories. +Added subscription: type=research.facts.available=>agent=writer +Runtime started. +Sending initial topic 'AutoGen Agents' to researcher/AutoGen Agents +Waiting for runtime to become idle... +Runtime is creating a ResearcherAgent... # First time researcher/AutoGen Agents is needed +Researcher (researcher/AutoGen Agents) got topic: AutoGen Agents +Researcher (researcher/AutoGen Agents) published facts to research.facts.available/AutoGen Agents +Runtime is creating a WriterAgent... # First time writer/AutoGen Agents is needed (due to subscription) +Writer (writer/AutoGen Agents) received facts via topic 'research.facts.available/AutoGen Agents': ['Fact 1 about AutoGen Agents', 'Fact 2'] +Writer (writer/AutoGen Agents) created draft: 'Draft for AutoGen Agents: Fact 1 about AutoGen Agents; Fact 2' +Runtime stopped. +``` +You can see the runtime orchestrating the creation of agents and the flow of messages based on the initial request and the subscription rule. + +## Under the Hood: How the Manager Works + +Let's peek inside the `SingleThreadedAgentRuntime` (a common implementation provided by AutoGen Core) to understand the flow. + +**Core Idea:** It uses an internal queue (`_message_queue`) to hold incoming requests (`send_message`, `publish_message`). A background task continuously takes items from the queue and processes them one by one (though the *handling* of a message might involve `await` and allow other tasks to run). + +**1. Agent Creation (`_get_agent`, `_invoke_agent_factory`)** + +When the runtime needs an agent instance (e.g., to deliver a message) that hasn't been created yet: + +```mermaid +sequenceDiagram + participant Runtime as AgentRuntime + participant Factory as Agent Factory Func + participant AgentCtx as AgentInstantiationContext + participant Agent as New Agent Instance + + Runtime->>Runtime: Check if agent instance exists (e.g., in `_instantiated_agents` dict) + alt Agent Not Found + Runtime->>Runtime: Find registered factory for agent type + Runtime->>AgentCtx: Set current runtime & agent_id + activate AgentCtx + Runtime->>Factory: Call factory function() + activate Factory + Factory->>AgentCtx: (Inside Agent.__init__) Get current runtime + AgentCtx-->>Factory: Return runtime + Factory->>AgentCtx: (Inside Agent.__init__) Get current agent_id + AgentCtx-->>Factory: Return agent_id + Factory-->>Runtime: Return new Agent instance + deactivate Factory + Runtime->>AgentCtx: Clear context + deactivate AgentCtx + Runtime->>Runtime: Store new agent instance + end + Runtime->>Runtime: Return agent instance +``` + +* The runtime looks up the factory function registered for the required `AgentId.type`. +* It uses `AgentInstantiationContext.populate_context` to temporarily store its own reference and the target `AgentId`. +* It calls the factory function. +* Inside the agent's `__init__` (usually via `BaseAgent`), `AgentInstantiationContext.current_runtime()` and `AgentInstantiationContext.current_agent_id()` are called to retrieve the context set by the runtime. +* The factory returns the fully initialized agent instance. +* The runtime stores this instance for future use. + +```python +# From: _agent_instantiation.py (Simplified) +class AgentInstantiationContext: + _CONTEXT_VAR = ContextVar("agent_context") # Stores (runtime, agent_id) + + @classmethod + @contextmanager + def populate_context(cls, ctx: tuple[AgentRuntime, AgentId]): + token = cls._CONTEXT_VAR.set(ctx) # Store context for this block + try: + yield # Code inside the 'with' block runs here + finally: + cls._CONTEXT_VAR.reset(token) # Clean up context + + @classmethod + def current_runtime(cls) -> AgentRuntime: + return cls._CONTEXT_VAR.get()[0] # Retrieve runtime from context + + @classmethod + def current_agent_id(cls) -> AgentId: + return cls._CONTEXT_VAR.get()[1] # Retrieve agent_id from context +``` +This context manager pattern ensures the correct runtime and ID are available *only* during the agent's creation by the runtime. + +**2. Direct Messaging (`send_message` -> `_process_send`)** + +```mermaid +sequenceDiagram + participant Sender as Sending Agent/Code + participant Runtime as AgentRuntime + participant Queue as Internal Queue + participant Recipient as Recipient Agent + + Sender->>+Runtime: send_message(msg, recipient_id, ...) + Runtime->>Runtime: Create Future (for response) + Runtime->>+Queue: Put SendMessageEnvelope(msg, recipient_id, future) + Runtime-->>-Sender: Return awaitable Future + Note over Queue, Runtime: Background task picks up envelope + Runtime->>Runtime: _process_send(envelope) + Runtime->>+Recipient: _get_agent(recipient_id) (creates if needed) + Recipient-->>-Runtime: Return Agent instance + Runtime->>+Recipient: on_message(msg, context) + Recipient->>Recipient: Process message... + Recipient-->>-Runtime: Return response value + Runtime->>Runtime: Set Future result with response value +``` + +* `send_message` creates a `Future` object (a placeholder for the eventual result) and wraps the message details in a `SendMessageEnvelope`. +* This envelope is put onto the internal `_message_queue`. +* The background task picks up the envelope. +* `_process_send` gets the recipient agent instance (using `_get_agent`). +* It calls the recipient's `on_message` method. +* When `on_message` returns a result, `_process_send` sets the result on the `Future` object, which makes the original `await runtime.send_message(...)` call return the value. + +**3. Publish/Subscribe (`publish_message` -> `_process_publish`)** + +```mermaid +sequenceDiagram + participant Publisher as Publishing Agent/Code + participant Runtime as AgentRuntime + participant Queue as Internal Queue + participant SubManager as SubscriptionManager + participant Subscriber as Subscribed Agent + + Publisher->>+Runtime: publish_message(msg, topic_id, ...) + Runtime->>+Queue: Put PublishMessageEnvelope(msg, topic_id) + Runtime-->>-Publisher: Return (None for publish) + Note over Queue, Runtime: Background task picks up envelope + Runtime->>Runtime: _process_publish(envelope) + Runtime->>+SubManager: get_subscribed_recipients(topic_id) + SubManager->>SubManager: Find matching subscriptions + SubManager->>SubManager: Map subscriptions to AgentIds + SubManager-->>-Runtime: Return list of recipient AgentIds + loop For each recipient AgentId + Runtime->>+Subscriber: _get_agent(recipient_id) (creates if needed) + Subscriber-->>-Runtime: Return Agent instance + Runtime->>+Subscriber: on_message(msg, context with topic_id) + Subscriber->>Subscriber: Process message... + Subscriber-->>-Runtime: Return (usually None for publish) + end +``` + +* `publish_message` wraps the message in a `PublishMessageEnvelope` and puts it on the queue. +* The background task picks it up. +* `_process_publish` asks the `SubscriptionManager` (`_subscription_manager`) for all `AgentId`s that are subscribed to the given `topic_id`. +* The `SubscriptionManager` checks its registered `Subscription` objects (`_subscriptions` list, added via `add_subscription`). For each `Subscription` where `is_match(topic_id)` is true, it calls `map_to_agent(topic_id)` to get the target `AgentId`. +* For each resulting `AgentId`, the runtime gets the agent instance and calls its `on_message` method, providing the `topic_id` in the `MessageContext`. + +```python +# From: _runtime_impl_helpers.py (SubscriptionManager simplified) +class SubscriptionManager: + def __init__(self): + self._subscriptions: List[Subscription] = [] + # Optimization cache can be added here + + async def add_subscription(self, subscription: Subscription): + self._subscriptions.append(subscription) + # Clear cache if any + + async def get_subscribed_recipients(self, topic: TopicId) -> List[AgentId]: + recipients = [] + for sub in self._subscriptions: + if sub.is_match(topic): + recipients.append(sub.map_to_agent(topic)) + return recipients +``` +The `SubscriptionManager` simply iterates through registered subscriptions to find matches when a message is published. + +## Next Steps + +You now understand the `AgentRuntime` - the essential coordinator that brings Agents to life, manages their communication, and runs the entire show. It handles agent creation via factories, routes direct and published messages, and manages the system's lifecycle. + +With the core concepts of `Agent`, `Messaging`, and `AgentRuntime` covered, we can start looking at more specialized building blocks. Next, we'll explore how agents can use external capabilities: + +* [Chapter 4: Tool](04_tool.md): How to give agents tools (like functions or APIs) to perform specific actions beyond just processing messages. + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) diff --git a/output/AutoGen Core/04_tool.md b/output/AutoGen Core/04_tool.md new file mode 100644 index 0000000..e526be1 --- /dev/null +++ b/output/AutoGen Core/04_tool.md @@ -0,0 +1,272 @@ +# Chapter 4: Tool - Giving Agents Specific Capabilities + +In the previous chapters, we learned about Agents as workers ([Chapter 1](01_agent.md)), how they can communicate directly or using announcements ([Chapter 2](02_messaging_system__topic___subscription_.md)), and the `AgentRuntime` that manages them ([Chapter 3](03_agentruntime.md)). + +Agents can process messages and coordinate, but what if an agent needs to perform a very specific action, like looking up information online, running a piece of code, accessing a database, or even just finding out the current date? They need specialized *capabilities*. + +This is where the concept of a **Tool** comes in. + +## Motivation: Agents Need Skills! + +Imagine our `Writer` agent from before. It receives facts and writes a draft. Now, let's say we want the `Writer` (or perhaps a smarter `Assistant` agent helping it) to always include the current date in the blog post title. + +How does the agent get the current date? It doesn't inherently know it. It needs a specific *skill* or *tool* for that. + +A `Tool` in AutoGen Core represents exactly this: a specific, well-defined capability that an Agent can use. Think of it like giving an employee (Agent) a specialized piece of equipment (Tool), like a calculator, a web browser, or a calendar lookup program. + +## Key Concepts: Understanding Tools + +Let's break down what defines a Tool: + +1. **It's a Specific Capability:** A Tool performs one well-defined task. Examples: + * `search_web(query: str)` + * `run_python_code(code: str)` + * `get_stock_price(ticker: str)` + * `get_current_date()` + +2. **It Has a Schema (The Manual):** This is crucial! For an Agent (especially one powered by a Large Language Model - LLM) to know *when* and *how* to use a tool, the tool needs a clear description or "manual". This is called the `ToolSchema`. It typically includes: + * **`name`**: A unique identifier for the tool (e.g., `get_current_date`). + * **`description`**: A clear explanation of what the tool does, which helps the LLM decide if this tool is appropriate for the current task (e.g., "Fetches the current date in YYYY-MM-DD format"). + * **`parameters`**: Defines what inputs the tool needs. This is itself a schema (`ParametersSchema`) describing the input fields, their types, and which ones are required. For our `get_current_date` example, it might need no parameters. For `get_stock_price`, it would need a `ticker` parameter of type string. + + ```python + # From: tools/_base.py (Simplified Concept) + from typing import TypedDict, Dict, Any, Sequence, NotRequired + + class ParametersSchema(TypedDict): + type: str # Usually "object" + properties: Dict[str, Any] # Defines input fields and their types + required: NotRequired[Sequence[str]] # List of required field names + + class ToolSchema(TypedDict): + name: str + description: NotRequired[str] + parameters: NotRequired[ParametersSchema] + # 'strict' flag also possible (Chapter 5 related) + ``` + This schema allows an LLM to understand: "Ah, there's a tool called `get_current_date` that takes no inputs and gives me the current date. I should use that now!" + +3. **It Can Be Executed:** Once an agent decides to use a tool (often based on the schema), there needs to be a mechanism to actually *run* the tool's underlying function and get the result. + +## Use Case Example: Adding a `get_current_date` Tool + +Let's equip an agent with the ability to find the current date. + +**Goal:** Define a tool that gets the current date and show how it could be executed by a specialized agent. + +**Step 1: Define the Python Function** + +First, we need the actual Python code that performs the action. + +```python +# File: get_date_function.py +import datetime + +def get_current_date() -> str: + """Fetches the current date as a string.""" + today = datetime.date.today() + return today.isoformat() # Returns date like "2023-10-27" + +# Test the function +print(f"Function output: {get_current_date()}") +``` +This is a standard Python function. It takes no arguments and returns the date as a string. + +**Step 2: Wrap it as a `FunctionTool`** + +AutoGen Core provides a convenient way to turn a Python function like this into a `Tool` object using `FunctionTool`. It automatically inspects the function's signature (arguments and return type) and docstring to help build the `ToolSchema`. + +```python +# File: create_date_tool.py +from autogen_core.tools import FunctionTool +from get_date_function import get_current_date # Import our function + +# Create the Tool instance +# We provide the function and a clear description for the LLM +date_tool = FunctionTool( + func=get_current_date, + description="Use this tool to get the current date in YYYY-MM-DD format." + # Name defaults to function name 'get_current_date' +) + +# Let's see what FunctionTool generated +print(f"Tool Name: {date_tool.name}") +print(f"Tool Description: {date_tool.description}") + +# The schema defines inputs (none in this case) +# print(f"Tool Schema Parameters: {date_tool.schema['parameters']}") +# Output (simplified): {'type': 'object', 'properties': {}, 'required': []} +``` +`FunctionTool` wraps our `get_current_date` function. It uses the function name as the tool name and the description we provided. It also correctly determines from the function signature that there are no input parameters (`properties: {}`). + +**Step 3: How an Agent Might Request Tool Use** + +Now we have a `date_tool`. How is it used? Typically, an LLM-powered agent (which we'll see more of in [Chapter 5: ChatCompletionClient](05_chatcompletionclient.md)) analyzes a request and decides a tool is needed. It then generates a request to *call* that tool, often using a specific message type like `FunctionCall`. + +```python +# File: tool_call_request.py +from autogen_core import FunctionCall # Represents a request to call a tool + +# Imagine an LLM agent decided to use the date tool. +# It constructs this message, providing the tool name and arguments (as JSON string). +date_call_request = FunctionCall( + id="call_date_001", # A unique ID for this specific call attempt + name="get_current_date", # Matches the Tool's name + arguments="{}" # An empty JSON object because no arguments are needed +) + +print("FunctionCall message:", date_call_request) +# Output: FunctionCall(id='call_date_001', name='get_current_date', arguments='{}') +``` +This `FunctionCall` message is like a work order: "Please execute the tool named `get_current_date` with these arguments." + +**Step 4: The `ToolAgent` Executes the Tool** + +Who receives this `FunctionCall` message? Usually, a specialized agent called `ToolAgent`. You create a `ToolAgent` and give it the list of tools it knows how to execute. When it receives a `FunctionCall`, it finds the matching tool and runs it. + +```python +# File: tool_agent_example.py +import asyncio +from autogen_core.tool_agent import ToolAgent +from autogen_core.models import FunctionExecutionResult +from create_date_tool import date_tool # Import the tool we created +from tool_call_request import date_call_request # Import the request message + +# Create an agent specifically designed to execute tools +tool_executor = ToolAgent( + description="I can execute tools like getting the date.", + tools=[date_tool] # Give it the list of tools it manages +) + +# --- Simulation of Runtime delivering the message --- +# In a real app, the AgentRuntime (Chapter 3) would route the +# date_call_request message to this tool_executor agent. +# We simulate the call to its message handler here: + +async def simulate_execution(): + # Fake context (normally provided by runtime) + class MockContext: cancellation_token = None + ctx = MockContext() + + print(f"ToolAgent received request: {date_call_request.name}") + result: FunctionExecutionResult = await tool_executor.handle_function_call( + message=date_call_request, + ctx=ctx + ) + print(f"ToolAgent produced result: {result}") + +asyncio.run(simulate_execution()) +``` + +**Expected Output:** + +``` +ToolAgent received request: get_current_date +ToolAgent produced result: FunctionExecutionResult(content='2023-10-27', call_id='call_date_001', is_error=False, name='get_current_date') # Date will be current date +``` +The `ToolAgent` received the `FunctionCall`, found the `date_tool` in its list, executed the underlying `get_current_date` function, and packaged the result (the date string) into a `FunctionExecutionResult` message. This result message can then be sent back to the agent that originally requested the tool use. + +## Under the Hood: How Tool Execution Works + +Let's visualize the typical flow when an LLM agent decides to use a tool managed by a `ToolAgent`. + +**Conceptual Flow:** + +```mermaid +sequenceDiagram + participant LLMA as LLM Agent (Decides) + participant Caller as Caller Agent (Orchestrates) + participant ToolA as ToolAgent (Executes) + participant ToolFunc as Tool Function (e.g., get_current_date) + + Note over LLMA: Analyzes conversation, decides tool needed. + LLMA->>Caller: Sends AssistantMessage containing FunctionCall(name='get_current_date', args='{}') + Note over Caller: Receives LLM response, sees FunctionCall. + Caller->>+ToolA: Uses runtime.send_message(message=FunctionCall, recipient=ToolAgent_ID) + Note over ToolA: Receives FunctionCall via on_message. + ToolA->>ToolA: Looks up 'get_current_date' in its internal list of Tools. + ToolA->>+ToolFunc: Calls tool.run_json(args={}) -> triggers get_current_date() + ToolFunc-->>-ToolA: Returns the result (e.g., "2023-10-27") + ToolA->>ToolA: Creates FunctionExecutionResult message with the content. + ToolA-->>-Caller: Returns FunctionExecutionResult via runtime messaging. + Note over Caller: Receives the tool result. + Caller->>LLMA: Sends FunctionExecutionResultMessage to LLM for next step. + Note over LLMA: Now knows the current date. +``` + +1. **Decision:** An LLM-powered agent decides a tool is needed based on the conversation and the available tools' descriptions. It generates a `FunctionCall`. +2. **Request:** A "Caller" agent (often the same LLM agent or a managing agent) sends this `FunctionCall` message to the dedicated `ToolAgent` using the `AgentRuntime`. +3. **Lookup:** The `ToolAgent` receives the message, extracts the tool `name` (`get_current_date`), and finds the corresponding `Tool` object (our `date_tool`) in the list it was configured with. +4. **Execution:** The `ToolAgent` calls the `run_json` method on the `Tool` object, passing the arguments from the `FunctionCall`. For a `FunctionTool`, `run_json` validates the arguments against the generated schema and then executes the original Python function (`get_current_date`). +5. **Result:** The Python function returns its result (the date string). +6. **Response:** The `ToolAgent` wraps this result string in a `FunctionExecutionResult` message, including the original `call_id`, and sends it back to the Caller agent. +7. **Continuation:** The Caller agent typically sends this result back to the LLM agent, allowing the conversation or task to continue with the new information. + +**Code Glimpse:** + +* **`Tool` Protocol (`tools/_base.py`):** Defines the basic contract any tool must fulfill. Key methods are `schema` (property returning the `ToolSchema`) and `run_json` (method to execute the tool with JSON-like arguments). +* **`BaseTool` (`tools/_base.py`):** An abstract class that helps implement the `Tool` protocol, especially using Pydantic models for defining arguments (`args_type`) and return values (`return_type`). It automatically generates the `parameters` part of the schema from the `args_type` model. +* **`FunctionTool` (`tools/_function_tool.py`):** Inherits from `BaseTool`. Its magic lies in automatically creating the `args_type` Pydantic model by inspecting the wrapped Python function's signature (`args_base_model_from_signature`). Its `run` method handles calling the original sync or async Python function. + ```python + # Inside FunctionTool (Simplified Concept) + class FunctionTool(BaseTool[BaseModel, BaseModel]): + def __init__(self, func, description, ...): + self._func = func + self._signature = get_typed_signature(func) + # Automatically create Pydantic model for arguments + args_model = args_base_model_from_signature(...) + # Get return type from signature + return_type = self._signature.return_annotation + super().__init__(args_model, return_type, ...) + + async def run(self, args: BaseModel, ...): + # Extract arguments from the 'args' model + kwargs = args.model_dump() + # Call the original Python function (sync or async) + result = await self._call_underlying_func(**kwargs) + return result # Must match the expected return_type + ``` +* **`ToolAgent` (`tool_agent/_tool_agent.py`):** A specialized `RoutedAgent`. It registers a handler specifically for `FunctionCall` messages. + ```python + # Inside ToolAgent (Simplified Concept) + class ToolAgent(RoutedAgent): + def __init__(self, ..., tools: List[Tool]): + super().__init__(...) + self._tools = {tool.name: tool for tool in tools} # Store tools by name + + @message_handler # Registers this for FunctionCall messages + async def handle_function_call(self, message: FunctionCall, ctx: MessageContext): + # Find the tool by name + tool = self._tools.get(message.name) + if tool is None: + # Handle error: Tool not found + raise ToolNotFoundException(...) + try: + # Parse arguments string into a dictionary + arguments = json.loads(message.arguments) + # Execute the tool's run_json method + result_obj = await tool.run_json(args=arguments, ...) + # Convert result object back to string if needed + result_str = tool.return_value_as_string(result_obj) + # Create the success result message + return FunctionExecutionResult(content=result_str, ...) + except Exception as e: + # Handle execution errors + return FunctionExecutionResult(content=f"Error: {e}", is_error=True, ...) + ``` + Its core logic is: find tool -> parse args -> run tool -> return result/error. + +## Next Steps + +You've learned how **Tools** provide specific capabilities to Agents, defined by a **Schema** that LLMs can understand. We saw how `FunctionTool` makes it easy to wrap existing Python functions and how `ToolAgent` acts as the executor for these tools. + +This ability for agents to use tools is fundamental to building powerful and versatile AI systems that can interact with the real world or perform complex calculations. + +Now that agents can use tools, we need to understand more about the agents that *decide* which tools to use, which often involves interacting with Large Language Models: + +* [Chapter 5: ChatCompletionClient](05_chatcompletionclient.md): How agents interact with LLMs like GPT to generate responses or decide on actions (like calling a tool). +* [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md): How the history of the conversation, including tool calls and results, is managed when talking to an LLM. + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/AutoGen Core/05_chatcompletionclient.md b/output/AutoGen Core/05_chatcompletionclient.md new file mode 100644 index 0000000..fb8d5e0 --- /dev/null +++ b/output/AutoGen Core/05_chatcompletionclient.md @@ -0,0 +1,296 @@ +# Chapter 5: ChatCompletionClient - Talking to the Brains + +So far, we've learned about: +* [Agents](01_agent.md): The workers in our system. +* [Messaging](02_messaging_system__topic___subscription_.md): How agents communicate broadly. +* [AgentRuntime](03_agentruntime.md): The manager that runs the show. +* [Tools](04_tool.md): How agents get specific skills. + +But how does an agent actually *think* or *generate text*? Many powerful agents rely on Large Language Models (LLMs) โ€“ think of models like GPT-4, Claude, or Gemini โ€“ as their "brains". How does an agent in AutoGen Core communicate with these external LLM services? + +This is where the **`ChatCompletionClient`** comes in. It's the dedicated component for talking to LLMs. + +## Motivation: Bridging the Gap to LLMs + +Imagine you want to build an agent that can summarize long articles. +1. You give the agent an article (as a message). +2. The agent needs to send this article to an LLM (like GPT-4). +3. It also needs to tell the LLM: "Please summarize this." +4. The LLM processes the request and generates a summary. +5. The agent needs to receive this summary back from the LLM. + +How does the agent handle the technical details of connecting to the LLM's specific API, formatting the request correctly, sending it over the internet, and understanding the response? + +The `ChatCompletionClient` solves this! Think of it as the **standard phone line and translator** connecting your agent to the LLM service. You tell the client *what* to say (the conversation history and instructions), and it handles *how* to say it to the specific LLM and translates the LLM's reply back into a standard format. + +## Key Concepts: Understanding the LLM Communicator + +Let's break down the `ChatCompletionClient`: + +1. **LLM Communication Bridge:** It's the primary way AutoGen agents interact with external LLM APIs (like OpenAI, Anthropic, Google Gemini, etc.). It hides the complexity of specific API calls. + +2. **Standard Interface (`create` method):** It defines a common way to send requests and receive responses, regardless of the underlying LLM. The core method is `create`. You give it: + * `messages`: A list of messages representing the conversation history so far. + * Optional `tools`: A list of tools ([Chapter 4](04_tool.md)) the LLM might be able to use. + * Other parameters (like `json_output` hints, `cancellation_token`). + +3. **Messages (`LLMMessage`):** The conversation history is passed as a sequence of specific message types defined in `autogen_core.models`: + * `SystemMessage`: Instructions for the LLM (e.g., "You are a helpful assistant."). + * `UserMessage`: Input from the user or another agent (e.g., the article text). + * `AssistantMessage`: Previous responses from the LLM (can include text or requests to call functions/tools). + * `FunctionExecutionResultMessage`: The results of executing a tool/function call. + +4. **Tools (`ToolSchema`):** You can provide the schemas of available tools ([Chapter 4](04_tool.md)). The LLM might then respond not with text, but with a request to call one of these tools (`FunctionCall` inside an `AssistantMessage`). + +5. **Response (`CreateResult`):** The `create` method returns a standard `CreateResult` object containing: + * `content`: The LLM's generated text or a list of `FunctionCall` requests. + * `finish_reason`: Why the LLM stopped generating (e.g., "stop", "length", "function_calls"). + * `usage`: How many input (`prompt_tokens`) and output (`completion_tokens`) tokens were used. + * `cached`: Whether the response came from a cache. + +6. **Token Tracking:** The client automatically tracks token usage (`prompt_tokens`, `completion_tokens`) for each call. You can query the total usage via methods like `total_usage()`. This is vital for monitoring costs, as most LLM APIs charge based on tokens. + +## Use Case Example: Summarizing Text with an LLM + +Let's build a simplified scenario where we use a `ChatCompletionClient` to ask an LLM to summarize text. + +**Goal:** Send text to an LLM via a client and get a summary back. + +**Step 1: Prepare the Input Messages** + +We need to structure our request as a list of `LLMMessage` objects. + +```python +# File: prepare_messages.py +from autogen_core.models import SystemMessage, UserMessage + +# Instructions for the LLM +system_prompt = SystemMessage( + content="You are a helpful assistant designed to summarize text concisely." +) + +# The text we want to summarize +article_text = """ +AutoGen is a framework that enables the development of LLM applications using multiple agents +that can converse with each other to solve tasks. AutoGen agents are customizable, +conversable, and can seamlessly allow human participation. They can operate in various modes +that employ combinations of LLMs, human inputs, and tools. +""" +user_request = UserMessage( + content=f"Please summarize the following text in one sentence:\n\n{article_text}", + source="User" # Indicate who provided this input +) + +# Combine into a list for the client +messages_to_send = [system_prompt, user_request] + +print("Messages prepared:") +for msg in messages_to_send: + print(f"- {msg.type}: {msg.content[:50]}...") # Print first 50 chars +``` +This code defines the instructions (`SystemMessage`) and the user's request (`UserMessage`) and puts them in a list, ready to be sent. + +**Step 2: Use the ChatCompletionClient (Conceptual)** + +Now, we need an instance of a `ChatCompletionClient`. In a real application, you'd configure a specific client (like `OpenAIChatCompletionClient` with your API key). For this example, let's imagine we have a pre-configured client called `llm_client`. + +```python +# File: call_llm_client.py +import asyncio +from autogen_core.models import CreateResult, RequestUsage +# Assume 'messages_to_send' is from the previous step +# Assume 'llm_client' is a pre-configured ChatCompletionClient instance +# (e.g., llm_client = OpenAIChatCompletionClient(config=...)) + +async def get_summary(client, messages): + print("\nSending messages to LLM via ChatCompletionClient...") + try: + # The core call: send messages, get structured result + response: CreateResult = await client.create( + messages=messages, + # We aren't providing tools in this simple example + tools=[] + ) + print("Received response:") + print(f"- Finish Reason: {response.finish_reason}") + print(f"- Content: {response.content}") # This should be the summary + print(f"- Usage (Tokens): Prompt={response.usage.prompt_tokens}, Completion={response.usage.completion_tokens}") + print(f"- Cached: {response.cached}") + + # Also, check total usage tracked by the client + total_usage = client.total_usage() + print(f"\nClient Total Usage: Prompt={total_usage.prompt_tokens}, Completion={total_usage.completion_tokens}") + + except Exception as e: + print(f"An error occurred: {e}") + +# --- Placeholder for actual client --- +class MockChatCompletionClient: # Simulate a real client + _total_usage = RequestUsage(prompt_tokens=0, completion_tokens=0) + async def create(self, messages, tools=[], **kwargs) -> CreateResult: + # Simulate API call and response + prompt_len = sum(len(str(m.content)) for m in messages) // 4 # Rough token estimate + summary = "AutoGen is a multi-agent framework for developing LLM applications." + completion_len = len(summary) // 4 # Rough token estimate + usage = RequestUsage(prompt_tokens=prompt_len, completion_tokens=completion_len) + self._total_usage.prompt_tokens += usage.prompt_tokens + self._total_usage.completion_tokens += usage.completion_tokens + return CreateResult( + finish_reason="stop", content=summary, usage=usage, cached=False + ) + def total_usage(self) -> RequestUsage: return self._total_usage + # Other required methods (count_tokens, model_info etc.) omitted for brevity + +async def main(): + from prepare_messages import messages_to_send # Get messages from previous step + mock_client = MockChatCompletionClient() + await get_summary(mock_client, messages_to_send) + +# asyncio.run(main()) # If you run this, it uses the mock client +``` +This code shows the essential `client.create(...)` call. We pass our `messages_to_send` and receive a `CreateResult`. We then print the summary (`response.content`) and the token usage reported for that specific call (`response.usage`) and the total tracked by the client (`client.total_usage()`). + +**How an Agent Uses It:** +Typically, an agent's logic (e.g., inside its `on_message` handler) would: +1. Receive an incoming message (like the article to summarize). +2. Prepare the list of `LLMMessage` objects (including system prompts, history, and the new request). +3. Access a `ChatCompletionClient` instance (often provided during agent setup or accessed via its context). +4. Call `await client.create(...)`. +5. Process the `CreateResult` (e.g., extract the summary text, check for function calls if tools were provided). +6. Potentially send the result as a new message to another agent or return it. + +## Under the Hood: How the Client Talks to the LLM + +What happens when you call `await client.create(...)`? + +**Conceptual Flow:** + +```mermaid +sequenceDiagram + participant Agent as Agent Logic + participant Client as ChatCompletionClient + participant Formatter as API Formatter + participant HTTP as HTTP Client + participant LLM_API as External LLM API + + Agent->>+Client: create(messages, tools) + Client->>+Formatter: Format messages & tools for specific API (e.g., OpenAI JSON format) + Formatter-->>-Client: Return formatted request body + Client->>+HTTP: Send POST request to LLM API endpoint with formatted body & API Key + HTTP->>+LLM_API: Transmit request over network + LLM_API->>LLM_API: Process request, generate completion/function call + LLM_API-->>-HTTP: Return API response (e.g., JSON) + HTTP-->>-Client: Receive HTTP response + Client->>+Formatter: Parse API response (extract content, usage, finish_reason) + Formatter-->>-Client: Return parsed data + Client->>Client: Create standard CreateResult object + Client-->>-Agent: Return CreateResult +``` + +1. **Prepare:** The `ChatCompletionClient` takes the standard `LLMMessage` list and `ToolSchema` list. +2. **Format:** It translates these into the specific format required by the target LLM's API (e.g., the JSON structure expected by OpenAI's `/chat/completions` endpoint). This might involve renaming roles (like `SystemMessage` to `system`), formatting tool descriptions, etc. +3. **Request:** It uses an underlying HTTP client to send a network request (usually a POST request) to the LLM service's API endpoint, including the formatted data and authentication (like an API key). +4. **Wait & Receive:** It waits for the LLM service to process the request and send back a response over the network. +5. **Parse:** It receives the raw HTTP response (usually JSON) from the API. +6. **Standardize:** It parses this specific API response, extracting the generated text or function calls, token usage figures, finish reason, etc. +7. **Return:** It packages all this information into a standard `CreateResult` object and returns it to the calling agent code. + +**Code Glimpse:** + +* **`ChatCompletionClient` Protocol (`models/_model_client.py`):** This is the abstract base class (or protocol) defining the *contract* that all specific clients must follow. + + ```python + # From: models/_model_client.py (Simplified ABC) + from abc import ABC, abstractmethod + from typing import Sequence, Optional, Mapping, Any, AsyncGenerator, Union + from ._types import LLMMessage, CreateResult, RequestUsage + from ..tools import Tool, ToolSchema + from .. import CancellationToken + + class ChatCompletionClient(ABC): + @abstractmethod + async def create( + self, messages: Sequence[LLMMessage], *, + tools: Sequence[Tool | ToolSchema] = [], + json_output: Optional[bool] = None, # Hint for JSON mode + extra_create_args: Mapping[str, Any] = {}, # API-specific args + cancellation_token: Optional[CancellationToken] = None, + ) -> CreateResult: ... # The core method + + @abstractmethod + def create_stream( + self, # Similar to create, but yields results incrementally + # ... parameters ... + ) -> AsyncGenerator[Union[str, CreateResult], None]: ... + + @abstractmethod + def total_usage(self) -> RequestUsage: ... # Get total tracked usage + + @abstractmethod + def count_tokens(self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = []) -> int: ... # Estimate token count + + # Other methods like close(), actual_usage(), remaining_tokens(), model_info... + ``` + Concrete classes like `OpenAIChatCompletionClient`, `AnthropicChatCompletionClient` etc., implement these methods using the specific libraries and API calls for each service. + +* **`LLMMessage` Types (`models/_types.py`):** These define the structure of messages passed *to* the client. + + ```python + # From: models/_types.py (Simplified) + from pydantic import BaseModel + from typing import List, Union, Literal + from .. import FunctionCall # From Chapter 4 context + + class SystemMessage(BaseModel): + content: str + type: Literal["SystemMessage"] = "SystemMessage" + + class UserMessage(BaseModel): + content: Union[str, List[Union[str, Image]]] # Can include images! + source: str + type: Literal["UserMessage"] = "UserMessage" + + class AssistantMessage(BaseModel): + content: Union[str, List[FunctionCall]] # Can be text or function calls + source: str + type: Literal["AssistantMessage"] = "AssistantMessage" + + # FunctionExecutionResultMessage also exists here... + ``` + +* **`CreateResult` (`models/_types.py`):** This defines the structure of the response *from* the client. + + ```python + # From: models/_types.py (Simplified) + from pydantic import BaseModel + from dataclasses import dataclass + from typing import Union, List, Optional + from .. import FunctionCall + + @dataclass + class RequestUsage: + prompt_tokens: int + completion_tokens: int + + FinishReasons = Literal["stop", "length", "function_calls", "content_filter", "unknown"] + + class CreateResult(BaseModel): + finish_reason: FinishReasons + content: Union[str, List[FunctionCall]] # LLM output + usage: RequestUsage # Token usage for this call + cached: bool + # Optional fields like logprobs, thought... + ``` + Using these standard types ensures that agent logic can work consistently, even if you switch the underlying LLM service by using a different `ChatCompletionClient` implementation. + +## Next Steps + +You now understand the role of `ChatCompletionClient` as the crucial link between AutoGen agents and the powerful capabilities of Large Language Models. It provides a standard way to send conversational history and tool definitions, receive generated text or function call requests, and track token usage. + +Managing the conversation history (`messages`) sent to the client is very important. How do you ensure the LLM has the right context, especially after tool calls have happened? + +* [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md): Learn how AutoGen helps manage the conversation history, including adding tool call requests and their results, before sending it to the `ChatCompletionClient`. + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/AutoGen Core/06_chatcompletioncontext.md b/output/AutoGen Core/06_chatcompletioncontext.md new file mode 100644 index 0000000..3b04b3d --- /dev/null +++ b/output/AutoGen Core/06_chatcompletioncontext.md @@ -0,0 +1,330 @@ +# Chapter 6: ChatCompletionContext - Remembering the Conversation + +In [Chapter 5: ChatCompletionClient](05_chatcompletionclient.md), we learned how agents talk to Large Language Models (LLMs) using a `ChatCompletionClient`. We saw that we need to send a list of `messages` (the conversation history) to the LLM so it knows the context. + +But conversations can get very long! Imagine talking on the phone for an hour. Can you remember *every single word* that was said? Probably not. You remember the main points, the beginning, and what was said most recently. LLMs have a similar limitation โ€“ they can only pay attention to a certain amount of text at once (called the "context window"). + +If we send the *entire* history of a very long chat, it might be too much for the LLM, lead to errors, be slow, or cost more money (since many LLMs charge based on the amount of text). + +So, how do we smartly choose *which* parts of the conversation history to send? This is the problem that **`ChatCompletionContext`** solves. + +## Motivation: Keeping LLM Conversations Focused + +Let's say we have a helpful assistant agent chatting with a user: + +1. **User:** "Hi! Can you tell me about AutoGen?" +2. **Assistant:** "Sure! AutoGen is a framework..." (provides details) +3. **User:** "Thanks! Now, can you draft an email to my team about our upcoming meeting?" +4. **Assistant:** "Okay, what's the meeting about?" +5. **User:** "It's about the project planning for Q3." +6. **Assistant:** (Needs to draft the email) + +When the Assistant needs to draft the email (step 6), does it need the *exact* text from step 2 about what AutoGen is? Probably not. It definitely needs the instructions from step 3 and the topic from step 5. Maybe the initial greeting isn't super important either. + +`ChatCompletionContext` acts like a **smart transcript editor**. Before sending the history to the LLM via the `ChatCompletionClient`, it reviews the full conversation log and prepares a shorter, focused version containing only the messages it thinks are most relevant for the LLM's next response. + +## Key Concepts: Managing the Chat History + +1. **The Full Transcript Holder:** A `ChatCompletionContext` object holds the *complete* list of messages (`LLMMessage` objects like `SystemMessage`, `UserMessage`, `AssistantMessage` from Chapter 5) that have occurred in a specific conversation thread. You add new messages using its `add_message` method. + +2. **The Smart View Generator (`get_messages`):** The core job of `ChatCompletionContext` is done by its `get_messages` method. When called, it looks at the *full* transcript it holds, but returns only a *subset* of those messages based on its specific strategy. This subset is what you'll actually send to the `ChatCompletionClient`. + +3. **Different Strategies for Remembering:** Because different situations require different focus, AutoGen Core provides several `ChatCompletionContext` implementations (strategies): + * **`UnboundedChatCompletionContext`:** The simplest (and sometimes riskiest!). It doesn't edit anything; `get_messages` just returns the *entire* history. Good for short chats, but can break with long ones. + * **`BufferedChatCompletionContext`:** Like remembering only the last few things someone said. It keeps the most recent `N` messages (where `N` is the `buffer_size` you set). Good for focusing on recent interactions. + * **`HeadAndTailChatCompletionContext`:** Tries to get the best of both worlds. It keeps the first few messages (the "head", maybe containing initial instructions) and the last few messages (the "tail", the recent context). It skips the messages in the middle. + +## Use Case Example: Chatting with Different Memory Strategies + +Let's simulate adding messages to different context managers and see what `get_messages` returns. + +**Step 1: Define some messages** + +```python +# File: define_chat_messages.py +from autogen_core.models import ( + SystemMessage, UserMessage, AssistantMessage, LLMMessage +) +from typing import List + +# The initial instruction for the assistant +system_msg = SystemMessage(content="You are a helpful assistant.") + +# A sequence of user/assistant turns +chat_sequence: List[LLMMessage] = [ + UserMessage(content="What is AutoGen?", source="User"), + AssistantMessage(content="AutoGen is a multi-agent framework...", source="Agent"), + UserMessage(content="What can it do?", source="User"), + AssistantMessage(content="It can build complex LLM apps.", source="Agent"), + UserMessage(content="Thanks!", source="User") +] + +# Combine system message and the chat sequence +full_history: List[LLMMessage] = [system_msg] + chat_sequence + +print(f"Total messages in full history: {len(full_history)}") +# Output: Total messages in full history: 6 +``` +We have a full history of 6 messages (1 system + 5 chat turns). + +**Step 2: Use `UnboundedChatCompletionContext`** + +This context keeps everything. + +```python +# File: use_unbounded_context.py +import asyncio +from define_chat_messages import full_history +from autogen_core.model_context import UnboundedChatCompletionContext + +async def main(): + # Create context and add all messages + context = UnboundedChatCompletionContext() + for msg in full_history: + await context.add_message(msg) + + # Get the messages to send to the LLM + messages_for_llm = await context.get_messages() + + print(f"--- Unbounded Context ({len(messages_for_llm)} messages) ---") + for i, msg in enumerate(messages_for_llm): + print(f"{i+1}. [{msg.type}]: {msg.content[:30]}...") + +# asyncio.run(main()) # If run +``` + +**Expected Output (Unbounded):** +``` +--- Unbounded Context (6 messages) --- +1. [SystemMessage]: You are a helpful assistant.... +2. [UserMessage]: What is AutoGen?... +3. [AssistantMessage]: AutoGen is a multi-agent fram... +4. [UserMessage]: What can it do?... +5. [AssistantMessage]: It can build complex LLM apps... +6. [UserMessage]: Thanks!... +``` +It returns all 6 messages, exactly as added. + +**Step 3: Use `BufferedChatCompletionContext`** + +Let's keep only the last 3 messages. + +```python +# File: use_buffered_context.py +import asyncio +from define_chat_messages import full_history +from autogen_core.model_context import BufferedChatCompletionContext + +async def main(): + # Keep only the last 3 messages + context = BufferedChatCompletionContext(buffer_size=3) + for msg in full_history: + await context.add_message(msg) + + messages_for_llm = await context.get_messages() + + print(f"--- Buffered Context (buffer=3, {len(messages_for_llm)} messages) ---") + for i, msg in enumerate(messages_for_llm): + print(f"{i+1}. [{msg.type}]: {msg.content[:30]}...") + +# asyncio.run(main()) # If run +``` + +**Expected Output (Buffered):** +``` +--- Buffered Context (buffer=3, 3 messages) --- +1. [UserMessage]: What can it do?... +2. [AssistantMessage]: It can build complex LLM apps... +3. [UserMessage]: Thanks!... +``` +It only returns the last 3 messages from the full history. The system message and the first chat turn are omitted. + +**Step 4: Use `HeadAndTailChatCompletionContext`** + +Let's keep the first message (head=1) and the last two messages (tail=2). + +```python +# File: use_head_tail_context.py +import asyncio +from define_chat_messages import full_history +from autogen_core.model_context import HeadAndTailChatCompletionContext + +async def main(): + # Keep first 1 and last 2 messages + context = HeadAndTailChatCompletionContext(head_size=1, tail_size=2) + for msg in full_history: + await context.add_message(msg) + + messages_for_llm = await context.get_messages() + + print(f"--- Head & Tail Context (h=1, t=2, {len(messages_for_llm)} messages) ---") + for i, msg in enumerate(messages_for_llm): + print(f"{i+1}. [{msg.type}]: {msg.content[:30]}...") + +# asyncio.run(main()) # If run +``` + +**Expected Output (Head & Tail):** +``` +--- Head & Tail Context (h=1, t=2, 4 messages) --- +1. [SystemMessage]: You are a helpful assistant.... +2. [UserMessage]: Skipped 3 messages.... +3. [AssistantMessage]: It can build complex LLM apps... +4. [UserMessage]: Thanks!... +``` +It keeps the very first message (`SystemMessage`), then inserts a placeholder telling the LLM that some messages were skipped, and finally includes the last two messages. This preserves the initial instruction and the most recent context. + +**Which one to choose?** It depends on your agent's task! +* Simple Q&A? `Buffered` might be fine. +* Following complex initial instructions? `HeadAndTail` or even `Unbounded` (if short) might be better. + +## Under the Hood: How Context is Managed + +The core idea is defined by the `ChatCompletionContext` abstract base class. + +**Conceptual Flow:** + +```mermaid +sequenceDiagram + participant Agent as Agent Logic + participant Context as ChatCompletionContext + participant FullHistory as Internal Message List + + Agent->>+Context: add_message(newMessage) + Context->>+FullHistory: Append newMessage to list + FullHistory-->>-Context: List updated + Context-->>-Agent: Done + + Agent->>+Context: get_messages() + Context->>+FullHistory: Read the full list + FullHistory-->>-Context: Return full list + Context->>Context: Apply Strategy (e.g., slice list for Buffered/HeadTail) + Context-->>-Agent: Return selected list of messages +``` + +1. **Adding:** When `add_message(message)` is called, the context simply appends the `message` to its internal list (`self._messages`). +2. **Getting:** When `get_messages()` is called: + * The context accesses its internal `self._messages` list. + * The specific implementation (`Unbounded`, `Buffered`, `HeadAndTail`) applies its logic to select which messages to return. + * It returns the selected list. + +**Code Glimpse:** + +* **Base Class (`_chat_completion_context.py`):** Defines the structure and common methods. + + ```python + # From: model_context/_chat_completion_context.py (Simplified) + from abc import ABC, abstractmethod + from typing import List + from ..models import LLMMessage + + class ChatCompletionContext(ABC): + component_type = "chat_completion_context" # Identifies this as a component type + + def __init__(self, initial_messages: List[LLMMessage] | None = None) -> None: + # Holds the COMPLETE history + self._messages: List[LLMMessage] = initial_messages or [] + + async def add_message(self, message: LLMMessage) -> None: + """Add a message to the full context.""" + self._messages.append(message) + + @abstractmethod + async def get_messages(self) -> List[LLMMessage]: + """Get the subset of messages based on the strategy.""" + # Each subclass MUST implement this logic + ... + + # Other methods like clear(), save_state(), load_state() exist too + ``` + The base class handles storing messages; subclasses define *how* to retrieve them. + +* **Unbounded (`_unbounded_chat_completion_context.py`):** The simplest implementation. + + ```python + # From: model_context/_unbounded_chat_completion_context.py (Simplified) + from typing import List + from ._chat_completion_context import ChatCompletionContext + from ..models import LLMMessage + + class UnboundedChatCompletionContext(ChatCompletionContext): + async def get_messages(self) -> List[LLMMessage]: + """Returns all messages.""" + return self._messages # Just return the whole internal list + ``` + +* **Buffered (`_buffered_chat_completion_context.py`):** Uses slicing to get the end of the list. + + ```python + # From: model_context/_buffered_chat_completion_context.py (Simplified) + from typing import List + from ._chat_completion_context import ChatCompletionContext + from ..models import LLMMessage, FunctionExecutionResultMessage + + class BufferedChatCompletionContext(ChatCompletionContext): + def __init__(self, buffer_size: int, ...): + super().__init__(...) + self._buffer_size = buffer_size + + async def get_messages(self) -> List[LLMMessage]: + """Get at most `buffer_size` recent messages.""" + # Slice the list to get the last 'buffer_size' items + messages = self._messages[-self._buffer_size :] + # Special case: Avoid starting with a function result message + if messages and isinstance(messages[0], FunctionExecutionResultMessage): + messages = messages[1:] + return messages + ``` + +* **Head and Tail (`_head_and_tail_chat_completion_context.py`):** Combines slices from the beginning and end. + + ```python + # From: model_context/_head_and_tail_chat_completion_context.py (Simplified) + from typing import List + from ._chat_completion_context import ChatCompletionContext + from ..models import LLMMessage, UserMessage + + class HeadAndTailChatCompletionContext(ChatCompletionContext): + def __init__(self, head_size: int, tail_size: int, ...): + super().__init__(...) + self._head_size = head_size + self._tail_size = tail_size + + async def get_messages(self) -> List[LLMMessage]: + head = self._messages[: self._head_size] # First 'head_size' items + tail = self._messages[-self._tail_size :] # Last 'tail_size' items + num_skipped = len(self._messages) - len(head) - len(tail) + + if num_skipped <= 0: # If no overlap or gap + return self._messages + else: # If messages were skipped + placeholder = [UserMessage(content=f"Skipped {num_skipped} messages.", source="System")] + # Combine head + placeholder + tail + return head + placeholder + tail + ``` + These implementations provide different ways to manage the context window effectively. + +## Putting it Together with ChatCompletionClient + +How does an agent use `ChatCompletionContext` with the `ChatCompletionClient` from Chapter 5? + +1. An agent has an instance of a `ChatCompletionContext` (e.g., `BufferedChatCompletionContext`) to store its conversation history. +2. When the agent receives a new message (e.g., a `UserMessage`), it calls `await context.add_message(new_user_message)`. +3. To prepare for calling the LLM, the agent calls `messages_to_send = await context.get_messages()`. This gets the strategically selected subset of the history. +4. The agent then passes this list to the `ChatCompletionClient`: `response = await llm_client.create(messages=messages_to_send, ...)`. +5. When the LLM replies (e.g., with an `AssistantMessage`), the agent adds it back to the context: `await context.add_message(llm_response_message)`. + +This loop ensures that the history is continuously updated and intelligently trimmed before each call to the LLM. + +## Next Steps + +You've learned how `ChatCompletionContext` helps manage the conversation history sent to LLMs, preventing context window overflows and keeping the interaction focused using different strategies (`Unbounded`, `Buffered`, `HeadAndTail`). + +This context management is a specific form of **memory**. Agents might need to remember things beyond just the chat history. How do they store general information, state, or knowledge over time? + +* [Chapter 7: Memory](07_memory.md): Explore the broader concept of Memory in AutoGen Core, which provides more general ways for agents to store and retrieve information. +* [Chapter 8: Component](08_component.md): Understand how `ChatCompletionContext` fits into the general `Component` model, allowing configuration and integration within the AutoGen system. + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/AutoGen Core/07_memory.md b/output/AutoGen Core/07_memory.md new file mode 100644 index 0000000..8e648b5 --- /dev/null +++ b/output/AutoGen Core/07_memory.md @@ -0,0 +1,323 @@ +# Chapter 7: Memory - The Agent's Notebook + +In [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md), we saw how agents manage the *short-term* history of a single conversation before talking to an LLM. It's like remembering what was just said in the last few minutes. + +But what if an agent needs to remember things for much longer, across *multiple* conversations or tasks? For example, imagine an assistant agent that learns your preferences: +* You tell it: "Please always write emails in a formal style for me." +* Weeks later, you ask it to draft a new email. + +How does it remember that preference? The short-term `ChatCompletionContext` might have forgotten the earlier instruction, especially if using a strategy like `BufferedChatCompletionContext`. The agent needs a **long-term memory**. + +This is where the **`Memory`** abstraction comes in. Think of it as the agent's **long-term notebook or database**. While `ChatCompletionContext` is the scratchpad for the current chat, `Memory` holds persistent information the agent can add to or look up later. + +## Motivation: Remembering Across Conversations + +Our goal is to give an agent the ability to store a piece of information (like a user preference) and retrieve it later to influence its behavior, even in a completely new conversation. `Memory` provides the mechanism for this long-term storage and retrieval. + +## Key Concepts: How the Notebook Works + +1. **What it Stores (`MemoryContent`):** Agents can store various types of information in their memory. This could be: + * Plain text notes (`text/plain`) + * Structured data like JSON (`application/json`) + * Even images (`image/*`) + Each piece of information is wrapped in a `MemoryContent` object, which includes the data itself, its type (`mime_type`), and optional descriptive `metadata`. + + ```python + # From: memory/_base_memory.py (Simplified Concept) + from pydantic import BaseModel + from typing import Any, Dict, Union + + # Represents one entry in the memory notebook + class MemoryContent(BaseModel): + content: Union[str, bytes, Dict[str, Any]] # The actual data + mime_type: str # What kind of data (e.g., "text/plain") + metadata: Dict[str, Any] | None = None # Extra info (optional) + ``` + This standard format helps manage different kinds of memories. + +2. **Adding to Memory (`add`):** When an agent learns something important it wants to remember long-term (like the user's preferred style), it uses the `memory.add(content)` method. This is like writing a new entry in the notebook. + +3. **Querying Memory (`query`):** When an agent needs to recall information, it can use `memory.query(query_text)`. This is like searching the notebook for relevant entries. How the search works depends on the specific memory implementation (it could be a simple text match, or a sophisticated vector search in more advanced memories). + +4. **Updating Chat Context (`update_context`):** This is a crucial link! Before an agent talks to the LLM (using the `ChatCompletionClient` from [Chapter 5](05_chatcompletionclient.md)), it can use `memory.update_context(chat_context)` method. This method: + * Looks at the current conversation (`chat_context`). + * Queries the long-term memory (`Memory`) for relevant information. + * Injects the retrieved memories *into* the `chat_context`, often as a `SystemMessage`. + This way, the LLM gets the benefit of the long-term memory *in addition* to the short-term conversation history, right before generating its response. + +5. **Different Memory Implementations:** Just like there are different `ChatCompletionContext` strategies, there can be different `Memory` implementations: + * `ListMemory`: A very simple memory that stores everything in a Python list (like a simple chronological notebook). + * *Future Possibilities*: More advanced implementations could use databases or vector stores for more efficient storage and retrieval of vast amounts of information. + +## Use Case Example: Remembering User Preferences with `ListMemory` + +Let's implement our user preference use case using the simple `ListMemory`. + +**Goal:** +1. Create a `ListMemory`. +2. Add a user preference ("formal style") to it. +3. Start a *new* chat context. +4. Use `update_context` to inject the preference into the new chat context. +5. Show how the chat context looks *before* being sent to the LLM. + +**Step 1: Create the Memory** + +We'll use `ListMemory`, the simplest implementation provided by AutoGen Core. + +```python +# File: create_list_memory.py +from autogen_core.memory import ListMemory + +# Create a simple list-based memory instance +user_prefs_memory = ListMemory(name="user_preferences") + +print(f"Created memory: {user_prefs_memory.name}") +print(f"Initial content: {user_prefs_memory.content}") +# Output: +# Created memory: user_preferences +# Initial content: [] +``` +We have an empty memory notebook named "user_preferences". + +**Step 2: Add the Preference** + +Let's add the user's preference as a piece of text memory. + +```python +# File: add_preference.py +import asyncio +from autogen_core.memory import MemoryContent +# Assume user_prefs_memory exists from the previous step + +# Define the preference as MemoryContent +preference = MemoryContent( + content="User prefers all communication to be written in a formal style.", + mime_type="text/plain", # It's just text + metadata={"source": "user_instruction_conversation_1"} # Optional info +) + +async def add_to_memory(): + # Add the content to our memory instance + await user_prefs_memory.add(preference) + print(f"Memory content after adding: {user_prefs_memory.content}") + +asyncio.run(add_to_memory()) +# Output (will show the MemoryContent object): +# Memory content after adding: [MemoryContent(content='User prefers...', mime_type='text/plain', metadata={'source': '...'})] +``` +We've successfully written the preference into our `ListMemory` notebook. + +**Step 3: Start a New Chat Context** + +Imagine time passes, and the user starts a new conversation asking for an email draft. We create a fresh `ChatCompletionContext`. + +```python +# File: start_new_chat.py +from autogen_core.model_context import UnboundedChatCompletionContext +from autogen_core.models import UserMessage + +# Start a new, empty chat context for a new task +new_chat_context = UnboundedChatCompletionContext() + +# Add the user's new request +new_request = UserMessage(content="Draft an email to the team about the Q3 results.", source="User") +# await new_chat_context.add_message(new_request) # In a real app, add the request + +print("Created a new, empty chat context.") +# Output: Created a new, empty chat context. +``` +This context currently *doesn't* know about the "formal style" preference stored in our long-term memory. + +**Step 4: Inject Memory into Chat Context** + +Before sending the `new_chat_context` to the LLM, we use `update_context` to bring in relevant long-term memories. + +```python +# File: update_chat_with_memory.py +import asyncio +# Assume user_prefs_memory exists (with the preference added) +# Assume new_chat_context exists (empty or with just the new request) +# Assume new_request exists + +async def main(): + # --- This is where Memory connects to Chat Context --- + print("Updating chat context with memory...") + update_result = await user_prefs_memory.update_context(new_chat_context) + print(f"Memories injected: {len(update_result.memories.results)}") + + # Now let's add the actual user request for this task + await new_chat_context.add_message(new_request) + + # See what messages are now in the context + messages_for_llm = await new_chat_context.get_messages() + print("\nMessages to be sent to LLM:") + for msg in messages_for_llm: + print(f"- [{msg.type}]: {msg.content}") + +asyncio.run(main()) +``` + +**Expected Output:** +``` +Updating chat context with memory... +Memories injected: 1 + +Messages to be sent to LLM: +- [SystemMessage]: +Relevant memory content (in chronological order): +1. User prefers all communication to be written in a formal style. + +- [UserMessage]: Draft an email to the team about the Q3 results. +``` +Look! The `ListMemory.update_context` method automatically queried the memory (in this simple case, it just takes *all* entries) and added a `SystemMessage` to the `new_chat_context`. This message explicitly tells the LLM about the stored preference *before* it sees the user's request to draft the email. + +**Step 5: (Conceptual) Sending to LLM** + +Now, if we were to send `messages_for_llm` to the `ChatCompletionClient` (Chapter 5): + +```python +# Conceptual code - Requires a configured client +# response = await llm_client.create(messages=messages_for_llm) +``` +The LLM would receive both the instruction about the formal style preference (from Memory) and the request to draft the email. It's much more likely to follow the preference now! + +**Step 6: Direct Query (Optional)** + +We can also directly query the memory if needed, without involving a chat context. + +```python +# File: query_memory.py +import asyncio +# Assume user_prefs_memory exists + +async def main(): + # Query the memory (ListMemory returns all items regardless of query text) + query_result = await user_prefs_memory.query("style preference") + print("\nDirect query result:") + for item in query_result.results: + print(f"- Content: {item.content}, Type: {item.mime_type}") + +asyncio.run(main()) +# Output: +# Direct query result: +# - Content: User prefers all communication to be written in a formal style., Type: text/plain +``` +This shows how an agent could specifically look things up in its notebook. + +## Under the Hood: How `ListMemory` Injects Context + +Let's trace the `update_context` call for `ListMemory`. + +**Conceptual Flow:** + +```mermaid +sequenceDiagram + participant AgentLogic as Agent Logic + participant ListMem as ListMemory + participant InternalList as Memory's Internal List + participant ChatCtx as ChatCompletionContext + + AgentLogic->>+ListMem: update_context(chat_context) + ListMem->>+InternalList: Get all stored MemoryContent items + InternalList-->>-ListMem: Return list of [pref_content] + alt Memory list is NOT empty + ListMem->>ListMem: Format memories into a single string (e.g., "1. pref_content") + ListMem->>ListMem: Create SystemMessage with formatted string + ListMem->>+ChatCtx: add_message(SystemMessage) + ChatCtx-->>-ListMem: Context updated + end + ListMem->>ListMem: Create UpdateContextResult(memories=[pref_content]) + ListMem-->>-AgentLogic: Return UpdateContextResult +``` + +1. The agent calls `user_prefs_memory.update_context(new_chat_context)`. +2. The `ListMemory` instance accesses its internal `_contents` list. +3. It checks if the list is empty. If not: +4. It iterates through the `MemoryContent` items in the list. +5. It formats them into a numbered string (like "Relevant memory content...\n1. Item 1\n2. Item 2..."). +6. It creates a single `SystemMessage` containing this formatted string. +7. It calls `new_chat_context.add_message()` to add this `SystemMessage` to the chat history that will be sent to the LLM. +8. It returns an `UpdateContextResult` containing the list of memories it just processed. + +**Code Glimpse:** + +* **`Memory` Protocol (`memory/_base_memory.py`):** Defines the required methods for any memory implementation. + + ```python + # From: memory/_base_memory.py (Simplified ABC) + from abc import ABC, abstractmethod + # ... other imports: MemoryContent, MemoryQueryResult, UpdateContextResult, ChatCompletionContext + + class Memory(ABC): + component_type = "memory" + + @abstractmethod + async def update_context(self, model_context: ChatCompletionContext) -> UpdateContextResult: ... + + @abstractmethod + async def query(self, query: str | MemoryContent, ...) -> MemoryQueryResult: ... + + @abstractmethod + async def add(self, content: MemoryContent, ...) -> None: ... + + @abstractmethod + async def clear(self) -> None: ... + + @abstractmethod + async def close(self) -> None: ... + ``` + Any class wanting to act as Memory must provide these methods. + +* **`ListMemory` Implementation (`memory/_list_memory.py`):** + + ```python + # From: memory/_list_memory.py (Simplified) + from typing import List + # ... other imports: Memory, MemoryContent, ..., SystemMessage, ChatCompletionContext + + class ListMemory(Memory): + def __init__(self, ..., memory_contents: List[MemoryContent] | None = None): + # Stores memory items in a simple list + self._contents: List[MemoryContent] = memory_contents or [] + + async def add(self, content: MemoryContent, ...) -> None: + """Add new content to the internal list.""" + self._contents.append(content) + + async def query(self, query: str | MemoryContent = "", ...) -> MemoryQueryResult: + """Return all memories, ignoring the query.""" + # Simple implementation: just return everything + return MemoryQueryResult(results=self._contents) + + async def update_context(self, model_context: ChatCompletionContext) -> UpdateContextResult: + """Add all memories as a SystemMessage to the chat context.""" + if not self._contents: # Do nothing if memory is empty + return UpdateContextResult(memories=MemoryQueryResult(results=[])) + + # Format all memories into a numbered list string + memory_strings = [f"{i}. {str(mem.content)}" for i, mem in enumerate(self._contents, 1)] + memory_context_str = "Relevant memory content...\n" + "\n".join(memory_strings) + "\n" + + # Add this string as a SystemMessage to the provided chat context + await model_context.add_message(SystemMessage(content=memory_context_str)) + + # Return info about which memories were added + return UpdateContextResult(memories=MemoryQueryResult(results=self._contents)) + + # ... clear(), close(), config methods ... + ``` + This shows the straightforward logic of `ListMemory`: store in a list, retrieve the whole list, and inject the whole list as a single system message into the chat context. More complex memories might use smarter retrieval (e.g., based on the `query` in `query()` or the last message in `update_context`) and inject memories differently. + +## Next Steps + +You've learned about `Memory`, AutoGen Core's mechanism for giving agents long-term recall beyond the immediate conversation (`ChatCompletionContext`). We saw how `MemoryContent` holds information, `add` stores it, `query` retrieves it, and `update_context` injects relevant memories into the LLM's working context. We explored the simple `ListMemory` as a basic example. + +Memory systems are crucial for agents that learn, adapt, or need to maintain state across interactions. + +This concludes our deep dive into the core abstractions of AutoGen Core! We've covered Agents, Messaging, Runtime, Tools, LLM Clients, Chat Context, and now Memory. There's one final concept that ties many of these together from a configuration perspective: + +* [Chapter 8: Component](08_component.md): Understand the general `Component` model in AutoGen Core, how it allows pieces like `Memory`, `ChatCompletionContext`, and `ChatCompletionClient` to be configured and managed consistently. + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/AutoGen Core/08_component.md b/output/AutoGen Core/08_component.md new file mode 100644 index 0000000..65d019a --- /dev/null +++ b/output/AutoGen Core/08_component.md @@ -0,0 +1,359 @@ +# Chapter 8: Component - The Standardized Building Blocks + +Welcome to Chapter 8! In our journey so far, we've met several key players in AutoGen Core: +* [Agents](01_agent.md): The workers. +* [Messaging System](02_messaging_system__topic___subscription_.md): How they communicate. +* [AgentRuntime](03_agentruntime.md): The manager. +* [Tools](04_tool.md): Their special skills. +* [ChatCompletionClient](05_chatcompletionclient.md): How they talk to LLMs. +* [ChatCompletionContext](06_chatcompletioncontext.md): How they remember recent chat history. +* [Memory](07_memory.md): How they remember things long-term. + +Now, imagine you've built a fantastic agent system using these parts. You've configured a specific `ChatCompletionClient` to use OpenAI's `gpt-4o` model, and you've set up a `ListMemory` (from Chapter 7) to store user preferences. How do you save this exact setup so you can easily recreate it later, or share it with a friend? And what if you later want to swap out the `gpt-4o` client for a different one, like Anthropic's Claude, without rewriting your agent's core logic? + +This is where the **`Component`** concept comes in. It provides a standard way to define, configure, save, and load these reusable building blocks. + +## Motivation: Making Setups Portable and Swappable + +Think of the parts we've used so far โ€“ `ChatCompletionClient`, `Memory`, `Tool` โ€“ like specialized **Lego bricks**. Each brick has a specific function (connecting to an LLM, remembering things, performing an action). + +Wouldn't it be great if: +1. Each Lego brick had a standard way to describe its properties (like "Red 2x4 Brick")? +2. You could easily save the description of all the bricks used in your creation (your agent system)? +3. Someone else could take that description and automatically rebuild your exact creation? +4. You could easily swap a "Red 2x4 Brick" for a "Blue 2x4 Brick" without having to rebuild everything around it? + +The `Component` abstraction in AutoGen Core provides exactly this! It makes your building blocks **configurable**, **savable**, **loadable**, and **swappable**. + +## Key Concepts: Understanding Components + +Let's break down what makes the Component system work: + +1. **Component:** A class (like `ListMemory` or `OpenAIChatCompletionClient`) that is designed to be a standard, reusable building block. It performs a specific role within the AutoGen ecosystem. Many core classes inherit from `Component` or related base classes. + +2. **Configuration (`Config`):** Every Component has specific settings. For example, an `OpenAIChatCompletionClient` needs an API key and a model name. A `ListMemory` might have a name. These settings are defined in a standard way, usually using a Pydantic `BaseModel` specific to that component type. This `Config` acts like the "specification sheet" for the component instance. + +3. **Saving Settings (`_to_config` method):** A Component instance knows how to generate its *current* configuration. It has an internal method, `_to_config()`, that returns a `Config` object representing its settings. This is like asking a configured Lego brick, "What color and size are you?" + +4. **Loading Settings (`_from_config` class method):** A Component *class* knows how to create a *new* instance of itself from a given configuration. It has a class method, `_from_config(config)`, that takes a `Config` object and builds a new, configured component instance. This is like having instructions: "Build a brick with this color and size." + +5. **`ComponentModel` (The Box):** This is the standard package format used to save and load components. It's like the label and instructions on the Lego box. A `ComponentModel` contains: + * `provider`: A string telling AutoGen *which* Python class to use (e.g., `"autogen_core.memory.ListMemory"`). + * `config`: A dictionary holding the specific settings for this instance (the output of `_to_config()`). + * `component_type`: The general role of the component (e.g., `"memory"`, `"model"`, `"tool"`). + * Other metadata like `version`, `description`, `label`. + + ```python + # From: _component_config.py (Conceptual Structure) + from pydantic import BaseModel + from typing import Dict, Any + + class ComponentModel(BaseModel): + provider: str # Path to the class (e.g., "autogen_core.memory.ListMemory") + config: Dict[str, Any] # The specific settings for this instance + component_type: str | None = None # Role (e.g., "memory") + # ... other fields like version, description, label ... + ``` + This `ComponentModel` is what you typically save to a file (often as JSON or YAML). + +## Use Case Example: Saving and Loading `ListMemory` + +Let's see how this works with the `ListMemory` we used in [Chapter 7: Memory](07_memory.md). + +**Goal:** +1. Create a `ListMemory` instance. +2. Save its configuration using the Component system (`dump_component`). +3. Load that configuration to create a *new*, identical `ListMemory` instance (`load_component`). + +**Step 1: Create and Configure a `ListMemory`** + +First, let's make a memory component. `ListMemory` is already designed as a Component. + +```python +# File: create_memory_component.py +import asyncio +from autogen_core.memory import ListMemory, MemoryContent + +# Create an instance of ListMemory +my_memory = ListMemory(name="user_prefs_v1") + +# Add some content (from Chapter 7 example) +async def add_content(): + pref = MemoryContent(content="Use formal style", mime_type="text/plain") + await my_memory.add(pref) + print(f"Created memory '{my_memory.name}' with content: {my_memory.content}") + +asyncio.run(add_content()) +# Output: Created memory 'user_prefs_v1' with content: [MemoryContent(content='Use formal style', mime_type='text/plain', metadata=None)] +``` +We have our configured `my_memory` instance. + +**Step 2: Save the Configuration (`dump_component`)** + +Now, let's ask this component instance to describe itself by creating a `ComponentModel`. + +```python +# File: save_memory_config.py +# Assume 'my_memory' exists from the previous step + +# Dump the component's configuration into a ComponentModel +memory_model = my_memory.dump_component() + +# Let's print it (converting to dict for readability) +print("Saved ComponentModel:") +print(memory_model.model_dump_json(indent=2)) +``` + +**Expected Output:** +```json +Saved ComponentModel: +{ + "provider": "autogen_core.memory.ListMemory", + "component_type": "memory", + "version": 1, + "component_version": 1, + "description": "ListMemory stores memory content in a simple list.", + "label": "ListMemory", + "config": { + "name": "user_prefs_v1", + "memory_contents": [ + { + "content": "Use formal style", + "mime_type": "text/plain", + "metadata": null + } + ] + } +} +``` +Look at the output! `dump_component` created a `ComponentModel` that contains: +* `provider`: Exactly which class to use (`autogen_core.memory.ListMemory`). +* `config`: The specific settings, including the `name` and even the `memory_contents` we added! +* `component_type`: Its role is `"memory"`. +* Other useful info like description and version. + +You could save this JSON structure to a file (`my_memory_config.json`). + +**Step 3: Load the Configuration (`load_component`)** + +Now, imagine you're starting a new script or sharing the config file. You can load this `ComponentModel` to recreate the memory instance. + +```python +# File: load_memory_config.py +from autogen_core import ComponentModel +from autogen_core.memory import ListMemory # Need the class for type hint/loading + +# Assume 'memory_model' is the ComponentModel we just created +# (or loaded from a file) + +print(f"Loading component from ComponentModel (Provider: {memory_model.provider})...") + +# Use the ComponentLoader mechanism (available on Component classes) +# to load the model. We specify the expected type (ListMemory). +loaded_memory: ListMemory = ListMemory.load_component(memory_model) + +print(f"Successfully loaded memory!") +print(f"- Name: {loaded_memory.name}") +print(f"- Content: {loaded_memory.content}") +``` + +**Expected Output:** +``` +Loading component from ComponentModel (Provider: autogen_core.memory.ListMemory)... +Successfully loaded memory! +- Name: user_prefs_v1 +- Content: [MemoryContent(content='Use formal style', mime_type='text/plain', metadata=None)] +``` +Success! `load_component` read the `ComponentModel`, found the right class (`ListMemory`), used its `_from_config` method with the saved `config` data, and created a brand new `loaded_memory` instance that is identical to our original `my_memory`. + +**Benefits Shown:** +* **Reproducibility:** We saved the exact state (including content!) and loaded it perfectly. +* **Configuration:** We could easily save this to a JSON/YAML file and manage it outside our Python code. +* **Modularity (Conceptual):** If `ListMemory` and `VectorDBMemory` were both Components of type "memory", we could potentially load either one from a configuration file just by changing the `provider` and `config` in the file, without altering the agent code that *uses* the memory component (assuming the agent interacts via the standard `Memory` interface from Chapter 7). + +## Under the Hood: How Saving and Loading Work + +Let's peek behind the curtain. + +**Saving (`dump_component`) Flow:** + +```mermaid +sequenceDiagram + participant User + participant MyMemory as my_memory (ListMemory instance) + participant ListMemConfig as ListMemoryConfig (Pydantic Model) + participant CompModel as ComponentModel + + User->>+MyMemory: dump_component() + MyMemory->>MyMemory: Calls internal self._to_config() + MyMemory->>+ListMemConfig: Creates Config object (name="...", contents=[...]) + ListMemConfig-->>-MyMemory: Returns Config object + MyMemory->>MyMemory: Gets provider string ("autogen_core.memory.ListMemory") + MyMemory->>MyMemory: Gets component_type ("memory"), version, etc. + MyMemory->>+CompModel: Creates ComponentModel(provider=..., config=config_dict, ...) + CompModel-->>-MyMemory: Returns ComponentModel instance + MyMemory-->>-User: Returns ComponentModel instance +``` + +1. You call `my_memory.dump_component()`. +2. It calls its own `_to_config()` method. For `ListMemory`, this gathers the `name` and current `_contents`. +3. `_to_config()` returns a `ListMemoryConfig` object (a Pydantic model) holding these values. +4. `dump_component()` takes this `ListMemoryConfig` object, converts its data into a dictionary (`config` field). +5. It figures out its own class path (`provider`) and other metadata (`component_type`, `version`, etc.). +6. It packages all this into a `ComponentModel` object and returns it. + +**Loading (`load_component`) Flow:** + +```mermaid +sequenceDiagram + participant User + participant Loader as ComponentLoader (e.g., ListMemory.load_component) + participant Importer as Python Import System + participant ListMemClass as ListMemory (Class definition) + participant ListMemConfig as ListMemoryConfig (Pydantic Model) + participant NewMemory as New ListMemory Instance + + User->>+Loader: load_component(component_model) + Loader->>Loader: Reads provider ("autogen_core.memory.ListMemory") from model + Loader->>+Importer: Imports the class `autogen_core.memory.ListMemory` + Importer-->>-Loader: Returns ListMemory class object + Loader->>+ListMemClass: Checks if it's a valid Component class + Loader->>ListMemClass: Gets expected config schema (ListMemoryConfig) + Loader->>+ListMemConfig: Validates `config` dict from model against schema + ListMemConfig-->>-Loader: Returns validated ListMemoryConfig object + Loader->>+ListMemClass: Calls _from_config(validated_config) + ListMemClass->>+NewMemory: Creates new ListMemory instance using config + NewMemory-->>-ListMemClass: Returns new instance + ListMemClass-->>-Loader: Returns new instance + Loader-->>-User: Returns the new ListMemory instance +``` + +1. You call `ListMemory.load_component(memory_model)`. +2. The loader reads the `provider` string from `memory_model`. +3. It dynamically imports the class specified by `provider`. +4. It verifies this class is a proper `Component` subclass. +5. It finds the configuration schema defined by the class (e.g., `ListMemoryConfig`). +6. It validates the `config` dictionary from `memory_model` using this schema. +7. It calls the class's `_from_config()` method, passing the validated configuration object. +8. `_from_config()` uses the configuration data to initialize and return a new instance of the class (e.g., a new `ListMemory` with the loaded name and content). +9. The loader returns this newly created instance. + +**Code Glimpse:** + +The core logic lives in `_component_config.py`. + +* **`Component` Base Class:** Classes like `ListMemory` inherit from `Component`. This requires them to define `component_type`, `component_config_schema`, and implement `_to_config()` and `_from_config()`. + + ```python + # From: _component_config.py (Simplified Concept) + from pydantic import BaseModel + from typing import Type, TypeVar, Generic, ClassVar + # ... other imports + + ConfigT = TypeVar("ConfigT", bound=BaseModel) + + class Component(Generic[ConfigT]): # Generic over its config type + # Required Class Variables for Concrete Components + component_type: ClassVar[str] + component_config_schema: Type[ConfigT] + + # Required Instance Method for Saving + def _to_config(self) -> ConfigT: + raise NotImplementedError + + # Required Class Method for Loading + @classmethod + def _from_config(cls, config: ConfigT) -> Self: + raise NotImplementedError + + # dump_component and load_component are also part of the system + # (often inherited from base classes like ComponentBase) + def dump_component(self) -> ComponentModel: ... + @classmethod + def load_component(cls, model: ComponentModel | Dict[str, Any]) -> Self: ... + ``` + +* **`ComponentModel`:** As shown before, a Pydantic model to hold the `provider`, `config`, `type`, etc. + +* **`dump_component` Implementation (Conceptual):** + ```python + # Inside ComponentBase or similar + def dump_component(self) -> ComponentModel: + # 1. Get the specific config from the instance + obj_config: BaseModel = self._to_config() + config_dict = obj_config.model_dump() # Convert to dictionary + + # 2. Determine the provider string (class path) + provider_str = _type_to_provider_str(self.__class__) + # (Handle overrides like self.component_provider_override) + + # 3. Get other metadata + comp_type = self.component_type + comp_version = self.component_version + # ... description, label ... + + # 4. Create and return the ComponentModel + model = ComponentModel( + provider=provider_str, + config=config_dict, + component_type=comp_type, + version=comp_version, + # ... other metadata ... + ) + return model + ``` + +* **`load_component` Implementation (Conceptual):** + ```python + # Inside ComponentLoader or similar + @classmethod + def load_component(cls, model: ComponentModel | Dict[str, Any]) -> Self: + # 1. Ensure we have a ComponentModel object + if isinstance(model, dict): + loaded_model = ComponentModel(**model) + else: + loaded_model = model + + # 2. Import the class based on the provider string + provider_str = loaded_model.provider + # ... (handle WELL_KNOWN_PROVIDERS mapping) ... + module_path, class_name = provider_str.rsplit(".", 1) + module = importlib.import_module(module_path) + component_class = getattr(module, class_name) + + # 3. Validate the class and config + if not is_component_class(component_class): # Check it's a valid Component + raise TypeError(...) + schema = component_class.component_config_schema + validated_config = schema.model_validate(loaded_model.config) + + # 4. Call the class's factory method to create instance + instance = component_class._from_config(validated_config) + + # 5. Return the instance (after type checks) + return instance + ``` + +This system provides a powerful and consistent way to manage the building blocks of your AutoGen applications. + +## Wrapping Up + +Congratulations! You've reached the end of our core concepts tour. You now understand the `Component` model โ€“ AutoGen Core's standard way to define configurable, savable, and loadable building blocks like `Memory`, `ChatCompletionClient`, `Tool`, and even aspects of `Agents` themselves. + +* **Components** are like standardized Lego bricks. +* They use **`_to_config`** to describe their settings. +* They use **`_from_config`** to be built from settings. +* **`ComponentModel`** is the standard "box" storing the provider and config, enabling saving/loading (often via JSON/YAML). + +This promotes: +* **Modularity:** Easily swap implementations (e.g., different LLM clients). +* **Reproducibility:** Save and load exact agent system configurations. +* **Configuration:** Manage settings in external files. + +With these eight core concepts (`Agent`, `Messaging`, `AgentRuntime`, `Tool`, `ChatCompletionClient`, `ChatCompletionContext`, `Memory`, and `Component`), you have a solid foundation for understanding and building powerful multi-agent applications with AutoGen Core! + +Happy building! + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/AutoGen Core/index.md b/output/AutoGen Core/index.md new file mode 100644 index 0000000..17fee53 --- /dev/null +++ b/output/AutoGen Core/index.md @@ -0,0 +1,47 @@ +# Tutorial: AutoGen Core + +AutoGen Core helps you build applications with multiple **_Agents_** that can work together. +Think of it like creating a team of specialized workers (*Agents*) who can communicate and use tools to solve problems. +The **_AgentRuntime_** acts as the manager, handling messages and agent lifecycles. +Agents communicate using a **_Messaging System_** (Topics and Subscriptions), can use **_Tools_** for specific tasks, interact with language models via a **_ChatCompletionClient_** while managing conversation history with **_ChatCompletionContext_**, and remember information using **_Memory_**. +**_Components_** provide a standard way to define and configure these building blocks. + + +**Source Repository:** [https://github.com/microsoft/autogen/tree/e45a15766746d95f8cfaaa705b0371267bec812e/python/packages/autogen-core/src/autogen_core](https://github.com/microsoft/autogen/tree/e45a15766746d95f8cfaaa705b0371267bec812e/python/packages/autogen-core/src/autogen_core) + +```mermaid +flowchart TD + A0["0: Agent"] + A1["1: AgentRuntime"] + A2["2: Messaging System (Topic & Subscription)"] + A3["3: Component"] + A4["4: Tool"] + A5["5: ChatCompletionClient"] + A6["6: ChatCompletionContext"] + A7["7: Memory"] + A1 -- "Manages lifecycle" --> A0 + A1 -- "Uses for message routing" --> A2 + A0 -- "Uses LLM client" --> A5 + A0 -- "Executes tools" --> A4 + A0 -- "Accesses memory" --> A7 + A5 -- "Gets history from" --> A6 + A5 -- "Uses tool schema" --> A4 + A7 -- "Updates LLM context" --> A6 + A4 -- "Implemented as" --> A3 +``` + +## Chapters + +1. [Agent](01_agent.md) +2. [Messaging System (Topic & Subscription)](02_messaging_system__topic___subscription_.md) +3. [AgentRuntime](03_agentruntime.md) +4. [Tool](04_tool.md) +5. [ChatCompletionClient](05_chatcompletionclient.md) +6. [ChatCompletionContext](06_chatcompletioncontext.md) +7. [Memory](07_memory.md) +8. [Component](08_component.md) + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Browser Use/01_agent.md b/output/Browser Use/01_agent.md new file mode 100644 index 0000000..e2b5e02 --- /dev/null +++ b/output/Browser Use/01_agent.md @@ -0,0 +1,259 @@ +# Chapter 1: The Agent - Your Browser Assistant's Brain + +Welcome to the `Browser Use` tutorial! We're excited to help you learn how to automate web tasks using the power of Large Language Models (LLMs). + +Imagine you want to perform a simple task, like searching Google for "cute cat pictures" and clicking on the very first image result. For a human, this is easy! You open your browser, type in the search, look at the results, and click. + +But how do you tell a computer program to do this? It needs to understand the goal, look at the webpage like a human does, decide what to click or type next, and then actually perform those actions. This is where the **Agent** comes in. + +## What Problem Does the Agent Solve? + +The Agent is the core orchestrator, the "brain" or "project manager" of your browser automation task. It connects all the different pieces needed to achieve your goal. Without the Agent, you'd have a bunch of tools (like a browser controller and an LLM) but no central coordinator telling them what to do and when. + +The Agent solves the problem of turning a high-level goal (like "find cat pictures") into concrete actions on a webpage, using intelligence to adapt to what it "sees" in the browser. + +## Meet the Agent: Your Project Manager + +Think of the `Agent` like a project manager overseeing a complex task. It doesn't do *all* the work itself, but it coordinates specialists: + +1. **Receives the Task:** You give the Agent the overall goal (e.g., "Search Google for 'cute cat pictures' and click the first image result."). +2. **Consults the Planner (LLM):** The Agent shows the current state of the webpage (using the [BrowserContext](03_browsercontext.md)) to a Large Language Model (LLM). It asks, "Here's the goal, and here's what the webpage looks like right now. What should be the very next step?" The LLM acts as a smart planner, suggesting actions like "type 'cute cat pictures' into the search bar" or "click the element with index 5". We'll learn more about how we instruct the LLM in the [System Prompt](02_system_prompt.md) chapter. +3. **Manages History:** The Agent keeps track of everything that has happened so far โ€“ the actions taken, the results, and the state of the browser at each step. This "memory" is managed by the [Message Manager](06_message_manager.md) and helps the LLM make better decisions. +4. **Instructs the Doer (Controller):** Once the LLM suggests an action (like "click element 5"), the Agent tells the [Action Controller & Registry](05_action_controller___registry.md) to actually perform that specific action within the browser. +5. **Observes the Results (BrowserContext):** After the Controller acts, the Agent uses the [BrowserContext](03_browsercontext.md) again to see the new state of the webpage (e.g., the Google search results page). +6. **Repeats:** The Agent repeats steps 2-5, continuously consulting the LLM, instructing the Controller, and observing the results, until the original task is complete or it reaches a stopping point. + +## Using the Agent: A Simple Example + +Let's see how you might use the Agent in Python code. Don't worry about understanding every detail yet; focus on the main idea. We're setting up the Agent with our task and the necessary components. + +```python +# --- Simplified Example --- +# We need to import the necessary parts from the browser_use library +from browser_use import Agent, Browser, Controller, BrowserConfig, BrowserContextConfig +# Assume 'my_llm' is your configured Large Language Model (e.g., from OpenAI, Anthropic) +from my_llm_setup import my_llm # Placeholder for your specific LLM setup + +# 1. Define the task for the Agent +my_task = "Go to google.com, search for 'cute cat pictures', and click the first image result." + +# 2. Basic browser configuration (we'll learn more later) +browser_config = BrowserConfig() # Default settings +context_config = BrowserContextConfig() # Default settings + +# 3. Initialize the components the Agent needs +# The Browser manages the underlying browser application +browser = Browser(config=browser_config) +# The Controller knows *how* to perform actions like 'click' or 'type' +controller = Controller() + +async def main(): + # The BrowserContext represents a single browser tab/window environment + # It uses the Browser and its configuration + async with BrowserContext(browser=browser, config=context_config) as browser_context: + + # 4. Create the Agent instance! + agent = Agent( + task=my_task, + llm=my_llm, # The "brain" - the Language Model + browser_context=browser_context, # The "eyes" - interacts with the browser tab + controller=controller # The "hands" - executes actions + # Many other settings can be configured here! + ) + + print(f"Agent created. Starting task: {my_task}") + + # 5. Run the Agent! This starts the loop. + # It will keep taking steps until the task is done or it hits the limit. + history = await agent.run(max_steps=15) # Limit steps for safety + + # 6. Check the result + if history.is_done() and history.is_successful(): + print("โœ… Agent finished the task successfully!") + print(f"Final message from agent: {history.final_result()}") + else: + print("โš ๏ธ Agent stopped. Maybe max_steps reached or task wasn't completed successfully.") + + # The 'async with' block automatically cleans up the browser_context + await browser.close() # Close the browser application + +# Run the asynchronous function +import asyncio +asyncio.run(main()) +``` + +**What happens when you run this?** + +1. An `Agent` object is created with your task, the LLM, the browser context, and the controller. +2. Calling `agent.run(max_steps=15)` starts the main loop. +3. The Agent gets the initial state of the browser (likely a blank page). +4. It asks the LLM what to do. The LLM might say "Go to google.com". +5. The Agent tells the Controller to execute the "go to URL" action. +6. The browser navigates to Google. +7. The Agent gets the new state (Google's homepage). +8. It asks the LLM again. The LLM says "Type 'cute cat pictures' into the search bar". +9. The Agent tells the Controller to type the text. +10. This continues step-by-step: pressing Enter, seeing results, asking the LLM, clicking the image. +11. Eventually, the LLM will hopefully tell the Agent the task is "done". +12. `agent.run()` finishes and returns the `history` object containing details of what happened. + +## How it Works Under the Hood: The Agent Loop + +Let's visualize the process with a simple diagram: + +```mermaid +sequenceDiagram + participant User + participant Agent + participant LLM + participant Controller + participant BC as BrowserContext + + User->>Agent: Start task("Search Google for cats...") + Note over Agent: Agent Loop Starts + Agent->>BC: Get current state (e.g., blank page) + BC-->>Agent: Current Page State + Agent->>LLM: What's next? (Task + State + History) + LLM-->>Agent: Plan: [Action: Type 'cute cat pictures', Action: Press Enter] + Agent->>Controller: Execute: type_text(...) + Controller->>BC: Perform type action + Agent->>Controller: Execute: press_keys('Enter') + Controller->>BC: Perform press action + Agent->>BC: Get new state (search results page) + BC-->>Agent: New Page State + Agent->>LLM: What's next? (Task + New State + History) + LLM-->>Agent: Plan: [Action: click_element(index=5)] + Agent->>Controller: Execute: click_element(index=5) + Controller->>BC: Perform click action + Note over Agent: Loop continues until done... + LLM-->>Agent: Plan: [Action: done(success=True, text='Found cat picture!')] + Agent->>Controller: Execute: done(...) + Controller-->>Agent: ActionResult (is_done=True) + Note over Agent: Agent Loop Ends + Agent->>User: Return History (Task Complete) + +``` + +The core of the `Agent` lives in the `agent/service.py` file. The `Agent` class manages the overall process. + +1. **Initialization (`__init__`)**: When you create an `Agent`, it sets up its internal state, stores the task, the LLM, the controller, and prepares the [Message Manager](06_message_manager.md) to keep track of the conversation history. It also figures out the best way to talk to the specific LLM you provided. + + ```python + # --- File: agent/service.py (Simplified __init__) --- + class Agent: + def __init__( + self, + task: str, + llm: BaseChatModel, + browser_context: BrowserContext, + controller: Controller, + # ... other settings like use_vision, max_failures, etc. + **kwargs + ): + self.task = task + self.llm = llm + self.browser_context = browser_context + self.controller = controller + self.settings = AgentSettings(**kwargs) # Store various settings + self.state = AgentState() # Internal state (step count, failures, etc.) + + # Setup message manager for history, using the task and system prompt + self._message_manager = MessageManager( + task=self.task, + system_message=self.settings.system_prompt_class(...).get_system_message(), + settings=MessageManagerSettings(...) + # ... more setup ... + ) + # ... other initializations ... + logger.info("Agent initialized.") + ``` + +2. **Running the Task (`run`)**: The `run` method orchestrates the main loop. It calls the `step` method repeatedly until the task is marked as done, an error occurs, or `max_steps` is reached. + + ```python + # --- File: agent/service.py (Simplified run method) --- + class Agent: + # ... (init) ... + async def run(self, max_steps: int = 100) -> AgentHistoryList: + self._log_agent_run() # Log start event + try: + for step_num in range(max_steps): + if self.state.stopped or self.state.consecutive_failures >= self.settings.max_failures: + break # Stop conditions + + # Wait if paused + while self.state.paused: await asyncio.sleep(0.2) + + step_info = AgentStepInfo(step_number=step_num, max_steps=max_steps) + await self.step(step_info) # <<< Execute one step of the loop + + if self.state.history.is_done(): + await self.log_completion() # Log success/failure + break # Exit loop if agent signaled 'done' + else: + logger.info("Max steps reached.") # Ran out of steps + + finally: + # ... (cleanup, telemetry, potentially save history/gif) ... + pass + return self.state.history # Return the recorded history + ``` + +3. **Taking a Step (`step`)**: This is the heart of the loop. In each step, the Agent: + * Gets the current browser state (`browser_context.get_state()`). + * Adds this state to the history via the `_message_manager`. + * Asks the LLM for the next action (`get_next_action()`). + * Tells the `Controller` to execute the action(s) (`multi_act()`). + * Records the outcome in the history. + * Handles any errors that might occur. + + ```python + # --- File: agent/service.py (Simplified step method) --- + class Agent: + # ... (init, run) ... + async def step(self, step_info: Optional[AgentStepInfo] = None) -> None: + logger.info(f"๐Ÿ“ Step {self.state.n_steps}") + state = None + model_output = None + result: list[ActionResult] = [] + + try: + # 1. Get current state from the browser + state = await self.browser_context.get_state() # Uses BrowserContext + + # 2. Add state (+ previous result) to message history for LLM context + self._message_manager.add_state_message(state, self.state.last_result, ...) + + # 3. Get LLM's decision on the next action(s) + input_messages = self._message_manager.get_messages() + model_output = await self.get_next_action(input_messages) # Calls the LLM + + self.state.n_steps += 1 # Increment step counter + + # 4. Execute the action(s) using the Controller + result = await self.multi_act(model_output.action) # Uses Controller + self.state.last_result = result # Store result for next step's context + + # 5. Record step details (actions, results, state snapshot) + self._make_history_item(model_output, state, result, ...) + + self.state.consecutive_failures = 0 # Reset failure count on success + + except Exception as e: + # Handle errors, increment failure count, maybe retry later + result = await self._handle_step_error(e) + self.state.last_result = result + # ... (finally block for logging/telemetry) ... + ``` + +## Conclusion + +You've now met the `Agent`, the central coordinator in `Browser Use`. You learned that it acts like a project manager, taking your high-level task, consulting an LLM for step-by-step planning, managing the history, and instructing a `Controller` to perform actions within a `BrowserContext`. + +The Agent's effectiveness heavily relies on how well we instruct the LLM planner. In the next chapter, we'll dive into exactly that: crafting the **System Prompt** to guide the LLM's behavior. + +[Next Chapter: System Prompt](02_system_prompt.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Browser Use/02_system_prompt.md b/output/Browser Use/02_system_prompt.md new file mode 100644 index 0000000..84328f1 --- /dev/null +++ b/output/Browser Use/02_system_prompt.md @@ -0,0 +1,235 @@ +# Chapter 2: The System Prompt - Setting the Rules for Your AI Assistant + +In [Chapter 1: The Agent](01_agent.md), we met the `Agent`, our project manager for automating browser tasks. We saw it consults a Large Language Model (LLM) โ€“ the "planner" โ€“ to decide the next steps based on the current state of the webpage. But how does the Agent tell the LLM *how* it should think, behave, and respond? Just giving it the task isn't enough! + +Imagine hiring a new assistant. You wouldn't just say, "Organize my files!" You'd give them specific instructions: "Please sort the files alphabetically by client name, put them in the blue folders, and give me a summary list when you're done." Without these rules, the assistant might do something completely different! + +The **System Prompt** solves this exact problem for our LLM. It's the set of core instructions and rules we give the LLM at the very beginning, telling it exactly how to act as a browser automation assistant and, crucially, how to format its responses so the `Agent` can understand them. + +## What is the System Prompt? The AI's Rulebook + +Think of the System Prompt like the AI assistant's fundamental operating manual, its "Prime Directive," or the rules of a board game. It defines: + +1. **Persona:** "You are an AI agent designed to automate browser tasks." +2. **Goal:** "Your goal is to accomplish the ultimate task..." +3. **Input:** How to understand the information it receives about the webpage ([DOM Representation](04_dom_representation.md)). +4. **Capabilities:** What actions it can take ([Action Controller & Registry](05_action_controller___registry.md)). +5. **Limitations:** What it *shouldn't* do (e.g., hallucinate actions). +6. **Response Format:** The *exact* structure (JSON format) its thoughts and planned actions must follow. + +Without this rulebook, the LLM might just chat casually, give vague suggestions, or produce output in a format the `Agent` code can't parse. The System Prompt ensures the LLM behaves like the specialized tool we need. + +## Why is the Response Format So Important? + +This is a critical point. The `Agent` code isn't a human reading the LLM's response. It's a program expecting data in a very specific structure. The System Prompt tells the LLM to *always* respond in a JSON format that looks something like this (simplified): + +```json +{ + "current_state": { + "evaluation_previous_goal": "Success - Found the search bar.", + "memory": "On google.com main page. Need to search for cats.", + "next_goal": "Type 'cute cat pictures' into the search bar." + }, + "action": [ + { + "input_text": { + "index": 5, // The index of the search bar element + "text": "cute cat pictures" + } + }, + { + "press_keys": { + "keys": "Enter" // Press the Enter key + } + } + ] +} +``` + +The `Agent` can easily read this JSON: +* It understands the LLM's thoughts (`current_state`). +* It sees the exact `action` list the LLM wants to perform. +* It passes these actions (like `input_text` or `press_keys`) to the [Action Controller & Registry](05_action_controller___registry.md) to execute them in the browser. + +If the LLM responded with just "Okay, I'll type 'cute cat pictures' into the search bar and press Enter," the `Agent` wouldn't know *which* element index corresponds to the search bar or exactly which actions to call. The strict JSON format is essential for automation. + +## A Peek Inside the Rulebook (`system_prompt.md`) + +The actual instructions live in a text file within the `Browser Use` library: `browser_use/agent/system_prompt.md`. It's quite detailed, but here's a tiny snippet focusing on the response format rule: + +```markdown +# Response Rules +1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format: +{{"current_state": {{"evaluation_previous_goal": "...", +"memory": "...", +"next_goal": "..."}}, +"action":[{{"one_action_name": {{...}}}}, ...]}} + +2. ACTIONS: You can specify multiple actions in the list... Use maximum {{max_actions}} actions... +``` +*(This is heavily simplified! The real file has many more rules about element interaction, error handling, task completion, etc.)* + +This file clearly defines the JSON structure (`current_state` and `action`) and other crucial behaviors required from the LLM. + +## How the Agent Uses the System Prompt + +The `Agent` uses a helper class called `SystemPrompt` (found in `agent/prompts.py`) to manage these rules. Here's the flow: + +1. **Loading:** When you create an `Agent`, it internally creates a `SystemPrompt` object. This object reads the rules from the `system_prompt.md` file. +2. **Formatting:** The `SystemPrompt` object formats these rules into a special `SystemMessage` object that LLMs understand as foundational instructions. +3. **Conversation Start:** This `SystemMessage` is given to the [Message Manager](06_message_manager.md), which keeps track of the conversation history with the LLM. The `SystemMessage` becomes the *very first message*, setting the context for all future interactions in that session. + +Think of it like starting a meeting: the first thing you do is state the agenda and rules (System Prompt), and then the discussion (LLM interaction) follows based on that foundation. + +Let's look at a simplified view of the `SystemPrompt` class loading the rules: + +```python +# --- File: agent/prompts.py (Simplified) --- +import importlib.resources # Helps find files within the installed library +from langchain_core.messages import SystemMessage # Special message type for LLMs + +class SystemPrompt: + def __init__(self, action_description: str, max_actions_per_step: int = 10): + # We ignore these details for now + self.default_action_description = action_description + self.max_actions_per_step = max_actions_per_step + self._load_prompt_template() # <--- Loads the rules file + + def _load_prompt_template(self) -> None: + """Load the prompt rules from the system_prompt.md file.""" + try: + # Finds the 'system_prompt.md' file inside the browser_use package + filepath = importlib.resources.files('browser_use.agent').joinpath('system_prompt.md') + with filepath.open('r') as f: + self.prompt_template = f.read() # Read the text content + print("System Prompt template loaded successfully!") + except Exception as e: + print(f"Error loading system prompt: {e}") + self.prompt_template = "Error: Could not load prompt." # Fallback + + def get_system_message(self) -> SystemMessage: + """Format the loaded rules into a message for the LLM.""" + # Replace placeholders like {{max_actions}} with actual values + prompt = self.prompt_template.format(max_actions=self.max_actions_per_step) + # Wrap the final rules text in a SystemMessage object + return SystemMessage(content=prompt) + +# --- How it plugs into Agent creation (Conceptual) --- +# from browser_use import Agent, SystemPrompt +# from my_llm_setup import my_llm # Your LLM +# ... other setup ... + +# When you create an Agent: +# agent = Agent( +# task="Find cat pictures", +# llm=my_llm, +# browser_context=..., +# controller=..., +# # The Agent's __init__ method does something like this internally: +# # system_prompt_obj = SystemPrompt(action_description="...", max_actions_per_step=10) +# # system_message_for_llm = system_prompt_obj.get_system_message() +# # This system_message_for_llm is then passed to the Message Manager. +# ) +``` + +This code shows how the `SystemPrompt` class finds and reads the `system_prompt.md` file and prepares the instructions as a `SystemMessage` ready for the LLM conversation. + +## Under the Hood: Initialization and Conversation Flow + +Let's visualize how the System Prompt fits into the Agent's setup and interaction loop: + +```mermaid +sequenceDiagram + participant User + participant Agent_Init as Agent Initialization + participant SP as SystemPrompt Class + participant MM as Message Manager + participant Agent_Run as Agent Run Loop + participant LLM + + User->>Agent_Init: Create Agent(task, llm, ...) + Note over Agent_Init: Agent needs the rules! + Agent_Init->>SP: Create SystemPrompt(...) + SP->>SP: _load_prompt_template() reads system_prompt.md + SP-->>Agent_Init: SystemPrompt instance + Agent_Init->>SP: get_system_message() + SP-->>Agent_Init: system_message (The Formatted Rules) + Note over Agent_Init: Pass rules to conversation manager + Agent_Init->>MM: Initialize MessageManager(task, system_message) + MM->>MM: Store system_message as message #1 + MM-->>Agent_Init: MessageManager instance ready + Agent_Init-->>User: Agent created and ready + + User->>Agent_Run: agent.run() starts the task + Note over Agent_Run: Agent needs context for LLM + Agent_Run->>MM: get_messages() + MM-->>Agent_Run: [system_message, user_message(state), ...] + Note over Agent_Run: Send rules + current state to LLM + Agent_Run->>LLM: Ask for next action (Input includes rules) + LLM-->>Agent_Run: JSON response (LLM followed rules!) + Agent_Run->>MM: add_model_output(...) + Note over Agent_Run: Loop continues... +``` + +Internally, the `Agent`'s initialization code (`__init__` in `agent/service.py`) explicitly creates the `SystemPrompt` and passes its output to the `MessageManager`: + +```python +# --- File: agent/service.py (Simplified Agent __init__) --- +# ... other imports ... +from browser_use.agent.prompts import SystemPrompt # Import the class +from browser_use.agent.message_manager.service import MessageManager, MessageManagerSettings + +class Agent: + def __init__( + self, + task: str, + llm: BaseChatModel, + browser_context: BrowserContext, + controller: Controller, + system_prompt_class: Type[SystemPrompt] = SystemPrompt, # Allows customizing the prompt class + max_actions_per_step: int = 10, + # ... other parameters ... + **kwargs + ): + self.task = task + self.llm = llm + # ... store other components ... + + # Get the list of available actions from the controller + self.available_actions = controller.registry.get_prompt_description() + + # 1. Create the SystemPrompt instance using the provided class + system_prompt_instance = system_prompt_class( + action_description=self.available_actions, + max_actions_per_step=max_actions_per_step, + ) + + # 2. Get the formatted SystemMessage (the rules) + system_message = system_prompt_instance.get_system_message() + + # 3. Initialize the Message Manager with the task and the rules + self._message_manager = MessageManager( + task=self.task, + system_message=system_message, # <--- Pass the rules here! + settings=MessageManagerSettings(...) + # ... other message manager setup ... + ) + # ... rest of initialization ... + logger.info("Agent initialized with System Prompt.") +``` + +When the `Agent` runs its loop (`agent.run()` calls `agent.step()`), it asks the `MessageManager` for the current conversation history (`self._message_manager.get_messages()`). The `MessageManager` always ensures that the `SystemMessage` (containing the rules) is the very first item in that history list sent to the LLM. + +## Conclusion + +The System Prompt is the essential rulebook that governs the LLM's behavior within the `Browser Use` framework. It tells the LLM how to interpret the browser state, what actions it can take, and most importantly, dictates the exact JSON format for its responses. This structured communication is key to enabling the `Agent` to reliably understand the LLM's plan and execute browser automation tasks. + +Without a clear System Prompt, the LLM would be like an untrained assistant โ€“ potentially intelligent, but unable to follow the specific procedures needed for the job. + +Now that we understand how the `Agent` gets its fundamental instructions, how does it actually perceive the webpage it's supposed to interact with? In the next chapter, we'll explore the component responsible for representing the browser's state: the [BrowserContext](03_browsercontext.md). + +[Next Chapter: BrowserContext](03_browsercontext.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Browser Use/03_browsercontext.md b/output/Browser Use/03_browsercontext.md new file mode 100644 index 0000000..e642caa --- /dev/null +++ b/output/Browser Use/03_browsercontext.md @@ -0,0 +1,295 @@ +# Chapter 3: BrowserContext - The Agent's Isolated Workspace + +In the [previous chapter](02_system_prompt.md), we learned how the `System Prompt` acts as the rulebook for the AI assistant (LLM) that guides our `Agent`. We know the Agent uses the LLM to decide *what* to do next based on the current situation in the browser. + +But *where* does the Agent actually "see" the webpage and perform its actions? How does it keep track of the current website address (URL), the page content, and things like cookies, all while staying focused on its specific task without getting mixed up with your other browsing? + +This is where the **BrowserContext** comes in. + +## What Problem Does BrowserContext Solve? + +Imagine you ask your `Agent` to log into a specific online shopping website and check your order status. You might already be logged into that same website in your regular browser window with your personal account. + +If the Agent just used your main browser window, it might: +1. Get confused by your existing login. +2. Accidentally use your personal cookies or saved passwords. +3. Interfere with other tabs you have open. + +We need a way to give the Agent its *own*, clean, separate browsing environment for each task. It needs an isolated "workspace" where it can open websites, log in, click buttons, and manage its own cookies without affecting anything else. + +The `BrowserContext` solves this by representing a single, isolated browser session. + +## Meet the BrowserContext: Your Agent's Private Browser Window + +Think of a `BrowserContext` like opening a brand new **Incognito Window** or creating a **separate User Profile** in your web browser (like Chrome or Firefox). + +* **It's Isolated:** What happens in one `BrowserContext` doesn't affect others or your main browser session. It has its own cookies, its own history (for that session), and its own set of tabs. +* **It Manages State:** It keeps track of everything important about the current web session the Agent is working on: + * The current URL. + * Which tabs are open within its "window". + * Cookies specific to that session. + * The structure and content of the current webpage (the DOM - Document Object Model, which we'll explore in the [next chapter](04_dom_representation.md)). +* **It's the Agent's Viewport:** The `Agent` looks through the `BrowserContext` to "see" the current state of the webpage. When the Agent decides to perform an action (like clicking a button), it tells the [Action Controller](05_action_controller___registry.md) to perform it *within* that specific `BrowserContext`. + +Essentially, the `BrowserContext` is like a dedicated, clean desk or workspace given to the Agent for its specific job. + +## Using the BrowserContext + +Before we can have an isolated session (`BrowserContext`), we first need the main browser application itself. This is handled by the `Browser` class. Think of `Browser` as the entire Chrome or Firefox application installed on your computer, while `BrowserContext` is just one window or profile within that application. + +Here's a simplified example of how you might set up a `Browser` and then create a `BrowserContext` to navigate to a page: + +```python +import asyncio +# Import necessary classes +from browser_use import Browser, BrowserConfig, BrowserContext, BrowserContextConfig + +async def main(): + # 1. Configure the main browser application (optional, defaults are usually fine) + browser_config = BrowserConfig(headless=False) # Show the browser window + + # 2. Create the main Browser instance + # This might launch a browser application in the background (or connect to one) + browser = Browser(config=browser_config) + print("Browser application instance created.") + + # 3. Configure the specific session/window (optional) + context_config = BrowserContextConfig( + user_agent="MyCoolAgent/1.0", # Example: Set a custom user agent + cookies_file="my_session_cookies.json" # Example: Save/load cookies + ) + + # 4. Create the isolated BrowserContext (like opening an incognito window) + # We use 'async with' to ensure it cleans up automatically afterwards + async with browser.new_context(config=context_config) as browser_context: + print(f"BrowserContext created (ID: {browser_context.context_id}).") + + # 5. Use the context to interact with the browser session + start_url = "https://example.com" + print(f"Navigating to: {start_url}") + await browser_context.navigate_to(start_url) + + # 6. Get information *from* the context + current_state = await browser_context.get_state() # Get current page info + print(f"Current page title: {current_state.title}") + print(f"Current page URL: {current_state.url}") + + # The Agent would use this 'browser_context' object to see the page + # and tell the Controller to perform actions within it. + + print("BrowserContext closed automatically.") + + # 7. Close the main browser application when done + await browser.close() + print("Browser application closed.") + +# Run the asynchronous code +asyncio.run(main()) +``` + +**What happens here?** + +1. We set up a `BrowserConfig` (telling it *not* to run headless so we can see the window). +2. We create a `Browser` instance, which represents the overall browser program. +3. We create a `BrowserContextConfig` to specify settings for our isolated session (like a custom name or where to save cookies). +4. Crucially, `browser.new_context(...)` creates our isolated session. The `async with` block ensures this session is properly closed later. +5. We use methods *on the `browser_context` object* like `navigate_to()` to control *this specific session*. +6. We use `browser_context.get_state()` to get information about the current page within *this session*. The `Agent` heavily relies on this method. +7. After the `async with` block finishes, the `browser_context` is closed (like closing the incognito window), and finally, we close the main `browser` application. + +## How it Works Under the Hood + +When the `Agent` needs to understand the current situation to decide the next step, it asks the `BrowserContext` for the latest state using the `get_state()` method. What happens then? + +1. **Wait for Stability:** The `BrowserContext` first waits for the webpage to finish loading and for network activity to settle down (`_wait_for_page_and_frames_load`). This prevents the Agent from acting on an incomplete page. +2. **Analyze the Page:** It then uses the [DOM Representation](04_dom_representation.md) service (`DomService`) to analyze the current HTML structure of the page. This service figures out which elements are visible, interactive (buttons, links, input fields), and where they are. +3. **Capture Visuals:** It often takes a screenshot of the current view (`take_screenshot`). This can be helpful for advanced agents or debugging. +4. **Gather Metadata:** It gets the current URL, page title, and information about any other tabs open *within this context*. +5. **Package the State:** All this information (DOM structure, URL, title, screenshot, etc.) is bundled into a `BrowserState` object. +6. **Return to Agent:** The `BrowserContext` returns this `BrowserState` object to the `Agent`. The Agent then uses this information (often sending it to the LLM) to plan its next action. + +Here's a simplified diagram of the `get_state()` process: + +```mermaid +sequenceDiagram + participant Agent + participant BC as BrowserContext + participant PlaywrightPage as Underlying Browser Page + participant DomService as DOM Service + + Agent->>BC: get_state() + Note over BC: Wait for page to be ready... + BC->>PlaywrightPage: Ensure page/network is stable + PlaywrightPage-->>BC: Page is ready + Note over BC: Analyze the page content... + BC->>DomService: Get simplified DOM structure + interactive elements + DomService-->>BC: DOMState (element tree, etc.) + Note over BC: Get visuals and metadata... + BC->>PlaywrightPage: Take screenshot() + PlaywrightPage-->>BC: Screenshot data + BC->>PlaywrightPage: Get URL, Title + PlaywrightPage-->>BC: URL, Title data + Note over BC: Combine everything... + BC->>BC: Create BrowserState object + BC-->>Agent: Return BrowserState +``` + +Let's look at some simplified code snippets from the library. + +The `BrowserContext` is initialized (`__init__` in `browser/context.py`) with its configuration and a reference to the main `Browser` instance that created it. + +```python +# --- File: browser/context.py (Simplified __init__) --- +import uuid +# ... other imports ... +if TYPE_CHECKING: + from browser_use.browser.browser import Browser # Link to the Browser class + +@dataclass +class BrowserContextConfig: # Configuration settings + # ... various settings like user_agent, cookies_file, window_size ... + pass + +@dataclass +class BrowserSession: # Holds the actual Playwright context + context: PlaywrightBrowserContext # The underlying Playwright object + cached_state: Optional[BrowserState] = None # Stores the last known state + +class BrowserContext: + def __init__( + self, + browser: 'Browser', # Reference to the main Browser instance + config: BrowserContextConfig = BrowserContextConfig(), + # ... other optional state ... + ): + self.context_id = str(uuid.uuid4()) # Unique ID for this session + self.config = config # Store the configuration + self.browser = browser # Store the reference to the parent Browser + + # The actual Playwright session is created later, when needed + self.session: BrowserSession | None = None + logger.debug(f"BrowserContext object created (ID: {self.context_id}). Session not yet initialized.") + + # The 'async with' statement calls __aenter__ which initializes the session + async def __aenter__(self): + await self._initialize_session() # Creates the actual browser window/tab + return self + + async def _initialize_session(self): + # ... (complex setup code happens here) ... + # Gets the main Playwright browser from self.browser + playwright_browser = await self.browser.get_playwright_browser() + # Creates the isolated Playwright context (like the incognito window) + context = await self._create_context(playwright_browser) + # Creates the BrowserSession to hold the context and state + self.session = BrowserSession(context=context, cached_state=None) + logger.debug(f"BrowserContext session initialized (ID: {self.context_id}).") + # ... (sets up the initial page) ... + return self.session + + # ... other methods like navigate_to, close, etc. ... +``` + +The `get_state` method orchestrates fetching the current information from the browser session. + +```python +# --- File: browser/context.py (Simplified get_state and helpers) --- +# ... other imports ... +from browser_use.dom.service import DomService # Imports the DOM analyzer +from browser_use.browser.views import BrowserState # Imports the state structure + +class BrowserContext: + # ... (init, aenter, etc.) ... + + async def get_state(self) -> BrowserState: + """Get the current state of the browser session.""" + logger.debug(f"Getting state for context {self.context_id}...") + # 1. Make sure the page is loaded and stable + await self._wait_for_page_and_frames_load() + + # 2. Get the actual Playwright session object + session = await self.get_session() + + # 3. Update the state (this does the heavy lifting) + session.cached_state = await self._update_state() + logger.debug(f"State update complete for {self.context_id}.") + + # 4. Optionally save cookies if configured + if self.config.cookies_file: + asyncio.create_task(self.save_cookies()) + + return session.cached_state + + async def _wait_for_page_and_frames_load(self, timeout_overwrite: float | None = None): + """Ensures page is fully loaded before continuing.""" + # ... (complex logic to wait for network idle, minimum times) ... + page = await self.get_current_page() + await page.wait_for_load_state('load', timeout=5000) # Simplified wait + logger.debug("Page load/network stability checks passed.") + await asyncio.sleep(self.config.minimum_wait_page_load_time) # Ensure minimum wait + + async def _update_state(self) -> BrowserState: + """Fetches all info and builds the BrowserState.""" + session = await self.get_session() + page = await self.get_current_page() # Get the active Playwright page object + + try: + # Use DomService to analyze the page content + dom_service = DomService(page) + # Get the simplified DOM tree and interactive elements map + content_info = await dom_service.get_clickable_elements( + highlight_elements=self.config.highlight_elements, + # ... other DOM options ... + ) + + # Take a screenshot + screenshot_b64 = await self.take_screenshot() + + # Get URL, Title, Tabs, Scroll info etc. + url = page.url + title = await page.title() + tabs = await self.get_tabs_info() + pixels_above, pixels_below = await self.get_scroll_info(page) + + # Create the BrowserState object + browser_state = BrowserState( + element_tree=content_info.element_tree, + selector_map=content_info.selector_map, + url=url, + title=title, + tabs=tabs, + screenshot=screenshot_b64, + pixels_above=pixels_above, + pixels_below=pixels_below, + ) + return browser_state + + except Exception as e: + logger.error(f'Failed to update state: {str(e)}') + # Maybe return old state or raise error + raise BrowserError("Failed to get browser state") from e + + async def take_screenshot(self, full_page: bool = False) -> str: + """Takes a screenshot and returns base64 encoded string.""" + page = await self.get_current_page() + screenshot_bytes = await page.screenshot(full_page=full_page, animations='disabled') + return base64.b64encode(screenshot_bytes).decode('utf-8') + + # ... many other helper methods (_get_current_page, get_tabs_info, etc.) ... + +``` +This shows how `BrowserContext` acts as a manager for a specific browser session, using underlying tools (like Playwright and `DomService`) to gather the necessary information (`BrowserState`) that the `Agent` needs to operate. + +## Conclusion + +The `BrowserContext` is a fundamental concept in `Browser Use`. It provides the necessary **isolated environment** for the `Agent` to perform its tasks, much like an incognito window or a separate browser profile. It manages the session's state (URL, cookies, tabs, page content) and provides the `Agent` with a snapshot of the current situation via the `get_state()` method. + +Understanding the `BrowserContext` helps clarify *where* the Agent works. Now, how does the Agent actually understand the *content* of the webpage within that context? How is the complex structure of a webpage represented in a way the Agent (and the LLM) can understand? + +In the next chapter, we'll dive into exactly that: the [DOM Representation](04_dom_representation.md). + +[Next Chapter: DOM Representation](04_dom_representation.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Browser Use/04_dom_representation.md b/output/Browser Use/04_dom_representation.md new file mode 100644 index 0000000..e83263c --- /dev/null +++ b/output/Browser Use/04_dom_representation.md @@ -0,0 +1,316 @@ +# Chapter 4: DOM Representation - Mapping the Webpage + +In the [previous chapter](03_browsercontext.md), we learned about the `BrowserContext`, the Agent's private workspace for browsing. We saw that the Agent uses `browser_context.get_state()` to get a snapshot of the current webpage. But how does the Agent actually *understand* the content of that snapshot? + +Imagine you're looking at the Google homepage. You instantly recognize the logo, the search bar, and the buttons. But a computer program just sees a wall of code (HTML). How can our `Agent` figure out: "This rectangular box is the search bar I need to type into," or "This specific image link is the first result I should click"? + +This is the problem solved by **DOM Representation**. + +## What Problem Does DOM Representation Solve? + +Webpages are built using HTML (HyperText Markup Language), which describes the structure and content. Your browser reads this HTML and creates an internal, structured representation called the **Document Object Model (DOM)**. It's like the browser builds a detailed blueprint or an outline from the HTML instructions. + +However, this raw DOM blueprint is incredibly complex and contains lots of information irrelevant to our Agent's task. The Agent doesn't need to know about every single tiny visual detail; it needs a *simplified map* focused on what's important for interaction: + +1. **What elements are on the page?** (buttons, links, input fields, text) +2. **Are they visible to a user?** (Hidden elements shouldn't be interacted with) +3. **Are they interactive?** (Can you click it? Can you type in it?) +4. **How can the Agent refer to them?** (We need a simple way to say "click *this* button") + +DOM Representation solves the problem of translating the complex, raw DOM blueprint into a simplified, structured map that highlights the interactive "landmarks" and pathways the Agent can use. + +## Meet `DomService`: The Map Maker + +The component responsible for creating this map is the `DomService`. Think of it as a cartographer specializing in webpages. + +When the `Agent` (via the `BrowserContext`) asks for the current state of the page, the `BrowserContext` employs the `DomService` to analyze the page's live DOM. + +Here's what the `DomService` does: + +1. **Examines the Live Page:** It looks at the current structure rendered in the browser tab, not just the initial HTML source code (because JavaScript can change the page after it loads). +2. **Identifies Elements:** It finds all the meaningful elements like buttons, links, input fields, and text blocks. +3. **Checks Properties:** For each element, it determines crucial properties: + * **Visibility:** Is it actually displayed on the screen? + * **Interactivity:** Is it something a user can click, type into, or otherwise interact with? + * **Position:** Where is it located (roughly)? +4. **Assigns Interaction Indices:** This is key! For elements deemed interactive and visible, `DomService` assigns a unique number, called a `highlight_index` (like `[5]`, `[12]`, etc.). This gives the Agent and the LLM a simple, unambiguous way to refer to specific elements. +5. **Builds a Structured Tree:** It organizes this information into a simplified tree structure (`element_tree`) that reflects the page layout but is much easier to process than the full DOM. +6. **Creates an Index Map:** It generates a `selector_map`, which is like an index in a book, mapping each `highlight_index` directly to its corresponding element node in the tree. + +The final output is a `DOMState` object containing the simplified `element_tree` and the handy `selector_map`. This `DOMState` is then included in the `BrowserState` that `BrowserContext.get_state()` returns to the Agent. + +## The Output: `DOMState` - The Agent's Map + +The `DOMState` object produced by `DomService` has two main parts: + +1. **`element_tree`:** This is the root of our simplified map, represented as a `DOMElementNode` object (defined in `dom/views.py`). Each node in the tree can be either an element (`DOMElementNode`) or a piece of text (`DOMTextNode`). `DOMElementNode`s contain information like the tag name (`\n[7]Images" + +// And respond with: +{ + "current_state": { + "evaluation_previous_goal": "...", + "memory": "On Google homepage, need to search for cats.", + "next_goal": "Type 'cute cats' into the search bar [5]." + }, + "action": [ + { + "input_text": { + "index": 5, // <-- Uses the highlight_index from the DOM map! + "text": "cute cats" + } + } + // ... maybe press Enter action ... + ] +} +``` + +## Code Example: Seeing the Map + +We don't usually interact with `DomService` directly. Instead, we get its output via the `BrowserContext`. Let's revisit the example from Chapter 3 and see where the DOM representation fits: + +```python +import asyncio +from browser_use import Browser, BrowserConfig, BrowserContext, BrowserContextConfig + +async def main(): + browser_config = BrowserConfig(headless=False) + browser = Browser(config=browser_config) + context_config = BrowserContextConfig() + + async with browser.new_context(config=context_config) as browser_context: + # Navigate to a page (e.g., Google) + await browser_context.navigate_to("https://www.google.com") + + print("Getting current page state...") + # This call uses DomService internally to generate the DOM representation + current_state = await browser_context.get_state() + + print(f"\nCurrent Page URL: {current_state.url}") + print(f"Current Page Title: {current_state.title}") + + # Accessing the DOM Representation parts within the BrowserState + print("\n--- DOM Representation Details ---") + # The element_tree is the root node of our simplified DOM map + if current_state.element_tree: + print(f"Root element tag of simplified tree: <{current_state.element_tree.tag_name}>") + else: + print("Element tree is empty.") + + # The selector_map provides direct access to interactive elements by index + if current_state.selector_map: + print(f"Number of interactive elements found: {len(current_state.selector_map)}") + + # Let's try to find the element the LLM might call [5] (often the search bar) + example_index = 5 # Note: Indices can change depending on the page! + if example_index in current_state.selector_map: + element_node = current_state.selector_map[example_index] + print(f"Element [{example_index}]: Tag=<{element_node.tag_name}>, Attributes={element_node.attributes}") + # The Agent uses this node reference to perform actions + else: + print(f"Element [{example_index}] not found in the selector map for this page state.") + else: + print("No interactive elements found (selector map is empty).") + + # The Agent would typically convert element_tree into a compact text format + # (using methods like element_tree.clickable_elements_to_string()) + # to send to the LLM along with the task instructions. + + print("\nBrowserContext closed.") + await browser.close() + print("Browser closed.") + +# Run the asynchronous code +asyncio.run(main()) +``` + +**What happens here?** + +1. We set up the `Browser` and `BrowserContext`. +2. We navigate to Google. +3. `browser_context.get_state()` is called. **Internally**, this triggers the `DomService`. +4. `DomService` analyzes the Google page, finds interactive elements (like the search bar, buttons), assigns them `highlight_index` numbers, and builds the `element_tree` and `selector_map`. +5. This `DOMState` (containing the tree and map) is packaged into the `BrowserState` object returned by `get_state()`. +6. Our code then accesses `current_state.element_tree` and `current_state.selector_map` to peek at the map created by `DomService`. +7. We demonstrate looking up an element using its potential index (`selector_map[5]`). + +## How It Works Under the Hood: `DomService` in Action + +Let's trace the flow when `BrowserContext.get_state()` is called: + +```mermaid +sequenceDiagram + participant Agent + participant BC as BrowserContext + participant DomService + participant PlaywrightPage as Browser Page (JS Env) + participant buildDomTree_js as buildDomTree.js + + Agent->>BC: get_state() + Note over BC: Needs to analyze the page content + BC->>DomService: get_clickable_elements(...) + Note over DomService: Needs to run analysis script in browser + DomService->>PlaywrightPage: evaluate(js_code='buildDomTree.js', args={...}) + Note over PlaywrightPage: Execute JavaScript code + PlaywrightPage->>buildDomTree_js: Run analysis function + Note over buildDomTree_js: Analyzes live DOM, finds visible & interactive elements, assigns highlight_index + buildDomTree_js-->>PlaywrightPage: Return structured data (nodes, indices, map) + PlaywrightPage-->>DomService: Return JS execution result (JSON-like data) + Note over DomService: Process the raw data from JS + DomService->>DomService: _construct_dom_tree(result) + Note over DomService: Builds Python DOMElementNode tree and selector_map + DomService-->>BC: Return DOMState (element_tree, selector_map) + Note over BC: Combine DOMState with URL, title, screenshot etc. + BC->>BC: Create BrowserState object + BC-->>Agent: Return BrowserState (containing DOM map) +``` + +**Key Code Points:** + +1. **`BrowserContext` calls `DomService`:** Inside `browser/context.py`, the `_update_state` method (called by `get_state`) initializes and uses the `DomService`: + + ```python + # --- File: browser/context.py (Simplified _update_state) --- + from browser_use.dom.service import DomService # Import the service + from browser_use.browser.views import BrowserState + + class BrowserContext: + # ... other methods ... + async def _update_state(self) -> BrowserState: + page = await self.get_current_page() # Get the active Playwright page object + # ... error handling ... + try: + # 1. Create DomService instance for the current page + dom_service = DomService(page) + + # 2. Call DomService to get the DOM map (DOMState) + content_info = await dom_service.get_clickable_elements( + highlight_elements=self.config.highlight_elements, + viewport_expansion=self.config.viewport_expansion, + # ... other options ... + ) + + # 3. Get other info (screenshot, URL, title etc.) + screenshot_b64 = await self.take_screenshot() + url = page.url + title = await page.title() + # ... gather more state ... + + # 4. Package everything into BrowserState + browser_state = BrowserState( + element_tree=content_info.element_tree, # <--- From DomService + selector_map=content_info.selector_map, # <--- From DomService + url=url, + title=title, + screenshot=screenshot_b64, + # ... other state info ... + ) + return browser_state + except Exception as e: + logger.error(f'Failed to update state: {str(e)}') + raise # Or handle error + ``` + +2. **`DomService` runs JavaScript:** Inside `dom/service.py`, the `_build_dom_tree` method executes the JavaScript code stored in `buildDomTree.js` within the browser page's context. + + ```python + # --- File: dom/service.py (Simplified _build_dom_tree) --- + import logging + from importlib import resources + # ... other imports ... + + logger = logging.getLogger(__name__) + + class DomService: + def __init__(self, page: 'Page'): + self.page = page + # Load the JavaScript code from the file when DomService is created + self.js_code = resources.read_text('browser_use.dom', 'buildDomTree.js') + # ... + + async def _build_dom_tree( + self, highlight_elements: bool, focus_element: int, viewport_expansion: int + ) -> tuple[DOMElementNode, SelectorMap]: + + # Prepare arguments for the JavaScript function + args = { + 'doHighlightElements': highlight_elements, + 'focusHighlightIndex': focus_element, + 'viewportExpansion': viewport_expansion, + 'debugMode': logger.getEffectiveLevel() == logging.DEBUG, + } + + try: + # Execute the JavaScript code in the browser page! + # The JS code analyzes the live DOM and returns a structured result. + eval_page = await self.page.evaluate(self.js_code, args) + except Exception as e: + logger.error('Error evaluating JavaScript: %s', e) + raise + + # ... (optional debug logging) ... + + # Parse the result from JavaScript into Python objects + return await self._construct_dom_tree(eval_page) + + async def _construct_dom_tree(self, eval_page: dict) -> tuple[DOMElementNode, SelectorMap]: + # ... (logic to parse js_node_map from eval_page) ... + # ... (loops through nodes, creates DOMElementNode/DOMTextNode objects) ... + # ... (builds the tree structure by linking parents/children) ... + # ... (populates the selector_map dictionary) ... + # This uses the structures defined in dom/views.py + # ... + root_node = ... # Parsed root DOMElementNode + selector_map = ... # Populated dictionary {index: DOMElementNode} + return root_node, selector_map + # ... other methods like get_clickable_elements ... + ``` + +3. **`buildDomTree.js` (Conceptual):** This JavaScript file (located at `dom/buildDomTree.js` in the library) is the core map-making logic that runs *inside the browser*. It traverses the live DOM, checks element visibility and interactivity using browser APIs (like `element.getBoundingClientRect()`, `window.getComputedStyle()`, `document.elementFromPoint()`), assigns the `highlight_index`, and packages the results into a structured format that the Python `DomService` can understand. *We don't need to understand the JS code itself, just its purpose.* + +4. **Python Data Structures (`DOMElementNode`, `DOMTextNode`):** The results from the JavaScript are parsed into Python objects defined in `dom/views.py`. These dataclasses (`DOMElementNode`, `DOMTextNode`) hold the information about each mapped element or text segment. + +## Conclusion + +DOM Representation, primarily handled by the `DomService`, is crucial for bridging the gap between the complex reality of a webpage (the DOM) and the Agent/LLM's need for a simplified, actionable understanding. By creating a structured `element_tree` and an indexed `selector_map`, it provides a clear map of interactive landmarks on the page, identified by simple `highlight_index` numbers. + +This map allows the LLM to make specific plans like "type into element [5]" or "click element [12]", which the Agent can then reliably translate into concrete actions. + +Now that we understand how the Agent sees the page, how does it actually *perform* those actions like clicking or typing? In the next chapter, we'll explore the component responsible for executing the LLM's plan: the [Action Controller & Registry](05_action_controller___registry.md). + +[Next Chapter: Action Controller & Registry](05_action_controller___registry.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Browser Use/05_action_controller___registry.md b/output/Browser Use/05_action_controller___registry.md new file mode 100644 index 0000000..091f0b2 --- /dev/null +++ b/output/Browser Use/05_action_controller___registry.md @@ -0,0 +1,340 @@ +# Chapter 5: Action Controller & Registry - The Agent's Hands and Toolbox + +In the [previous chapter](04_dom_representation.md), we saw how the `DomService` creates a simplified map (`DOMState`) of the webpage, allowing the Agent and its LLM planner to identify interactive elements like buttons and input fields using unique numbers (`highlight_index`). The LLM uses this map to decide *what* specific action to take next, like "click element [5]" or "type 'hello world' into element [12]". + +But how does the program actually *do* that? How does the abstract idea "click element [5]" turn into a real click inside the browser window managed by the [BrowserContext](03_browsercontext.md)? + +This is where the **Action Controller** and **Action Registry** come into play. They are the "hands" and "toolbox" that execute the Agent's decisions. + +## What Problem Do They Solve? + +Imagine you have a detailed instruction manual (the LLM's plan) for building a model car. The manual tells you exactly which piece to pick up (`index=5`) and what to do with it ("click" or "attach"). However, you still need: + +1. **A Toolbox:** A collection of all the tools you might need (screwdriver, glue, pliers). You need to know what tools are available. +2. **A Mechanic:** Someone (or you!) who can read the instruction ("Use the screwdriver on screw #5"), select the correct tool from the toolbox, and skillfully use it on the specified part. + +Without the toolbox and the mechanic, the instruction manual is useless. + +Similarly, the `Browser Use` Agent needs: +1. **Action Registry (The Toolbox):** A defined list of all possible actions the Agent can perform (e.g., `click_element`, `input_text`, `scroll_down`, `go_to_url`, `done`). This registry also holds details about each action, like what parameters it needs (e.g., `click_element` needs an `index`). +2. **Action Controller (The Mechanic):** A component that takes the specific action requested by the LLM (e.g., "execute `click_element` with `index=5`"), finds the corresponding function (the "tool") in the Registry, ensures the request is valid, and then executes that function using the [BrowserContext](03_browsercontext.md) (the "car"). + +The Controller and Registry solve the problem of translating the LLM's high-level plan into concrete, executable browser operations in a structured and reliable way. + +## Meet the Toolbox and the Mechanic + +Let's break down these two closely related concepts: + +### 1. Action Registry: The Toolbox (`controller/registry/service.py`) + +Think of the `Registry` as a carefully organized toolbox. Each drawer is labeled with the name of a tool (an action like `click_element`), and inside, you find the tool itself (the actual code function) along with its instructions (description and required parameters). + +* **Catalog of Actions:** It holds a dictionary where keys are action names (strings like `"click_element"`) and values are `RegisteredAction` objects containing: + * The action's `name`. + * A `description` (for humans and the LLM). + * The actual Python `function` to call. + * A `param_model` (a Pydantic model defining required parameters like `index` or `text`). +* **Informs the LLM:** The `Registry` can generate a description of all available actions and their parameters. This description is given to the LLM (as part of the [System Prompt](02_system_prompt.md)) so it knows exactly what "tools" it's allowed to ask the Agent to use. + +### 2. Action Controller: The Mechanic (`controller/service.py`) + +The `Controller` is the skilled mechanic who uses the tools from the Registry. + +* **Receives Instructions:** It gets the action request from the Agent. This request typically comes in the form of an `ActionModel` object, which represents the LLM's JSON output (e.g., `{"click_element": {"index": 5}}`). +* **Selects the Tool:** It looks at the `ActionModel`, identifies the action name (`"click_element"`), and retrieves the corresponding `RegisteredAction` from the `Registry`. +* **Validates Parameters:** It uses the action's `param_model` (e.g., `ClickElementAction`) to check if the provided parameters (`{"index": 5}`) are correct. +* **Executes the Action:** It calls the actual Python function associated with the action (e.g., the `click_element` function), passing it the validated parameters and the necessary `BrowserContext` (so the function knows *which* browser tab to act upon). +* **Reports the Result:** The action function performs the task (e.g., clicking the element) and returns an `ActionResult` object, indicating whether it succeeded, failed, or produced some output. The Controller passes this result back to the Agent. + +## Using the Controller: Executing an Action + +In the Agent's main loop ([Chapter 1: Agent](01_agent.md)), after the LLM provides its plan as an `ActionModel`, the Agent simply hands this model over to the `Controller` to execute it. + +```python +# --- Simplified Agent step calling the Controller --- +# Assume 'llm_response_model' is the ActionModel object parsed from LLM's JSON +# Assume 'self.controller' is the Controller instance +# Assume 'self.browser_context' is the current BrowserContext + +# ... inside the Agent's step method ... + +try: + # Agent tells the Controller: "Execute this action!" + action_result: ActionResult = await self.controller.act( + action=llm_response_model, # The LLM's chosen action and parameters + browser_context=self.browser_context # The browser tab to act within + # Other context like LLMs for extraction might be passed too + ) + + # Agent receives the result from the Controller + print(f"Action executed. Result: {action_result.extracted_content}") + if action_result.is_done: + print("Task marked as done by the action!") + if action_result.error: + print(f"Action encountered an error: {action_result.error}") + + # Agent records this result in the history ([Message Manager](06_message_manager.md)) + # ... + +except Exception as e: + print(f"Failed to execute action: {e}") + # Handle the error +``` + +**What happens here?** + +1. The Agent has received `llm_response_model` (e.g., representing `{"click_element": {"index": 5}}`). +2. It calls `self.controller.act()`, passing the action model and the active `browser_context`. +3. The `controller.act()` method handles looking up the `"click_element"` function in the `Registry`, validating the `index` parameter, and calling the function to perform the click within the `browser_context`. +4. The `click_element` function executes (interacting with the browser via `BrowserContext` methods). +5. It returns an `ActionResult` (e.g., `ActionResult(extracted_content="Clicked button with index 5")`). +6. The Agent receives this `action_result` and proceeds. + +## How it Works Under the Hood: The Execution Flow + +Let's trace the journey of an action request from the Agent to the browser click: + +```mermaid +sequenceDiagram + participant Agent + participant Controller + participant Registry + participant ClickFunc as click_element Function + participant BC as BrowserContext + + Note over Agent: LLM decided: click_element(index=5) + Agent->>Controller: act(action={"click_element": {"index": 5}}, browser_context=BC) + Note over Controller: Identify action and params + Controller->>Controller: action_name = "click_element", params = {"index": 5} + Note over Controller: Ask Registry for the tool + Controller->>Registry: Get action definition for "click_element" + Registry-->>Controller: Return RegisteredAction(name="click_element", function=ClickFunc, param_model=ClickElementAction, ...) + Note over Controller: Validate params using param_model + Controller->>Controller: ClickElementAction(index=5) # Validation OK + Note over Controller: Execute the function + Controller->>ClickFunc: ClickFunc(params=ClickElementAction(index=5), browser=BC) + Note over ClickFunc: Perform the click via BrowserContext + ClickFunc->>BC: Find element with index 5 + BC-->>ClickFunc: Element reference + ClickFunc->>BC: Execute click on element + BC-->>ClickFunc: Click successful + ClickFunc-->>Controller: Return ActionResult(extracted_content="Clicked button...") + Controller-->>Agent: Return ActionResult +``` + +This diagram shows the Controller orchestrating the process: receiving the request, consulting the Registry, validating, calling the specific action function, and returning the result. + +## Diving Deeper into the Code + +Let's peek at simplified versions of the key files. + +### 1. Registering Actions (`controller/registry/service.py`) + +Actions are typically registered using a decorator `@registry.action`. + +```python +# --- File: controller/registry/service.py (Simplified Registry) --- +from typing import Callable, Type +from pydantic import BaseModel +# Assume ActionModel, RegisteredAction are defined in views.py + +class Registry: + def __init__(self, exclude_actions: list[str] = []): + self.registry: dict[str, RegisteredAction] = {} + self.exclude_actions = exclude_actions + # ... other initializations ... + + def _create_param_model(self, function: Callable) -> Type[BaseModel]: + """Creates a Pydantic model from function signature (simplified)""" + # ... (Inspects function signature to build a model) ... + # Example: for func(index: int, text: str), creates a model + # class func_parameters(ActionModel): + # index: int + # text: str + # return func_parameters + pass # Placeholder for complex logic + + def action( + self, + description: str, + param_model: Type[BaseModel] | None = None, + ): + """Decorator for registering actions""" + def decorator(func: Callable): + if func.__name__ in self.exclude_actions: return func # Skip excluded + + # If no specific param_model provided, try to generate one + actual_param_model = param_model # Or self._create_param_model(func) if needed + + # Ensure function is awaitable (async) + wrapped_func = func # Assume func is already async for simplicity + + action = RegisteredAction( + name=func.__name__, + description=description, + function=wrapped_func, + param_model=actual_param_model, + ) + self.registry[func.__name__] = action # Add to the toolbox! + print(f"Action '{func.__name__}' registered.") + return func + return decorator + + def get_prompt_description(self) -> str: + """Get a description of all actions for the prompt (simplified)""" + descriptions = [] + for action in self.registry.values(): + # Format description for LLM (e.g., "click_element: Click element {index: {'type': 'integer'}}") + descriptions.append(f"{action.name}: {action.description} {action.param_model.schema()}") + return "\n".join(descriptions) + + async def execute_action(self, action_name: str, params: dict, browser, **kwargs) -> Any: + """Execute a registered action (simplified)""" + if action_name not in self.registry: + raise ValueError(f"Action {action_name} not found") + + action = self.registry[action_name] + try: + # Validate params using the registered Pydantic model + validated_params = action.param_model(**params) + + # Call the actual action function with validated params and browser context + # Assumes function takes validated_params model and browser + result = await action.function(validated_params, browser=browser, **kwargs) + return result + except Exception as e: + raise RuntimeError(f"Error executing {action_name}: {e}") from e + +``` + +This shows how the `@registry.action` decorator takes a function, its description, and parameter model, and stores them in the `registry` dictionary. `execute_action` is the core method used by the `Controller` to run a specific action. + +### 2. Defining Action Parameters (`controller/views.py`) + +Each action often has its own Pydantic model to define its expected parameters. + +```python +# --- File: controller/views.py (Simplified Action Parameter Models) --- +from pydantic import BaseModel +from typing import Optional + +# Example parameter model for the 'click_element' action +class ClickElementAction(BaseModel): + index: int # The highlight_index of the element to click + xpath: Optional[str] = None # Optional hint (usually index is enough) + +# Example parameter model for the 'input_text' action +class InputTextAction(BaseModel): + index: int # The highlight_index of the input field + text: str # The text to type + xpath: Optional[str] = None # Optional hint + +# Example parameter model for the 'done' action (task completion) +class DoneAction(BaseModel): + text: str # A final message or result + success: bool # Was the overall task successful? + +# ... other action models like GoToUrlAction, ScrollAction etc. ... +``` + +These models ensure that when the Controller receives parameters like `{"index": 5}`, it can validate that `index` is indeed an integer as required by `ClickElementAction`. + +### 3. The Controller Service (`controller/service.py`) + +The `Controller` class ties everything together. It initializes the `Registry` and registers the default browser actions. Its main job is the `act` method. + +```python +# --- File: controller/service.py (Simplified Controller) --- +import logging +from browser_use.agent.views import ActionModel, ActionResult # Input/Output types +from browser_use.browser.context import BrowserContext # Needed by actions +from browser_use.controller.registry.service import Registry # The toolbox +from browser_use.controller.views import ClickElementAction, InputTextAction, DoneAction # Param models + +logger = logging.getLogger(__name__) + +class Controller: + def __init__(self, exclude_actions: list[str] = []): + self.registry = Registry(exclude_actions=exclude_actions) # Initialize the toolbox + + # --- Register Default Actions --- + # (Registration happens when Controller is created) + + @self.registry.action("Click element", param_model=ClickElementAction) + async def click_element(params: ClickElementAction, browser: BrowserContext): + logger.info(f"Attempting to click element index {params.index}") + # --- Actual click logic using browser object --- + element_node = await browser.get_dom_element_by_index(params.index) + await browser._click_element_node(element_node) # Internal browser method + # --- + msg = f"๐Ÿ–ฑ๏ธ Clicked element with index {params.index}" + return ActionResult(extracted_content=msg, include_in_memory=True) + + @self.registry.action("Input text into an element", param_model=InputTextAction) + async def input_text(params: InputTextAction, browser: BrowserContext): + logger.info(f"Attempting to type into element index {params.index}") + # --- Actual typing logic using browser object --- + element_node = await browser.get_dom_element_by_index(params.index) + await browser._input_text_element_node(element_node, params.text) # Internal method + # --- + msg = f"โŒจ๏ธ Input text into index {params.index}" + return ActionResult(extracted_content=msg, include_in_memory=True) + + @self.registry.action("Complete task", param_model=DoneAction) + async def done(params: DoneAction): + logger.info(f"Task completion requested. Success: {params.success}") + return ActionResult(is_done=True, success=params.success, extracted_content=params.text) + + # ... registration for scroll_down, go_to_url, etc. ... + + async def act( + self, + action: ActionModel, # The ActionModel from the LLM + browser_context: BrowserContext, # The context to act within + **kwargs # Other potential context (LLMs, etc.) + ) -> ActionResult: + """Execute an action defined in the ActionModel""" + try: + # ActionModel might look like: ActionModel(click_element=ClickElementAction(index=5)) + # model_dump gets {'click_element': {'index': 5}} + action_data = action.model_dump(exclude_unset=True) + + for action_name, params in action_data.items(): + if params is not None: + logger.debug(f"Executing action: {action_name} with params: {params}") + # Call the registry's execute method + result = await self.registry.execute_action( + action_name=action_name, + params=params, + browser=browser_context, # Pass the essential context + **kwargs # Pass any other context needed by actions + ) + + # Ensure result is ActionResult or convert it + if isinstance(result, ActionResult): return result + if isinstance(result, str): return ActionResult(extracted_content=result) + return ActionResult() # Default empty result if action returned None + + logger.warning("ActionModel had no action to execute.") + return ActionResult(error="No action specified in the model") + + except Exception as e: + logger.error(f"Error during controller.act: {e}", exc_info=True) + return ActionResult(error=str(e)) # Return error in ActionResult +``` + +The `Controller` registers all the standard browser actions during initialization. The `act` method then dynamically finds and executes the requested action using the `Registry`. + +## Conclusion + +The **Action Registry** acts as the definitive catalog or "toolbox" of all operations the `Browser Use` Agent can perform. The **Action Controller** is the "mechanic" that interprets the LLM's plan, selects the appropriate tool from the Registry, and executes it within the specified [BrowserContext](03_browsercontext.md). + +Together, they provide a robust and extensible way to translate high-level instructions into low-level browser interactions, forming the crucial link between the Agent's "brain" (LLM planner) and its "hands" (browser manipulation). + +Now that we know how actions are chosen and executed, how does the Agent keep track of the conversation with the LLM, including the history of states observed and actions taken? We'll explore this in the next chapter on the [Message Manager](06_message_manager.md). + +[Next Chapter: Message Manager](06_message_manager.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Browser Use/06_message_manager.md b/output/Browser Use/06_message_manager.md new file mode 100644 index 0000000..8c405c4 --- /dev/null +++ b/output/Browser Use/06_message_manager.md @@ -0,0 +1,386 @@ +# Chapter 6: Message Manager - Keeping the Conversation Straight + +In the [previous chapter](05_action_controller___registry.md), we learned how the `Action Controller` and `Registry` act as the Agent's "hands" and "toolbox", executing the specific actions decided by the LLM planner. But how does the LLM get all the information it needs to make those decisions in the first place? How does the Agent keep track of the ongoing conversation, including what it "saw" on the page and what happened after each action? + +Imagine you're having a long, multi-step discussion with an assistant about a complex task. If the assistant has a poor memory, they might forget earlier instructions, the current status, or previous results, making it impossible to proceed correctly. LLMs face a similar challenge: they need the conversation history for context, but they have a limited memory (called the "context window"). + +This is the problem the **Message Manager** solves. + +## What Problem Does the Message Manager Solve? + +The `Agent` needs to have a conversation with the LLM. This conversation isn't just chat; it includes: + +1. **Initial Instructions:** The core rules from the [System Prompt](02_system_prompt.md). +2. **The Task:** The overall goal the Agent needs to achieve. +3. **Observations:** What the Agent currently "sees" in the browser ([BrowserContext](03_browsercontext.md) state, including the [DOM Representation](04_dom_representation.md)). +4. **Action Results:** What happened after the last action was performed ([Action Controller & Registry](05_action_controller___registry.md)). +5. **LLM's Plan:** The sequence of actions the LLM decided on. + +The Message Manager solves several key problems: + +* **Organizes History:** It structures the conversation chronologically, keeping track of who said what (System, User/Agent State, AI/LLM Plan). +* **Formats Messages:** It ensures the browser state, action results, and even images are formatted correctly so the LLM can understand them. +* **Tracks Size:** It keeps count of the "tokens" (roughly, words or parts of words) used in the conversation history. +* **Manages Limits:** It helps prevent the conversation history from exceeding the LLM's context window limit, potentially by removing older parts of the conversation if it gets too long. + +Think of the `MessageManager` as a meticulous secretary for the Agent-LLM conversation. It takes clear, concise notes, presents the current situation accurately, and ensures the conversation doesn't ramble on for too long, keeping everything within the LLM's "attention span". + +## Meet the Message Manager: The Conversation Secretary + +The `MessageManager` (found in `agent/message_manager/service.py`) is responsible for managing the list of messages that are sent to the LLM in each step. + +Here are its main jobs: + +1. **Initialization:** When the `Agent` starts, the `MessageManager` is created. It immediately adds the foundational messages: + * The `SystemMessage` containing the rules from the [System Prompt](02_system_prompt.md). + * A `HumanMessage` stating the overall `task`. + * Other initial setup messages (like examples or sensitive data placeholders). +2. **Adding Browser State:** Before asking the LLM what to do next, the `Agent` gets the current `BrowserState`. It then tells the `MessageManager` to add this information as a `HumanMessage`. This message includes the simplified DOM map, the current URL, and potentially a screenshot (if `use_vision` is enabled). It also includes the results (`ActionResult`) from the *previous* step, so the LLM knows what happened last. +3. **Adding LLM Output:** After the LLM responds with its plan (`AgentOutput`), the `Agent` tells the `MessageManager` to add this plan as an `AIMessage`. This typically includes the LLM's reasoning and the list of actions to perform. +4. **Adding Action Results (Indirectly):** The results from the `Controller.act` call (`ActionResult`) aren't added as separate messages *after* the action. Instead, they are included in the *next* `HumanMessage` that contains the browser state (see step 2). This keeps the context tight: "Here's the current page, and here's what happened right before we got here." +5. **Providing Messages to LLM:** When the `Agent` is ready to call the LLM, it asks the `MessageManager` for the current conversation history (`get_messages()`). +6. **Token Management:** Every time a message is added, the `MessageManager` calculates how many tokens it adds (`_count_tokens`) and updates the total. If the total exceeds the limit (`max_input_tokens`), it might trigger a truncation strategy (`cut_messages`) to shorten the history, usually by removing parts of the oldest user state message or removing the image first. + +## How the Agent Uses the Message Manager + +Let's revisit the simplified `Agent.step` method from [Chapter 1](01_agent.md) and highlight the `MessageManager` interactions (using `self._message_manager`): + +```python +# --- File: agent/service.py (Simplified step method - Highlighting MessageManager) --- +class Agent: + # ... (init, run) ... + async def step(self, step_info: Optional[AgentStepInfo] = None) -> None: + logger.info(f"๐Ÿ“ Step {self.state.n_steps}") + state = None + model_output = None + result: list[ActionResult] = [] + + try: + # 1. Get current state from the browser + state = await self.browser_context.get_state() # Uses BrowserContext + + # 2. Add state + PREVIOUS result to message history via MessageManager + # 'self.state.last_result' holds the outcome of the *previous* step's action + self._message_manager.add_state_message( + state, + self.state.last_result, # Result from previous action + step_info, + self.settings.use_vision # Tell it whether to include image + ) + + # 3. Get the complete, formatted message history for the LLM + input_messages = self._message_manager.get_messages() + + # 4. Get LLM's decision on the next action(s) + model_output = await self.get_next_action(input_messages) # Calls the LLM + + # --- Agent increments step counter --- + self.state.n_steps += 1 + + # 5. Remove the potentially large state message before adding the compact AI response + # (This is an optimization mentioned in the provided code) + self._message_manager._remove_last_state_message() + + # 6. Add the LLM's response (the plan) to the history + self._message_manager.add_model_output(model_output) + + # 7. Execute the action(s) using the Controller + result = await self.multi_act(model_output.action) # Uses Controller + + # 8. Store the result of THIS action. It will be used in the *next* step's + # call to self._message_manager.add_state_message() + self.state.last_result = result + + # ... (Record step details, handle success/failure) ... + + except Exception as e: + # Handle errors... + result = await self._handle_step_error(e) + self.state.last_result = result + # ... (finally block) ... +``` + +This flow shows the cycle: add state/previous result -> get messages -> call LLM -> add LLM response -> execute action -> store result for *next* state message. + +## How it Works Under the Hood: Managing the Flow + +Let's visualize the key interactions during one step of the Agent loop involving the `MessageManager`: + +```mermaid +sequenceDiagram + participant Agent + participant BC as BrowserContext + participant MM as MessageManager + participant LLM + participant Controller + + Note over Agent: Start of step + Agent->>BC: get_state() + BC-->>Agent: Current BrowserState (DOM map, URL, screenshot?) + Note over Agent: Have BrowserState and `last_result` from previous step + Agent->>MM: add_state_message(BrowserState, last_result) + MM->>MM: Format state/result into HumanMessage (with text/image) + MM->>MM: Calculate tokens for new message + MM->>MM: Add HumanMessage to internal history list + MM->>MM: Update total token count + MM->>MM: Check token limit, potentially call cut_messages() + Note over Agent: Ready to ask LLM + Agent->>MM: get_messages() + MM-->>Agent: Return List[BaseMessage] (System, Task, State1, Plan1, State2...) + Agent->>LLM: Invoke LLM with message list + LLM-->>Agent: LLM Response (AgentOutput containing plan) + Note over Agent: Got LLM's plan + Agent->>MM: _remove_last_state_message() # Optimization + MM->>MM: Remove last (large) HumanMessage from list + Agent->>MM: add_model_output(AgentOutput) + MM->>MM: Format plan into AIMessage (with tool calls) + MM->>MM: Calculate tokens for AIMessage + MM->>MM: Add AIMessage to internal history list + MM->>MM: Update total token count + Note over Agent: Ready to execute plan + Agent->>Controller: multi_act(AgentOutput.action) + Controller-->>Agent: List[ActionResult] (Result of this step's actions) + Agent->>Agent: Store ActionResult in `self.state.last_result` (for next step) + Note over Agent: End of step +``` + +This shows how `MessageManager` sits between the Agent, the Browser State, and the LLM, managing the history list and token counts. + +## Diving Deeper into the Code (`agent/message_manager/service.py`) + +Let's look at simplified versions of key methods in `MessageManager`. + +**1. Initialization (`__init__` and `_init_messages`)** + +When the `Agent` creates the `MessageManager`, it passes the task and the already-formatted `SystemMessage`. + +```python +# --- File: agent/message_manager/service.py (Simplified __init__) --- +from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, ToolMessage +# ... other imports ... +from browser_use.agent.views import MessageManagerState # Internal state storage +from browser_use.agent.message_manager.views import MessageMetadata, ManagedMessage # Message wrapper + +class MessageManager: + def __init__( + self, + task: str, + system_message: SystemMessage, # Received from Agent + settings: MessageManagerSettings = MessageManagerSettings(), + state: MessageManagerState = MessageManagerState(), # Stores history + ): + self.task = task + self.settings = settings # Max tokens, image settings, etc. + self.state = state # Holds the 'history' object + self.system_prompt = system_message + + # Only initialize if history is empty (e.g., not resuming from saved state) + if len(self.state.history.messages) == 0: + self._init_messages() + + def _init_messages(self) -> None: + """Add the initial fixed messages to the history.""" + # Add the main system prompt (rules) + self._add_message_with_tokens(self.system_prompt) + + # Add the user's task + task_message = HumanMessage( + content=f'Your ultimate task is: """{self.task}"""...' + ) + self._add_message_with_tokens(task_message) + + # Add other setup messages (context, sensitive data info, examples) + # ... (simplified - see full code for details) ... + + # Example: Add a placeholder for where the main history begins + placeholder_message = HumanMessage(content='[Your task history memory starts here]') + self._add_message_with_tokens(placeholder_message) +``` + +This sets up the foundational context for the LLM. + +**2. Adding Browser State (`add_state_message`)** + +This method takes the current `BrowserState` and the previous `ActionResult`, formats them into a `HumanMessage` (potentially multi-modal with image and text parts), and adds it to the history. + +```python +# --- File: agent/message_manager/service.py (Simplified add_state_message) --- +# ... imports ... +from browser_use.browser.views import BrowserState +from browser_use.agent.views import ActionResult, AgentStepInfo +from browser_use.agent.prompts import AgentMessagePrompt # Helper to format state + +class MessageManager: + # ... (init) ... + + def add_state_message( + self, + state: BrowserState, # The current view of the browser + result: Optional[List[ActionResult]] = None, # Result from *previous* action + step_info: Optional[AgentStepInfo] = None, + use_vision=True, # Flag to include screenshot + ) -> None: + """Add browser state and previous result as a human message.""" + + # Add any 'memory' messages from the previous result first (if any) + if result: + for r in result: + if r.include_in_memory and (r.extracted_content or r.error): + content = f"Action result: {r.extracted_content}" if r.extracted_content else f"Action error: {r.error}" + msg = HumanMessage(content=content) + self._add_message_with_tokens(msg) + result = None # Don't include again in the main state message + + # Use a helper class to format the BrowserState (+ optional remaining result) + # into the correct message structure (text + optional image) + state_prompt = AgentMessagePrompt( + state, + result, # Pass any remaining result info + include_attributes=self.settings.include_attributes, + step_info=step_info, + ) + # Get the formatted message (could be complex list for vision) + state_message = state_prompt.get_user_message(use_vision) + + # Add the formatted message (with token calculation) to history + self._add_message_with_tokens(state_message) + +``` + +**3. Adding Model Output (`add_model_output`)** + +This takes the LLM's plan (`AgentOutput`) and formats it as an `AIMessage` with specific "tool calls" structure that many models expect. + +```python +# --- File: agent/message_manager/service.py (Simplified add_model_output) --- +# ... imports ... +from browser_use.agent.views import AgentOutput + +class MessageManager: + # ... (init, add_state_message) ... + + def add_model_output(self, model_output: AgentOutput) -> None: + """Add model output (the plan) as an AI message with tool calls.""" + # Format the output according to OpenAI's tool calling standard + tool_calls = [ + { + 'name': 'AgentOutput', # The 'tool' name + 'args': model_output.model_dump(mode='json', exclude_unset=True), # The LLM's JSON output + 'id': str(self.state.tool_id), # Unique ID for the call + 'type': 'tool_call', + } + ] + + # Create the AIMessage containing the tool calls + msg = AIMessage( + content='', # Content is often empty when using tool calls + tool_calls=tool_calls, + ) + + # Add it to history + self._add_message_with_tokens(msg) + + # Add a corresponding empty ToolMessage (required by some models) + self.add_tool_message(content='') # Content depends on tool execution result + + def add_tool_message(self, content: str) -> None: + """Add tool message to history (often confirms tool call receipt/result)""" + # ToolMessage links back to the AIMessage's tool_call_id + msg = ToolMessage(content=content, tool_call_id=str(self.state.tool_id)) + self.state.tool_id += 1 # Increment for next potential tool call + self._add_message_with_tokens(msg) +``` + +**4. Adding Messages and Counting Tokens (`_add_message_with_tokens`, `_count_tokens`)** + +This is the core function called by others to add any message to the history, ensuring token counts are tracked. + +```python +# --- File: agent/message_manager/service.py (Simplified _add_message_with_tokens) --- +# ... imports ... +from langchain_core.messages import BaseMessage +from browser_use.agent.message_manager.views import MessageMetadata, ManagedMessage + +class MessageManager: + # ... (other methods) ... + + def _add_message_with_tokens(self, message: BaseMessage, position: int | None = None) -> None: + """Internal helper to add any message with its token count metadata.""" + + # 1. Optionally filter sensitive data (replace actual data with placeholders) + # if self.settings.sensitive_data: + # message = self._filter_sensitive_data(message) # Simplified + + # 2. Count the tokens in the message + token_count = self._count_tokens(message) + + # 3. Create metadata object + metadata = MessageMetadata(tokens=token_count) + + # 4. Add the message and its metadata to the history list + # (self.state.history is a MessageHistory object) + self.state.history.add_message(message, metadata, position) + # Note: self.state.history.add_message also updates the total token count + + # 5. Check if history exceeds token limit and truncate if needed + self.cut_messages() # Check and potentially trim history + + def _count_tokens(self, message: BaseMessage) -> int: + """Estimate tokens in a message.""" + tokens = 0 + if isinstance(message.content, list): # Multi-modal (text + image) + for item in message.content: + if isinstance(item, dict) and 'image_url' in item: + # Add fixed cost for images + tokens += self.settings.image_tokens + elif isinstance(item, dict) and 'text' in item: + # Estimate tokens based on text length + tokens += len(item['text']) // self.settings.estimated_characters_per_token + elif isinstance(message.content, str): # Text message + text = message.content + if hasattr(message, 'tool_calls'): # Add tokens for tool call structure + text += str(getattr(message, 'tool_calls', '')) + tokens += len(text) // self.settings.estimated_characters_per_token + + return tokens + + def cut_messages(self): + """Trim messages if total tokens exceed the limit.""" + # Calculate how many tokens we are over the limit + diff = self.state.history.current_tokens - self.settings.max_input_tokens + if diff <= 0: + return # We are within limits + + logger.debug(f"Token limit exceeded by {diff}. Trimming history.") + + # Strategy: + # 1. Try removing the image from the *last* (most recent) state message if present. + # (Code logic finds the last message, checks content list, removes image item, updates counts) + # ... (Simplified - see full code for image removal logic) ... + + # 2. If still over limit after image removal (or no image was present), + # trim text content from the *end* of the last state message. + # Calculate proportion to remove, shorten string, create new message. + # ... (Simplified - see full code for text trimming logic) ... + + # Ensure we don't get stuck if trimming isn't enough (raise error) + if self.state.history.current_tokens > self.settings.max_input_tokens: + raise ValueError("Max token limit reached even after trimming.") + +``` + +This shows the basic mechanics of adding messages, calculating their approximate size, and applying strategies to keep the history within the LLM's context window limit. + +## Conclusion + +The `MessageManager` is the Agent's conversation secretary. It meticulously records the dialogue between the Agent (reporting browser state and action results) and the LLM (providing analysis and action plans), starting from the initial `System Prompt` and task definition. + +Crucially, it formats these messages correctly, tracks the conversation's size using token counts, and implements strategies to keep the history concise enough for the LLM's limited context window. Without the `MessageManager`, the Agent would quickly lose track of the conversation, and the LLM wouldn't have the necessary context to guide the browser effectively. + +Many of the objects managed and passed around by the `MessageManager`, like `BrowserState`, `ActionResult`, and `AgentOutput`, are defined as specific data structures. In the next chapter, we'll take a closer look at these important **Data Structures (Views)**. + +[Next Chapter: Data Structures (Views)](07_data_structures__views_.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Browser Use/07_data_structures__views_.md b/output/Browser Use/07_data_structures__views_.md new file mode 100644 index 0000000..9d11c09 --- /dev/null +++ b/output/Browser Use/07_data_structures__views_.md @@ -0,0 +1,235 @@ +# Chapter 7: Data Structures (Views) - The Project's Blueprints + +In the [previous chapter](06_message_manager.md), we saw how the `MessageManager` acts like a secretary, carefully organizing the conversation between the [Agent](01_agent.md) and the LLM. It manages different pieces of information โ€“ the browser's current state, the LLM's plan, the results of actions, and more. + +But how do all these different components โ€“ the Agent, the LLM parser, the [BrowserContext](03_browsercontext.md), the [Action Controller & Registry](05_action_controller___registry.md), and the [Message Manager](06_message_manager.md) โ€“ ensure they understand each other perfectly? If the LLM gives a plan in one format, and the Controller expects it in another, things will break! + +Imagine trying to build furniture using instructions written in a language you don't fully understand, or trying to fill out a form where every section uses a different layout. It would be confusing and error-prone. We need a shared, consistent language and format. + +This is where **Data Structures (Views)** come in. They act as the official blueprints or standardized forms for all the important information passed around within the `Browser Use` project. + +## What Problem Do Data Structures Solve? + +In a complex system like `Browser Use`, many components need to exchange data: + +* The [BrowserContext](03_browsercontext.md) needs to package up the current state of the webpage. +* The [Agent](01_agent.md) needs to understand the LLM's multi-step plan. +* The [Action Controller & Registry](05_action_controller___registry.md) needs to know exactly which action to perform and with what specific parameters (like which element index to click). +* The Controller needs to report back the result of an action in a predictable way. + +Without a standard format for each piece of data, you might encounter problems like: + +* Misinterpreting data (e.g., is `5` an element index or a quantity?). +* Missing required information. +* Inconsistent naming (`element_id` vs `index` vs `element_number`). +* Difficulty debugging when data looks different every time. + +Data Structures (Views) solve this by defining **strict, consistent blueprints** for the data. Everyone agrees to use these blueprints, ensuring smooth communication and preventing errors. + +## Meet Pydantic: The Blueprint Maker and Checker + +In `Browser Use`, these blueprints are primarily defined using a popular Python library called **Pydantic**. + +Think of Pydantic like a combination of: + +1. **A Blueprint Designer:** It provides an easy way to define the structure of your data using standard Python type hints (like `str` for text, `int` for whole numbers, `bool` for True/False, `list` for lists). +2. **A Quality Inspector:** When data comes in (e.g., from the LLM or from an action's result), Pydantic automatically checks if it matches the blueprint. Does it have all the required fields? Are the data types correct? If not, Pydantic raises an error, stopping bad data before it causes problems later. + +These Pydantic models (our blueprints) are often stored in files named `views.py` within different component directories (like `agent/views.py`, `browser/views.py`), which is why we sometimes call them "Views". + +## Key Blueprints in `Browser Use` + +Let's look at some of the most important data structures used in the project. Don't worry about memorizing every detail; focus on *what kind* of information each blueprint holds and *who* uses it. + +*(Note: These are simplified representations. The actual models might have more fields or features.)* + +### 1. `BrowserState` (from `browser/views.py`) + +* **Purpose:** Represents a complete snapshot of the browser's state at a specific moment. +* **Blueprint Contents (Simplified):** + * `url`: The current web address (string). + * `title`: The title of the webpage (string). + * `element_tree`: The simplified map of the webpage content (from [DOM Representation](04_dom_representation.md)). + * `selector_map`: The lookup map for interactive elements (from [DOM Representation](04_dom_representation.md)). + * `screenshot`: An optional image of the page (string, base64 encoded). + * `tabs`: Information about other open tabs in this context (list). +* **Who Uses It:** + * Created by: [BrowserContext](03_browsercontext.md) (`get_state()` method). + * Used by: [Agent](01_agent.md) (to see the current situation), [Message Manager](06_message_manager.md) (to store in history). + +```python +# --- Conceptual Pydantic Model --- +# File: browser/views.py (Simplified Example) +from pydantic import BaseModel +from typing import Optional, List, Dict # For type hints +# Assume DOMElementNode and TabInfo are defined elsewhere + +class BrowserState(BaseModel): + url: str + title: str + element_tree: Optional[object] # Simplified: Actual type is DOMElementNode + selector_map: Optional[Dict[int, object]] # Simplified: Actual type is SelectorMap + screenshot: Optional[str] = None # Optional field + tabs: List[object] = [] # Simplified: Actual type is TabInfo + +# Pydantic ensures that when a BrowserState is created, +# 'url' and 'title' MUST be provided as strings. +``` + +### 2. `ActionModel` (from `controller/registry/views.py`) + +* **Purpose:** Represents a *single* specific action the LLM wants to perform, including its parameters. This model is often created *dynamically* based on the actions available in the [Action Controller & Registry](05_action_controller___registry.md). +* **Blueprint Contents (Example for `click_element`):** + * `index`: The `highlight_index` of the element to click (integer). + * `xpath`: An optional hint about the element's location (string). +* **Blueprint Contents (Example for `input_text`):** + * `index`: The `highlight_index` of the input field (integer). + * `text`: The text to type (string). +* **Who Uses It:** + * Defined by/Registered in: [Action Controller & Registry](05_action_controller___registry.md). + * Created based on: LLM output (often part of `AgentOutput`). + * Used by: [Action Controller & Registry](05_action_controller___registry.md) (to validate parameters and know what function to call). + +```python +# --- Conceptual Pydantic Models --- +# File: controller/views.py (Simplified Examples) +from pydantic import BaseModel +from typing import Optional + +class ClickElementAction(BaseModel): + index: int + xpath: Optional[str] = None # Optional hint + +class InputTextAction(BaseModel): + index: int + text: str + xpath: Optional[str] = None # Optional hint + +# Base model that dynamically holds ONE of the above actions +class ActionModel(BaseModel): + # Pydantic allows models like this where only one field is expected + # e.g., ActionModel(click_element=ClickElementAction(index=5)) + # or ActionModel(input_text=InputTextAction(index=12, text="hello")) + click_element: Optional[ClickElementAction] = None + input_text: Optional[InputTextAction] = None + # ... fields for other possible actions (scroll, done, etc.) ... + pass # More complex logic handles ensuring only one action is present +``` + +### 3. `AgentOutput` (from `agent/views.py`) + +* **Purpose:** Represents the complete plan received from the LLM after it analyzes the current state. This is the structure the [System Prompt](02_system_prompt.md) tells the LLM to follow. +* **Blueprint Contents (Simplified):** + * `current_state`: The LLM's thoughts/reasoning (a nested structure, often called `AgentBrain`). + * `action`: A *list* of one or more `ActionModel` objects representing the steps the LLM wants to take. +* **Who Uses It:** + * Created by: The [Agent](01_agent.md) parses the LLM's raw JSON output into this structure. + * Used by: [Agent](01_agent.md) (to understand the plan), [Message Manager](06_message_manager.md) (to store the plan in history), [Action Controller & Registry](05_action_controller___registry.md) (reads the `action` list). + +```python +# --- Conceptual Pydantic Model --- +# File: agent/views.py (Simplified Example) +from pydantic import BaseModel +from typing import List +# Assume ActionModel and AgentBrain are defined elsewhere + +class AgentOutput(BaseModel): + current_state: object # Simplified: Actual type is AgentBrain + action: List[ActionModel] # A list of actions to execute + +# Pydantic ensures the LLM output MUST have 'current_state' and 'action', +# and that 'action' MUST be a list containing valid ActionModel objects. +``` + +### 4. `ActionResult` (from `agent/views.py`) + +* **Purpose:** Represents the outcome after the [Action Controller & Registry](05_action_controller___registry.md) attempts to execute a single action. +* **Blueprint Contents (Simplified):** + * `is_done`: Did this action signal the end of the overall task? (boolean, optional). + * `success`: If done, was the task successful overall? (boolean, optional). + * `extracted_content`: Any text result from the action (e.g., "Clicked button X") (string, optional). + * `error`: Any error message if the action failed (string, optional). + * `include_in_memory`: Should this result be explicitly shown to the LLM next time? (boolean). +* **Who Uses It:** + * Created by: Functions within the [Action Controller & Registry](05_action_controller___registry.md) (like `click_element`). + * Used by: [Agent](01_agent.md) (to check status, record results), [Message Manager](06_message_manager.md) (includes info in the next state message sent to LLM). + +```python +# --- Conceptual Pydantic Model --- +# File: agent/views.py (Simplified Example) +from pydantic import BaseModel +from typing import Optional + +class ActionResult(BaseModel): + is_done: Optional[bool] = False + success: Optional[bool] = None + extracted_content: Optional[str] = None + error: Optional[str] = None + include_in_memory: bool = False # Default to False + +# Pydantic helps ensure results are consistently structured. +# For example, 'is_done' must be True or False if provided. +``` + +## The Power of Blueprints: Ensuring Consistency + +Using Pydantic models for these data structures provides a huge benefit: **automatic validation**. + +Imagine the LLM sends back a plan, but it forgets to include the `index` for a `click_element` action. + +```json +// Bad LLM Response (Missing 'index') +{ + "current_state": { ... }, + "action": [ + { + "click_element": { + "xpath": "//button[@id='submit']" // 'index' is missing! + } + } + ] +} +``` + +When the [Agent](01_agent.md) tries to parse this JSON into the `AgentOutput` Pydantic model, Pydantic will immediately notice that the `index` field (which is required by the `ClickElementAction` blueprint) is missing. It will raise a `ValidationError`. + +```python +# --- Conceptual Agent Code --- +import pydantic +# Assume AgentOutput is the Pydantic model defined earlier +# Assume 'llm_json_response' contains the bad JSON from above + +try: + # Try to create the AgentOutput object from the LLM's response + llm_plan = AgentOutput.model_validate_json(llm_json_response) + # If validation succeeds, proceed... + print("LLM Plan Validated:", llm_plan) +except pydantic.ValidationError as e: + # Pydantic catches the error! + print(f"Validation Error: The LLM response didn't match the blueprint!") + print(e) + # The Agent can now handle this error gracefully, + # maybe asking the LLM to try again, instead of crashing later. +``` + +This automatic checking catches errors early, preventing the [Action Controller & Registry](05_action_controller___registry.md) from receiving incomplete instructions and making the whole system much more robust and easier to debug. It enforces the "contract" between different components. + +## Under the Hood: Simple Classes + +These data structures are simply Python classes, mostly inheriting from `pydantic.BaseModel` or defined using Python's built-in `dataclass`. They don't contain complex logic themselves; their main job is to define the *shape* and *type* of the data. You'll find their definitions scattered across the various `views.py` files within the project's component directories (like `agent/`, `browser/`, `controller/`, `dom/`). + +Think of them as the official vocabulary and grammar rules that all the components agree to use when communicating. + +## Conclusion + +Data Structures (Views), primarily defined using Pydantic models, are the essential blueprints that ensure consistent and reliable communication within the `Browser Use` project. They act like standardized forms for `BrowserState`, `AgentOutput`, `ActionModel`, and `ActionResult`, making sure every component knows exactly what kind of data to expect and how to interpret it. + +By defining these clear structures and leveraging Pydantic's automatic validation, `Browser Use` prevents misunderstandings between components, catches errors early, and makes the overall system more robust and maintainable. These standardized structures also make it easier to log and understand what's happening in the system. + +Speaking of logging and understanding the system's behavior, how can we monitor the Agent's performance and gather data for improvement? In the next and final chapter, we'll explore the [Telemetry Service](08_telemetry_service.md). + +[Next Chapter: Telemetry Service](08_telemetry_service.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Browser Use/08_telemetry_service.md b/output/Browser Use/08_telemetry_service.md new file mode 100644 index 0000000..6f9b7df --- /dev/null +++ b/output/Browser Use/08_telemetry_service.md @@ -0,0 +1,290 @@ +# Chapter 8: Telemetry Service - Helping Improve the Project (Optional) + +In the [previous chapter](07_data_structures__views_.md), we explored the essential blueprints (`Data Structures (Views)`) that keep communication clear and consistent between all the parts of `Browser Use`. We saw how components like the [Agent](01_agent.md) and the [Action Controller & Registry](05_action_controller___registry.md) use these blueprints to exchange information reliably. + +Now, let's think about the project itself. How do the developers who build `Browser Use` know if it's working well for users? How do they find out about common errors or which features are most popular, so they can make the tool better? + +## What Problem Does the Telemetry Service Solve? + +Imagine you released a new tool, like `Browser Use`. You want it to be helpful, but you don't know how people are actually using it. Are they running into unexpected errors? Are certain actions (like clicking vs. scrolling) causing problems? Is the performance okay? Without some feedback, it's hard to know where to focus improvements. + +One way to get feedback is through bug reports or feature requests, but that only captures a small fraction of user experiences. We need a way to get a broader, anonymous picture of how the tool is performing "in the wild." + +The **Telemetry Service** solves this by providing an *optional* and *anonymous* way to send basic usage statistics back to the project developers. Think of it like an anonymous suggestion box or an automatic crash report that doesn't include any personal information. + +**Crucially:** This service is designed to protect user privacy. It doesn't collect website content, personal data, or anything sensitive. It only sends anonymous statistics about the tool's operation, and **it can be completely disabled**. + +## Meet `ProductTelemetry`: The Anonymous Reporter + +The component responsible for this is the `ProductTelemetry` service, found in `telemetry/service.py`. + +* **Collects Usage Data:** It gathers anonymized information about events like: + * When an [Agent](01_agent.md) starts or finishes a run. + * Details about each step the Agent takes (like which actions were used). + * Errors encountered during agent runs. + * Which actions are defined in the [Action Controller & Registry](05_action_controller___registry.md). +* **Anonymizes Data:** It uses a randomly generated user ID (stored locally, not linked to you) to group events from the same installation without knowing *who* the user is. +* **Sends Data:** It sends this anonymous data to a secure third-party service (PostHog) used by the developers to analyze trends and identify potential issues. +* **Optional:** You can easily turn it off. + +## How is Telemetry Used? (Mostly Automatic) + +You usually don't interact with the `ProductTelemetry` service directly. Instead, other components like the `Agent` and `Controller` automatically call it at key moments. + +**Example: Agent Run Start/End** + +When you create an `Agent` and call `agent.run()`, the Agent automatically notifies the Telemetry Service. + +```python +# --- File: agent/service.py (Simplified Agent run method) --- +class Agent: + # ... (other methods) ... + + # Agent has a telemetry object initialized in __init__ + # self.telemetry = ProductTelemetry() + + async def run(self, max_steps: int = 100) -> AgentHistoryList: + # ---> Tell Telemetry: Agent run is starting <--- + self._log_agent_run() # This includes a telemetry.capture() call + + try: + # ... (main agent loop runs here) ... + for step_num in range(max_steps): + # ... (agent takes steps) ... + if self.state.history.is_done(): + break + # ... + finally: + # ---> Tell Telemetry: Agent run is ending <--- + self.telemetry.capture( + AgentEndTelemetryEvent( # Uses a specific data structure + agent_id=self.state.agent_id, + is_done=self.state.history.is_done(), + success=self.state.history.is_successful(), + # ... other anonymous stats ... + ) + ) + # ... (cleanup browser etc.) ... + + return self.state.history +``` + +**Explanation:** + +1. When the `Agent` is created, it gets an instance of `ProductTelemetry`. +2. Inside the `run` method, before the main loop starts, `_log_agent_run()` is called, which internally uses `self.telemetry.capture()` to send an `AgentRunTelemetryEvent`. +3. After the loop finishes (or an error occurs), the `finally` block ensures that another `self.telemetry.capture()` call is made, this time sending an `AgentEndTelemetryEvent` with summary statistics about the run. + +Similarly, the `Agent.step` method captures an `AgentStepTelemetryEvent`, and the `Controller`'s `Registry` captures a `ControllerRegisteredFunctionsTelemetryEvent` when it's initialized. This happens automatically in the background if telemetry is enabled. + +## How to Disable Telemetry + +If you prefer not to send any anonymous usage data, you can easily disable the Telemetry Service. + +Set the environment variable `ANONYMIZED_TELEMETRY` to `False`. + +How you set environment variables depends on your operating system: + +* **Linux/macOS (in terminal):** + ```bash + export ANONYMIZED_TELEMETRY=False + # Now run your Python script in the same terminal + python your_agent_script.py + ``` +* **Windows (Command Prompt):** + ```cmd + set ANONYMIZED_TELEMETRY=False + python your_agent_script.py + ``` +* **Windows (PowerShell):** + ```powershell + $env:ANONYMIZED_TELEMETRY="False" + python your_agent_script.py + ``` +* **In Python Code (using `os` module, *before* importing `browser_use`):** + ```python + import os + os.environ['ANONYMIZED_TELEMETRY'] = 'False' + + # Now import and use browser_use + from browser_use import Agent # ... other imports + # ... rest of your script ... + ``` + +If this environment variable is set to `False`, the `ProductTelemetry` service will be initialized in a disabled state, and no data will be collected or sent. + +## How It Works Under the Hood: Sending Anonymous Data + +When telemetry is enabled and an event occurs (like `agent.run()` starting): + +1. **Component Calls Capture:** The `Agent` (or `Controller`) calls `telemetry.capture(event_data)`. +2. **Telemetry Service Checks:** The `ProductTelemetry` service checks if it's enabled. If not, it does nothing. +3. **Get User ID:** It retrieves or generates a unique, anonymous user ID. This is typically a random UUID (like `a1b2c3d4-e5f6-7890-abcd-ef1234567890`) stored in a hidden file on your computer (`~/.cache/browser_use/telemetry_user_id`). This ID helps group events from the same installation without identifying the actual user. +4. **Send to PostHog:** It sends the event data (structured using Pydantic models like `AgentRunTelemetryEvent`) along with the anonymous user ID to PostHog, a third-party service specialized in product analytics. +5. **Analysis:** Developers can then look at aggregated, anonymous trends in PostHog (e.g., "What percentage of agent runs finish successfully?", "What are the most common errors?") to understand usage patterns and prioritize improvements. + +Here's a simplified diagram: + +```mermaid +sequenceDiagram + participant Agent + participant TelemetrySvc as ProductTelemetry + participant LocalFile as ~/.cache/.../user_id + participant PostHog + + Agent->>TelemetrySvc: capture(AgentRunEvent) + Note over TelemetrySvc: Telemetry Enabled? Yes. + TelemetrySvc->>LocalFile: Read existing User ID (or create new) + LocalFile-->>TelemetrySvc: Anonymous User ID (UUID) + Note over TelemetrySvc: Package Event + User ID + TelemetrySvc->>PostHog: Send(EventData, UserID) + PostHog-->>TelemetrySvc: Acknowledgment (Optional) +``` + +Let's look at the simplified code involved. + +**1. Initializing Telemetry (`telemetry/service.py`)** + +The service checks the environment variable during initialization. + +```python +# --- File: telemetry/service.py (Simplified __init__) --- +import os +import uuid +import logging +from pathlib import Path +from posthog import Posthog # The library for the external service +from browser_use.utils import singleton + +logger = logging.getLogger(__name__) + +@singleton # Ensures only one instance exists +class ProductTelemetry: + USER_ID_PATH = str(Path.home() / '.cache' / 'browser_use' / 'telemetry_user_id') + # ... (API key constants) ... + _curr_user_id = None + + def __init__(self) -> None: + # Check the environment variable + telemetry_disabled = os.getenv('ANONYMIZED_TELEMETRY', 'true').lower() == 'false' + + if telemetry_disabled: + self._posthog_client = None # Telemetry is off + logger.debug('Telemetry disabled by environment variable.') + else: + # Initialize the PostHog client if enabled + self._posthog_client = Posthog(...) + logger.info( + 'Anonymized telemetry enabled.' # Inform the user + ) + # Optionally silence PostHog's own logs + # ... + + # ... (other methods) ... +``` + +**2. Capturing an Event (`telemetry/service.py`)** + +The `capture` method sends the data if the client is active. + +```python +# --- File: telemetry/service.py (Simplified capture) --- +# Assume BaseTelemetryEvent is the base Pydantic model for events +from browser_use.telemetry.views import BaseTelemetryEvent + +class ProductTelemetry: + # ... (init) ... + + def capture(self, event: BaseTelemetryEvent) -> None: + # Do nothing if telemetry is disabled + if self._posthog_client is None: + return + + try: + # Get the anonymous user ID (lazy loaded) + anon_user_id = self.user_id + + # Send the event name and its properties (as a dictionary) + self._posthog_client.capture( + distinct_id=anon_user_id, + event=event.name, # e.g., "agent_run" + properties=event.properties # Data from the event model + ) + logger.debug(f'Telemetry event captured: {event.name}') + except Exception as e: + # Don't crash the main application if telemetry fails + logger.error(f'Failed to send telemetry event {event.name}: {e}') + + @property + def user_id(self) -> str: + """Gets or creates the anonymous user ID.""" + if self._curr_user_id: + return self._curr_user_id + + try: + # Check if the ID file exists + id_file = Path(self.USER_ID_PATH) + if not id_file.exists(): + # Create directory and generate a new UUID if it doesn't exist + id_file.parent.mkdir(parents=True, exist_ok=True) + new_user_id = str(uuid.uuid4()) + id_file.write_text(new_user_id) + self._curr_user_id = new_user_id + else: + # Read the existing UUID from the file + self._curr_user_id = id_file.read_text().strip() + except Exception: + # Fallback if file access fails + self._curr_user_id = 'UNKNOWN_USER_ID' + return self._curr_user_id + +``` + +**3. Event Data Structures (`telemetry/views.py`)** + +Like other components, Telemetry uses Pydantic models to define the structure of the data being sent. + +```python +# --- File: telemetry/views.py (Simplified Event Example) --- +from dataclasses import dataclass, asdict +from typing import Any, Dict, Sequence + +# Base class for all telemetry events (conceptual) +@dataclass +class BaseTelemetryEvent: + @property + def name(self) -> str: + raise NotImplementedError + @property + def properties(self) -> Dict[str, Any]: + # Helper to convert the dataclass fields to a dictionary + return {k: v for k, v in asdict(self).items() if k != 'name'} + +# Specific event for when an agent run starts +@dataclass +class AgentRunTelemetryEvent(BaseTelemetryEvent): + agent_id: str # Anonymous ID for the specific agent instance + use_vision: bool # Was vision enabled? + task: str # The task description (anonymized/hashed in practice) + model_name: str # Name of the LLM used + chat_model_library: str # Library used for the LLM (e.g., ChatOpenAI) + version: str # browser-use version + source: str # How browser-use was installed (e.g., pip, git) + name: str = 'agent_run' # The event name sent to PostHog + +# ... other event models like AgentEndTelemetryEvent, AgentStepTelemetryEvent ... +``` + +These structures ensure the data sent to PostHog is consistent and well-defined. + +## Conclusion + +The **Telemetry Service** (`ProductTelemetry`) provides an optional and privacy-conscious way for the `Browser Use` project to gather anonymous feedback about how the tool is being used. It automatically captures events like agent runs, steps, and errors, sending anonymized statistics to developers via PostHog. + +This feedback loop is vital for identifying common issues, understanding feature usage, and ultimately improving the `Browser Use` library for everyone. Remember, you have full control and can easily disable this service by setting the `ANONYMIZED_TELEMETRY=False` environment variable. + +This chapter concludes our tour of the core components within the `Browser Use` project. You've learned about the [Agent](01_agent.md), the guiding [System Prompt](02_system_prompt.md), the isolated [BrowserContext](03_browsercontext.md), the webpage map ([DOM Representation](04_dom_representation.md)), the action execution engine ([Action Controller & Registry](05_action_controller___registry.md)), the conversation tracker ([Message Manager](06_message_manager.md)), the data blueprints ([Data Structures (Views)](07_data_structures__views_.md)), and now the optional feedback mechanism ([Telemetry Service](08_telemetry_service.md)). We hope this gives you a solid foundation for understanding and using `Browser Use`! + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Browser Use/index.md b/output/Browser Use/index.md new file mode 100644 index 0000000..8865237 --- /dev/null +++ b/output/Browser Use/index.md @@ -0,0 +1,53 @@ +# Tutorial: Browser Use + +**Browser Use** is a project that allows an *AI agent* to control a web browser and perform tasks automatically. +Think of it like an AI assistant that can browse websites, fill forms, click buttons, and extract information based on your instructions. It uses a Large Language Model (LLM) as its "brain" to decide what actions to take on a webpage to complete a given *task*. The project manages the browser session, understands the page structure (DOM), and communicates back and forth with the LLM. + + +**Source Repository:** [https://github.com/browser-use/browser-use/tree/3076ba0e83f30b45971af58fe2aeff64472da812/browser_use](https://github.com/browser-use/browser-use/tree/3076ba0e83f30b45971af58fe2aeff64472da812/browser_use) + +```mermaid +flowchart TD + A0["Agent"] + A1["BrowserContext"] + A2["Action Controller & Registry"] + A3["DOM Representation"] + A4["Message Manager"] + A5["System Prompt"] + A6["Data Structures (Views)"] + A7["Telemetry Service"] + A0 -- "Gets state from" --> A1 + A0 -- "Uses to execute actions" --> A2 + A0 -- "Uses for LLM communication" --> A4 + A0 -- "Gets instructions from" --> A5 + A0 -- "Uses/Produces data formats" --> A6 + A0 -- "Logs events to" --> A7 + A1 -- "Gets DOM structure via" --> A3 + A1 -- "Provides BrowserState" --> A6 + A2 -- "Executes actions on" --> A1 + A2 -- "Defines/Uses ActionModel/Ac..." --> A6 + A2 -- "Logs registered functions to" --> A7 + A3 -- "Provides structure to" --> A1 + A3 -- "Uses DOM structures" --> A6 + A4 -- "Provides messages to" --> A0 + A4 -- "Initializes with" --> A5 + A4 -- "Formats data using" --> A6 + A5 -- "Defines structure for Agent..." --> A6 + A7 -- "Receives events from" --> A0 +``` + +## Chapters + +1. [Agent](01_agent.md) +2. [System Prompt](02_system_prompt.md) +3. [BrowserContext](03_browsercontext.md) +4. [DOM Representation](04_dom_representation.md) +5. [Action Controller & Registry](05_action_controller___registry.md) +6. [Message Manager](06_message_manager.md) +7. [Data Structures (Views)](07_data_structures__views_.md) +8. [Telemetry Service](08_telemetry_service.md) + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/01_celery_app.md b/output/Celery/01_celery_app.md new file mode 100644 index 0000000..4124b90 --- /dev/null +++ b/output/Celery/01_celery_app.md @@ -0,0 +1,293 @@ +# Chapter 1: The Celery App - Your Task Headquarters + +Welcome to the world of Celery! If you've ever thought, "I wish this slow part of my web request could run somewhere else later," or "How can I process this huge amount of data without freezing my main application?", then Celery is here to help. + +Celery allows you to run code (we call these "tasks") separately from your main application, either in the background on the same machine or distributed across many different machines. + +But how do you tell Celery *what* tasks to run and *how* to run them? That's where the **Celery App** comes in. + +## What Problem Does the Celery App Solve? + +Imagine you're building a website. When a user uploads a profile picture, you need to resize it into different formats (thumbnail, medium, large). Doing this immediately when the user clicks "upload" can make the request slow and keep the user waiting. + +Ideally, you want to: +1. Quickly save the original image. +2. Tell the user "Okay, got it!" +3. *Later*, in the background, resize the image. + +Celery helps with step 3. But you need a central place to define the "resize image" task and configure *how* it should be run (e.g., where to send the request to resize, where to store the result). The **Celery App** is that central place. + +Think of it like the main application object in web frameworks like Flask or Django. It's the starting point, the brain, the headquarters for everything Celery-related in your project. + +## Creating Your First Celery App + +Getting started is simple. You just need to create an instance of the `Celery` class. + +Let's create a file named `celery_app.py`: + +```python +# celery_app.py +from celery import Celery + +# Create a Celery app instance +# 'tasks' is just a name for this app instance, often the module name. +# 'broker' tells Celery where to send task messages. +# We'll use Redis here for simplicity (you need Redis running). +app = Celery('tasks', + broker='redis://localhost:6379/0', + backend='redis://localhost:6379/0') # Added backend for results + +print(f"Celery app created: {app}") +``` + +**Explanation:** + +* `from celery import Celery`: We import the main `Celery` class. +* `app = Celery(...)`: We create an instance. + * `'tasks'`: This is the *name* of our Celery application. It's often good practice to use the name of the module where your app is defined. Celery uses this name to automatically name tasks if you don't provide one explicitly. + * `broker='redis://localhost:6379/0'`: This is crucial! It tells Celery where to send the task messages. A "broker" is like a post office for tasks. We're using Redis here, but Celery supports others like RabbitMQ. We'll learn more about the [Broker Connection (AMQP)](04_broker_connection__amqp_.md) in Chapter 4. (Note: AMQP is the protocol often used with brokers like RabbitMQ, but the concept applies even when using Redis). + * `backend='redis://localhost:6379/0'`: This tells Celery where to store the results of your tasks. If your task returns a value (like `2+2` returns `4`), Celery can store this `4` in the backend. We'll cover the [Result Backend](06_result_backend.md) in Chapter 6. + +That's it! You now have a `Celery` application instance named `app`. This `app` object is your main tool for working with Celery. + +## Defining a Task with the App + +Now that we have our `app`, how do we define a task? We use the `@app.task` decorator. + +Let's modify `celery_app.py`: + +```python +# celery_app.py +from celery import Celery +import time + +# Create a Celery app instance +app = Celery('tasks', + broker='redis://localhost:6379/0', + backend='redis://localhost:6379/0') + +# Define a simple task using the app's decorator +@app.task +def add(x, y): + print(f"Task 'add' started with args: ({x}, {y})") + time.sleep(2) # Simulate some work + result = x + y + print(f"Task 'add' finished with result: {result}") + return result + +print(f"Task 'add' is registered: {app.tasks.get('celery_app.add')}") +``` + +**Explanation:** + +* `@app.task`: This is the magic decorator. It takes our regular Python function `add(x, y)` and registers it as a Celery task within our `app`. +* Now, `app` knows about a task called `celery_app.add` (Celery automatically generates the name based on the module `celery_app` and function `add`). +* We'll learn all about [Task](03_task.md)s in Chapter 3. + +## Sending a Task (Conceptual) + +How do we actually *run* this `add` task in the background? We use methods like `.delay()` or `.apply_async()` on the task object itself. + +```python +# In a separate Python script or interpreter, after importing 'add' from celery_app.py +from celery_app import add + +# Send the task to the broker configured in our 'app' +result_promise = add.delay(4, 5) + +print(f"Task sent! It will run in the background.") +print(f"We got back a promise object: {result_promise}") +# We can later check the result using result_promise.get() +# (Requires a result backend and a worker running the task) +``` + +**Explanation:** + +* `add.delay(4, 5)`: This doesn't run the `add` function *right now*. Instead, it: + 1. Packages the task name (`celery_app.add`) and its arguments (`4`, `5`) into a message. + 2. Sends this message to the **broker** (Redis, in our case) that was configured in our `Celery` app instance (`app`). +* It returns an `AsyncResult` object (our `result_promise`), which is like an IOU or a placeholder for the actual result. We can use this later to check if the task finished and what its result was (if we configured a [Result Backend](06_result_backend.md)). + +A separate program, called a Celery [Worker](05_worker.md), needs to be running. This worker watches the broker for new task messages, executes the corresponding task function, and (optionally) stores the result in the backend. We'll learn how to run a worker in Chapter 5. + +The key takeaway here is that the **Celery App** holds the configuration needed (`broker` and `backend` URLs) for `add.delay()` to know *where* to send the task message and potentially where the result will be stored. + +## How It Works Internally (High-Level) + +Let's visualize the process of creating the app and sending a task: + +1. **Initialization (`Celery(...)`)**: When you create `app = Celery(...)`, the app instance stores the `broker` and `backend` URLs and sets up internal components like the task registry. +2. **Task Definition (`@app.task`)**: The decorator tells the `app` instance: "Hey, remember this function `add`? It's a task." The app stores this information in its internal task registry (`app.tasks`). +3. **Sending a Task (`add.delay(4, 5)`)**: + * `add.delay()` looks up the `app` it belongs to. + * It asks the `app` for the `broker` URL. + * It creates a message containing the task name (`celery_app.add`), arguments (`4, 5`), and other details. + * It uses the `broker` URL to connect to the broker (Redis) and sends the message. + +```mermaid +sequenceDiagram + participant Client as Your Python Code + participant CeleryApp as app = Celery(...) + participant AddTask as @app.task add() + participant Broker as Redis/RabbitMQ + + Client->>CeleryApp: Create instance (broker='redis://...') + Client->>AddTask: Define add() function with @app.task + Note over AddTask,CeleryApp: Decorator registers 'add' with 'app' + + Client->>AddTask: Call add.delay(4, 5) + AddTask->>CeleryApp: Get broker configuration + CeleryApp-->>AddTask: 'redis://...' + AddTask->>Broker: Send task message ('add', 4, 5) + Broker-->>AddTask: Acknowledgment (message sent) + AddTask-->>Client: Return AsyncResult (promise) +``` + +This diagram shows how the `Celery App` acts as the central coordinator, holding configuration and enabling the task (`add`) to send its execution request to the Broker. + +## Code Dive: Inside the `Celery` Class + +Let's peek at some relevant code snippets (simplified for clarity). + +**Initialization (`app/base.py`)** + +When you call `Celery(...)`, the `__init__` method runs: + +```python +# Simplified from celery/app/base.py +from .registry import TaskRegistry +from .utils import Settings + +class Celery: + def __init__(self, main=None, broker=None, backend=None, + include=None, config_source=None, task_cls=None, + autofinalize=True, **kwargs): + + self.main = main # Store the app name ('tasks' in our example) + self._tasks = TaskRegistry({}) # Create an empty dictionary for tasks + + # Store broker/backend/include settings temporarily + self._preconf = {} + self.__autoset('broker_url', broker) + self.__autoset('result_backend', backend) + self.__autoset('include', include) + # ... other kwargs ... + + # Configuration object - initially pending, loaded later + self._conf = Settings(...) + + # ... other setup ... + + _register_app(self) # Register this app instance globally (sometimes useful) + + # Helper to store initial settings before full configuration load + def __autoset(self, key, value): + if value is not None: + self._preconf[key] = value +``` + +This shows how the `Celery` object is initialized, storing the name, setting up a task registry, and holding onto initial configuration like the `broker` URL. The full configuration is often loaded later (see [Configuration](02_configuration.md)). + +**Task Decorator (`app/base.py`)** + +The `@app.task` decorator ultimately calls `_task_from_fun`: + +```python +# Simplified from celery/app/base.py + + def task(self, *args, **opts): + # ... logic to handle decorator arguments ... + def _create_task_cls(fun): + # If app isn't finalized, might return a proxy object first + # Eventually calls _task_from_fun to create/register the task + ret = self._task_from_fun(fun, **opts) + return ret + return _create_task_cls + + def _task_from_fun(self, fun, name=None, base=None, bind=False, **options): + # Generate task name if not provided (e.g., 'celery_app.add') + name = name or self.gen_task_name(fun.__name__, fun.__module__) + base = base or self.Task # Default base Task class + + # Check if task already registered + if name not in self._tasks: + # Create a Task class dynamically based on the function + task = type(fun.__name__, (base,), { + 'app': self, # Link task back to this app instance! + 'name': name, + 'run': staticmethod(fun), # The actual function to run + # ... other attributes and options ... + })() # Instantiate the new task class + self._tasks[task.name] = task # Add to app's task registry + task.bind(self) # Perform any binding steps + else: + task = self._tasks[name] # Task already exists + return task +``` + +This shows how the decorator uses the `app` instance (`self`) to generate a name, create a `Task` object wrapping your function, associate the task with the app (`'app': self`), and store it in the `app._tasks` registry. + +**Sending Tasks (`app/base.py`)** + +Calling `.delay()` or `.apply_async()` eventually uses `app.send_task`: + +```python +# Simplified from celery/app/base.py + + def send_task(self, name, args=None, kwargs=None, task_id=None, + producer=None, connection=None, router=None, **options): + # ... lots of logic to prepare options, task_id, routing ... + + # Get the routing info (exchange, routing_key, queue) + # Uses app.conf for defaults if not specified + options = self.amqp.router.route(options, name, args, kwargs) + + # Create the message body + message = self.amqp.create_task_message( + task_id or uuid(), # Generate task ID if needed + name, args, kwargs, # Task details + # ... other arguments like countdown, eta, expires ... + ) + + # Get a producer (handles connection/channel to broker) + # Uses the app's producer pool (app.producer_pool) + with self.producer_or_acquire(producer) as P: + # Tell the backend we're about to send (if tracking results) + if not options.get('ignore_result', False): + self.backend.on_task_call(P, task_id) + + # Actually send the message via the producer + self.amqp.send_task_message(P, name, message, **options) + + # Create the AsyncResult object to return to the caller + result = self.AsyncResult(task_id) + # ... set result properties ... + return result +``` + +This highlights how `send_task` relies on the `app` (via `self`) to: +* Access configuration (`self.conf`). +* Use the AMQP utilities (`self.amqp`) for routing and message creation. +* Access the result backend (`self.backend`). +* Get a connection/producer from the pool (`self.producer_or_acquire`). +* Create the `AsyncResult` using the app's result class (`self.AsyncResult`). + +## Conclusion + +You've learned that the `Celery App` is the essential starting point for any Celery project. + +* It acts as the central **headquarters** or **brain**. +* You create it using `app = Celery(...)`, providing at least a name and a `broker` URL. +* It holds **configuration** (like broker/backend URLs). +* It **registers tasks** defined using the `@app.task` decorator. +* It enables tasks to be **sent** to the broker using methods like `.delay()`. + +The app ties everything together. But how do you manage all the different settings Celery offers, beyond just the `broker` and `backend`? + +In the next chapter, we'll dive deeper into how to configure your Celery app effectively. + +**Next:** [Chapter 2: Configuration](02_configuration.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/02_configuration.md b/output/Celery/02_configuration.md new file mode 100644 index 0000000..8134279 --- /dev/null +++ b/output/Celery/02_configuration.md @@ -0,0 +1,252 @@ +# Chapter 2: Configuration - Telling Celery How to Work + +In [Chapter 1: The Celery App](01_celery_app.md), we created our first `Celery` app instance. We gave it a name and told it where our message broker and result backend were located using the `broker` and `backend` arguments: + +```python +# From Chapter 1 +from celery import Celery + +app = Celery('tasks', + broker='redis://localhost:6379/0', + backend='redis://localhost:6379/0') +``` + +This worked, but what if we want to change settings later, or manage many different settings? Passing everything directly when creating the `app` can become messy. + +## What Problem Does Configuration Solve? + +Think of Celery as a busy workshop with different stations (workers, schedulers) and tools (message brokers, result storage). **Configuration** is the central instruction manual or settings panel for this entire workshop. + +It tells Celery things like: + +* **Where is the message broker?** (The post office for tasks) +* **Where should results be stored?** (The filing cabinet for completed work) +* **How should tasks be handled?** (e.g., What format should the messages use? Are there any speed limits for certain tasks?) +* **How should the workers behave?** (e.g., How many tasks can they work on at once?) +* **How should scheduled tasks run?** (e.g., What timezone should be used?) + +Without configuration, Celery wouldn't know how to connect to your broker, where to put results, or how to manage the workflow. Configuration allows you to customize Celery to fit your specific needs. + +## Key Configuration Concepts + +While Celery has many settings, here are some fundamental ones you'll encounter often: + +1. **`broker_url`**: The address of your message broker (like Redis or RabbitMQ). This is essential for sending and receiving task messages. We'll learn more about brokers in [Chapter 4: Broker Connection (AMQP)](04_broker_connection__amqp_.md). +2. **`result_backend`**: The address of your result store. This is needed if you want to keep track of task status or retrieve return values. We cover this in [Chapter 6: Result Backend](06_result_backend.md). +3. **`include`**: A list of module names that the Celery worker should import when it starts. This is often where your task definitions live (like the `add` task from Chapter 1). +4. **`task_serializer`**: Defines the format used to package task messages before sending them to the broker (e.g., 'json', 'pickle'). 'json' is a safe and common default. +5. **`timezone`**: Sets the timezone Celery uses, which is important for scheduled tasks managed by [Chapter 7: Beat (Scheduler)](07_beat__scheduler_.md). + +## How to Configure Your Celery App + +Celery is flexible and offers several ways to set its configuration. + +**Method 1: Directly on the App Object (After Creation)** + +You can update the configuration *after* creating the `Celery` app instance using the `app.conf.update()` method. This is handy for simple adjustments or quick tests. + +```python +# celery_app.py +from celery import Celery + +# Create the app (maybe with initial settings) +app = Celery('tasks', broker='redis://localhost:6379/0') + +# Update configuration afterwards +app.conf.update( + result_backend='redis://localhost:6379/1', # Use database 1 for results + task_serializer='json', + result_serializer='json', + accept_content=['json'], # Only accept json formatted tasks + timezone='Europe/Oslo', + enable_utc=True, # Use UTC timezone internally + # Add task modules to import when worker starts + include=['my_tasks'] # Assumes you have a file my_tasks.py with tasks +) + +print(f"Broker URL set to: {app.conf.broker_url}") +print(f"Result backend set to: {app.conf.result_backend}") +print(f"Timezone set to: {app.conf.timezone}") +``` + +**Explanation:** + +* We create the `app` like before, potentially setting some initial config like the `broker`. +* `app.conf.update(...)`: We pass a Python dictionary to this method. The keys are Celery setting names (like `result_backend`, `timezone`), and the values are what we want to set them to. +* `app.conf` is the central configuration object attached to your `app` instance. + +**Method 2: Dedicated Configuration Module (Recommended)** + +For most projects, especially larger ones, it's cleaner to keep your Celery settings in a separate Python file (e.g., `celeryconfig.py`). + +1. **Create `celeryconfig.py`:** + + ```python + # celeryconfig.py + + # Broker settings + broker_url = 'redis://localhost:6379/0' + + # Result backend settings + result_backend = 'redis://localhost:6379/1' + + # Task settings + task_serializer = 'json' + result_serializer = 'json' + accept_content = ['json'] + + # Timezone settings + timezone = 'America/New_York' + enable_utc = True # Recommended + + # List of modules to import when the Celery worker starts. + imports = ('proj.tasks',) # Example: Assuming tasks are in proj/tasks.py + ``` + + **Explanation:** + * This is just a standard Python file. + * We define variables whose names match the Celery configuration settings (e.g., `broker_url`, `timezone`). Celery expects these specific names. + +2. **Load the configuration in your app file (`celery_app.py`):** + + ```python + # celery_app.py + from celery import Celery + + # Create the app instance (no need to pass broker/backend here now) + app = Celery('tasks') + + # Load configuration from the 'celeryconfig' module + # Assumes celeryconfig.py is in the same directory or Python path + app.config_from_object('celeryconfig') + + print(f"Loaded Broker URL from config file: {app.conf.broker_url}") + print(f"Loaded Timezone from config file: {app.conf.timezone}") + + # You might still define tasks in this file or in the modules listed + # in celeryconfig.imports + @app.task + def multiply(x, y): + return x * y + ``` + + **Explanation:** + * `app = Celery('tasks')`: We create the app instance, but we don't need to specify the broker or backend here because they will be loaded from the file. + * `app.config_from_object('celeryconfig')`: This is the key line. It tells Celery to: + * Find a module named `celeryconfig`. + * Look at all the uppercase variables defined in that module. + * Use those variables to configure the `app`. + +This approach keeps your settings organized and separate from your application logic. + +**Method 3: Environment Variables** + +Celery settings can also be controlled via environment variables. This is very useful for deployments (e.g., using Docker) where you might want to change the broker address without changing code. + +Environment variable names typically follow the pattern `CELERY_`. + +For example, you could set the broker URL in your terminal before running your app or worker: + +```bash +# In your terminal (Linux/macOS) +export CELERY_BROKER_URL='amqp://guest:guest@localhost:5672//' +export CELERY_RESULT_BACKEND='redis://localhost:6379/2' + +# Now run your Python script or Celery worker +python your_script.py +# or +# celery -A your_app_module worker --loglevel=info +``` + +Celery automatically picks up these environment variables. They often take precedence over settings defined in a configuration file or directly on the app, making them ideal for overriding settings in different environments (development, staging, production). + +*Note: The exact precedence order can sometimes depend on how and when configuration is loaded, but environment variables are generally a high-priority source.* + +## How It Works Internally (Simplified View) + +1. **Loading:** When you create a `Celery` app or call `app.config_from_object()`, Celery reads the settings from the specified source (arguments, object/module, environment variables). +2. **Storing:** These settings are stored in a dictionary-like object accessible via `app.conf`. Celery uses a default set of values initially, which are then updated or overridden by your configuration. +3. **Accessing:** When a Celery component needs a setting (e.g., the worker needs the `broker_url` to connect, or a task needs the `task_serializer`), it simply looks up the required key in the `app.conf` object. + +```mermaid +sequenceDiagram + participant ClientCode as Your App Setup (e.g., celery_app.py) + participant CeleryApp as app = Celery(...) + participant ConfigSource as celeryconfig.py / Env Vars + participant Worker as Celery Worker Process + participant Broker as Message Broker (e.g., Redis) + + ClientCode->>CeleryApp: Create instance + ClientCode->>CeleryApp: app.config_from_object('celeryconfig') + CeleryApp->>ConfigSource: Read settings (broker_url, etc.) + ConfigSource-->>CeleryApp: Return settings values + Note over CeleryApp: Stores settings in app.conf + + Worker->>CeleryApp: Start worker for 'app' + Worker->>CeleryApp: Access app.conf.broker_url + CeleryApp-->>Worker: Return 'redis://localhost:6379/0' + Worker->>Broker: Connect using 'redis://localhost:6379/0' +``` + +This diagram shows the app loading configuration first, and then the worker using that stored configuration (`app.conf`) to perform its duties, like connecting to the broker. + +## Code Dive: Where Configuration Lives + +* **`app.conf`:** This is the primary interface you interact with. It's an instance of a special dictionary-like class (`celery.app.utils.Settings`) that handles loading defaults, converting keys (Celery has changed setting names over time), and providing convenient access. You saw this in the direct update example: `app.conf.update(...)`. +* **Loading Logic (`config_from_object`)**: Methods like `app.config_from_object` typically delegate to the app's "loader" (`app.loader`). The loader (e.g., `celery.loaders.base.BaseLoader` or `celery.loaders.app.AppLoader`) handles the actual importing of the configuration module and extracting the settings. See `loaders/base.py` for the `config_from_object` method definition. +* **Default Settings**: Celery has a built-in set of default values for all its settings. These are defined in `celery.app.defaults`. Your configuration overrides these defaults. See `app/defaults.py`. +* **Accessing Settings**: Throughout the Celery codebase, different components access the configuration via `app.conf`. For instance, when sending a task (`app/base.py:send_task`), the code looks up `app.conf.broker_url` (or related settings) to know where and how to send the message. + +```python +# Simplified concept from loaders/base.py +class BaseLoader: + # ... + def config_from_object(self, obj, silent=False): + if isinstance(obj, str): + # Import the module (e.g., 'celeryconfig') + obj = self._smart_import(obj, imp=self.import_from_cwd) + # ... error handling ... + # Store the configuration (simplified - actual process merges) + self._conf = force_mapping(obj) # Treat obj like a dictionary + # ... + return True + +# Simplified concept from app/base.py (where settings are used) +class Celery: + # ... + def send_task(self, name, args=None, kwargs=None, **options): + # ... other setup ... + + # Access configuration to know where the broker is + broker_connection_url = self.conf.broker_url # Reads from app.conf + + # Use the broker URL to get a connection/producer + with self.producer_or_acquire(producer) as P: + # ... create message ... + # Send message using the connection derived from broker_url + self.amqp.send_task_message(P, name, message, **options) + + # ... return result object ... +``` + +This illustrates the core idea: load configuration into `app.conf`, then components read from `app.conf` when they need instructions. + +## Conclusion + +Configuration is the backbone of Celery's flexibility. You've learned: + +* **Why it's needed:** To tell Celery *how* to operate (broker, backend, tasks settings). +* **What can be configured:** Broker/backend URLs, serializers, timezones, task imports, and much more. +* **How to configure:** + * Directly via `app.conf.update()`. + * Using a dedicated module (`celeryconfig.py`) with `app.config_from_object()`. (Recommended) + * Using environment variables (great for deployment). +* **How it works:** Settings are loaded into `app.conf` and accessed by Celery components as needed. + +With your Celery app configured, you're ready to define the actual work you want Celery to do. That's where Tasks come in! + +**Next:** [Chapter 3: Task](03_task.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/03_task.md b/output/Celery/03_task.md new file mode 100644 index 0000000..563258a --- /dev/null +++ b/output/Celery/03_task.md @@ -0,0 +1,245 @@ +# Chapter 3: Task - The Job Description + +In [Chapter 1: The Celery App](01_celery_app.md), we set up our Celery headquarters, and in [Chapter 2: Configuration](02_configuration.md), we learned how to give it instructions. Now, we need to define the *actual work* we want Celery to do. This is where **Tasks** come in. + +## What Problem Does a Task Solve? + +Imagine you have a specific job that needs doing, like "Resize this image to thumbnail size" or "Send a welcome email to this new user." In Celery, each of these specific jobs is represented by a **Task**. + +A Task is like a **job description** or a **recipe**. It contains the exact steps (the code) needed to complete a specific piece of work. You write this recipe once as a Python function, and then you can tell Celery to follow that recipe whenever you need that job done, potentially many times with different inputs (like resizing different images or sending emails to different users). + +The key benefit is that you don't run the recipe immediately yourself. You hand the recipe (the Task) and the ingredients (the arguments, like the image file or the user's email) over to Celery. Celery then finds an available helper (a [Worker](05_worker.md)) who knows how to follow that specific recipe and lets them do the work in the background. This keeps your main application free to do other things. + +## Defining Your First Task + +Defining a task in Celery is surprisingly simple. You just take a regular Python function and "decorate" it using `@app.task`. Remember our `app` object from [Chapter 1](01_celery_app.md)? We use its `task` decorator. + +Let's create a file, perhaps named `tasks.py`, to hold our task definitions: + +```python +# tasks.py +import time +from celery_app import app # Import the app instance we created + +@app.task +def add(x, y): + """A simple task that adds two numbers.""" + print(f"Task 'add' starting with ({x}, {y})") + # Simulate some work taking time + time.sleep(5) + result = x + y + print(f"Task 'add' finished with result: {result}") + return result + +@app.task +def send_welcome_email(user_id): + """A task simulating sending a welcome email.""" + print(f"Task 'send_welcome_email' starting for user {user_id}") + # Simulate email sending process + time.sleep(3) + print(f"Welcome email supposedly sent to user {user_id}") + return f"Email sent to {user_id}" + +# You can have many tasks in one file! +``` + +**Explanation:** + +1. **`from celery_app import app`**: We import the `Celery` app instance we configured earlier. This instance holds the knowledge about our broker and backend. +2. **`@app.task`**: This is the magic decorator! When Celery sees this above a function (`add` or `send_welcome_email`), it says, "Ah! This isn't just a regular function; it's a job description that my workers need to know about." +3. **The Function (`add`, `send_welcome_email`)**: This is the actual Python code that performs the work. It's the core of the task โ€“ the steps in the recipe. It can take arguments (like `x`, `y`, or `user_id`) and can return a value. +4. **Registration**: The `@app.task` decorator automatically *registers* this function with our Celery `app`. Now, `app` knows about a task named `tasks.add` and another named `tasks.send_welcome_email` (Celery creates the name from `module_name.function_name`). Workers connected to this `app` will be able to find and execute this code when requested. + +*Self-Host Note:* If you are running this code, make sure you have a `celery_app.py` file containing your Celery app instance as shown in previous chapters, and that the `tasks.py` file can import `app` from it. + +## Sending a Task for Execution + +Okay, we've written our recipes (`add` and `send_welcome_email`). How do we tell Celery, "Please run the `add` recipe with the numbers 5 and 7"? + +We **don't call the function directly** like `add(5, 7)`. If we did that, it would just run immediately in our current program, which defeats the purpose of using Celery! + +Instead, we use special methods on the task object itself, most commonly `.delay()` or `.apply_async()`. + +Let's try this in a separate Python script or an interactive Python session: + +```python +# run_tasks.py +from tasks import add, send_welcome_email + +print("Let's send some tasks!") + +# --- Using .delay() --- +# Tell Celery to run add(5, 7) in the background +result_promise_add = add.delay(5, 7) +print(f"Sent task add(5, 7). Task ID: {result_promise_add.id}") + +# Tell Celery to run send_welcome_email(123) in the background +result_promise_email = send_welcome_email.delay(123) +print(f"Sent task send_welcome_email(123). Task ID: {result_promise_email.id}") + + +# --- Using .apply_async() --- +# Does the same thing as .delay() but allows more options +result_promise_add_later = add.apply_async(args=(10, 20), countdown=10) # Run after 10s +print(f"Sent task add(10, 20) to run in 10s. Task ID: {result_promise_add_later.id}") + +print("Tasks have been sent to the broker!") +print("A Celery worker needs to be running to pick them up.") +``` + +**Explanation:** + +1. **`from tasks import add, send_welcome_email`**: We import our *task functions*. Because they were decorated with `@app.task`, they are now special Celery Task objects. +2. **`add.delay(5, 7)`**: This is the simplest way to send a task. + * It *doesn't* run `add(5, 7)` right now. + * It takes the arguments `(5, 7)`. + * It packages them up into a **message** along with the task's name (`tasks.add`). + * It sends this message to the **message broker** (like Redis or RabbitMQ) that we configured in our `celery_app.py`. Think of it like dropping a request slip into a mailbox. +3. **`send_welcome_email.delay(123)`**: Same idea, but for our email task. A message with `tasks.send_welcome_email` and the argument `123` is sent to the broker. +4. **`add.apply_async(args=(10, 20), countdown=10)`**: This is a more powerful way to send tasks. + * It does the same fundamental thing: sends a message to the broker. + * It allows for more options, like `args` (positional arguments as a tuple), `kwargs` (keyword arguments as a dict), `countdown` (delay execution by seconds), `eta` (run at a specific future time), and many others. + * `.delay(*args, **kwargs)` is just a convenient shortcut for `.apply_async(args=args, kwargs=kwargs)`. +5. **`result_promise_... = ...`**: Both `.delay()` and `apply_async()` return an `AsyncResult` object immediately. This is *not* the actual result of the task (like `12` for `add(5, 7)`). It's more like a receipt or a tracking number (notice the `.id` attribute). You can use this object later to check if the task finished and what its result was, but only if you've set up a [Result Backend](06_result_backend.md) (Chapter 6). +6. **The Worker**: Sending the task only puts the message on the queue. A separate process, the Celery [Worker](05_worker.md) (Chapter 5), needs to be running. The worker constantly watches the queue, picks up messages, finds the corresponding task function (using the name like `tasks.add`), and executes it with the provided arguments. + +## How It Works Internally (Simplified) + +Let's trace the journey of defining and sending our `add` task: + +1. **Definition (`@app.task` in `tasks.py`)**: + * Python defines the `add` function. + * The `@app.task` decorator sees this function. + * It tells the `Celery` instance (`app`) about this function, registering it under the name `tasks.add` in an internal dictionary (`app.tasks`). The `app` instance knows the broker/backend settings. +2. **Sending (`add.delay(5, 7)` in `run_tasks.py`)**: + * You call `.delay()` on the `add` task object. + * `.delay()` (or `.apply_async()`) internally uses the `app` the task is bound to. + * It asks the `app` for the configured broker URL. + * It creates a message containing: + * Task Name: `tasks.add` + * Arguments: `(5, 7)` + * Other options (like a unique Task ID). + * It connects to the **Broker** (e.g., Redis) using the broker URL. + * It sends the message to a specific queue (usually named 'celery' by default) on the broker. + * It returns an `AsyncResult` object referencing the Task ID. +3. **Waiting**: The message sits in the queue on the broker, waiting. +4. **Execution (by a [Worker](05_worker.md))**: + * A separate Celery Worker process is running, connected to the same broker and `app`. + * The Worker fetches the message from the queue. + * It reads the task name: `tasks.add`. + * It looks up `tasks.add` in its copy of the `app.tasks` registry to find the actual `add` function code. + * It calls the `add` function with the arguments from the message: `add(5, 7)`. + * The function runs (prints logs, sleeps, calculates `12`). + * If a [Result Backend](06_result_backend.md) is configured, the Worker takes the return value (`12`) and stores it in the backend, associated with the Task ID. + * The Worker acknowledges the message to the broker, removing it from the queue. + +```mermaid +sequenceDiagram + participant Client as Your Code (run_tasks.py) + participant TaskDef as @app.task def add() + participant App as Celery App Instance + participant Broker as Message Broker (e.g., Redis) + participant Worker as Celery Worker (separate process) + + Note over TaskDef, App: 1. @app.task registers 'add' function with App's task registry + + Client->>TaskDef: 2. Call add.delay(5, 7) + TaskDef->>App: 3. Get broker config + App-->>TaskDef: Broker URL + TaskDef->>Broker: 4. Send message ('tasks.add', (5, 7), task_id, ...) + Broker-->>TaskDef: Ack (Message Queued) + TaskDef-->>Client: 5. Return AsyncResult(task_id) + + Worker->>Broker: 6. Fetch next message + Broker-->>Worker: Message ('tasks.add', (5, 7), task_id) + Worker->>App: 7. Lookup 'tasks.add' in registry + App-->>Worker: add function code + Worker->>Worker: 8. Execute add(5, 7) -> returns 12 + Note over Worker: (Optionally store result in Backend) + Worker->>Broker: 9. Acknowledge message completion +``` + +## Code Dive: Task Creation and Sending + +* **Task Definition (`@app.task`)**: This decorator is defined in `celery/app/base.py` within the `Celery` class method `task`. It ultimately calls `_task_from_fun`. + + ```python + # Simplified from celery/app/base.py + class Celery: + # ... + def task(self, *args, **opts): + # ... handles decorator arguments ... + def _create_task_cls(fun): + # Returns a Task instance or a Proxy that creates one later + ret = self._task_from_fun(fun, **opts) + return ret + return _create_task_cls + + def _task_from_fun(self, fun, name=None, base=None, bind=False, **options): + # Generate name like 'tasks.add' if not given + name = name or self.gen_task_name(fun.__name__, fun.__module__) + base = base or self.Task # The base Task class (from celery.app.task) + + if name not in self._tasks: # If not already registered... + # Dynamically create a Task class wrapping the function + task = type(fun.__name__, (base,), { + 'app': self, # Link task back to this app instance! + 'name': name, + 'run': staticmethod(fun), # The actual function to run + '__doc__': fun.__doc__, + '__module__': fun.__module__, + # ... other options ... + })() # Instantiate the new Task class + self._tasks[task.name] = task # Add to app's registry! + task.bind(self) # Perform binding steps + else: + task = self._tasks[name] # Task already exists + return task + ``` + This shows how the decorator essentially creates a specialized object (an instance of a class derived from `celery.app.task.Task`) that wraps your original function and registers it with the `app` under a specific name. + +* **Task Sending (`.delay`)**: The `.delay()` method is defined on the `Task` class itself in `celery/app/task.py`. It's a simple shortcut. + + ```python + # Simplified from celery/app/task.py + class Task: + # ... + def delay(self, *args, **kwargs): + """Shortcut for apply_async(args, kwargs)""" + return self.apply_async(args, kwargs) + + def apply_async(self, args=None, kwargs=None, ..., **options): + # ... argument checking, option processing ... + + # Get the app associated with this task instance + app = self._get_app() + + # If always_eager is set, run locally instead of sending + if app.conf.task_always_eager: + return self.apply(args, kwargs, ...) # Runs inline + + # The main path: tell the app to send the task message + return app.send_task( + self.name, args, kwargs, task_type=self, + **options # Includes things like countdown, eta, queue etc. + ) + ``` + You can see how `.delay` just calls `.apply_async`, which then (usually) delegates the actual message sending to the `app.send_task` method we saw briefly in [Chapter 1](01_celery_app.md). The `app` uses its configuration to know *how* and *where* to send the message. + +## Conclusion + +You've learned the core concept of a Celery **Task**: + +* It represents a single, well-defined **unit of work** or **job description**. +* You define a task by decorating a normal Python function with `@app.task`. This **registers** the task with your Celery application. +* You **send** a task request (not run it directly) using `.delay()` or `.apply_async()`. +* Sending a task puts a **message** onto a queue managed by a **message broker**. +* A separate **Worker** process picks up the message and executes the corresponding task function. + +Tasks are the fundamental building blocks of work in Celery. Now that you know how to define a task and request its execution, let's look more closely at the crucial component that handles passing these requests around: the message broker. + +**Next:** [Chapter 4: Broker Connection (AMQP)](04_broker_connection__amqp_.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/04_broker_connection__amqp_.md b/output/Celery/04_broker_connection__amqp_.md new file mode 100644 index 0000000..bc0278e --- /dev/null +++ b/output/Celery/04_broker_connection__amqp_.md @@ -0,0 +1,167 @@ +# Chapter 4: Broker Connection (AMQP) - Celery's Postal Service + +In [Chapter 3: Task](03_task.md), we learned how to define "job descriptions" (Tasks) like `add(x, y)` and how to request them using `.delay()`. But when you call `add.delay(2, 2)`, how does that request actually *get* to a worker process that can perform the addition? It doesn't just magically appear! + +This is where the **Broker Connection** comes in. Think of it as Celery's built-in postal service. + +## What Problem Does the Broker Connection Solve? + +Imagine you want to send a letter (a task request) to a friend (a worker) who lives in another city. You can't just shout the message out your window and hope they hear it. You need: + +1. A **Post Office** (the Message Broker, like RabbitMQ or Redis) that handles mail. +2. A way to **talk to the Post Office** (the Broker Connection) to drop off your letter or pick up mail addressed to you. + +The Broker Connection is that crucial link between your application (where you call `.delay()`) or your Celery worker and the message broker system. It manages sending messages *to* the broker and receiving messages *from* the broker reliably. + +Without this connection, your task requests would never leave your application, and your workers would never know there's work waiting for them. + +## Key Concepts: Post Office & Rules + +Let's break down the pieces: + +1. **The Message Broker (The Post Office):** This is a separate piece of software that acts as a central hub for messages. Common choices are RabbitMQ and Redis. You tell Celery its address using the `broker_url` setting in your [Configuration](02_configuration.md). + ```python + # From Chapter 2 - celeryconfig.py + broker_url = 'amqp://guest:guest@localhost:5672//' # Example for RabbitMQ + # Or maybe: broker_url = 'redis://localhost:6379/0' # Example for Redis + ``` + +2. **The Connection (Talking to the Staff):** This is the active communication channel established between your Python code (either your main app or a worker) and the broker. It's like having an open phone line to the post office. Celery, using a library called `kombu`, handles creating and managing these connections based on the `broker_url`. + +3. **AMQP (The Postal Rules):** AMQP stands for **Advanced Message Queuing Protocol**. Think of it as a specific set of rules and procedures for how post offices should operate โ€“ how letters should be addressed, sorted, delivered, and confirmed. + * RabbitMQ is a broker that speaks AMQP natively. + * Other brokers, like Redis, use different protocols (their own set of rules). + * **Why mention AMQP?** It's a very common and powerful protocol for message queuing, and the principles behind it (exchanges, queues, routing) are fundamental to how Celery routes tasks, even when using other brokers. Celery's internal component for handling this communication is often referred to as `app.amqp` (found in `app/amqp.py`), even though the underlying library (`kombu`) supports multiple protocols. So, we focus on the *concept* of managing the broker connection, often using AMQP terminology as a reference point. + +4. **Producer (Sending Mail):** When your application calls `add.delay(2, 2)`, it acts as a *producer*. It uses its broker connection to send a message ("Please run 'add' with arguments (2, 2)") to the broker. + +5. **Consumer (Receiving Mail):** A Celery [Worker](05_worker.md) acts as a *consumer*. It uses its *own* broker connection to constantly check a specific mailbox (queue) at the broker for new messages. When it finds one, it takes it, performs the task, and tells the broker it's done. + +## How Sending a Task Uses the Connection + +Let's revisit sending a task from [Chapter 3: Task](03_task.md): + +```python +# run_tasks.py (simplified) +from tasks import add +from celery_app import app # Assume app is configured with a broker_url + +# 1. You call .delay() +print("Sending task...") +result_promise = add.delay(2, 2) +# Behind the scenes: +# a. Celery looks at the 'add' task, finds its associated 'app'. +# b. It asks 'app' for the broker_url from its configuration. +# c. It uses the app.amqp component (powered by Kombu) to get a connection +# to the broker specified by the URL (e.g., 'amqp://localhost...'). +# d. It packages the task name 'tasks.add' and args (2, 2) into a message. +# e. It uses the connection to 'publish' (send) the message to the broker. + +print(f"Task sent! ID: {result_promise.id}") +``` + +The `add.delay(2, 2)` call triggers this whole process. It needs the configured `broker_url` to know *which* post office to connect to, and the broker connection handles the actual sending of the "letter" (task message). + +Similarly, a running Celery [Worker](05_worker.md) establishes its own connection to the *same* broker. It uses this connection to *listen* for incoming messages on the queues it's assigned to. + +## How It Works Internally (Simplified) + +Celery uses a powerful library called **Kombu** to handle the low-level details of connecting and talking to different types of brokers (RabbitMQ, Redis, etc.). The `app.amqp` object in Celery acts as a high-level interface to Kombu's features. + +1. **Configuration:** The `broker_url` tells Kombu where and how to connect. +2. **Connection Pool:** To be efficient, Celery (via Kombu) often maintains a *pool* of connections. When you send a task, it might grab an existing, idle connection from the pool instead of creating a new one every time. This is faster. You can see this managed by `app.producer_pool` in `app/base.py`. +3. **Producer:** When `task.delay()` is called, it ultimately uses a `kombu.Producer` object. This object represents the ability to *send* messages. It's tied to a specific connection and channel. +4. **Publishing:** The producer's `publish()` method is called. This takes the task message (already serialized into a format like JSON), specifies the destination (exchange and routing key - think of these like the address and sorting code on an envelope), and sends it over the connection to the broker. +5. **Consumer:** A Worker uses a `kombu.Consumer` object. This object is set up to listen on specific queues via its connection. When a message arrives in one of those queues, the broker pushes it to the consumer over the connection, and the consumer triggers the appropriate Celery task execution logic. + +```mermaid +sequenceDiagram + participant Client as Your App Code + participant Task as add.delay() + participant App as Celery App + participant AppAMQP as app.amqp (Kombu Interface) + participant Broker as RabbitMQ / Redis + + Client->>Task: Call add.delay(2, 2) + Task->>App: Get broker config (broker_url) + App-->>Task: broker_url + Task->>App: Ask to send task 'tasks.add' + App->>AppAMQP: Send task message('tasks.add', (2, 2), ...) + Note over AppAMQP: Gets connection/producer (maybe from pool) + AppAMQP->>Broker: publish(message, routing_info) via Connection + Broker-->>AppAMQP: Acknowledge message received + AppAMQP-->>App: Message sent successfully + App-->>Task: Return AsyncResult + Task-->>Client: Return AsyncResult +``` + +This shows the flow: your code calls `.delay()`, Celery uses its configured connection details (`app.amqp` layer) to get a connection and producer, and then publishes the message to the broker. + +## Code Dive: Sending a Message + +Let's peek inside `app/amqp.py` where the `AMQP` class orchestrates sending. The `send_task_message` method (simplified below) is key. + +```python +# Simplified from app/amqp.py within the AMQP class + +# This function is configured internally and gets called by app.send_task +def _create_task_sender(self): + # ... (lots of setup: getting defaults from config, signals) ... + default_serializer = self.app.conf.task_serializer + default_compressor = self.app.conf.task_compression + + def send_task_message(producer, name, message, + exchange=None, routing_key=None, queue=None, + serializer=None, compression=None, declare=None, + retry=None, retry_policy=None, + **properties): + # ... (Determine exchange, routing_key, queue based on config/options) ... + # ... (Prepare headers, properties, handle retries) ... + + headers, properties, body, sent_event = message # Unpack the prepared message tuple + + # The core action: Use the producer to publish the message! + ret = producer.publish( + body, # The actual task payload (args, kwargs, etc.) + exchange=exchange, + routing_key=routing_key, + serializer=serializer or default_serializer, # e.g., 'json' + compression=compression or default_compressor, + retry=retry, + retry_policy=retry_policy, + declare=declare, # Maybe declare queues/exchanges if needed + headers=headers, + **properties # Other message properties (correlation_id, etc.) + ) + + # ... (Send signals like task_sent, publish events if configured) ... + return ret + return send_task_message +``` + +**Explanation:** + +* This function takes a `producer` object (which is linked to a broker connection via Kombu). +* It figures out the final destination details (exchange, routing key). +* It calls `producer.publish()`, passing the task body and all the necessary options (like serializer). This is the function that actually sends the data over the network connection to the broker. + +The `Connection` objects themselves are managed by Kombu (see `kombu/connection.py`). Celery uses these objects via its `app.connection_for_write()` or `app.connection_for_read()` methods, which often pull from the connection pool (`kombu.pools`). + +## Conclusion + +The Broker Connection is Celery's vital communication link, its "postal service." + +* It connects your application and workers to the **Message Broker** (like RabbitMQ or Redis). +* It uses the `broker_url` from your [Configuration](02_configuration.md) to know where to connect. +* Protocols like **AMQP** define the "rules" for communication, although Celery's underlying library (Kombu) handles various protocols. +* Your app **produces** task messages and sends them over the connection. +* Workers **consume** task messages received over their connection. +* Celery manages connections efficiently, often using **pools**. + +Understanding the broker connection helps clarify how tasks move from where they're requested to where they run. Now that we know how tasks are defined and sent across the wire, let's look at the entity that actually picks them up and does the work. + +**Next:** [Chapter 5: Worker](05_worker.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/05_worker.md b/output/Celery/05_worker.md new file mode 100644 index 0000000..7faf3c5 --- /dev/null +++ b/output/Celery/05_worker.md @@ -0,0 +1,223 @@ +# Chapter 5: Worker - The Task Doer + +In [Chapter 4: Broker Connection (AMQP)](04_broker_connection__amqp_.md), we learned how Celery uses a message broker, like a postal service, to send task messages. When you call `add.delay(2, 2)`, a message asking to run the `add` task with arguments `(2, 2)` gets dropped into a mailbox (the broker queue). + +But who actually checks that mailbox, picks up the message, and performs the addition? That's the job of the **Celery Worker**. + +## What Problem Does the Worker Solve? + +Imagine our workshop analogy again. You've defined the blueprint for a job ([Task](03_task.md)) and you've dropped the work order into the central inbox ([Broker Connection (AMQP)](04_broker_connection__amqp_.md)). Now you need an actual employee or a machine to: + +1. Look in the inbox for new work orders. +2. Pick up an order. +3. Follow the instructions (run the task code). +4. Maybe put the finished product (the result) somewhere specific. +5. Mark the order as complete. + +The **Celery Worker** is that employee or machine. It's a separate program (process) that you run, whose sole purpose is to execute the tasks you send to the broker. Without a worker running, your task messages would just sit in the queue forever, waiting for someone to process them. + +## Starting Your First Worker + +Running a worker is typically done from your command line or terminal. You need to tell the worker where to find your [Celery App](01_celery_app.md) instance (which holds the configuration, including the broker address and the list of known tasks). + +Assuming you have: +* A file `celery_app.py` containing your `app = Celery(...)` instance. +* A file `tasks.py` containing your task definitions (like `add` and `send_welcome_email`) decorated with `@app.task`. +* Your message broker (e.g., Redis or RabbitMQ) running. + +You can start a worker like this: + +```bash +# In your terminal, in the same directory as celery_app.py and tasks.py +# Make sure your Python environment has celery and the broker driver installed +# (e.g., pip install celery redis) + +celery -A celery_app worker --loglevel=info +``` + +**Explanation:** + +* `celery`: This is the main Celery command-line program. +* `-A celery_app`: The `-A` flag (or `--app`) tells Celery where to find your `Celery` app instance. `celery_app` refers to the `celery_app.py` file (or module) and implies Celery should look for an instance named `app` inside it. +* `worker`: This specifies that you want to run the worker component. +* `--loglevel=info`: This sets the logging level. `info` is a good starting point, showing you when the worker connects, finds tasks, and executes them. Other levels include `debug` (more verbose), `warning`, `error`, and `critical`. + +**What You'll See:** + +When the worker starts successfully, you'll see a banner like this (details may vary): + +```text + -------------- celery@yourhostname v5.x.x (stars) +--- ***** ----- +-- ******* ---- Linux-5.15.0...-generic-x86_64-with-... 2023-10-27 10:00:00 +- *** --- * --- +- ** ---------- [config] +- ** ---------- .> app: tasks:0x7f... +- ** ---------- .> transport: redis://localhost:6379/0 +- ** ---------- .> results: redis://localhost:6379/0 +- *** --- * --- .> concurrency: 8 (prefork) +-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker) +--- ***** ----- + -------------- [queues] + .> celery exchange=celery(direct) key=celery + + +[tasks] + . tasks.add + . tasks.send_welcome_email + +[2023-10-27 10:00:01,000: INFO/MainProcess] Connected to redis://localhost:6379/0 +[2023-10-27 10:00:01,050: INFO/MainProcess] mingle: searching for neighbors +[2023-10-27 10:00:02,100: INFO/MainProcess] mingle: all alone +[2023-10-27 10:00:02,150: INFO/MainProcess] celery@yourhostname ready. +``` + +**Key Parts of the Banner:** + +* `celery@yourhostname`: The unique name of this worker instance. +* `transport`: The broker URL it connected to (from your app config). +* `results`: The result backend URL (if configured). +* `concurrency`: How many tasks this worker can potentially run at once (defaults to the number of CPU cores) and the execution pool type (`prefork` is common). We'll touch on this later. +* `queues`: The specific "mailboxes" (queues) the worker is listening to. `celery` is the default queue name. +* `[tasks]`: A list of all the tasks the worker discovered (like our `tasks.add` and `tasks.send_welcome_email`). If your tasks don't show up here, the worker won't be able to run them! + +The final `celery@yourhostname ready.` message means the worker is connected and waiting for jobs! + +## What the Worker Does + +Now that the worker is running, let's trace what happens when you send a task (e.g., from `run_tasks.py` in [Chapter 3: Task](03_task.md)): + +1. **Waiting:** The worker is connected to the broker, listening on the `celery` queue. +2. **Message Arrival:** Your `add.delay(5, 7)` call sends a message to the `celery` queue on the broker. The broker notifies the worker. +3. **Receive & Decode:** The worker receives the raw message. It decodes it to find the task name (`tasks.add`), the arguments (`(5, 7)`), and other info (like a unique task ID). +4. **Find Task Code:** The worker looks up the name `tasks.add` in its internal registry (populated when it started) to find the actual Python function `add` defined in `tasks.py`. +5. **Execute:** The worker executes the function: `add(5, 7)`. + * You will see the `print` statements from your task function appear in the *worker's* terminal output: + ```text + [2023-10-27 10:05:00,100: INFO/ForkPoolWorker-1] Task tasks.add[some-task-id] received + Task 'add' starting with (5, 7) + Task 'add' finished with result: 12 + [2023-10-27 10:05:05,150: INFO/ForkPoolWorker-1] Task tasks.add[some-task-id] succeeded in 5.05s: 12 + ``` +6. **Store Result (Optional):** If a [Result Backend](06_result_backend.md) is configured, the worker takes the return value (`12`) and sends it to the backend, associating it with the task's unique ID. +7. **Acknowledge:** The worker sends an "acknowledgement" (ack) back to the broker. This tells the broker, "I finished processing this message successfully, you can delete it from the queue." This ensures tasks aren't lost if a worker crashes mid-execution (the message would remain on the queue for another worker to pick up). +8. **Wait Again:** The worker goes back to waiting for the next message. + +## Running Multiple Workers and Concurrency + +* **Multiple Workers:** You can start multiple worker processes by running the `celery worker` command again, perhaps on different machines or in different terminals on the same machine. They will all connect to the same broker and pull tasks from the queue, allowing you to process tasks in parallel and scale your application. +* **Concurrency within a Worker:** A single worker process can often handle more than one task concurrently. Celery achieves this using *execution pools*. + * **Prefork (Default):** The worker starts several child *processes*. Each child process handles one task at a time. The `-c` (or `--concurrency`) flag controls the number of child processes (default is the number of CPU cores). This is good for CPU-bound tasks. + * **Eventlet/Gevent:** Uses *green threads* (lightweight concurrency managed by libraries like eventlet or gevent). A single worker process can handle potentially hundreds or thousands of tasks concurrently, especially if the tasks are I/O-bound (e.g., waiting for network requests). You select these using the `-P` flag: `celery -A celery_app worker -P eventlet -c 1000`. Requires installing the respective library (`pip install eventlet` or `pip install gevent`). + * **Solo:** Executes tasks one after another in the main worker process. Useful for debugging. `-P solo`. + * **Threads:** Uses regular OS threads. `-P threads`. Less common for Celery tasks due to Python's Global Interpreter Lock (GIL) limitations for CPU-bound tasks, but can be useful for I/O-bound tasks. + +For beginners, sticking with the default **prefork** pool is usually fine. Just know that the worker can likely handle multiple tasks simultaneously. + +## How It Works Internally (Simplified) + +Let's visualize the worker's main job: processing a single task. + +1. **Startup:** The `celery worker` command starts the main worker process. It loads the `Celery App`, reads the configuration (`broker_url`, tasks to import, etc.). +2. **Connect & Listen:** The worker establishes a connection to the message broker and tells it, "I'm ready to consume messages from the 'celery' queue." +3. **Message Delivery:** The broker sees a message for the 'celery' queue (sent by `add.delay(5, 7)`) and delivers it to the connected worker. +4. **Consumer Receives:** The worker's internal "Consumer" component receives the message. +5. **Task Dispatch:** The Consumer decodes the message, identifies the task (`tasks.add`), and finds the arguments (`(5, 7)`). It then hands this off to the configured execution pool (e.g., prefork). +6. **Pool Execution:** The pool (e.g., a child process in the prefork pool) gets the task function and arguments and executes `add(5, 7)`. +7. **Result Return:** The pool process finishes execution and returns the result (`12`) back to the main worker process. +8. **Result Handling (Optional):** The main worker process, if a [Result Backend](06_result_backend.md) is configured, sends the result (`12`) and task ID to the backend store. +9. **Acknowledgement:** The main worker process sends an "ack" message back to the broker, confirming the task message was successfully processed. The broker then deletes the message. + +```mermaid +sequenceDiagram + participant CLI as Terminal (celery worker) + participant WorkerMain as Worker Main Process + participant App as Celery App Instance + participant Broker as Message Broker + participant Pool as Execution Pool (e.g., Prefork Child) + participant TaskCode as Your Task Function (add) + + CLI->>WorkerMain: Start celery -A celery_app worker + WorkerMain->>App: Load App & Config (broker_url, tasks) + WorkerMain->>Broker: Connect & Listen on 'celery' queue + + Broker-->>WorkerMain: Deliver Message ('tasks.add', (5, 7), task_id) + WorkerMain->>WorkerMain: Decode Message + WorkerMain->>Pool: Request Execute add(5, 7) with task_id + Pool->>TaskCode: Run add(5, 7) + TaskCode-->>Pool: Return 12 + Pool-->>WorkerMain: Result=12 for task_id + Note over WorkerMain: (Optionally) Store 12 in Result Backend + WorkerMain->>Broker: Acknowledge task_id is complete +``` + +## Code Dive: Where Worker Logic Lives + +* **Command Line Entry Point (`celery/bin/worker.py`):** This script handles parsing the command-line arguments (`-A`, `-l`, `-c`, `-P`, etc.) when you run `celery worker ...`. It ultimately creates and starts a `WorkController` instance. (See `worker()` function in the file). +* **Main Worker Class (`celery/worker/worker.py`):** The `WorkController` class is the heart of the worker. It manages all the different components (like the pool, consumer, timer, etc.) using a system called "bootsteps". It handles the overall startup, shutdown, and coordination. (See `WorkController` class). +* **Message Handling (`celery/worker/consumer/consumer.py`):** The `Consumer` class (specifically its `Blueprint` and steps like `Tasks` and `Evloop`) is responsible for the core loop of fetching messages from the broker via the connection, decoding them, and dispatching them to the execution pool using task strategies. (See `Consumer.create_task_handler`). +* **Execution Pools (`celery/concurrency/`):** Modules like `prefork.py`, `solo.py`, `eventlet.py`, `gevent.py` implement the different concurrency models (`-P` flag). The `WorkController` selects and manages one of these pools. + +A highly simplified conceptual view of the core message processing logic within the `Consumer`: + +```python +# Conceptual loop inside the Consumer (highly simplified) + +def message_handler(message): + try: + # 1. Decode message (task name, args, kwargs, id, etc.) + task_name, args, kwargs, task_id = decode_message(message.body, message.headers) + + # 2. Find the registered task function + task_func = app.tasks[task_name] + + # 3. Prepare execution request for the pool + request = TaskRequest(task_id, task_name, task_func, args, kwargs) + + # 4. Send request to the pool for execution + # (Pool runs request.execute() which calls task_func(*args, **kwargs)) + pool.apply_async(request.execute, accept_callback=task_succeeded, ...) + + except Exception as e: + # Handle errors (e.g., unknown task, decoding error) + log_error(e) + message.reject() # Tell broker it failed + +def task_succeeded(task_id, retval): + # Called by the pool when task finishes successfully + # 5. Store result (optional) + if app.backend: + app.backend.store_result(task_id, retval, status='SUCCESS') + + # 6. Acknowledge message to broker + message.ack() + +# --- Setup --- +# Worker connects to broker and registers message_handler +# for incoming messages on the subscribed queue(s) +connection.consume(queue_name, callback=message_handler) + +# Start the event loop to wait for messages +connection.drain_events() +``` + +This illustrates the fundamental cycle: receive -> decode -> find task -> execute via pool -> handle result -> acknowledge. The actual code involves much more detail regarding error handling, state management, different protocols, rate limiting, etc., managed through the bootstep system. + +## Conclusion + +You've now met the **Celery Worker**, the essential component that actually *runs* your tasks. + +* It's a **separate process** you start from the command line (`celery worker`). +* It connects to the **broker** using the configuration from your **Celery App**. +* It **listens** for task messages on queues. +* It **executes** the corresponding task code when a message arrives. +* It handles **concurrency** using execution pools (like prefork, eventlet, gevent). +* It **acknowledges** messages to the broker upon successful completion. + +Without workers, Celery tasks would never get done. But what happens when a task finishes? What if it returns a value, like our `add` task returning `12`? How can your original application find out the result? That's where the Result Backend comes in. + +**Next:** [Chapter 6: Result Backend](06_result_backend.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/06_result_backend.md b/output/Celery/06_result_backend.md new file mode 100644 index 0000000..fe5d120 --- /dev/null +++ b/output/Celery/06_result_backend.md @@ -0,0 +1,318 @@ +# Chapter 6: Result Backend - Checking Your Task's Homework + +In [Chapter 5: Worker](05_worker.md), we met the Celery Worker, the diligent entity that picks up task messages from the [Broker Connection (AMQP)](04_broker_connection__amqp_.md) and executes the code defined in our [Task](03_task.md). + +But what happens after the worker finishes a task? What if the task was supposed to calculate something, like `add(2, 2)`? How do we, back in our main application, find out the answer (`4`)? Or even just know if the task finished successfully or failed? + +This is where the **Result Backend** comes in. It's like a dedicated place to check the status and results of the homework assigned to the workers. + +## What Problem Does the Result Backend Solve? + +Imagine you give your Celery worker a math problem: "What is 123 + 456?". The worker goes away, calculates the answer (579), and... then what? + +If you don't tell the worker *where* to put the answer, it just disappears! You, back in your main program, have no idea if the worker finished, if it got the right answer, or if it encountered an error. + +The **Result Backend** solves this by providing a storage location (like a database, a cache like Redis, or even via the message broker itself) where the worker can: + +1. Record the final **state** of the task (e.g., `SUCCESS`, `FAILURE`). +2. Store the task's **return value** (e.g., `579`) if it succeeded. +3. Store the **error** information (e.g., `TypeError: unsupported operand type(s)...`) if it failed. + +Later, your main application can query this Result Backend using the task's unique ID to retrieve this information. + +Think of it as a shared filing cabinet: +* The **Worker** puts the completed homework (result and status) into a specific folder (identified by the task ID). +* Your **Application** can later look inside that folder (using the task ID) to see the results. + +## Key Concepts + +1. **Storage:** It's a place to store task results and states. This could be Redis, a relational database (like PostgreSQL or MySQL), MongoDB, RabbitMQ (using RPC), and others. +2. **Task ID:** Each task execution gets a unique ID (remember the `result_promise_add.id` from Chapter 3?). This ID is the key used to store and retrieve the result from the backend. +3. **State:** Besides the return value, the backend stores the task's current state (e.g., `PENDING`, `STARTED`, `SUCCESS`, `FAILURE`, `RETRY`, `REVOKED`). +4. **Return Value / Exception:** If the task finishes successfully (`SUCCESS`), the backend stores the value the task function returned. If it fails (`FAILURE`), it stores details about the exception that occurred. +5. **`AsyncResult` Object:** When you call `task.delay()` or `task.apply_async()`, Celery gives you back an `AsyncResult` object. This object holds the task's ID and provides methods to interact with the result backend (check status, get the result, etc.). + +## How to Use a Result Backend + +**1. Configure It!** + +First, you need to tell your Celery app *where* the result backend is located. You do this using the `result_backend` configuration setting, just like you set the `broker_url` in [Chapter 2: Configuration](02_configuration.md). + +Let's configure our app to use Redis (make sure you have Redis running!) as the result backend. We'll use database number `1` for results to keep it separate from the broker which might be using database `0`. + +```python +# celery_app.py +from celery import Celery + +# Configure BOTH broker and result backend +app = Celery('tasks', + broker='redis://localhost:6379/0', + backend='redis://localhost:6379/1') # <-- Result Backend URL + +# You could also use app.config_from_object('celeryconfig') +# if result_backend = 'redis://localhost:6379/1' is in celeryconfig.py + +# ... your task definitions (@app.task) would go here or be imported ... +@app.task +def add(x, y): + import time + time.sleep(3) # Simulate work + return x + y + +@app.task +def fail_sometimes(x): + import random + if random.random() < 0.5: + raise ValueError("Something went wrong!") + return f"Processed {x}" +``` + +**Explanation:** + +* `backend='redis://localhost:6379/1'`: We provide a URL telling Celery to use the Redis server running on `localhost`, port `6379`, and specifically database `1` for storing results. (The `backend` argument is an alias for `result_backend`). + +**2. Send a Task and Get the `AsyncResult`** + +When you send a task, the returned object is your key to the result. + +```python +# run_tasks.py +from celery_app import add, fail_sometimes + +# Send the add task +result_add = add.delay(10, 20) +print(f"Sent task add(10, 20). Task ID: {result_add.id}") + +# Send the task that might fail +result_fail = fail_sometimes.delay("my data") +print(f"Sent task fail_sometimes('my data'). Task ID: {result_fail.id}") +``` + +**Explanation:** + +* `result_add` and `result_fail` are `AsyncResult` objects. They contain the `.id` attribute, which is the unique identifier for *this specific execution* of the task. + +**3. Check the Status and Get the Result** + +Now, you can use the `AsyncResult` object to interact with the result backend. + +**(Run a worker in another terminal first: `celery -A celery_app worker --loglevel=info`)** + +```python +# continue in run_tasks.py or a new Python session +from celery_app import app # Need app for AsyncResult if creating from ID + +# Use the AsyncResult objects we got earlier +# Or, if you only have the ID, you can recreate the AsyncResult: +# result_add = app.AsyncResult('the-task-id-you-saved-earlier') + +print(f"\nChecking results for add task ({result_add.id})...") + +# Check if the task is finished (returns True/False immediately) +print(f"Is add ready? {result_add.ready()}") + +# Check the state (returns 'PENDING', 'STARTED', 'SUCCESS', 'FAILURE', etc.) +print(f"State of add: {result_add.state}") + +# Get the result. IMPORTANT: This call will BLOCK until the task is finished! +# If the task failed, this will raise the exception that occurred in the worker. +try: + # Set a timeout (in seconds) to avoid waiting forever + final_result = result_add.get(timeout=10) + print(f"Result of add: {final_result}") + print(f"Did add succeed? {result_add.successful()}") + print(f"Final state of add: {result_add.state}") +except Exception as e: + print(f"Could not get result for add: {type(e).__name__} - {e}") + print(f"Final state of add: {result_add.state}") + print(f"Did add fail? {result_add.failed()}") + # Get the traceback if it failed + print(f"Traceback: {result_add.traceback}") + + +print(f"\nChecking results for fail_sometimes task ({result_fail.id})...") +try: + # Wait up to 10 seconds for this task + fail_result = result_fail.get(timeout=10) + print(f"Result of fail_sometimes: {fail_result}") + print(f"Did fail_sometimes succeed? {result_fail.successful()}") + print(f"Final state of fail_sometimes: {result_fail.state}") +except Exception as e: + print(f"Could not get result for fail_sometimes: {type(e).__name__} - {e}") + print(f"Final state of fail_sometimes: {result_fail.state}") + print(f"Did fail_sometimes fail? {result_fail.failed()}") + print(f"Traceback:\n{result_fail.traceback}") + +``` + +**Explanation & Potential Output:** + +* `result.ready()`: Checks if the task has finished (reached a `SUCCESS`, `FAILURE`, or other final state). Non-blocking. +* `result.state`: Gets the current state string. Non-blocking. +* `result.successful()`: Returns `True` if the state is `SUCCESS`. Non-blocking. +* `result.failed()`: Returns `True` if the state is `FAILURE` or another exception state. Non-blocking. +* `result.get(timeout=...)`: This is the most common way to get the actual return value. + * **It blocks** (waits) until the task completes *or* the timeout expires. + * If the task state becomes `SUCCESS`, it returns the value the task function returned (e.g., `30`). + * If the task state becomes `FAILURE`, it **raises** the exception that occurred in the worker (e.g., `ValueError: Something went wrong!`). + * If the timeout is reached before the task finishes, it raises a `celery.exceptions.TimeoutError`. +* `result.traceback`: If the task failed, this contains the error traceback string from the worker. + +**(Example Output - might vary for `fail_sometimes` due to randomness)** + +```text +Sent task add(10, 20). Task ID: f5e8a3f6-c7b1-4a9e-8f0a-1b2c3d4e5f6a +Sent task fail_sometimes('my data'). Task ID: 9b1d8c7e-a6f5-4b3a-9c8d-7e6f5a4b3c2d + +Checking results for add task (f5e8a3f6-c7b1-4a9e-8f0a-1b2c3d4e5f6a)... +Is add ready? False +State of add: PENDING # Or STARTED if checked quickly after worker picks it up +Result of add: 30 +Did add succeed? True +Final state of add: SUCCESS + +Checking results for fail_sometimes task (9b1d8c7e-a6f5-4b3a-9c8d-7e6f5a4b3c2d)... +Could not get result for fail_sometimes: ValueError - Something went wrong! +Final state of fail_sometimes: FAILURE +Did fail_sometimes fail? True +Traceback: +Traceback (most recent call last): + File "/path/to/celery/app/trace.py", line ..., in trace_task + R = retval = fun(*args, **kwargs) + File "/path/to/celery/app/trace.py", line ..., in __protected_call__ + return self.run(*args, **kwargs) + File "/path/to/your/project/celery_app.py", line ..., in fail_sometimes + raise ValueError("Something went wrong!") +ValueError: Something went wrong! +``` + +## How It Works Internally + +1. **Task Sent:** Your application calls `add.delay(10, 20)`. It sends a message to the **Broker** and gets back an `AsyncResult` object containing the unique `task_id`. +2. **Worker Executes:** A **Worker** picks up the task message from the Broker. It finds the `add` function and executes `add(10, 20)`. The function returns `30`. +3. **Worker Stores Result:** Because a `result_backend` is configured (`redis://.../1`), the Worker: + * Connects to the Result Backend (Redis DB 1). + * Prepares the result data (e.g., `{'status': 'SUCCESS', 'result': 30, 'task_id': 'f5e8...', ...}`). + * Stores this data in the backend, using the `task_id` as the key (e.g., in Redis, it might set a key like `celery-task-meta-f5e8a3f6-c7b1-4a9e-8f0a-1b2c3d4e5f6a` to the JSON representation of the result data). + * It might also set an expiry time on the result if configured (`result_expires`). +4. **Client Checks Result:** Your application calls `result_add.get(timeout=10)` on the `AsyncResult` object. +5. **Client Queries Backend:** The `AsyncResult` object uses the `task_id` (`f5e8...`) and the configured `result_backend` URL: + * It connects to the Result Backend (Redis DB 1). + * It repeatedly fetches the data associated with the `task_id` key (e.g., `GET celery-task-meta-f5e8...` in Redis). + * It checks the `status` field in the retrieved data. + * If the status is `PENDING` or `STARTED`, it waits a short interval and tries again, until the timeout is reached. + * If the status is `SUCCESS`, it extracts the `result` field (`30`) and returns it. + * If the status is `FAILURE`, it extracts the `result` field (which contains exception info), reconstructs the exception, and raises it. + +```mermaid +sequenceDiagram + participant Client as Your Application + participant Task as add.delay(10, 20) + participant Broker as Message Broker (Redis DB 0) + participant Worker as Celery Worker + participant ResultBackend as Result Backend (Redis DB 1) + participant AsyncResult as result_add = AsyncResult(...) + + Client->>Task: Call add.delay(10, 20) + Task->>Broker: Send task message (task_id: 't1') + Task-->>Client: Return AsyncResult (id='t1') + + Worker->>Broker: Fetch message (task_id: 't1') + Worker->>Worker: Execute add(10, 20) -> returns 30 + Worker->>ResultBackend: Store result (key='t1', value={'status': 'SUCCESS', 'result': 30, ...}) + ResultBackend-->>Worker: Ack (Result stored) + Worker->>Broker: Ack message complete + + Client->>AsyncResult: Call result_add.get(timeout=10) + loop Check Backend Until Ready or Timeout + AsyncResult->>ResultBackend: Get result for key='t1' + ResultBackend-->>AsyncResult: Return {'status': 'SUCCESS', 'result': 30, ...} + end + AsyncResult-->>Client: Return 30 +``` + +## Code Dive: Storing and Retrieving Results + +* **Backend Loading (`celery/app/backends.py`):** When Celery starts, it uses the `result_backend` URL to look up the correct backend class (e.g., `RedisBackend`, `DatabaseBackend`, `RPCBackend`) using functions like `by_url` and `by_name`. These map URL schemes (`redis://`, `db+postgresql://`, `rpc://`) or aliases ('redis', 'db', 'rpc') to the actual Python classes. The mapping is defined in `BACKEND_ALIASES`. +* **Base Classes (`celery/backends/base.py`):** All result backends inherit from `BaseBackend`. Many common backends (like Redis, Memcached) inherit from `BaseKeyValueStoreBackend`, which provides common logic for storing results using keys. +* **Storing Result (`BaseKeyValueStoreBackend._store_result` in `celery/backends/base.py`):** This method (called by the worker) is responsible for actually saving the result. + + ```python + # Simplified from backends/base.py (inside BaseKeyValueStoreBackend) + def _store_result(self, task_id, result, state, + traceback=None, request=None, **kwargs): + # 1. Prepare the metadata dictionary + meta = self._get_result_meta(result=result, state=state, + traceback=traceback, request=request) + meta['task_id'] = bytes_to_str(task_id) # Ensure task_id is str + + # (Check if already successfully stored to prevent overwrites - omitted for brevity) + + # 2. Encode the metadata (e.g., to JSON or pickle) + encoded_meta = self.encode(meta) + + # 3. Get the specific key for this task + key = self.get_key_for_task(task_id) # e.g., b'celery-task-meta-' + + # 4. Call the specific backend's 'set' method (implemented by RedisBackend etc.) + # It might also set an expiry time (self.expires) + try: + self._set_with_state(key, encoded_meta, state) # Calls self.set(key, encoded_meta) + except Exception as exc: + # Handle potential storage errors, maybe retry + raise BackendStoreError(...) from exc + + return result # Returns the original (unencoded) result + ``` + The `self.set()` method is implemented by the concrete backend (e.g., `RedisBackend.set` uses `redis-py` client's `setex` or `set` command). + +* **Retrieving Result (`BaseBackend.wait_for` or `BaseKeyValueStoreBackend.get_many` in `celery/backends/base.py`):** When you call `AsyncResult.get()`, it often ends up calling `wait_for` or similar methods that poll the backend. + + ```python + # Simplified from backends/base.py (inside SyncBackendMixin) + def wait_for(self, task_id, + timeout=None, interval=0.5, no_ack=True, on_interval=None): + """Wait for task and return its result meta.""" + self._ensure_not_eager() # Check if running in eager mode + + time_elapsed = 0.0 + + while True: + # 1. Get metadata from backend (calls self._get_task_meta_for) + meta = self.get_task_meta(task_id) + + # 2. Check if the task is in a final state + if meta['status'] in states.READY_STATES: + return meta # Return the full metadata dict + + # 3. Call interval callback if provided + if on_interval: + on_interval() + + # 4. Sleep to avoid busy-waiting + time.sleep(interval) + time_elapsed += interval + + # 5. Check for timeout + if timeout and time_elapsed >= timeout: + raise TimeoutError('The operation timed out.') + ``` + The `self.get_task_meta(task_id)` eventually calls `self._get_task_meta_for(task_id)`, which in `BaseKeyValueStoreBackend` uses `self.get(key)` (e.g., `RedisBackend.get` uses `redis-py` client's `GET` command) and then decodes the result using `self.decode_result`. + +## Conclusion + +You've learned about the crucial **Result Backend**: + +* It acts as a **storage place** (like a filing cabinet or database) for task results and states. +* It's configured using the `result_backend` setting in your [Celery App](01_celery_app.md). +* The [Worker](05_worker.md) stores the outcome (success value or failure exception) in the backend after executing a [Task](03_task.md). +* You use the `AsyncResult` object (returned by `.delay()` or `.apply_async()`) and its methods (`.get()`, `.state`, `.ready()`) to query the backend using the task's unique ID. +* Various backend types exist (Redis, Database, RPC, etc.), each with different characteristics. + +Result backends allow your application to track the progress and outcome of background work. But what if you want tasks to run automatically at specific times or on a regular schedule, like sending a report every morning? That's where Celery's scheduler comes in. + +**Next:** [Chapter 7: Beat (Scheduler)](07_beat__scheduler_.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/07_beat__scheduler_.md b/output/Celery/07_beat__scheduler_.md new file mode 100644 index 0000000..adbd8a2 --- /dev/null +++ b/output/Celery/07_beat__scheduler_.md @@ -0,0 +1,329 @@ +# Chapter 7: Beat (Scheduler) - Celery's Alarm Clock + +In the last chapter, [Chapter 6: Result Backend](06_result_backend.md), we learned how to track the status and retrieve the results of our background tasks. This is great when we manually trigger tasks from our application. But what if we want tasks to run automatically, without us needing to press a button every time? + +Maybe you need to: +* Send out a newsletter email every Friday morning. +* Clean up temporary files in your system every night. +* Check the health of your external services every 5 minutes. + +How can you make Celery do these things on a regular schedule? Meet **Celery Beat**. + +## What Problem Does Beat Solve? + +Imagine you have a task, say `send_daily_report()`, that needs to run every morning at 8:00 AM. How would you achieve this? You could try setting up a system `cron` job to call a Python script that sends the Celery task, but that adds another layer of complexity. + +Celery provides its own built-in solution: **Beat**. + +**Beat is Celery's periodic task scheduler.** Think of it like a dedicated alarm clock or a `cron` job system built specifically for triggering Celery tasks. It's a separate program that you run alongside your workers. Its job is simple: + +1. Read a list of scheduled tasks (e.g., "run `send_daily_report` every day at 8:00 AM"). +2. Keep track of the time. +3. When the time comes for a scheduled task, Beat sends the task message to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md), just as if you had called `.delay()` yourself. +4. A regular Celery [Worker](05_worker.md) then picks up the task from the broker and executes it. + +Beat doesn't run the tasks itself; it just *schedules* them by sending the messages at the right time. + +## Key Concepts + +1. **Beat Process:** A separate Celery program you run (like `celery -A your_app beat`). It needs access to your Celery app's configuration. +2. **Schedule:** A configuration setting (usually `beat_schedule` in your Celery config) that defines which tasks should run and when. This schedule can use simple intervals (like every 30 seconds) or cron-like patterns (like "every Monday at 9 AM"). +3. **Schedule Storage:** Beat needs to remember when each task was last run so it knows when it's due again. By default, it saves this information to a local file named `celerybeat-schedule` (using Python's `shelve` module). +4. **Ticker:** The heart of Beat. It's an internal loop that wakes up periodically, checks the schedule against the current time, and sends messages for any due tasks. + +## How to Use Beat + +Let's schedule two tasks: +* Our `add` task from [Chapter 3: Task](03_task.md) to run every 15 seconds. +* A new (dummy) task `send_report` to run every minute. + +**1. Define the Schedule in Configuration** + +The best place to define your schedule is in your configuration, either directly on the `app` object or in a separate `celeryconfig.py` file (see [Chapter 2: Configuration](02_configuration.md)). We'll use a separate file. + +First, create the new task in your `tasks.py`: + +```python +# tasks.py (add this new task) +from celery_app import app +import time + +@app.task +def add(x, y): + """A simple task that adds two numbers.""" + print(f"Task 'add' starting with ({x}, {y})") + time.sleep(2) # Simulate short work + result = x + y + print(f"Task 'add' finished with result: {result}") + return result + +@app.task +def send_report(name): + """A task simulating sending a report.""" + print(f"Task 'send_report' starting for report: {name}") + time.sleep(5) # Simulate longer work + print(f"Report '{name}' supposedly sent.") + return f"Report {name} sent." +``` + +Now, update or create `celeryconfig.py`: + +```python +# celeryconfig.py +from datetime import timedelta +from celery.schedules import crontab + +# Basic Broker/Backend settings (replace with your actual URLs) +broker_url = 'redis://localhost:6379/0' +result_backend = 'redis://localhost:6379/1' +timezone = 'UTC' # Or your preferred timezone, e.g., 'America/New_York' +enable_utc = True + +# List of modules to import when the Celery worker starts. +# Make sure tasks.py is discoverable in your Python path +imports = ('tasks',) + +# Define the Beat schedule +beat_schedule = { + # Executes tasks.add every 15 seconds with arguments (16, 16) + 'add-every-15-seconds': { + 'task': 'tasks.add', # The task name + 'schedule': 15.0, # Run every 15 seconds (float or timedelta) + 'args': (16, 16), # Positional arguments for the task + }, + # Executes tasks.send_report every minute + 'send-report-every-minute': { + 'task': 'tasks.send_report', + 'schedule': crontab(), # Use crontab() for "every minute" + 'args': ('daily-summary',), # Argument for the report name + # Example using crontab for more specific timing: + # 'schedule': crontab(hour=8, minute=0, day_of_week='fri'), # Every Friday at 8:00 AM + }, +} +``` + +**Explanation:** + +* `from datetime import timedelta`: Used for simple interval schedules. +* `from celery.schedules import crontab`: Used for cron-like scheduling. +* `imports = ('tasks',)`: Ensures the worker and beat know about the tasks defined in `tasks.py`. +* `beat_schedule = {...}`: This dictionary holds all your scheduled tasks. + * Each key (`'add-every-15-seconds'`, `'send-report-every-minute'`) is a unique name for the schedule entry. + * Each value is another dictionary describing the schedule: + * `'task'`: The full name of the task to run (e.g., `'module_name.task_name'`). + * `'schedule'`: Defines *when* to run. + * A `float` or `int`: number of seconds between runs. + * A `timedelta` object: the time interval between runs. + * A `crontab` object: for complex schedules (minute, hour, day_of_week, etc.). `crontab()` with no arguments means "every minute". + * `'args'`: A tuple of positional arguments to pass to the task. + * `'kwargs'`: (Optional) A dictionary of keyword arguments to pass to the task. + * `'options'`: (Optional) A dictionary of execution options like `queue`, `priority`. + +**2. Load the Configuration in Your App** + +Make sure your `celery_app.py` loads this configuration: + +```python +# celery_app.py +from celery import Celery + +# Create the app instance +app = Celery('tasks') + +# Load configuration from the 'celeryconfig' module +app.config_from_object('celeryconfig') + +# Tasks might be defined here, but we put them in tasks.py +# which is loaded via the 'imports' setting in celeryconfig.py +``` + +**3. Run Celery Beat** + +Now, open a terminal and run the Beat process. You need to tell it where your app is (`-A celery_app`): + +```bash +# In your terminal +celery -A celery_app beat --loglevel=info +``` + +**Explanation:** + +* `celery`: The Celery command-line tool. +* `-A celery_app`: Points to your app instance (in `celery_app.py`). +* `beat`: Tells Celery to start the scheduler process. +* `--loglevel=info`: Shows informational messages about what Beat is doing. + +You'll see output similar to this: + +```text +celery beat v5.x.x is starting. +__ - ... __ - _ +LocalTime -> 2023-10-27 11:00:00 +Configuration -> + . broker -> redis://localhost:6379/0 + . loader -> celery.loaders.app.AppLoader + . scheduler -> celery.beat.PersistentScheduler + . db -> celerybeat-schedule + . logfile -> [stderr]@INFO + . maxinterval -> 300.0s (5m0s) +celery beat v5.x.x has started. +``` + +Beat is now running! It will check the schedule and: +* Every 15 seconds, it will send a message to run `tasks.add(16, 16)`. +* Every minute, it will send a message to run `tasks.send_report('daily-summary')`. + +**4. Run a Worker (Crucial!)** + +Beat only *sends* the task messages. You still need a [Worker](05_worker.md) running to actually *execute* the tasks. Open **another terminal** and start a worker: + +```bash +# In a SECOND terminal +celery -A celery_app worker --loglevel=info +``` + +Now, watch the output in the **worker's terminal**. You should see logs appearing periodically as the worker receives and executes the tasks sent by Beat: + +```text +# Output in the WORKER terminal (example) +[2023-10-27 11:00:15,000: INFO/MainProcess] Task tasks.add[task-id-1] received +Task 'add' starting with (16, 16) +Task 'add' finished with result: 32 +[2023-10-27 11:00:17,050: INFO/MainProcess] Task tasks.add[task-id-1] succeeded in 2.05s: 32 + +[2023-10-27 11:01:00,000: INFO/MainProcess] Task tasks.send_report[task-id-2] received +Task 'send_report' starting for report: daily-summary +[2023-10-27 11:01:00,000: INFO/MainProcess] Task tasks.add[task-id-3] received # Another 'add' task might arrive while 'send_report' runs +Task 'add' starting with (16, 16) +Task 'add' finished with result: 32 +[2023-10-27 11:01:02,050: INFO/MainProcess] Task tasks.add[task-id-3] succeeded in 2.05s: 32 +Report 'daily-summary' supposedly sent. +[2023-10-27 11:01:05,100: INFO/MainProcess] Task tasks.send_report[task-id-2] succeeded in 5.10s: "Report daily-summary sent." +... and so on ... +``` + +You have successfully set up scheduled tasks! + +## How It Works Internally (Simplified) + +1. **Startup:** You run `celery -A celery_app beat`. The Beat process starts. +2. **Load Config:** It loads the Celery app (`celery_app`) and reads its configuration, paying special attention to `beat_schedule`. +3. **Load State:** It opens the schedule file (e.g., `celerybeat-schedule`) to see when each task was last run. If the file doesn't exist, it creates it. +4. **Main Loop (Tick):** Beat enters its main loop (the "ticker"). +5. **Calculate Due Tasks:** In each tick, Beat looks at every entry in `beat_schedule`. For each entry, it compares the current time with the task's `schedule` definition and its `last_run_at` time (from the schedule file). It calculates which tasks are due to run *right now*. +6. **Send Task Message:** If a task (e.g., `add-every-15-seconds`) is due, Beat constructs a task message (containing `'tasks.add'`, `args=(16, 16)`, etc.) just like `.delay()` would. It sends this message to the configured **Broker**. +7. **Update State:** Beat updates the `last_run_at` time for the task it just sent in its internal state and saves this back to the schedule file. +8. **Sleep:** Beat calculates the time until the *next* scheduled task is due and sleeps for that duration (or up to a maximum interval, `beat_max_loop_interval`, usually 5 minutes, whichever is shorter). +9. **Repeat:** Go back to step 5. + +Meanwhile, a **Worker** process is connected to the same **Broker**, picks up the task messages sent by Beat, and executes them. + +```mermaid +sequenceDiagram + participant Beat as Celery Beat Process + participant ScheduleCfg as beat_schedule Config + participant ScheduleDB as celerybeat-schedule File + participant Broker as Message Broker + participant Worker as Celery Worker + + Beat->>ScheduleCfg: Load schedule definitions on startup + Beat->>ScheduleDB: Load last run times on startup + + loop Tick Loop (e.g., every second or more) + Beat->>Beat: Check current time + Beat->>ScheduleCfg: Get definition for 'add-every-15' + Beat->>ScheduleDB: Get last run time for 'add-every-15' + Beat->>Beat: Calculate if 'add-every-15' is due now + alt Task 'add-every-15' is due + Beat->>Broker: Send task message('tasks.add', (16, 16)) + Broker-->>Beat: Ack (Message Queued) + Beat->>ScheduleDB: Update last run time for 'add-every-15' + ScheduleDB-->>Beat: Ack (Saved) + end + Beat->>Beat: Calculate time until next task is due + Beat->>Beat: Sleep until next check + end + + Worker->>Broker: Fetch task message ('tasks.add', ...) + Broker-->>Worker: Deliver message + Worker->>Worker: Execute task add(16, 16) + Worker->>Broker: Ack message complete +``` + +## Code Dive: Where Beat Lives + +* **Command Line (`celery/bin/beat.py`):** Handles the `celery beat` command, parses arguments (`-A`, `-s`, `-S`, `--loglevel`), and creates/runs the `Beat` service object. +* **Beat Service Runner (`celery/apps/beat.py`):** The `Beat` class sets up the environment, loads the app, initializes logging, creates the actual scheduler service (`celery.beat.Service`), installs signal handlers, and starts the service. +* **Beat Service (`celery/beat.py:Service`):** This class manages the lifecycle of the scheduler. Its `start()` method contains the main loop that repeatedly calls `scheduler.tick()`. It loads the scheduler class specified in the configuration (defaulting to `PersistentScheduler`). +* **Scheduler (`celery/beat.py:Scheduler` / `PersistentScheduler`):** This is the core logic. + * `Scheduler` is the base class. Its `tick()` method calculates the time until the next event, finds due tasks, calls `apply_entry` for due tasks, and returns the sleep interval. + * `PersistentScheduler` inherits from `Scheduler` and adds the logic to load/save the schedule state (last run times) using `shelve` (the `celerybeat-schedule` file). It overrides methods like `setup_schedule`, `sync`, `close`, and `schedule` property to interact with the `shelve` store (`self._store`). +* **Schedule Types (`celery/schedules.py`):** Defines classes like `schedule` (for `timedelta` intervals) and `crontab`. These classes implement the `is_due(last_run_at)` method, which the `Scheduler.tick()` method uses to determine if a task entry should run. + +A simplified conceptual look at the `beat_schedule` config structure: + +```python +# Example structure from celeryconfig.py + +beat_schedule = { + 'schedule-name-1': { # Unique name for this entry + 'task': 'my_app.tasks.task1', # Task to run (module.task_name) + 'schedule': 30.0, # When to run (e.g., seconds, timedelta, crontab) + 'args': (arg1, arg2), # Optional: Positional arguments + 'kwargs': {'key': 'value'}, # Optional: Keyword arguments + 'options': {'queue': 'hipri'},# Optional: Execution options + }, + 'schedule-name-2': { + 'task': 'my_app.tasks.task2', + 'schedule': crontab(minute=0, hour=0), # e.g., Run at midnight + # ... other options ... + }, +} +``` + +And a very simplified concept of the `Scheduler.tick()` method: + +```python +# Simplified conceptual logic of Scheduler.tick() + +def tick(self): + remaining_times = [] + due_tasks = [] + + # 1. Iterate through schedule entries + for entry in self.schedule.values(): # self.schedule reads from PersistentScheduler._store['entries'] + # 2. Check if entry is due using its schedule object (e.g., crontab) + is_due, next_time_to_run = entry.is_due() # Calls schedule.is_due(entry.last_run_at) + + if is_due: + due_tasks.append(entry) + else: + remaining_times.append(next_time_to_run) # Store time until next check + + # 3. Apply due tasks (send message to broker) + for entry in due_tasks: + self.apply_entry(entry) # Sends task message and updates entry's last_run_at in schedule store + + # 4. Calculate minimum sleep time until next event + return min(remaining_times + [self.max_interval]) +``` + +## Conclusion + +Celery Beat is your tool for automating task execution within the Celery ecosystem. + +* It acts as a **scheduler**, like an alarm clock or `cron` for Celery tasks. +* It runs as a **separate process** (`celery beat`). +* You define the schedule using the `beat_schedule` setting in your configuration, specifying **what** tasks run, **when** (using intervals or crontabs), and with what **arguments**. +* Beat **sends task messages** to the broker at the scheduled times. +* Running **Workers** are still required to pick up and execute these tasks. + +Beat allows you to reliably automate recurring background jobs, from simple periodic checks to complex, time-specific operations. + +Now that we know how to run individual tasks, get their results, and schedule them automatically, what if we want to create more complex workflows involving multiple tasks that depend on each other? That's where Celery's Canvas comes in. + +**Next:** [Chapter 8: Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/08_canvas__signatures___primitives_.md b/output/Celery/08_canvas__signatures___primitives_.md new file mode 100644 index 0000000..033435d --- /dev/null +++ b/output/Celery/08_canvas__signatures___primitives_.md @@ -0,0 +1,343 @@ +# Chapter 8: Canvas (Signatures & Primitives) - Building Task Workflows + +In the previous chapter, [Chapter 7: Beat (Scheduler)](07_beat__scheduler_.md), we learned how to schedule tasks to run automatically at specific times using Celery Beat. This is great for recurring jobs. But what if you need to run a sequence of tasks, where one task depends on the result of another? Or run multiple tasks in parallel and then collect their results? + +Imagine you're building a feature where a user uploads an article, and you need to: +1. Fetch the article content from a URL. +2. Process the text to extract keywords. +3. Process the text to detect the language. +4. Once *both* processing steps are done, save the article and the extracted metadata to your database. + +Simply running these tasks independently won't work. Keyword extraction and language detection can happen at the same time, but only *after* the content is fetched. Saving can only happen *after* both processing steps are complete. How do you orchestrate this multi-step workflow? + +This is where **Celery Canvas** comes in. It provides the building blocks to design complex task workflows. + +## What Problem Does Canvas Solve? + +Canvas helps you connect individual [Task](03_task.md)s together to form more sophisticated processes. It solves the problem of defining dependencies and flow control between tasks. Instead of just firing off tasks one by one and hoping they complete in the right order or manually checking results, Canvas lets you declare the desired workflow structure directly. + +Think of it like having different types of Lego bricks: +* Some bricks represent a single task. +* Other bricks let you connect tasks end-to-end (run in sequence). +* Some let you stack bricks side-by-side (run in parallel). +* Others let you build a structure where several parallel steps must finish before the next piece is added. + +Canvas gives you these connecting bricks for your Celery tasks. + +## Key Concepts: Signatures and Primitives + +The core ideas in Canvas are **Signatures** and **Workflow Primitives**. + +1. **Signature (`signature` or `.s()`): The Basic Building Block** + * A `Signature` wraps up everything needed to call a single task: the task's name, the arguments (`args`), the keyword arguments (`kwargs`), and any execution options (like `countdown`, `eta`, queue name). + * Think of it as a **pre-filled request form** or a **recipe card** for a specific task execution. It doesn't *run* the task immediately; it just holds the plan for running it. + * The easiest way to create a signature is using the `.s()` shortcut on a task function. + + ```python + # tasks.py + from celery_app import app # Assuming app is defined in celery_app.py + + @app.task + def add(x, y): + return x + y + + # Create a signature for add(2, 3) + add_sig = add.s(2, 3) + + # add_sig now holds the 'plan' to run add(2, 3) + print(f"Signature: {add_sig}") + print(f"Task name: {add_sig.task}") + print(f"Arguments: {add_sig.args}") + + # To actually run it, you call .delay() or .apply_async() ON the signature + # result_promise = add_sig.delay() + ``` + + **Output:** + ```text + Signature: tasks.add(2, 3) + Task name: tasks.add + Arguments: (2, 3) + ``` + +2. **Primitives: Connecting the Blocks** + Canvas provides several functions (primitives) to combine signatures into workflows: + + * **`chain`:** Links tasks sequentially. The result of the first task is passed as the first argument to the second task, and so on. + * Analogy: An assembly line where each station passes its output to the next. + * Syntax: `(sig1 | sig2 | sig3)` or `chain(sig1, sig2, sig3)` + + * **`group`:** Runs a list of tasks in parallel. It returns a special result object that helps track the group. + * Analogy: Hiring several workers to do similar jobs independently at the same time. + * Syntax: `group(sig1, sig2, sig3)` + + * **`chord`:** Runs a group of tasks in parallel (the "header"), and *then*, once *all* tasks in the group have finished successfully, it runs a single callback task (the "body") with the results of the header tasks. + * Analogy: A team of researchers works on different parts of a project in parallel. Once everyone is done, a lead researcher collects all the findings to write the final report. + * Syntax: `chord(group(header_sigs), body_sig)` + +There are other primitives like `chunks`, `xmap`, and `starmap`, but `chain`, `group`, and `chord` are the most fundamental ones for building workflows. + +## How to Use Canvas: Building the Article Processing Workflow + +Let's build the workflow we described earlier: Fetch -> (Process Keywords & Detect Language in parallel) -> Save. + +**1. Define the Tasks** + +First, we need our basic tasks. Let's create dummy versions in `tasks.py`: + +```python +# tasks.py +from celery_app import app +import time +import random + +@app.task +def fetch_data(url): + print(f"Fetching data from {url}...") + time.sleep(1) + # Simulate fetching some data + data = f"Content from {url} - {random.randint(1, 100)}" + print(f"Fetched: {data}") + return data + +@app.task +def process_part_a(data): + print(f"Processing Part A for: {data}") + time.sleep(2) + result_a = f"Keywords for '{data}'" + print("Part A finished.") + return result_a + +@app.task +def process_part_b(data): + print(f"Processing Part B for: {data}") + time.sleep(3) # Simulate slightly longer processing + result_b = f"Language for '{data}'" + print("Part B finished.") + return result_b + +@app.task +def combine_results(results): + # 'results' will be a list containing the return values + # of process_part_a and process_part_b + print(f"Combining results: {results}") + time.sleep(1) + final_output = f"Combined: {results[0]} | {results[1]}" + print(f"Final Output: {final_output}") + return final_output +``` + +**2. Define the Workflow Using Canvas** + +Now, in a separate script or Python shell, let's define the workflow using signatures and primitives. + +```python +# run_workflow.py +from celery import chain, group, chord +from tasks import fetch_data, process_part_a, process_part_b, combine_results + +# The URL we want to process +article_url = "http://example.com/article1" + +# Create the workflow structure +# 1. Fetch data. The result (data) is passed to the next step. +# 2. The next step is a chord: +# - Header: A group running process_part_a and process_part_b in parallel. +# Both tasks receive the 'data' from fetch_data. +# - Body: combine_results receives a list of results from the group. + +workflow = chain( + fetch_data.s(article_url), # Step 1: Fetch + chord( # Step 2: Chord + group(process_part_a.s(), process_part_b.s()), # Header: Parallel processing + combine_results.s() # Body: Combine results + ) +) + +print(f"Workflow definition:\n{workflow}") + +# Start the workflow +print("\nSending workflow to Celery...") +result_promise = workflow.apply_async() + +print(f"Workflow sent! Final result ID: {result_promise.id}") +print("Run a Celery worker to execute the tasks.") +# You can optionally wait for the final result: +# final_result = result_promise.get() +# print(f"\nWorkflow finished! Final result: {final_result}") +``` + +**Explanation:** + +* We import `chain`, `group`, `chord` from `celery`. +* We import our task functions. +* `fetch_data.s(article_url)`: Creates a signature for the first step. +* `process_part_a.s()` and `process_part_b.s()`: Create signatures for the parallel tasks. Note that we *don't* provide the `data` argument here. `chain` automatically passes the result of `fetch_data` to the *next* task in the sequence. Since the next task is a `chord` containing a `group`, Celery cleverly passes the `data` to *each* task within that group. +* `combine_results.s()`: Creates the signature for the final step (the chord's body). It doesn't need arguments initially because the `chord` will automatically pass the list of results from the header group to it. +* `chain(...)`: Connects `fetch_data` to the `chord`. +* `chord(group(...), ...)`: Defines that the group must finish before `combine_results` is called. +* `group(...)`: Defines that `process_part_a` and `process_part_b` run in parallel. +* `workflow.apply_async()`: This sends the *first* task (`fetch_data`) to the broker. The rest of the workflow is encoded in the task's options (like `link` or `chord` information) so that Celery knows what to do next after each step completes. + +If you run this script (and have a [Worker](05_worker.md) running), you'll see the tasks execute in the worker logs, respecting the defined dependencies and parallelism. `fetch_data` runs first, then `process_part_a` and `process_part_b` run concurrently, and finally `combine_results` runs after both A and B are done. + +## How It Works Internally (Simplified Walkthrough) + +Let's trace a simpler workflow: `my_chain = (add.s(2, 2) | add.s(4))` + +1. **Workflow Definition:** When you create `my_chain`, Celery creates a `chain` object containing the signatures `add.s(2, 2)` and `add.s(4)`. +2. **Sending (`my_chain.apply_async()`):** + * Celery looks at the first task in the chain: `add.s(2, 2)`. + * It prepares to send this task message to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md). + * Crucially, it adds a special option to the message, often called `link` (or uses the `chain` field in newer protocols). This option contains the *signature* of the next task in the chain: `add.s(4)`. + * The message for `add(2, 2)` (with the link to `add(4)`) is sent to the broker. +3. **Worker 1 Executes First Task:** + * A [Worker](05_worker.md) picks up the message for `add(2, 2)`. + * It runs the `add` function with arguments `(2, 2)`. The result is `4`. + * The worker stores the result `4` in the [Result Backend](06_result_backend.md) (if configured). + * The worker notices the `link` option in the original message, pointing to `add.s(4)`. +4. **Worker 1 Sends Second Task:** + * The worker takes the result of the first task (`4`). + * It uses the linked signature `add.s(4)`. + * It *prepends* the result (`4`) to the arguments of the linked signature, making it effectively `add.s(4, 4)`. *(Note: The original `4` in `add.s(4)` came from the chain definition, the first `4` is the result)*. + * It sends a *new* message to the broker for `add(4, 4)`. +5. **Worker 2 Executes Second Task:** + * Another (or the same) worker picks up the message for `add(4, 4)`. + * It runs `add(4, 4)`. The result is `8`. + * It stores the result `8` in the backend. + * There are no more links, so the chain is complete. + +`group` works by sending all task messages in the group concurrently. `chord` is more complex; it involves the workers coordinating via the [Result Backend](06_result_backend.md) to count completed tasks in the header before the callback task is finally sent. + +```mermaid +sequenceDiagram + participant Client as Your Code + participant Canvas as workflow = chain(...) + participant Broker as Message Broker + participant Worker as Celery Worker + + Client->>Canvas: workflow.apply_async() + Note over Canvas: Prepare msg for add(2, 2) with link=add.s(4) + Canvas->>Broker: Send Task 1 msg ('add', (2, 2), link=add.s(4), id=T1) + Broker-->>Canvas: Ack + Canvas-->>Client: Return AsyncResult(id=T2) # ID of the *last* task in chain + + Worker->>Broker: Fetch msg (T1) + Broker-->>Worker: Deliver Task 1 msg + Worker->>Worker: Execute add(2, 2) -> returns 4 + Note over Worker: Store result 4 for T1 in Backend + Worker->>Worker: Check 'link' option -> add.s(4) + Note over Worker: Prepare msg for add(4, 4) using result 4 + linked args + Worker->>Broker: Send Task 2 msg ('add', (4, 4), id=T2) + Broker-->>Worker: Ack + Worker->>Broker: Ack Task 1 msg complete + + Worker->>Broker: Fetch msg (T2) + Broker-->>Worker: Deliver Task 2 msg + Worker->>Worker: Execute add(4, 4) -> returns 8 + Note over Worker: Store result 8 for T2 in Backend + Worker->>Broker: Ack Task 2 msg complete +``` + +## Code Dive: Canvas Implementation + +The logic for signatures and primitives resides primarily in `celery/canvas.py`. + +* **`Signature` Class:** + * Defined in `celery/canvas.py`. It's essentially a dictionary subclass holding `task`, `args`, `kwargs`, `options`, etc. + * The `.s()` method on a `Task` instance (in `celery/app/task.py`) is a shortcut to create a `Signature`. + * `apply_async`: Prepares arguments/options by calling `_merge` and then delegates to `self.type.apply_async` (the task's method) or `app.send_task`. + * `link`, `link_error`: Methods that modify the `options` dictionary to add callbacks. + * `__or__`: The pipe operator (`|`) overload. It checks the type of the right-hand operand (`other`) and constructs a `_chain` object accordingly. + + ```python + # Simplified from celery/canvas.py + class Signature(dict): + # ... methods like __init__, clone, set, apply_async ... + + def link(self, callback): + # Appends callback signature to the 'link' list in options + return self.append_to_list_option('link', callback) + + def link_error(self, errback): + # Appends errback signature to the 'link_error' list in options + return self.append_to_list_option('link_error', errback) + + def __or__(self, other): + # Called when you use the pipe '|' operator + if isinstance(other, Signature): + # task | task -> chain + return _chain(self, other, app=self._app) + # ... other cases for group, chain ... + return NotImplemented + ``` + +* **`_chain` Class:** + * Also in `celery/canvas.py`, inherits from `Signature`. Its `task` name is hardcoded to `'celery.chain'`. The actual task signatures are stored in `kwargs['tasks']`. + * `apply_async` / `run`: Contains the logic to handle sending the first task with the rest of the chain embedded in the options (either via `link` for protocol 1 or the `chain` message property for protocol 2). + * `prepare_steps`: This complex method recursively unwraps nested primitives (like a chain within a chain, or a group that needs to become a chord) and sets up the linking between steps. + + ```python + # Simplified concept from celery/canvas.py (chain execution) + class _chain(Signature): + # ... __init__, __or__ ... + + def apply_async(self, args=None, kwargs=None, **options): + # ... handle always_eager ... + return self.run(args, kwargs, app=self.app, **options) + + def run(self, args=None, kwargs=None, app=None, **options): + # ... setup ... + tasks, results = self.prepare_steps(...) # Unroll and freeze tasks + + if results: # If there are tasks to run + first_task = tasks.pop() # Get the first task (list is reversed) + remaining_chain = tasks if tasks else None + + # Determine how to pass the chain info (link vs. message field) + use_link = self._use_link # ... logic to decide ... + + if use_link: + # Protocol 1: Link first task to the second task + if remaining_chain: + first_task.link(remaining_chain.pop()) + # (Worker handles subsequent links) + options_to_apply = options # Pass original options + else: + # Protocol 2: Embed the rest of the reversed chain in options + options_to_apply = ChainMap({'chain': remaining_chain}, options) + + # Send the *first* task only + result_from_apply = first_task.apply_async(**options_to_apply) + # Return AsyncResult of the *last* task in the original chain + return results[0] + ``` + +* **`group` Class:** + * In `celery/canvas.py`. Its `task` name is `'celery.group'`. + * `apply_async`: Iterates through its `tasks`, freezes each one (assigning a common `group_id`), sends their messages, and collects the `AsyncResult` objects into a `GroupResult`. It uses a `barrier` (from the `vine` library) to track completion. +* **`chord` Class:** + * In `celery/canvas.py`. Its `task` name is `'celery.chord'`. + * `apply_async` / `run`: Coordinates with the result backend (`backend.apply_chord`). It typically runs the header `group` first, configuring it to notify the backend upon completion. The backend then triggers the `body` task once the count is reached. + +## Conclusion + +Celery Canvas transforms simple tasks into powerful workflow components. + +* A **Signature** (`task.s()`) captures the details for a single task call without running it. +* Primitives like **`chain`** (`|`), **`group`**, and **`chord`** combine signatures to define complex execution flows: + * `chain`: Sequence (output of one to input of next). + * `group`: Parallel execution. + * `chord`: Parallel execution followed by a callback with all results. +* You compose these primitives like building with Lego bricks to model your application's logic. +* Calling `.apply_async()` on a workflow primitive starts the process by sending the first task(s), embedding the rest of the workflow logic in the task options or using backend coordination. + +Canvas allows you to move complex orchestration logic out of your application code and into Celery, making your tasks more modular and your overall system more robust. + +Now that you can build and run complex workflows, how do you monitor what's happening inside Celery? How do you know when tasks start, finish, or fail in real-time? + +**Next:** [Chapter 9: Events](09_events.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/09_events.md b/output/Celery/09_events.md new file mode 100644 index 0000000..df1c6fd --- /dev/null +++ b/output/Celery/09_events.md @@ -0,0 +1,310 @@ +# Chapter 9: Events - Listening to Celery's Heartbeat + +In [Chapter 8: Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md), we saw how to build complex workflows by chaining tasks together or running them in parallel. But as your Celery system gets busier, you might wonder: "What are my workers doing *right now*? Which tasks have started? Which ones finished successfully or failed?" + +Imagine you're running an important data processing job involving many tasks. Wouldn't it be great to have a live dashboard showing the progress, or get immediate notifications if something goes wrong? This is where **Celery Events** come in. + +## What Problem Do Events Solve? + +Celery Events provide a **real-time monitoring system** for your tasks and workers. Think of it like a live activity log or a notification system built into Celery. + +Without events, finding out what happened requires checking logs or querying the [Result Backend](06_result_backend.md) for each task individually. This isn't ideal for getting a live overview of the entire cluster. + +Events solve this by having workers broadcast messages (events) about important actions they take, such as: +* A worker coming online or going offline. +* A worker receiving a task. +* A worker starting to execute a task. +* A task succeeding or failing. +* A worker sending out a heartbeat signal. + +Other programs can then listen to this stream of event messages to monitor the health and activity of the Celery cluster in real-time, build dashboards (like the popular tool Flower), or trigger custom alerts. + +## Key Concepts + +1. **Events:** Special messages sent by workers (and sometimes clients) describing an action. Each event has a `type` (e.g., `task-received`, `worker-online`) and contains details relevant to that action (like the task ID, worker hostname, timestamp). +2. **Event Exchange:** Events aren't sent to the regular task queues. They are published to a dedicated, named exchange on the [Broker Connection (AMQP)](04_broker_connection__amqp_.md). Think of it as a separate broadcast channel just for monitoring messages. +3. **Event Sender (`EventDispatcher`):** A component within the [Worker](05_worker.md) responsible for creating and sending event messages to the broker's event exchange. This is usually disabled by default for performance reasons. +4. **Event Listener (`EventReceiver`):** Any program that connects to the event exchange on the broker and consumes the stream of event messages. This could be the `celery events` command-line tool, Flower, or your own custom monitoring script. +5. **Event Types:** Celery defines many event types. Some common ones include: + * `worker-online`, `worker-offline`, `worker-heartbeat`: Worker status updates. + * `task-sent`: Client sent a task request (requires `task_send_sent_event` setting). + * `task-received`: Worker received the task message. + * `task-started`: Worker started executing the task code. + * `task-succeeded`: Task finished successfully. + * `task-failed`: Task failed with an error. + * `task-retried`: Task is being retried. + * `task-revoked`: Task was cancelled/revoked. + +## How to Use Events: Simple Monitoring + +Let's see how to enable events and watch the live stream using Celery's built-in tool. + +**1. Enable Events in the Worker** + +By default, workers don't send events to save resources. You need to explicitly tell them to start sending. You can do this in two main ways: + +* **Command-line flag (`-E`):** When starting your worker, add the `-E` flag. + + ```bash + # Start a worker AND enable sending events + celery -A celery_app worker --loglevel=info -E + ``` + +* **Configuration Setting:** Set `worker_send_task_events = True` in your Celery configuration ([Chapter 2: Configuration](02_configuration.md)). This is useful if you always want events enabled for workers using that configuration. You can also enable worker-specific events (`worker-online`, `worker-heartbeat`) with `worker_send_worker_events = True` (which defaults to True). + + ```python + # celeryconfig.py (example) + broker_url = 'redis://localhost:6379/0' + result_backend = 'redis://localhost:6379/1' + imports = ('tasks',) + + # Enable sending task-related events + task_send_sent_event = False # Optional: If you want task-sent events too + worker_send_task_events = True + worker_send_worker_events = True # Usually True by default + ``` + +Now, any worker started with this configuration (or the `-E` flag) will publish events to the broker. + +**2. Watch the Event Stream** + +Celery provides a command-line tool called `celery events` that acts as a simple event listener and prints the events it receives to your console. + +Open **another terminal** (while your worker with events enabled is running) and run: + +```bash +# Watch for events associated with your app +celery -A celery_app events +``` + +Alternatively, you can use the more descriptive (but older) command `celery control enable_events` to tell already running workers to start sending events, and `celery control disable_events` to stop them. + +**What You'll See:** + +Initially, `celery events` might show nothing. Now, try sending a task from another script or shell (like the `run_tasks.py` from [Chapter 3: Task](03_task.md)): + +```python +# In a third terminal/shell +from tasks import add +result = add.delay(5, 10) +print(f"Sent task {result.id}") +``` + +Switch back to the terminal running `celery events`. You should see output similar to this (details and timestamps will vary): + +```text +-> celery events v5.x.x +-> connected to redis://localhost:6379/0 + +-------------- task-received celery@myhostname [2023-10-27 12:00:01.100] + uuid:a1b2c3d4-e5f6-7890-1234-567890abcdef + name:tasks.add + args:[5, 10] + kwargs:{} + retries:0 + eta:null + hostname:celery@myhostname + timestamp:1666872001.1 + pid:12345 + ... + +-------------- task-started celery@myhostname [2023-10-27 12:00:01.150] + uuid:a1b2c3d4-e5f6-7890-1234-567890abcdef + hostname:celery@myhostname + timestamp:1666872001.15 + pid:12345 + ... + +-------------- task-succeeded celery@myhostname [2023-10-27 12:00:04.200] + uuid:a1b2c3d4-e5f6-7890-1234-567890abcdef + result:'15' + runtime:3.05 + hostname:celery@myhostname + timestamp:1666872004.2 + pid:12345 + ... +``` + +**Explanation:** + +* `celery events` connects to the broker defined in `celery_app`. +* It listens for messages on the event exchange. +* As the worker processes the `add(5, 10)` task, it sends `task-received`, `task-started`, and `task-succeeded` events. +* `celery events` receives these messages and prints their details. + +This gives you a raw, real-time feed of what's happening in your Celery cluster! + +**Flower: A Visual Monitor** + +While `celery events` is useful, it's quite basic. A very popular tool called **Flower** uses the same event stream to provide a web-based dashboard for monitoring your Celery cluster. It shows running tasks, completed tasks, worker status, task details, and more, all updated in real-time thanks to Celery Events. You can typically install it (`pip install flower`) and run it (`celery -A celery_app flower`). + +## How It Works Internally (Simplified) + +1. **Worker Action:** A worker performs an action (e.g., starts executing task `T1`). +2. **Event Dispatch:** If events are enabled, the worker's internal `EventDispatcher` component is notified. +3. **Create Event Message:** The `EventDispatcher` creates a dictionary representing the event (e.g., `{'type': 'task-started', 'uuid': 'T1', 'hostname': 'worker1', ...}`). +4. **Publish to Broker:** The `EventDispatcher` uses its connection to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md) to publish this event message to a specific **event exchange** (usually named `celeryev`). It uses a routing key based on the event type (e.g., `task.started`). +5. **Listener Connects:** A monitoring tool (like `celery events` or Flower) starts up. It creates an `EventReceiver`. +6. **Declare Queue:** The `EventReceiver` connects to the same broker and declares a temporary, unique queue bound to the event exchange (`celeryev`), often configured to receive all event types (`#` routing key). +7. **Consume Events:** The `EventReceiver` starts consuming messages from its dedicated queue. +8. **Process Event:** When an event message (like the `task-started` message for `T1`) arrives from the broker, the `EventReceiver` decodes it and passes it to a handler (e.g., `celery events` prints it, Flower updates its web UI). + +```mermaid +sequenceDiagram + participant Worker + participant Dispatcher as EventDispatcher (in Worker) + participant Broker as Message Broker + participant Receiver as EventReceiver (e.g., celery events tool) + participant Display as Console/UI + + Worker->>Worker: Starts executing Task T1 + Worker->>Dispatcher: Notify: Task T1 started + Dispatcher->>Dispatcher: Create event message {'type': 'task-started', ...} + Dispatcher->>Broker: Publish event msg to 'celeryev' exchange (routing_key='task.started') + Broker-->>Dispatcher: Ack (Message Sent) + + Receiver->>Broker: Connect and declare unique queue bound to 'celeryev' exchange + Broker-->>Receiver: Queue ready + + Broker->>Receiver: Deliver event message {'type': 'task-started', ...} + Receiver->>Receiver: Decode message + Receiver->>Display: Process event (e.g., print to console) +``` + +## Code Dive: Sending and Receiving Events + +* **Enabling Events (`celery/worker/consumer/events.py`):** The `Events` bootstep in the worker process is responsible for initializing the `EventDispatcher`. The `-E` flag or configuration settings control whether this bootstep actually enables the dispatcher. + + ```python + # Simplified from worker/consumer/events.py + class Events(bootsteps.StartStopStep): + requires = (Connection,) + + def __init__(self, c, task_events=True, # Controlled by config/flags + # ... other flags ... + **kwargs): + self.send_events = task_events # or other flags + self.enabled = self.send_events + # ... + super().__init__(c, **kwargs) + + def start(self, c): + # ... gets connection ... + # Creates the actual dispatcher instance + dis = c.event_dispatcher = c.app.events.Dispatcher( + c.connection_for_write(), + hostname=c.hostname, + enabled=self.send_events, # Only sends if enabled + # ... other options ... + ) + # ... flush buffer ... + ``` + +* **Sending Events (`celery/events/dispatcher.py`):** The `EventDispatcher` class has the `send` method, which creates the event dictionary and calls `publish`. + + ```python + # Simplified from events/dispatcher.py + class EventDispatcher: + # ... __init__ setup ... + + def send(self, type, blind=False, ..., **fields): + if self.enabled: + groups, group = self.groups, group_from(type) + if groups and group not in groups: + return # Don't send if this group isn't enabled + + # ... potential buffering logic (omitted) ... + + # Call publish to actually send + return self.publish(type, fields, self.producer, blind=blind, + Event=Event, ...) + + def publish(self, type, fields, producer, blind=False, Event=Event, **kwargs): + # Create the event dictionary + clock = None if blind else self.clock.forward() + event = Event(type, hostname=self.hostname, utcoffset=utcoffset(), + pid=self.pid, clock=clock, **fields) + + # Publish using the underlying Kombu producer + with self.mutex: + return self._publish(event, producer, + routing_key=type.replace('-', '.'), **kwargs) + + def _publish(self, event, producer, routing_key, **kwargs): + exchange = self.exchange # The dedicated event exchange + try: + # Kombu's publish method sends the message + producer.publish( + event, # The dictionary payload + routing_key=routing_key, + exchange=exchange.name, + declare=[exchange], # Ensure exchange exists + serializer=self.serializer, # e.g., 'json' + headers=self.headers, + delivery_mode=self.delivery_mode, # e.g., transient + **kwargs + ) + except Exception as exc: + # ... error handling / buffering ... + raise + ``` + +* **Receiving Events (`celery/events/receiver.py`):** The `EventReceiver` class (used by tools like `celery events`) sets up a consumer to listen for messages on the event exchange. + + ```python + # Simplified from events/receiver.py + class EventReceiver(ConsumerMixin): # Uses Kombu's ConsumerMixin + + def __init__(self, channel, handlers=None, routing_key='#', ...): + # ... setup app, channel, handlers ... + self.exchange = get_exchange(..., name=self.app.conf.event_exchange) + self.queue = Queue( # Create a unique, auto-deleting queue + '.'.join([self.queue_prefix, self.node_id]), + exchange=self.exchange, + routing_key=routing_key, # Often '#' to get all events + auto_delete=True, durable=False, + # ... other queue options ... + ) + # ... + + def get_consumers(self, Consumer, channel): + # Tell ConsumerMixin to consume from our event queue + return [Consumer(queues=[self.queue], + callbacks=[self._receive], # Method to call on message + no_ack=True, # Events usually don't need explicit ack + accept=self.accept)] + + # This method is registered as the callback for new messages + def _receive(self, body, message): + # Decode message body (can be single event or list in newer Celery) + if isinstance(body, list): + process, from_message = self.process, self.event_from_message + [process(*from_message(event)) for event in body] + else: + self.process(*self.event_from_message(body)) + + # process() calls the appropriate handler from self.handlers + def process(self, type, event): + """Process event by dispatching to configured handler.""" + handler = self.handlers.get(type) or self.handlers.get('*') + handler and handler(event) # Call the handler function + ``` + +## Conclusion + +Celery Events provide a powerful mechanism for **real-time monitoring** of your distributed task system. + +* Workers (when enabled via `-E` or configuration) send **event messages** describing their actions (like task start/finish, worker online). +* These messages go to a dedicated **event exchange** on the broker. +* Tools like `celery events` or Flower act as **listeners** (`EventReceiver`), consuming this stream to provide insights into the cluster's activity. +* Events are the foundation for building dashboards, custom monitoring, and diagnostic tools. + +Understanding events helps you observe and manage your Celery application more effectively. + +So far, we've explored the major components and concepts of Celery. But how does a worker actually start up? How does it initialize all these different parts like the connection, the consumer, the event dispatcher, and the execution pool in the right order? That's orchestrated by a system called Bootsteps. + +**Next:** [Chapter 10: Bootsteps](10_bootsteps.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/10_bootsteps.md b/output/Celery/10_bootsteps.md new file mode 100644 index 0000000..b67fea1 --- /dev/null +++ b/output/Celery/10_bootsteps.md @@ -0,0 +1,227 @@ +# Chapter 10: Bootsteps - How Celery Workers Start Up + +In [Chapter 9: Events](09_events.md), we learned how to monitor the real-time activity within our Celery system. We've now covered most of the key parts of Celery: the [Celery App](01_celery_app.md), [Task](03_task.md)s, the [Broker Connection (AMQP)](04_broker_connection__amqp_.md), the [Worker](05_worker.md), the [Result Backend](06_result_backend.md), [Beat (Scheduler)](07_beat__scheduler_.md), [Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md), and [Events](09_events.md). + +But have you ever wondered how the Celery worker manages to get all these different parts working together when you start it? When you run `celery worker`, it needs to connect to the broker, set up the execution pool, start listening for tasks, maybe start the event dispatcher, and possibly even start an embedded Beat scheduler. How does it ensure all these things happen in the correct order? That's where **Bootsteps** come in. + +## What Problem Do Bootsteps Solve? + +Imagine you're assembling a complex piece of furniture. You have many parts and screws, and the instructions list a specific sequence of steps. You can't attach the tabletop before you've built the legs! Similarly, a Celery worker has many internal components that need to be initialized and started in a precise order. + +For example, the worker needs to: +1. Establish a connection to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md). +2. *Then*, start the consumer logic that uses this connection to fetch tasks. +3. Set up the execution pool (like prefork or eventlet) that will actually run the tasks. +4. Start optional components like the [Events](09_events.md) dispatcher or the embedded [Beat (Scheduler)](07_beat__scheduler_.md). + +If these steps happen out of order (e.g., trying to fetch tasks before connecting to the broker), the worker will fail. + +**Bootsteps** provide a framework within Celery to define this startup (and shutdown) sequence. It's like the assembly instructions or a detailed checklist for the worker. Each major component or initialization phase is defined as a "step," and steps can declare dependencies on each other (e.g., "Step B requires Step A to be finished"). Celery uses this information to automatically figure out the correct order to start everything up and, just as importantly, the correct reverse order to shut everything down cleanly. + +This makes the worker's internal structure more organized, modular, and easier for Celery developers to extend with new features. As a user, you generally don't write bootsteps yourself, but understanding the concept helps demystify the worker's startup process. + +## Key Concepts + +1. **Step (`Step`):** A single, distinct part of the worker's startup or shutdown logic. Think of it as one instruction in the assembly manual. Examples include initializing the broker connection, starting the execution pool, or starting the component that listens for task messages (the consumer). +2. **Blueprint (`Blueprint`):** A collection of related steps that manage a larger component. For instance, the main "Consumer" component within the worker has its own blueprint defining steps for connection, event handling, task fetching, etc. +3. **Dependencies (`requires`):** A step can declare that it needs other steps to be completed first. For example, the step that starts fetching tasks (`Tasks`) *requires* the step that establishes the broker connection (`Connection`). +4. **Order:** Celery analyzes the `requires` declarations of all steps within a blueprint (and potentially across blueprints) to build a dependency graph. It then sorts this graph to determine the exact order in which steps must be started. Shutdown usually happens in the reverse order. + +## How It Works: The Worker Startup Sequence + +You don't typically interact with bootsteps directly, but you see their effect every time you start a worker. + +When you run: +`celery -A your_app worker --loglevel=info` + +Celery initiates the **Worker Controller** (`WorkController`). This controller uses the Bootstep framework, specifically a main **Blueprint**, to manage its initialization. + +Here's a simplified idea of what happens under the hood, orchestrated by Bootsteps: + +1. **Load Blueprint:** The `WorkController` loads its main blueprint, which includes steps for core functionalities. +2. **Build Graph:** Celery looks at all the steps defined in the blueprint (e.g., `Connection`, `Pool`, `Consumer`, `Timer`, `Events`, potentially `Beat`) and their `requires` attributes. It builds a dependency graph. +3. **Determine Order:** It calculates the correct startup order from the graph (a "topological sort"). For example, it determines that `Connection` must start before `Consumer`, and `Pool` must start before `Consumer` can start dispatching tasks to it. +4. **Execute Steps:** The `WorkController` iterates through the steps in the determined order and calls each step's `start` method. + * The `Connection` step establishes the link to the broker. + * The `Timer` step sets up internal timers. + * The `Pool` step initializes the execution pool (e.g., starts prefork child processes). + * The `Events` step starts the event dispatcher (if `-E` was used). + * The `Consumer` step (usually last) starts the main loop that fetches tasks from the broker and dispatches them to the pool. +5. **Worker Ready:** Once all essential bootsteps have successfully started, the worker prints the "ready" message and begins processing tasks. + +When you stop the worker (e.g., with Ctrl+C), a similar process happens in reverse using the steps' `stop` or `terminate` methods, ensuring connections are closed, pools are shut down, etc., in the correct order. + +## Internal Implementation Walkthrough + +Let's visualize the simplified startup flow managed by bootsteps: + +```mermaid +sequenceDiagram + participant CLI as `celery worker ...` + participant WorkerMain as Worker Main Process + participant Blueprint as Main Worker Blueprint + participant DepGraph as Dependency Graph Builder + participant Step1 as Connection Step + participant Step2 as Pool Step + participant Step3 as Consumer Step + + CLI->>WorkerMain: Start worker command + WorkerMain->>Blueprint: Load blueprint definition (steps & requires) + Blueprint->>DepGraph: Define steps and dependencies + DepGraph->>Blueprint: Return sorted startup order [Step1, Step2, Step3] + WorkerMain->>Blueprint: Iterate through sorted steps + Blueprint->>Step1: Call start() + Step1-->>Blueprint: Connection established + Blueprint->>Step2: Call start() + Step2-->>Blueprint: Pool initialized + Blueprint->>Step3: Call start() + Step3-->>Blueprint: Consumer loop started + Blueprint-->>WorkerMain: Startup complete + WorkerMain->>WorkerMain: Worker is Ready +``` + +The Bootstep framework relies on classes defined mainly in `celery/bootsteps.py`. + +## Code Dive: Anatomy of a Bootstep + +Bootsteps are defined as classes inheriting from `Step` or `StartStopStep`. + +* **Defining a Step:** A step class defines its logic and dependencies. + + ```python + # Simplified concept from celery/bootsteps.py + + # Base class for all steps + class Step: + # List of other Step classes needed before this one runs + requires = () + + def __init__(self, parent, **kwargs): + # Called when the blueprint is applied to the parent (e.g., Worker) + # Can be used to set initial attributes on the parent. + pass + + def create(self, parent): + # Create the service/component managed by this step. + # Often returns an object to be stored. + pass + + def include(self, parent): + # Logic to add this step to the parent's step list. + # Called after __init__. + if self.should_include(parent): + self.obj = self.create(parent) # Store created object if needed + parent.steps.append(self) + return True + return False + + # A common step type with start/stop/terminate methods + class StartStopStep(Step): + obj = None # Holds the object created by self.create + + def start(self, parent): + # Logic to start the component/service + if self.obj and hasattr(self.obj, 'start'): + self.obj.start() + + def stop(self, parent): + # Logic to stop the component/service gracefully + if self.obj and hasattr(self.obj, 'stop'): + self.obj.stop() + + def terminate(self, parent): + # Logic to force shutdown (if different from stop) + if self.obj: + term_func = getattr(self.obj, 'terminate', None) or getattr(self.obj, 'stop', None) + if term_func: + term_func() + + # include() method adds self to parent.steps if created + ``` + **Explanation:** + * `requires`: A tuple of other Step classes that must be fully started *before* this step's `start` method is called. This defines the dependencies. + * `__init__`, `create`, `include`: Methods involved in setting up the step and potentially creating the component it manages. + * `start`, `stop`, `terminate`: Methods called during the worker's lifecycle (startup, graceful shutdown, forced shutdown). + +* **Blueprint:** Manages a collection of steps. + + ```python + # Simplified concept from celery/bootsteps.py + from celery.utils.graph import DependencyGraph + + class Blueprint: + # Set of default step classes (or string names) included in this blueprint + default_steps = set() + + def __init__(self, steps=None, name=None, **kwargs): + self.name = name or self.__class__.__name__ + # Combine default steps with any provided steps + self.types = set(steps or []) | set(self.default_steps) + self.steps = {} # Will hold step instances + self.order = [] # Will hold sorted step instances + # ... other callbacks ... + + def apply(self, parent, **kwargs): + # 1. Load step classes from self.types + step_classes = self.claim_steps() # {name: StepClass, ...} + + # 2. Build the dependency graph + self.graph = DependencyGraph( + ((Cls, Cls.requires) for Cls in step_classes.values()), + # ... formatter options ... + ) + + # 3. Get the topologically sorted order + sorted_classes = self.graph.topsort() + + # 4. Instantiate and include each step + self.order = [] + for S in sorted_classes: + step = S(parent, **kwargs) # Call Step.__init__ + self.steps[step.name] = step + self.order.append(step) + for step in self.order: + step.include(parent) # Call Step.include -> Step.create + + return self + + def start(self, parent): + # Called by the parent (e.g., Worker) to start all steps + for step in self.order: # Use the sorted order + if hasattr(step, 'start'): + step.start(parent) + + def stop(self, parent): + # Called by the parent to stop all steps (in reverse order) + for step in reversed(self.order): + if hasattr(step, 'stop'): + step.stop(parent) + # ... other methods like close, terminate, restart ... + ``` + **Explanation:** + * `default_steps`: Defines the standard components managed by this blueprint. + * `apply`: The core method that takes the step definitions, builds the `DependencyGraph` based on `requires`, gets the sorted execution `order`, and then instantiates and includes each step. + * `start`/`stop`: Iterate through the calculated `order` (or its reverse) to start/stop the components managed by each step. + +* **Example Usage (Worker Components):** The worker's main components are defined as bootsteps in `celery/worker/components.py`. You can see classes like `Pool`, `Consumer`, `Timer`, `Beat`, each inheriting from `bootsteps.Step` or `bootsteps.StartStopStep` and potentially defining `requires`. The `Consumer` blueprint in `celery/worker/consumer/consumer.py` then lists many of these (`Connection`, `Events`, `Tasks`, etc.) in its `default_steps`. + +## Conclusion + +You've learned about Bootsteps, the underlying framework that brings order to the Celery worker's startup and shutdown procedures. + +* They act as an **assembly guide** or **checklist** for the worker. +* Each core function (connecting, starting pool, consuming tasks) is a **Step**. +* Steps declare **Dependencies** (`requires`) on each other. +* A **Blueprint** groups related steps. +* Celery uses a **Dependency Graph** to determine the correct **order** to start and stop steps. +* This ensures components like the [Broker Connection (AMQP)](04_broker_connection__amqp_.md), [Worker](05_worker.md) pool, and task consumer initialize and terminate predictably. + +While you typically don't write bootsteps as an end-user, understanding their role clarifies how the complex machinery of a Celery worker reliably comes to life and shuts down. + +--- + +This concludes our introductory tour of Celery's core concepts! We hope these chapters have given you a solid foundation for understanding how Celery works and how you can use it to build robust and scalable distributed applications. Happy tasking! + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Celery/index.md b/output/Celery/index.md new file mode 100644 index 0000000..44a0458 --- /dev/null +++ b/output/Celery/index.md @@ -0,0 +1,50 @@ +# Tutorial: Celery + +Celery is a system for running **distributed tasks** *asynchronously*. You define *units of work* (Tasks) in your Python code. When you want a task to run, you send a message using a **message broker** (like RabbitMQ or Redis). One or more **Worker** processes are running in the background, listening for these messages. When a worker receives a message, it executes the corresponding task. Optionally, the task's result (or any error) can be stored in a **Result Backend** (like Redis or a database) so you can check its status or retrieve the output later. Celery helps manage this whole process, making it easier to handle background jobs, scheduled tasks, and complex workflows. + + +**Source Repository:** [https://github.com/celery/celery/tree/d1c35bbdf014f13f4ab698d75e3ea381a017b090/celery](https://github.com/celery/celery/tree/d1c35bbdf014f13f4ab698d75e3ea381a017b090/celery) + +```mermaid +flowchart TD + A0["Celery App"] + A1["Task"] + A2["Worker"] + A3["Broker Connection (AMQP)"] + A4["Result Backend"] + A5["Canvas (Signatures & Primitives)"] + A6["Beat (Scheduler)"] + A7["Configuration"] + A8["Events"] + A9["Bootsteps"] + A0 -- "Defines and sends" --> A1 + A0 -- "Uses for messaging" --> A3 + A0 -- "Uses for results" --> A4 + A0 -- "Loads and uses" --> A7 + A1 -- "Updates state in" --> A4 + A2 -- "Executes" --> A1 + A2 -- "Fetches tasks from" --> A3 + A2 -- "Uses for lifecycle" --> A9 + A5 -- "Represents task invocation" --> A1 + A6 -- "Sends scheduled tasks via" --> A3 + A8 -- "Sends events via" --> A3 + A9 -- "Manages connection via" --> A3 +``` + +## Chapters + +1. [Celery App](01_celery_app.md) +2. [Configuration](02_configuration.md) +3. [Task](03_task.md) +4. [Broker Connection (AMQP)](04_broker_connection__amqp_.md) +5. [Worker](05_worker.md) +6. [Result Backend](06_result_backend.md) +7. [Beat (Scheduler)](07_beat__scheduler_.md) +8. [Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md) +9. [Events](09_events.md) +10. [Bootsteps](10_bootsteps.md) + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Click/01_command___group.md b/output/Click/01_command___group.md new file mode 100644 index 0000000..9f7529a --- /dev/null +++ b/output/Click/01_command___group.md @@ -0,0 +1,197 @@ +# Chapter 1: Commands and Groups: The Building Blocks + +Welcome to your first step in learning Click! Imagine you want to create your own command-line tool, maybe something like `git` or `docker`. How do you tell your program what to do when someone types `git commit` or `docker build`? That's where **Commands** and **Groups** come in. They are the fundamental building blocks for any Click application. + +Think about a simple tool. Maybe you want a program that can greet someone. You'd type `greet Alice` in your terminal, and it would print "Hello Alice!". In Click, this single action, "greet", would be represented by a `Command`. + +Now, what if your tool needed to do *more* than one thing? Maybe besides greeting, it could also say goodbye. You might want to type `mytool greet Alice` or `mytool goodbye Bob`. The main `mytool` part acts like a container or a menu, holding the different actions (`greet`, `goodbye`). This container is what Click calls a `Group`. + +So: + +* `Command`: Represents a single action your tool can perform. +* `Group`: Represents a collection of related actions (Commands or other Groups). + +Let's dive in and see how to create them! + +## Your First Command + +Creating a command in Click is surprisingly simple. You basically write a normal Python function and then "decorate" it to tell Click it's a command-line command. + +Let's make a command that just prints "Hello World!". + +```python +# hello_app.py +import click + +@click.command() +def hello(): + """A simple command that says Hello World""" + print("Hello World!") + +if __name__ == '__main__': + hello() +``` + +Let's break this down: + +1. `import click`: We need to import the Click library first. +2. `@click.command()`: This is the magic part! It's called a decorator. It transforms the Python function `hello()` right below it into a Click `Command` object. We'll learn more about [Decorators](02_decorators.md) in the next chapter, but for now, just know this line turns `hello` into something Click understands as a command. +3. `def hello(): ...`: This is a standard Python function. The code inside this function is what will run when you execute the command from your terminal. +4. `"""A simple command that says Hello World"""`: This is a docstring. Click cleverly uses the function's docstring as the help text for the command! +5. `if __name__ == '__main__': hello()`: This standard Python construct checks if the script is being run directly. If it is, it calls our `hello` command function (which is now actually a Click `Command` object). + +**Try running it!** Save the code above as `hello_app.py`. Open your terminal in the same directory and run: + +```bash +$ python hello_app.py +Hello World! +``` + +It works! You just created your first command-line command with Click. + +**Bonus: Automatic Help!** + +Click automatically generates help screens for you. Try running your command with `--help`: + +```bash +$ python hello_app.py --help +Usage: hello_app.py [OPTIONS] + + A simple command that says Hello World + +Options: + --help Show this message and exit. +``` + +See? Click used the docstring we wrote (`A simple command that says Hello World`) and added a standard `--help` option for free! + +## Grouping Commands + +Okay, one command is nice, but real tools often have multiple commands. Like `git` has `commit`, `pull`, `push`, etc. Let's say we want our tool to have two commands: `hello` and `goodbye`. + +We need a way to group these commands together. That's what `click.group()` is for. A `Group` acts as the main entry point and can have other commands attached to it. + +```python +# multi_app.py +import click + +# 1. Create the main group +@click.group() +def cli(): + """A simple tool with multiple commands.""" + pass # The group function itself doesn't need to do anything + +# 2. Define the 'hello' command +@click.command() +def hello(): + """Says Hello World""" + print("Hello World!") + +# 3. Define the 'goodbye' command +@click.command() +def goodbye(): + """Says Goodbye World""" + print("Goodbye World!") + +# 4. Attach the commands to the group +cli.add_command(hello) +cli.add_command(goodbye) + +if __name__ == '__main__': + cli() # Run the main group +``` + +What's changed? + +1. We created a function `cli` and decorated it with `@click.group()`. This makes `cli` our main entry point, a container for other commands. Notice the function body is just `pass` โ€“ often, the group function itself doesn't need logic; its job is to hold other commands. +2. We defined `hello` and `goodbye` just like before, using `@click.command()`. +3. Crucially, we *attached* our commands to the group: `cli.add_command(hello)` and `cli.add_command(goodbye)`. This tells Click that `hello` and `goodbye` are subcommands of `cli`. +4. Finally, in the `if __name__ == '__main__':` block, we run `cli()`, our main group. + +**Let's run this!** Save it as `multi_app.py`. + +First, check the main help screen: + +```bash +$ python multi_app.py --help +Usage: multi_app.py [OPTIONS] COMMAND [ARGS]... + + A simple tool with multiple commands. + +Options: + --help Show this message and exit. + +Commands: + goodbye Says Goodbye World + hello Says Hello World +``` + +Look! Click now lists `goodbye` and `hello` under "Commands". It automatically figured out their names from the function names (`goodbye`, `hello`) and their help text from their docstrings. + +Now, run the specific commands: + +```bash +$ python multi_app.py hello +Hello World! + +$ python multi_app.py goodbye +Goodbye World! +``` + +You've successfully created a multi-command CLI tool! + +*(Self-promotion: There's an even shorter way to attach commands using decorators directly on the group, which we'll see in [Decorators](02_decorators.md)!)* + +## How It Works Under the Hood + +What's really happening when you use `@click.command()` or `@click.group()`? + +1. **Decoration:** The decorator (`@click.command` or `@click.group`) takes your Python function (`hello`, `goodbye`, `cli`). It wraps this function inside a Click object โ€“ either a `Command` instance or a `Group` instance (which is actually a special type of `Command`). These objects store your original function as the `callback` to be executed later. They also store metadata like the command name (derived from the function name) and the help text (from the docstring). You can find the code for these decorators in `decorators.py` and the `Command`/`Group` classes in `core.py`. + +2. **Execution:** When you run `python multi_app.py hello`, Python executes the `cli()` call at the bottom. Since `cli` is a `Group` object created by Click, it knows how to parse the command-line arguments (`hello` in this case). + +3. **Parsing & Dispatch:** The `cli` group looks at the first argument (`hello`). It checks its list of registered subcommands (which we added using `cli.add_command`). It finds a match with the `hello` command object. + +4. **Callback:** The `cli` group then invokes the `hello` command object. The `hello` command object, in turn, calls the original Python function (`hello()`) that it stored earlier as its `callback`. + +Here's a simplified view of what happens when you run `python multi_app.py hello`: + +```mermaid +sequenceDiagram + participant User + participant Terminal + participant PythonScript (multi_app.py) + participant ClickRuntime + participant cli_Group as cli (Group Object) + participant hello_Command as hello (Command Object) + + User->>Terminal: python multi_app.py hello + Terminal->>PythonScript: Executes script with args ["hello"] + PythonScript->>ClickRuntime: Calls cli() entry point + ClickRuntime->>cli_Group: Asks to handle args ["hello"] + cli_Group->>cli_Group: Parses args, identifies "hello" as subcommand + cli_Group->>hello_Command: Invokes the 'hello' command + hello_Command->>hello_Command: Executes its callback (the original hello() function) + hello_Command-->>PythonScript: Prints "Hello World!" + PythonScript-->>Terminal: Shows output + Terminal-->>User: Displays "Hello World!" +``` + +This process of parsing arguments and calling the right function based on the command structure is the core job of Click, making it easy for *you* to just focus on writing the functions for each command. + +## Conclusion + +You've learned about the two most fundamental concepts in Click: + +* `Command`: Represents a single action, created by decorating a function with `@click.command()`. +* `Group`: Acts as a container for multiple commands (or other groups), created with `@click.group()`. Groups allow you to structure your CLI application logically. + +We saw how Click uses decorators to transform simple Python functions into powerful command-line interface components, automatically handling things like help text generation and command dispatching. + +Commands and Groups form the basic structure, but how do we pass information *into* our commands (like `git commit -m "My message"`)? And what other cool things can decorators do? We'll explore that starting with a deeper look at decorators in the next chapter! + +Next up: [Chapter 2: Decorators](02_decorators.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Click/02_decorators.md b/output/Click/02_decorators.md new file mode 100644 index 0000000..5b2c349 --- /dev/null +++ b/output/Click/02_decorators.md @@ -0,0 +1,278 @@ +# Chapter 2: Decorators: Magic Wands for Your Functions + +In [Chapter 1: Commands and Groups](01_command___group.md), we learned how to create basic command-line actions (`Command`) and group them together (`Group`). You might have noticed those strange `@click.command()` and `@click.group()` lines above our functions. What are they, and why do we use them? + +Those are **Decorators**, and they are the heart of how you build Click applications! Think of them as special annotations or modifiers you place *on top* of your Python functions to give them command-line superpowers. + +## Why Decorators? Making Life Easier + +Imagine you didn't have decorators. To create a simple command like `hello` from Chapter 1, you might have to write something like this (this is *not* real Click code, just an illustration): + +```python +# NOT how Click works, but imagine... +import click + +def hello_logic(): + """My command's help text""" + print("Hello World!") + +# Manually create a Command object +hello_command = click.Command( + name='hello', # Give it a name + callback=hello_logic, # Tell it which function to run + help=hello_logic.__doc__ # Copy the help text +) + +if __name__ == '__main__': + # Manually parse arguments and run + # (This part would be complex!) + pass +``` + +That looks like a lot more work! You have to: + +1. Write the function (`hello_logic`). +2. Manually create a `Command` object. +3. Explicitly tell the `Command` object its name, which function to run (`callback`), and its help text. + +Now, let's remember the Click way from Chapter 1: + +```python +# The actual Click way +import click + +@click.command() # <-- The Decorator! +def hello(): + """A simple command that says Hello World""" + print("Hello World!") + +if __name__ == '__main__': + hello() +``` + +Much cleaner, right? The `@click.command()` decorator handles creating the `Command` object, figuring out the name (`hello`), and grabbing the help text from the docstring (`"""..."""`) all automatically! + +Decorators let you *declare* what you want ("this function is a command") right next to the function's code, making your CLI definition much more readable and concise. + +## What is a Decorator in Python? (A Quick Peek) + +Before diving deeper into Click's decorators, let's understand what a decorator *is* in Python itself. + +In Python, a decorator is essentially a function that takes another function as input and returns a *modified* version of that function. It's like wrapping a gift: you still have the original gift inside, but the wrapping adds something extra. + +The `@` symbol is just syntactic sugar โ€“ a shortcut โ€“ for applying a decorator. + +Here's a super simple example (not using Click): + +```python +# A simple Python decorator +def simple_decorator(func): + def wrapper(): + print("Something is happening before the function is called.") + func() # Call the original function + print("Something is happening after the function is called.") + return wrapper # Return the modified function + +@simple_decorator # Apply the decorator +def say_whee(): + print("Whee!") + +# Now, when we call say_whee... +say_whee() +``` + +Running this would print: + +``` +Something is happening before the function is called. +Whee! +Something is happening after the function is called. +``` + +See? `simple_decorator` took our `say_whee` function and wrapped it with extra print statements. The `@simple_decorator` line is equivalent to writing `say_whee = simple_decorator(say_whee)` after defining `say_whee`. + +Click's decorators (`@click.command`, `@click.group`, etc.) do something similar, but instead of just printing, they wrap your function inside Click's `Command` or `Group` objects and configure them. + +## Click's Main Decorators + +Click provides several decorators. The most common ones you'll use are: + +* `@click.command()`: Turns a function into a single CLI command. +* `@click.group()`: Turns a function into a container for other commands. +* `@click.option()`: Adds an *option* (like `--name` or `-v`) to your command. Options are typically optional parameters. +* `@click.argument()`: Adds an *argument* (like a required filename) to your command. Arguments are typically required and positional. + +We already saw `@click.command` and `@click.group` in Chapter 1. Let's focus on how decorators streamline adding commands to groups and introduce options. + +## Decorators in Action: Simplifying Groups and Adding Options + +Remember the `multi_app.py` example from Chapter 1? We had to define the group `cli` and the commands `hello` and `goodbye` separately, then manually attach them using `cli.add_command()`. + +```python +# multi_app_v1.py (from Chapter 1) +import click + +@click.group() +def cli(): + """A simple tool with multiple commands.""" + pass + +@click.command() +def hello(): + """Says Hello World""" + print("Hello World!") + +@click.command() +def goodbye(): + """Says Goodbye World""" + print("Goodbye World!") + +# Manual attachment +cli.add_command(hello) +cli.add_command(goodbye) + +if __name__ == '__main__': + cli() +``` + +Decorators provide a more elegant way! If you have a `@click.group()`, you can use *its* `.command()` method as a decorator to automatically attach the command. + +Let's rewrite `multi_app.py` using this decorator pattern and also add a simple name option to the `hello` command using `@click.option`: + +```python +# multi_app_v2.py (using decorators more effectively) +import click + +# 1. Create the main group +@click.group() +def cli(): + """A simple tool with multiple commands.""" + pass # Group function still doesn't need to do much + +# 2. Define 'hello' and attach it to 'cli' using a decorator +@cli.command() # <-- Decorator from the 'cli' group object! +@click.option('--name', default='World', help='Who to greet.') +def hello(name): # The 'name' parameter matches the option + """Says Hello""" + print(f"Hello {name}!") + +# 3. Define 'goodbye' and attach it to 'cli' using a decorator +@cli.command() # <-- Decorator from the 'cli' group object! +def goodbye(): + """Says Goodbye""" + print("Goodbye World!") + +# No need for cli.add_command() anymore! + +if __name__ == '__main__': + cli() +``` + +What changed? + +1. Instead of `@click.command()`, we used `@cli.command()` above `hello` and `goodbye`. This tells Click, "This function is a command, *and* it belongs to the `cli` group." No more manual `cli.add_command()` needed! +2. We added `@click.option('--name', default='World', help='Who to greet.')` right below `@cli.command()` for the `hello` function. This adds a command-line option named `--name`. +3. The `hello` function now accepts an argument `name`. Click automatically passes the value provided via the `--name` option to this function parameter. If the user doesn't provide `--name`, it uses the `default='World'`. + +**Let's run this new version:** + +Check the help for the main command: + +```bash +$ python multi_app_v2.py --help +Usage: multi_app_v2.py [OPTIONS] COMMAND [ARGS]... + + A simple tool with multiple commands. + +Options: + --help Show this message and exit. + +Commands: + goodbye Says Goodbye + hello Says Hello +``` + +Now check the help for the `hello` subcommand: + +```bash +$ python multi_app_v2.py hello --help +Usage: multi_app_v2.py hello [OPTIONS] + + Says Hello + +Options: + --name TEXT Who to greet. [default: World] + --help Show this message and exit. +``` + +See? The `--name` option is listed, along with its help text and default value! + +Finally, run `hello` with and without the option: + +```bash +$ python multi_app_v2.py hello +Hello World! + +$ python multi_app_v2.py hello --name Alice +Hello Alice! +``` + +It works! Decorators made adding the command to the group cleaner, and adding the option was as simple as adding another decorator line and a function parameter. We'll learn much more about configuring options and arguments in the next chapter, [Parameter (Option / Argument)](03_parameter__option___argument_.md). + +## How Click Decorators Work (Under the Hood) + +So what's the "magic" behind these `@` symbols in Click? + +1. **Decorator Functions:** When you write `@click.command()` or `@click.option()`, you're calling functions defined in Click (specifically in `decorators.py`). These functions are designed to *return another function* (the actual decorator). +2. **Wrapping the User Function:** Python takes the function you defined (e.g., `hello`) and passes it to the decorator function returned in step 1. +3. **Attaching Information:** + * `@click.option` / `@click.argument`: These decorators typically don't create the final `Command` object immediately. Instead, they attach the parameter information (like the option name `--name`, type, default value) to your function object itself, often using a special temporary attribute (like `__click_params__`). They then return the *original function*, but now with this extra metadata attached. + * `@click.command` / `@click.group`: This decorator usually runs *last* (decorators are applied bottom-up). It looks for any parameter information attached by previous `@option` or `@argument` decorators (like `__click_params__`). It then creates the actual `Command` or `Group` object (defined in `core.py`), configures it with the command name, help text (from the docstring), the attached parameters, and stores your original function as the `callback` to be executed. It returns this newly created `Command` or `Group` object, effectively replacing your original function definition with the Click object. +4. **Group Attachment:** When you use `@cli.command()`, the `@cli.command()` decorator not only creates the `Command` object but also automatically calls `cli.add_command()` to register the new command with the `cli` group object. + +Here's a simplified sequence diagram showing what happens when you define the `hello` command in `multi_app_v2.py`: + +```mermaid +sequenceDiagram + participant PythonInterpreter + participant click_option as @click.option('--name') + participant hello_func as hello(name) + participant cli_command as @cli.command() + participant cli_Group as cli (Group Object) + participant hello_Command as hello (New Command Object) + + Note over PythonInterpreter, hello_func: Python processes decorators bottom-up + PythonInterpreter->>click_option: Processes @click.option('--name', ...) decorator + click_option->>hello_func: Attaches Option info (like in __click_params__) + click_option-->>PythonInterpreter: Returns original hello_func (with attached info) + + PythonInterpreter->>cli_command: Processes @cli.command() decorator + cli_command->>hello_func: Reads function name, docstring, attached params (__click_params__) + cli_command->>hello_Command: Creates new Command object for 'hello' + cli_command->>cli_Group: Calls cli.add_command(hello_Command) + cli_command-->>PythonInterpreter: Returns the new hello_Command object + + Note over PythonInterpreter: 'hello' in the code now refers to the Command object +``` + +The key takeaway is that decorators allow Click to gather all the necessary information (function logic, command name, help text, options, arguments) right where you define the function, and build the corresponding Click objects behind the scenes. You can find the implementation details in `click/decorators.py` and `click/core.py`. The `_param_memo` helper function in `decorators.py` is often used internally by `@option` and `@argument` to attach parameter info to the function before `@command` processes it. + +## Conclusion + +Decorators are fundamental to Click's design philosophy. They provide a clean, readable, and *declarative* way to turn your Python functions into powerful command-line interface components. + +You've learned: + +* Decorators are Python features (`@`) that modify functions. +* Click uses decorators like `@click.command`, `@click.group`, `@click.option`, and `@click.argument` extensively. +* Decorators handle the creation and configuration of `Command`, `Group`, `Option`, and `Argument` objects for you. +* Using decorators like `@group.command()` automatically attaches commands to groups. +* They make defining your CLI structure intuitive and keep related code together. + +We've only scratched the surface of `@click.option` and `@click.argument`. How do you make options required? How do you handle different data types (numbers, files)? How do you define arguments that take multiple values? We'll explore all of this in the next chapter! + +Next up: [Chapter 3: Parameter (Option / Argument)](03_parameter__option___argument_.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Click/03_parameter__option___argument_.md b/output/Click/03_parameter__option___argument_.md new file mode 100644 index 0000000..420f39e --- /dev/null +++ b/output/Click/03_parameter__option___argument_.md @@ -0,0 +1,249 @@ +# Chapter 3: Parameter (Option / Argument) - Giving Your Commands Input + +In the last chapter, [Decorators](02_decorators.md), we saw how decorators like `@click.command()` and `@click.option()` act like magic wands, transforming our Python functions into CLI commands and adding features like command-line options. + +But how do our commands actually *receive* information from the user? If we have a command `greet`, how do we tell it *who* to greet, like `greet --name Alice`? Or if we have a `copy` command, how do we specify the source and destination files, like `copy report.txt backup.txt`? + +This is where **Parameters** come in. Parameters define the inputs your commands can accept, just like arguments define the inputs for a regular Python function. Click handles parsing these inputs from the command line, validating them, and making them available to your command function. + +There are two main types of parameters in Click: + +1. **Options:** These are usually preceded by flags like `--verbose` or `-f`. They are often optional and can either take a value (like `--name Alice`) or act as simple on/off switches (like `--verbose`). You define them using the `@click.option()` decorator. +2. **Arguments:** These are typically positional values that come *after* any options. They often represent required inputs, like a filename (`report.txt`). You define them using the `@click.argument()` decorator. + +Let's see how to use them! + +## Options: The Named Inputs (`@click.option`) + +Think of options like keyword arguments in Python functions. In `def greet(name="World"):`, `name` is a keyword argument with a default value. Options serve a similar purpose for your CLI. + +Let's modify our `hello` command from the previous chapter to accept a `--name` option. + +```python +# greet_app.py +import click + +@click.group() +def cli(): + """A simple tool with a greeting command.""" + pass + +@cli.command() +@click.option('--name', default='World', help='Who to greet.') +def hello(name): # <-- The 'name' parameter matches the option + """Greets the person specified by the --name option.""" + print(f"Hello {name}!") + +if __name__ == '__main__': + cli() +``` + +Let's break down the new parts: + +1. `@click.option('--name', default='World', help='Who to greet.')`: This decorator defines an option. + * `'--name'`: This is the primary name of the option on the command line. + * `default='World'`: If the user doesn't provide the `--name` option, the value `World` will be used. + * `help='Who to greet.'`: This text will appear in the help message for the `hello` command. +2. `def hello(name):`: Notice how the `hello` function now accepts an argument named `name`. Click cleverly matches the option name (`name`) to the function parameter name and passes the value automatically! + +**Try running it!** + +First, check the help message for the `hello` command: + +```bash +$ python greet_app.py hello --help +Usage: greet_app.py hello [OPTIONS] + + Greets the person specified by the --name option. + +Options: + --name TEXT Who to greet. [default: World] + --help Show this message and exit. +``` + +See? Click added our `--name` option to the help screen, including the help text and default value we provided. The `TEXT` part indicates the type of value expected (we'll cover types in [ParamType](04_paramtype.md)). + +Now, run it with and without the option: + +```bash +$ python greet_app.py hello +Hello World! + +$ python greet_app.py hello --name Alice +Hello Alice! +``` + +It works perfectly! Click parsed the `--name Alice` option and passed `"Alice"` to our `hello` function's `name` parameter. When we didn't provide the option, it used the default value `"World"`. + +### Option Flavors: Short Names and Flags + +Options can have variations: + +* **Short Names:** You can provide shorter aliases, like `-n` for `--name`. +* **Flags:** Options that don't take a value but act as switches (e.g., `--verbose`). + +Let's add a short name `-n` to our `--name` option and a `--shout` flag to make the greeting uppercase. + +```python +# greet_app_v2.py +import click + +@click.group() +def cli(): + """A simple tool with a greeting command.""" + pass + +@cli.command() +@click.option('--name', '-n', default='World', help='Who to greet.') # Added '-n' +@click.option('--shout', is_flag=True, help='Greet loudly.') # Added '--shout' flag +def hello(name, shout): # <-- Function now accepts 'shout' too + """Greets the person, optionally shouting.""" + greeting = f"Hello {name}!" + if shout: + greeting = greeting.upper() + print(greeting) + +if __name__ == '__main__': + cli() +``` + +Changes: + +1. `@click.option('--name', '-n', ...)`: We added `'-n'` as the second argument to the decorator. Now, both `--name` and `-n` work. +2. `@click.option('--shout', is_flag=True, ...)`: This defines a flag. `is_flag=True` tells Click this option doesn't take a value; its presence makes the corresponding parameter `True`, otherwise it's `False`. +3. `def hello(name, shout):`: The function signature is updated to accept the `shout` parameter. + +**Run it again!** + +```bash +$ python greet_app_v2.py hello -n Bob +Hello Bob! + +$ python greet_app_v2.py hello --name Carol --shout +HELLO CAROL! + +$ python greet_app_v2.py hello --shout +HELLO WORLD! +``` + +Flags and short names make your CLI more flexible and conventional! + +## Arguments: The Positional Inputs (`@click.argument`) + +Arguments are like positional arguments in Python functions. In `def copy(src, dst):`, `src` and `dst` are required positional arguments. Click arguments usually represent mandatory inputs that follow the command and any options. + +Let's create a simple command that takes two arguments, `SRC` and `DST`, representing source and destination files (though we'll just print them for now). + +```python +# copy_app.py +import click + +@click.command() +@click.argument('src') # Defines the first argument +@click.argument('dst') # Defines the second argument +def copy(src, dst): # Function parameters match argument names + """Copies SRC file to DST.""" + print(f"Pretending to copy '{src}' to '{dst}'") + +if __name__ == '__main__': + copy() +``` + +What's happening here? + +1. `@click.argument('src')`: Defines a positional argument named `src`. By default, arguments are required. The name `'src'` is used both internally and often capitalized (`SRC`) in help messages by convention. +2. `@click.argument('dst')`: Defines the second required positional argument. +3. `def copy(src, dst):`: The function parameters `src` and `dst` receive the values provided on the command line in the order they appear. + +**Let's try it!** + +First, see what happens if we forget the arguments: + +```bash +$ python copy_app.py +Usage: copy_app.py [OPTIONS] SRC DST +Try 'copy_app.py --help' for help. + +Error: Missing argument 'SRC'. +``` + +Click automatically detects the missing argument and gives a helpful error message! + +Now, provide the arguments: + +```bash +$ python copy_app.py report.txt backup/report.txt +Pretending to copy 'report.txt' to 'backup/report.txt' +``` + +Click correctly captured the positional arguments and passed them to our `copy` function. + +Arguments are essential for inputs that are fundamental to the command's operation, like the files to operate on. Options are better suited for modifying the command's behavior. + +*(Note: Arguments can also be made optional or accept variable numbers of inputs, often involving the `required` and `nargs` settings, which tie into concepts we'll explore more in [ParamType](04_paramtype.md).)* + +## How Parameters Work Together + +When you run a command like `python greet_app_v2.py hello --shout -n Alice`, Click performs a sequence of steps: + +1. **Parsing:** Click looks at the command-line arguments (`sys.argv`) provided by the operating system: `['greet_app_v2.py', 'hello', '--shout', '-n', 'Alice']`. +2. **Command Identification:** It identifies `hello` as the command to execute. +3. **Parameter Matching:** It scans the remaining arguments (`['--shout', '-n', 'Alice']`). + * It sees `--shout`. It looks up the parameters defined for the `hello` command (using the `@click.option` and `@click.argument` decorators). It finds the `shout` option definition (which has `is_flag=True`). It marks the value for `shout` as `True`. + * It sees `-n`. It finds the `name` option definition (which includes `-n` as an alias and expects a value). + * It sees `Alice`. Since the previous token (`-n`) expected a value, Click associates `"Alice"` with the `-n` (and thus `--name`) option. It marks the value for `name` as `"Alice"`. +4. **Validation & Conversion:** Click checks if all required parameters are present (they are). It also performs type conversion (though in this case, the default is string, which matches "Alice"). We'll see more complex conversions in the next chapter. +5. **Function Call:** Finally, Click calls the command's underlying Python function (`hello`) with the collected values as keyword arguments: `hello(name='Alice', shout=True)`. + +Here's a simplified view of the process: + +```mermaid +sequenceDiagram + participant User + participant Terminal + participant PythonScript as python greet_app_v2.py + participant ClickRuntime + participant hello_func as hello(name, shout) + + User->>Terminal: python greet_app_v2.py hello --shout -n Alice + Terminal->>PythonScript: Executes script with args ["hello", "--shout", "-n", "Alice"] + PythonScript->>ClickRuntime: Calls cli() entry point + ClickRuntime->>ClickRuntime: Parses args, finds 'hello' command + ClickRuntime->>ClickRuntime: Identifies '--shout' as flag for 'shout' parameter (value=True) + ClickRuntime->>ClickRuntime: Identifies '-n' as option for 'name' parameter + ClickRuntime->>ClickRuntime: Consumes 'Alice' as value for '-n'/'name' parameter (value="Alice") + ClickRuntime->>ClickRuntime: Validates parameters, performs type conversion + ClickRuntime->>hello_func: Calls callback: hello(name="Alice", shout=True) + hello_func-->>PythonScript: Prints "HELLO ALICE!" + PythonScript-->>Terminal: Shows output + Terminal-->>User: Displays "HELLO ALICE!" +``` + +## Under the Hood: Decorators and Parameter Objects + +How do `@click.option` and `@click.argument` actually work with `@click.command`? + +1. **Parameter Definition (`decorators.py`, `core.py`):** When you use `@click.option(...)` or `@click.argument(...)`, these functions (defined in `click/decorators.py`) create instances of the `Option` or `Argument` classes (defined in `click/core.py`). These objects store all the configuration you provided (like `--name`, `-n`, `default='World'`, `is_flag=True`, etc.). +2. **Attaching to Function (`decorators.py`):** Crucially, these decorators don't immediately add the parameters to a command. Instead, they attach the created `Option` or `Argument` object to the function they are decorating. Click uses a helper mechanism (like the internal `_param_memo` function which adds to a `__click_params__` list) to store these parameter objects *on* the function object temporarily. +3. **Command Creation (`decorators.py`, `core.py`):** The `@click.command()` decorator (or `@group.command()`) runs *after* all the `@option` and `@argument` decorators for that function. It looks for the attached parameter objects (the `__click_params__` list). It gathers these objects and passes them to the constructor of the `Command` (or `Group`) object it creates. The `Command` object stores these parameters in its `params` attribute. +4. **Parsing (`parser.py`, `core.py`):** When the command is invoked, the `Command` object uses its `params` list to configure an internal parser (historically based on Python's `optparse`, see `click/parser.py`). This parser processes the command-line string (`sys.argv`) according to the rules defined by the `Option` and `Argument` objects in the `params` list. +5. **Callback Invocation (`core.py`):** After parsing and validation, Click takes the resulting values and calls the original Python function (stored as the `Command.callback`), passing the values as arguments. + +So, the decorators work together: `@option`/`@argument` define the parameters and temporarily attach them to the function, while `@command` collects these definitions and builds the final `Command` object, ready for parsing. + +## Conclusion + +You've learned how to make your Click commands interactive by defining inputs using **Parameters**: + +* **Options (`@click.option`):** Named inputs, often optional, specified with flags (`--name`, `-n`). Great for controlling behavior (like `--verbose`, `--shout`) or providing specific pieces of data (`--output file.txt`). +* **Arguments (`@click.argument`):** Positional inputs, often required, that follow options (`input.csv`). Ideal for core data the command operates on (like source/destination files). + +You saw how Click uses decorators to define these parameters and automatically handles parsing the command line, providing default values, generating help messages, and passing the final values to your Python function. + +But what if you want an option to accept only numbers? Or a choice from a predefined list? Or maybe an argument that represents a file path that must exist? Click handles this through **Parameter Types**. Let's explore those next! + +Next up: [Chapter 4: ParamType](04_paramtype.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Click/04_paramtype.md b/output/Click/04_paramtype.md new file mode 100644 index 0000000..7b8ee5e --- /dev/null +++ b/output/Click/04_paramtype.md @@ -0,0 +1,257 @@ +# Chapter 4: ParamType - Checking and Converting Inputs + +In [Chapter 3: Parameter (Option / Argument)](03_parameter__option___argument_.md), we learned how to define inputs for our commands using `@click.option` and `@click.argument`. Our `greet` command could take a `--name` option, and our `copy` command took `SRC` and `DST` arguments. + +But what if we need more control? What if our command needs a *number* as input, like `--count 3`? Or what if an option should only accept specific words, like `--level easy` or `--level hard`? Right now, Click treats most inputs as simple text strings. + +This is where **ParamType** comes in! Think of `ParamType`s as the **gatekeepers** and **translators** for your command-line inputs. They: + +1. **Validate:** Check if the user's input looks correct (e.g., "Is this actually a number?"). +2. **Convert:** Change the input text (which is always initially a string) into the Python type you need (e.g., the string `"3"` becomes the integer `3`). + +`ParamType`s make your commands more robust by catching errors early and giving your Python code the data types it expects. + +## Why Do We Need ParamTypes? + +Imagine you're writing a command to repeat a message multiple times: + +```bash +repeat --times 5 "Hello!" +``` + +Inside your Python function, you want the `times` variable to be an integer so you can use it in a loop. If the user types `repeat --times five "Hello!"`, your code might crash if it tries to use the string `"five"` like a number. + +`ParamType` solves this. By telling Click that the `--times` option expects an integer, Click will automatically: + +* Check if the input (`"5"`) can be turned into an integer. +* If yes, convert it to the integer `5` and pass it to your function. +* If no (like `"five"`), stop immediately and show the user a helpful error message *before* your function even runs! + +## Using Built-in ParamTypes + +Click provides several ready-to-use `ParamType`s. You specify which one to use with the `type` argument in `@click.option` or `@click.argument`. + +Let's modify an example to use `click.INT`. + +```python +# count_app.py +import click + +@click.command() +@click.option('--count', default=1, type=click.INT, help='Number of times to print.') +@click.argument('message') +def repeat(count, message): + """Prints MESSAGE the specified number of times.""" + # 'count' is now guaranteed to be an integer! + for _ in range(count): + click.echo(message) + +if __name__ == '__main__': + repeat() +``` + +Breakdown: + +1. `import click`: As always. +2. `@click.option('--count', ..., type=click.INT, ...)`: This is the key change! We added `type=click.INT`. This tells Click that the value provided for `--count` must be convertible to an integer. `click.INT` is one of Click's built-in `ParamType` instances. +3. `def repeat(count, message):`: The `count` parameter in our function will receive the *converted* integer value. + +**Let's run it!** + +```bash +$ python count_app.py --count 3 "Woohoo!" +Woohoo! +Woohoo! +Woohoo! +``` + +It works! Click converted the input string `"3"` into the Python integer `3` before calling our `repeat` function. + +Now, see what happens with invalid input: + +```bash +$ python count_app.py --count five "Oh no" +Usage: count_app.py [OPTIONS] MESSAGE +Try 'count_app.py --help' for help. + +Error: Invalid value for '--count': 'five' is not a valid integer. +``` + +Perfect! Click caught the error because `"five"` couldn't be converted by `click.INT`. It printed a helpful message and prevented our `repeat` function from running with bad data. + +## Common Built-in Types + +Click offers several useful built-in types: + +* `click.STRING`: The default type. Converts the input to a string (usually doesn't change much unless the input was bytes). +* `click.INT`: Converts to an integer. Fails if the input isn't a valid whole number. +* `click.FLOAT`: Converts to a floating-point number. Fails if the input isn't a valid number (e.g., `3.14`, `-0.5`). +* `click.BOOL`: Converts to a boolean (`True`/`False`). It's clever and understands inputs like `'1'`, `'true'`, `'t'`, `'yes'`, `'y'`, `'on'` as `True`, and `'0'`, `'false'`, `'f'`, `'no'`, `'n'`, `'off'` as `False`. Usually used for options that aren't flags. +* `click.Choice`: Checks if the value is one of a predefined list of choices. + + ```python + # choice_example.py + import click + + @click.command() + @click.option('--difficulty', type=click.Choice(['easy', 'medium', 'hard'], case_sensitive=False), default='easy') + def setup(difficulty): + click.echo(f"Setting up game with difficulty: {difficulty}") + + if __name__ == '__main__': + setup() + ``` + + Running `python choice_example.py --difficulty MeDiUm` works (because `case_sensitive=False`), but `python choice_example.py --difficulty expert` would fail. + +* `click.Path`: Represents a filesystem path. It can check if the path exists, if it's a file or directory, and if it has certain permissions (read/write/execute). It returns the path as a string (or `pathlib.Path` if configured). + + ```python + # path_example.py + import click + + @click.command() + @click.argument('output_dir', type=click.Path(exists=True, file_okay=False, dir_okay=True, writable=True)) + def process(output_dir): + click.echo(f"Processing data into directory: {output_dir}") + # We know output_dir exists, is a directory, and is writable! + + if __name__ == '__main__': + process() + ``` + +* `click.File`: Similar to `Path`, but it *automatically opens* the file and passes the open file object to your function. It also handles closing the file automatically. You can specify the mode (`'r'`, `'w'`, `'rb'`, `'wb'`). + + ```python + # file_example.py + import click + + @click.command() + @click.argument('input_file', type=click.File('r')) # Open for reading text + def cat(input_file): + # input_file is an open file handle! + click.echo(input_file.read()) + # Click will close the file automatically after this function returns + + if __name__ == '__main__': + cat() + ``` + +These built-in types cover most common use cases for validating and converting command-line inputs. + +## How ParamTypes Work Under the Hood + +What happens when you specify `type=click.INT`? + +1. **Parsing:** As described in [Chapter 3](03_parameter__option___argument_.md), Click's parser identifies the command-line arguments and matches them to your defined `Option`s and `Argument`s. It finds the raw string value provided by the user (e.g., `"3"` for `--count`). +2. **Type Retrieval:** The parser looks at the `Parameter` object (the `Option` or `Argument`) and finds the `type` you assigned to it (e.g., the `click.INT` instance). +3. **Conversion Attempt:** The parser calls the `convert()` method of the `ParamType` instance, passing the raw string value (`"3"`), the parameter object itself, and the current [Context](05_context.md). +4. **Validation & Conversion Logic (Inside `ParamType.convert`)**: + * The `click.INT.convert()` method tries to call Python's built-in `int("3")`. + * If this succeeds, it returns the result (the integer `3`). + * If it fails (e.g., `int("five")` would raise a `ValueError`), the `convert()` method catches this error. +5. **Success or Failure**: + * **Success:** The parser receives the converted value (`3`) and stores it. Later, it passes this value to your command function. + * **Failure:** The `convert()` method calls its `fail()` helper method. The `fail()` method raises a `click.BadParameter` exception with a helpful error message (e.g., "'five' is not a valid integer."). Click catches this exception, stops further processing, and displays the error message to the user along with usage instructions. + +Here's a simplified view of the successful conversion process: + +```mermaid +sequenceDiagram + participant User + participant CLI + participant ClickParser as Click Parser + participant IntType as click.INT + participant CommandFunc as Command Function + + User->>CLI: python count_app.py --count 3 ... + CLI->>ClickParser: Parse args, find '--count' option with value '3' + ClickParser->>IntType: Call convert(value='3', param=..., ctx=...) + IntType->>IntType: Attempt int('3') -> Success! returns 3 + IntType-->>ClickParser: Return converted value: 3 + ClickParser->>CommandFunc: Call repeat(count=3, ...) + CommandFunc-->>CLI: Executes logic (prints message 3 times) +``` + +And here's the failure process: + +```mermaid +sequenceDiagram + participant User + participant CLI + participant ClickParser as Click Parser + participant IntType as click.INT + participant ClickException as Click Exception Handling + + User->>CLI: python count_app.py --count five ... + CLI->>ClickParser: Parse args, find '--count' option with value 'five' + ClickParser->>IntType: Call convert(value='five', param=..., ctx=...) + IntType->>IntType: Attempt int('five') -> Fails! (ValueError) + IntType->>ClickException: Catch error, call fail("'five' is not...") -> raises BadParameter + ClickException-->>ClickParser: BadParameter exception raised + ClickParser-->>CLI: Catch exception, stop processing + CLI-->>User: Display "Error: Invalid value for '--count': 'five' is not a valid integer." +``` + +The core logic for built-in types resides in `click/types.py`. Each type (like `IntParamType`, `Choice`, `Path`) inherits from the base `ParamType` class and implements its own `convert` method containing the specific validation and conversion rules. + +```python +# Simplified structure from click/types.py + +class ParamType: + name: str # Human-readable name like "integer" or "filename" + + def convert(self, value, param, ctx): + # Must be implemented by subclasses + # Should return the converted value or call self.fail() + raise NotImplementedError + + def fail(self, message, param, ctx): + # Raises a BadParameter exception + raise BadParameter(message, ctx=ctx, param=param) + +class IntParamType(ParamType): + name = "integer" + + def convert(self, value, param, ctx): + try: + # The core conversion logic! + return int(value) + except ValueError: + # If conversion fails, raise the standard error + self.fail(f"{value!r} is not a valid integer.", param, ctx) + +# click.INT is just an instance of this class +INT = IntParamType() +``` + +## Custom Types + +What if none of the built-in types do exactly what you need? Click allows you to create your own custom `ParamType`s! You can do this by subclassing `click.ParamType` and implementing the `name` attribute and the `convert` method. This is an advanced topic, but it provides great flexibility. + +## Shell Completion Hints + +An added benefit of using specific `ParamType`s is that they can provide hints for shell completion (when the user presses Tab). For example: +* `click.Choice(['easy', 'medium', 'hard'])` can suggest `easy`, `medium`, or `hard`. +* `click.Path` can suggest file and directory names from the current location. + +This makes your CLI even more user-friendly. + +## Conclusion + +`ParamType`s are a fundamental part of Click, acting as the bridge between raw command-line text input and the well-typed data your Python functions need. They handle the crucial tasks of: + +* **Validating** user input against expected formats or rules. +* **Converting** input strings to appropriate Python types (integers, booleans, files, etc.). +* **Generating** user-friendly error messages for invalid input. +* Providing hints for **shell completion**. + +By using built-in types like `click.INT`, `click.Choice`, `click.Path`, and `click.File`, you make your commands more robust, reliable, and easier to use. + +So far, we've seen how commands are structured, how parameters get their values, and how those values are validated and converted. But how does Click manage the state *during* the execution of a command? How does it know which command is running or what the parent commands were? That's the job of the `Context`. Let's explore that next! + +Next up: [Chapter 5: Context](05_context.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Click/05_context.md b/output/Click/05_context.md new file mode 100644 index 0000000..5b482c2 --- /dev/null +++ b/output/Click/05_context.md @@ -0,0 +1,271 @@ +# Chapter 5: Context - The Command's Nervous System + +In the last chapter, [ParamType](04_paramtype.md), we saw how Click helps validate and convert user input into the right Python types, making our commands more robust. We used types like `click.INT` and `click.Path` to ensure data correctness. + +But what happens *while* a command is running? How does Click keep track of which command is being executed, what parameters were passed, or even shared information between different commands in a nested structure (like `git remote add ...`)? + +This is where the **Context** object, often referred to as `ctx`, comes into play. Think of the Context as the central nervous system for a single command invocation. It carries all the vital information about the current state of execution. + +## Why Do We Need a Context? + +Imagine you have a command that needs to behave differently based on a global configuration, maybe a `--verbose` flag set on the main application group. Or perhaps one command needs to call another command within the same application. How do they communicate? + +The Context object solves these problems by providing a central place to: + +* Access parameters passed to the *current* command. +* Access parameters or settings from *parent* commands. +* Share application-level objects (like configuration settings or database connections) between commands. +* Manage resources that need cleanup (like automatically closing files opened with `click.File`). +* Invoke other commands programmatically. + +Let's explore how to access and use this powerful object. + +## Getting the Context: `@pass_context` + +Click doesn't automatically pass the Context object to your command function. You need to explicitly ask for it using a special decorator: `@click.pass_context`. + +When you add `@click.pass_context` *above* your function definition (but typically *below* the `@click.command` or `@click.option` decorators), Click will automatically **inject** the `Context` object as the **very first argument** to your function. + +Let's see a simple example: + +```python +# context_basics.py +import click + +@click.group() +@click.pass_context # Request the context for the group function +def cli(ctx): + """A simple CLI with context.""" + # We can store arbitrary data on the context's 'obj' attribute + ctx.obj = {'verbose': False} # Initialize a shared dictionary + +@cli.command() +@click.option('--verbose', is_flag=True, help='Enable verbose mode.') +@click.pass_context # Request the context for the command function +def info(ctx, verbose): + """Prints info, possibly verbosely.""" + # Access the command name from the context + click.echo(f"Executing command: {ctx.command.name}") + + # Access parameters passed to *this* command + click.echo(f"Verbose flag (local): {verbose}") + + # We can modify the shared object from the parent context + if verbose: + ctx.obj['verbose'] = True + + # Access the shared object from the parent context + click.echo(f"Verbose setting (shared): {ctx.obj['verbose']}") + +if __name__ == '__main__': + cli() +``` + +Let's break it down: + +1. `@click.pass_context`: We apply this decorator to both the `cli` group function and the `info` command function. +2. `def cli(ctx): ...`: Because of `@pass_context`, the `cli` function now receives the `Context` object as its first argument, which we've named `ctx`. +3. `ctx.obj = {'verbose': False}`: The `ctx.obj` attribute is a special place designed for you to store and share *your own* application data. Here, the main `cli` group initializes it as a dictionary. This object will be automatically inherited by child command contexts. +4. `def info(ctx, verbose): ...`: The `info` command function also receives the `Context` (`ctx`) as its first argument, followed by its own parameters (`verbose`). +5. `ctx.command.name`: We access the `Command` object associated with the current context via `ctx.command` and get its name. +6. `ctx.obj['verbose'] = True`: We can *modify* the shared `ctx.obj` from within the subcommand. +7. `click.echo(f"Verbose setting (shared): {ctx.obj['verbose']}")`: We access the potentially modified shared state. + +**Run it!** + +```bash +$ python context_basics.py info +Executing command: info +Verbose flag (local): False +Verbose setting (shared): False + +$ python context_basics.py info --verbose +Executing command: info +Verbose flag (local): True +Verbose setting (shared): True +``` + +You can see how `@pass_context` gives us access to the runtime environment (`ctx.command.name`) and allows us to use `ctx.obj` to share state between the parent group (`cli`) and the subcommand (`info`). + +## Key Context Attributes + +The `Context` object has several useful attributes: + +* `ctx.command`: The [Command](01_command___group.md) object that this context belongs to. You can get its name (`ctx.command.name`), parameters, etc. +* `ctx.parent`: The context of the invoking command. If this is the top-level command, `ctx.parent` will be `None`. This forms a linked list or chain back to the root context. +* `ctx.params`: A dictionary mapping parameter names to the *final* values passed to the command, after parsing, type conversion, and defaults have been applied. + ```python + # access_params.py + import click + + @click.command() + @click.option('--name', default='Guest') + @click.pass_context + def hello(ctx, name): + click.echo(f"Hello, {name}!") + # Access the parameter value directly via ctx.params + click.echo(f"(Value from ctx.params: {ctx.params['name']})") + + if __name__ == '__main__': + hello() + ``` + Running `python access_params.py --name Alice` would show `Hello, Alice!` and `(Value from ctx.params: Alice)`. +* `ctx.obj`: As seen before, this is an arbitrary object that gets passed down the context chain. It's commonly used for shared configuration, database connections, or other application-level state. You can also use `@click.pass_obj` as a shortcut if you *only* need `ctx.obj`. +* `ctx.info_name`: The name that was used on the command line to invoke this command or group (e.g., `info` in `python context_basics.py info`). +* `ctx.invoked_subcommand`: For groups, this holds the name of the subcommand that was invoked (or `None` if no subcommand was called). + +## Calling Other Commands + +Sometimes, you want one command to trigger another. The Context provides methods for this: + +* `ctx.invoke(other_command, **params)`: Calls another Click command (`other_command`), passing the current context's parent (`ctx.parent`) as the new command's parent. It uses the provided `params` for the call. +* `ctx.forward(other_command)`: Similar to `invoke`, but it automatically passes all parameters from the *current* context (`ctx.params`) to the `other_command`. This is useful for creating alias commands. + +```python +# invoke_example.py +import click + +@click.group() +def cli(): + pass + +@cli.command() +@click.argument('text') +def print_it(text): + """Prints the given text.""" + click.echo(f"Printing: {text}") + +@cli.command() +@click.argument('message') +@click.pass_context # Need context to call invoke +def shout(ctx, message): + """Shouts the message by calling print_it.""" + click.echo("About to invoke print_it...") + # Call the 'print_it' command, passing the uppercased message + ctx.invoke(print_it, text=message.upper()) + click.echo("Finished invoking print_it.") + +if __name__ == '__main__': + cli() +``` + +Running `python invoke_example.py shout "hello world"` will output: + +``` +About to invoke print_it... +Printing: HELLO WORLD +Finished invoking print_it. +``` + +The `shout` command successfully called the `print_it` command programmatically using `ctx.invoke()`. + +## Resource Management (`ctx.call_on_close`) + +Click uses the context internally to manage resources. For instance, when you use `type=click.File('w')`, Click opens the file and registers a cleanup function using `ctx.call_on_close(file.close)`. This ensures the file is closed when the context is finished, even if errors occur. + +You can use this mechanism yourself if you need custom resource cleanup tied to the command's lifecycle. + +```python +# resource_management.py +import click + +class MockResource: + def __init__(self, name): + self.name = name + click.echo(f"Resource '{self.name}' opened.") + def close(self): + click.echo(f"Resource '{self.name}' closed.") + +@click.command() +@click.pass_context +def process(ctx): + """Opens and closes a mock resource.""" + res = MockResource("DataFile") + # Register the close method to be called when the context ends + ctx.call_on_close(res.close) + click.echo("Processing with resource...") + # Function ends, context tears down, call_on_close triggers + +if __name__ == '__main__': + process() +``` + +Running this script will show: + +``` +Resource 'DataFile' opened. +Processing with resource... +Resource 'DataFile' closed. +``` + +The resource was automatically closed because we registered its `close` method with `ctx.call_on_close`. + +## How Context Works Under the Hood + +1. **Initial Context:** When you run your Click application (e.g., by calling `cli()`), Click creates the first `Context` object associated with the top-level command or group (`cli` in our examples). +2. **Parsing and Subcommand:** Click parses the command-line arguments. If a subcommand is identified (like `info` in `python context_basics.py info`), Click finds the corresponding `Command` object. +3. **Child Context Creation:** Before executing the subcommand's callback function, Click creates a *new* `Context` object for the subcommand. Crucially, it sets the `parent` attribute of this new context to the context of the invoking command (the `cli` context in our example). +4. **Object Inheritance:** The `ctx.obj` attribute is automatically passed down from the parent context to the child context *by reference* (unless the child explicitly sets its own `ctx.obj`). +5. **`@pass_context` Decorator:** This decorator (defined in `decorators.py`) wraps your callback function. When the wrapped function is called, the decorator uses `click.globals.get_current_context()` (which accesses a thread-local stack of contexts) to fetch the *currently active* context and inserts it as the first argument before calling your original function. +6. **`ctx.invoke`:** When you call `ctx.invoke(other_cmd, ...)`, Click finds the `other_cmd` object, creates a *new* context for it (setting its parent to `ctx.parent`), populates its `params` from the arguments you provided, and then executes `other_cmd`'s callback within that new context. +7. **Cleanup:** Once a command function finishes (or raises an exception that Click handles), its corresponding context is "torn down". This is when any functions registered with `ctx.call_on_close` are executed. + +Here's a simplified diagram showing context creation and `ctx.obj` flow for `python context_basics.py info --verbose`: + +```mermaid +sequenceDiagram + participant User + participant CLI as python context_basics.py + participant ClickRuntime + participant cli_ctx as cli Context + participant info_ctx as info Context + participant cli_func as cli(ctx) + participant info_func as info(ctx, verbose) + + User->>CLI: info --verbose + CLI->>ClickRuntime: Calls cli() entry point + ClickRuntime->>cli_ctx: Creates root context for 'cli' group + Note over ClickRuntime, cli_func: ClickRuntime calls cli's callback (due to @click.group) + ClickRuntime->>cli_func: cli(ctx=cli_ctx) + cli_func->>cli_ctx: Sets ctx.obj = {'verbose': False} + cli_func-->>ClickRuntime: Returns + ClickRuntime->>ClickRuntime: Parses args, finds 'info' subcommand, '--verbose' option + ClickRuntime->>info_ctx: Creates child context for 'info' command + info_ctx->>cli_ctx: Sets info_ctx.parent = cli_ctx + info_ctx->>info_ctx: Inherits ctx.obj from parent (value = {'verbose': False}) + Note over ClickRuntime, info_func: ClickRuntime prepares to call info's callback + ClickRuntime->>ClickRuntime: Uses @pass_context to get info_ctx + ClickRuntime->>info_func: info(ctx=info_ctx, verbose=True) + info_func->>info_ctx: Accesses ctx.command.name + info_func->>info_ctx: Accesses ctx.params['verbose'] (or local 'verbose') + info_func->>info_ctx: Modifies ctx.obj['verbose'] = True + info_func->>info_ctx: Accesses ctx.obj['verbose'] (now True) + info_func-->>ClickRuntime: Returns + ClickRuntime->>info_ctx: Tears down info_ctx (runs call_on_close) + ClickRuntime->>cli_ctx: Tears down cli_ctx (runs call_on_close) + ClickRuntime-->>CLI: Exits +``` + +The core `Context` class is defined in `click/core.py`. The decorators `pass_context` and `pass_obj` are in `click/decorators.py`, and the mechanism for tracking the current context is in `click/globals.py`. + +## Conclusion + +The `Context` (`ctx`) is a cornerstone concept in Click, acting as the runtime carrier of information for a command invocation. + +You've learned: + +* The Context holds data like the current command, parameters, parent context, and shared application objects (`ctx.obj`). +* The `@click.pass_context` decorator injects the current Context into your command function. +* `ctx.obj` is essential for sharing state between nested commands. +* `ctx.invoke()` and `ctx.forward()` allow commands to call each other programmatically. +* Click uses the context for resource management (`ctx.call_on_close`), ensuring cleanup. + +Understanding the Context is key to building more complex Click applications where commands need to interact with each other or with shared application state. It provides the structure and communication channels necessary for sophisticated CLI tools. + +So far, we've focused on the logic and structure of commands. But how can we make the interaction in the terminal itself more engaging? How do we prompt users for input, show progress bars, or display colored output? Let's explore Click's terminal UI capabilities next! + +Next up: [Chapter 6: Term UI (Terminal User Interface)](06_term_ui__terminal_user_interface_.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Click/06_term_ui__terminal_user_interface_.md b/output/Click/06_term_ui__terminal_user_interface_.md new file mode 100644 index 0000000..f008da7 --- /dev/null +++ b/output/Click/06_term_ui__terminal_user_interface_.md @@ -0,0 +1,290 @@ +# Chapter 6: Term UI (Terminal User Interface) + +Welcome back! In [Chapter 5: Context](05_context.md), we learned how Click uses the `Context` object (`ctx`) to manage the state of a command while it's running, allowing us to share information and call other commands. + +So far, our commands have mostly just printed simple text. But what if we want to make our command-line tools more interactive and user-friendly? How can we: + +* Ask the user for input (like their name or a filename)? +* Ask simple yes/no questions? +* Show a progress bar for long-running tasks? +* Make our output more visually appealing with colors or styles (like making errors red)? + +This is where Click's **Terminal User Interface (Term UI)** functions come in handy. They are Click's toolkit for talking *back and forth* with the user through the terminal. + +## Making Our Tools Talk: The Need for Term UI + +Imagine you're building a tool that processes a large data file. A purely silent tool isn't very helpful. A better tool might: + +1. Ask the user which file to process. +2. Ask for confirmation before starting a potentially long operation. +3. Show a progress bar while processing the data. +4. Print a nice, colored "Success!" message at the end, or a red "Error!" message if something went wrong. + +Doing all this reliably across different operating systems (like Linux, macOS, and Windows) can be tricky. For example, getting colored text to work correctly on Windows requires special handling. + +Click's Term UI functions wrap up these common interactive tasks into easy-to-use functions that work consistently everywhere. Let's explore some of the most useful ones! + +## Printing with `click.echo()` + +We've seen `print()` in Python, but Click provides its own version: `click.echo()`. Why use it? + +* **Smarter:** It works better with different kinds of data (like Unicode text and raw bytes). +* **Cross-Platform:** It handles subtle differences between operating systems for you. +* **Color Aware:** It automatically strips out color codes if the output isn't going to a terminal (like if you redirect output to a file), preventing garbled text. +* **Integrated:** It works seamlessly with Click's other features, like redirecting output or testing. + +Using it is just like `print()`: + +```python +# echo_example.py +import click + +@click.command() +def cli(): + """Demonstrates click.echo""" + click.echo("Hello from Click!") + # You can print errors to stderr easily + click.echo("Oops, something went wrong!", err=True) + +if __name__ == '__main__': + cli() +``` + +Running this: + +```bash +$ python echo_example.py +Hello from Click! +Oops, something went wrong! # (This line goes to stderr) +``` + +Simple! For most printing in Click apps, `click.echo()` is preferred over `print()`. + +## Adding Style: `click.style()` and `click.secho()` + +Want to make your output stand out? Click makes it easy to add colors and styles (like bold or underline) to your text. + +* `click.style(text, fg='color', bg='color', bold=True, ...)`: Takes your text and wraps it with special codes that terminals understand to change its appearance. It returns the modified string. +* `click.secho(text, fg='color', ...)`: A shortcut that combines `style` and `echo`. It styles the text *and* prints it in one go. + +Let's make our success and error messages more obvious: + +```python +# style_example.py +import click + +@click.command() +def cli(): + """Demonstrates styled output""" + # Style the text first, then echo it + success_message = click.style("Operation successful!", fg='green', bold=True) + click.echo(success_message) + + # Or use secho for style + echo in one step + click.secho("Critical error!", fg='red', underline=True, err=True) + +if __name__ == '__main__': + cli() +``` + +Running this (your terminal must support color): + +```bash +$ python style_example.py +# Output will look something like: +# Operation successful! (in bold green) +# Critical error! (in underlined red, sent to stderr) +``` + +Click supports various colors (`'red'`, `'green'`, `'blue'`, etc.) and styles (`bold`, `underline`, `blink`, `reverse`). This makes your CLI output much more informative at a glance! + +## Getting User Input: `click.prompt()` + +Sometimes you need to ask the user for information. `click.prompt()` is designed for this. It shows a message and waits for the user to type something and press Enter. + +```python +# prompt_example.py +import click + +@click.command() +def cli(): + """Asks for user input""" + name = click.prompt("Please enter your name") + click.echo(f"Hello, {name}!") + + # You can specify a default value + location = click.prompt("Enter location", default="Earth") + click.echo(f"Location: {location}") + + # You can also require a specific type (like an integer) + age = click.prompt("Enter your age", type=int) + click.echo(f"You are {age} years old.") + +if __name__ == '__main__': + cli() +``` + +Running this interactively: + +```bash +$ python prompt_example.py +Please enter your name: Alice +Hello, Alice! +Enter location [Earth]: # Just press Enter here +Location: Earth +Enter your age: 30 +You are 30 years old. +``` + +If you enter something that can't be converted to the `type` (like "abc" for age), `click.prompt` will automatically show an error and ask again! It can also hide input for passwords (`hide_input=True`). + +## Asking Yes/No: `click.confirm()` + +A common need is asking for confirmation before doing something potentially destructive or time-consuming. `click.confirm()` handles this nicely. + +```python +# confirm_example.py +import click +import time + +@click.command() +@click.option('--yes', is_flag=True, help='Assume Yes to confirmation.') +def cli(yes): + """Asks for confirmation.""" + click.echo("This might take a while or change things.") + + # If --yes flag is given, `yes` is True, otherwise ask. + # abort=True means if user says No, stop the program. + if not yes: + click.confirm("Do you want to continue?", abort=True) + + click.echo("Starting operation...") + time.sleep(2) # Simulate work + click.echo("Done!") + +if __name__ == '__main__': + cli() +``` + +Running interactively: + +```bash +$ python confirm_example.py +This might take a while or change things. +Do you want to continue? [y/N]: y # User types 'y' +Starting operation... +Done! +``` + +If the user types 'n' (or just presses Enter, since the default is No - indicated by `[y/N]`), the program will stop immediately because of `abort=True`. If you run `python confirm_example.py --yes`, it skips the question entirely. + +## Showing Progress: `click.progressbar()` + +For tasks that take a while, it's good practice to show the user that something is happening. `click.progressbar()` creates a visual progress bar. You typically use it with a Python `with` statement around a loop. + +Let's simulate processing a list of items: + +```python +# progress_example.py +import click +import time + +items_to_process = range(100) # Simulate 100 items + +@click.command() +def cli(): + """Shows a progress bar.""" + # 'items_to_process' is the iterable + # 'label' is the text shown before the bar + with click.progressbar(items_to_process, label="Processing items") as bar: + for item in bar: + # Simulate work for each item + time.sleep(0.05) + # The 'bar' automatically updates with each iteration + + click.echo("Finished processing!") + +if __name__ == '__main__': + cli() +``` + +When you run this, you'll see a progress bar update in your terminal: + +```bash +$ python progress_example.py +Processing items [####################################] 100% 00:00:05 +Finished processing! +# (The bar animates in place while running) +``` + +The progress bar automatically figures out the percentage and estimated time remaining (ETA). It makes long tasks much less mysterious for the user. You can also use it without an iterable by manually calling the `bar.update(increment)` method inside the `with` block. + +## How Term UI Works Under the Hood + +These functions seem simple, but they handle quite a bit behind the scenes: + +1. **Abstraction:** They provide a high-level API for common terminal tasks, hiding the low-level details. +2. **Input Handling:** Functions like `prompt` and `confirm` use Python's built-in `input()` or `getpass.getpass()` (for hidden input). They add loops for retries, default value handling, and type conversion/validation (using [ParamType](04_paramtype.md) concepts internally). +3. **Output Handling (`echo`, `secho`):** + * They check if the output stream (`stdout` or `stderr`) is connected to a terminal (`isatty`). + * If not a terminal, or if color is disabled, `style` codes are automatically removed (`strip_ansi`). + * On Windows, if `colorama` is installed, Click wraps the output streams to translate ANSI color codes into Windows API calls, making colors work automatically. +4. **Progress Bar (`progressbar`):** + * It calculates the percentage complete based on the iterable's length (or the provided `length`). + * It estimates the remaining time (ETA) by timing recent iterations. + * It formats the bar (`#` and `-` characters) and info text. + * Crucially, it uses special terminal control characters (like `\r` - carriage return) to move the cursor back to the beginning of the line before printing the updated bar. This makes the bar *appear* to update in place rather than printing many lines. It also hides/shows the cursor during updates (`\033[?25l`, `\033[?25h`) on non-Windows systems for a smoother look. +5. **Cross-Platform Compatibility:** A major goal is to make these interactions work consistently across different operating systems and terminal types, handling quirks like Windows console limitations (`_winconsole.py`, `_compat.py`). + +Let's visualize what might happen when you call `click.secho("Error!", fg='red', err=True)`: + +```mermaid +sequenceDiagram + participant UserCode as Your Code + participant ClickSecho as click.secho() + participant ClickStyle as click.style() + participant ClickEcho as click.echo() + participant CompatLayer as Click Compatibility Layer + participant Terminal + + UserCode->>ClickSecho: secho("Error!", fg='red', err=True) + ClickSecho->>ClickStyle: style("Error!", fg='red', ...) + ClickStyle-->>ClickSecho: Returns "\033[31mError!\033[0m" (styled text) + ClickSecho->>ClickEcho: echo("\033[31mError!\033[0m", err=True) + ClickEcho->>CompatLayer: Check if output (stderr) is a TTY + CompatLayer-->>ClickEcho: Yes, it's a TTY + ClickEcho->>CompatLayer: Check if color is enabled + CompatLayer-->>ClickEcho: Yes, color is enabled + Note over ClickEcho, Terminal: On Windows, may wrap stream with Colorama here + ClickEcho->>CompatLayer: Write styled text to stderr + CompatLayer->>Terminal: Writes "\033[31mError!\033[0m\n" + Terminal-->>Terminal: Displays "Error!" in red +``` + +The key is that Click adds layers of checks and formatting (`style`, color stripping, platform adaptation) around the basic act of printing (`echo`) or getting input (`prompt`). + +You can find the implementation details in: +* `click/termui.py`: Defines the main functions like `prompt`, `confirm`, `style`, `secho`, `progressbar`, `echo_via_pager`. +* `click/_termui_impl.py`: Contains the implementations for more complex features like `ProgressBar`, `Editor`, `pager`, and `getchar`. +* `click/utils.py`: Contains `echo` and helpers like `open_stream`. +* `click/_compat.py` & `click/_winconsole.py`: Handle differences between Python versions and operating systems, especially for terminal I/O and color support on Windows. + +## Conclusion + +Click's **Term UI** functions are essential for creating command-line applications that are interactive, informative, and pleasant to use. You've learned how to: + +* Print output reliably with `click.echo`. +* Add visual flair with colors and styles using `click.style` and `click.secho`. +* Ask the user for input with `click.prompt`. +* Get yes/no confirmation using `click.confirm`. +* Show progress for long tasks with `click.progressbar`. + +These tools handle many cross-platform complexities, letting you focus on building the core logic of your interactive CLI. + +But what happens when things go wrong? How does Click handle errors, like invalid user input or missing files? That's where Click's exception handling comes in. Let's dive into that next! + +Next up: [Chapter 7: Click Exceptions](07_click_exceptions.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Click/07_click_exceptions.md b/output/Click/07_click_exceptions.md new file mode 100644 index 0000000..41b227a --- /dev/null +++ b/output/Click/07_click_exceptions.md @@ -0,0 +1,251 @@ +# Chapter 7: Click Exceptions - Handling Errors Gracefully + +In the last chapter, [Chapter 6: Term UI (Terminal User Interface)](06_term_ui__terminal_user_interface_.md), we explored how to make our command-line tools interactive and visually appealing using functions like `click.prompt`, `click.confirm`, and `click.secho`. We learned how to communicate effectively *with* the user. + +But what happens when the user doesn't communicate effectively with *us*? What if they type the wrong command, forget a required argument, or enter text when a number was expected? Our programs need a way to handle these errors without just crashing. + +This is where **Click Exceptions** come in. They are Click's way of signaling that something went wrong, usually because of a problem with the user's input or how they tried to run the command. + +## Why Special Exceptions? The Problem with Crashes + +Imagine you have a command that needs a number, like `--count 5`. You used `type=click.INT` like we learned in [Chapter 4: ParamType](04_paramtype.md). What happens if the user types `--count five`? + +If Click didn't handle this specially, the `int("five")` conversion inside Click would fail, raising a standard Python `ValueError`. This might cause your program to stop with a long, confusing Python traceback message that isn't very helpful for the end-user. They might not understand what went wrong or how to fix it. + +Click wants to provide a better experience. When something like this happens, Click catches the internal error and raises one of its own **custom exception types**. These special exceptions tell Click exactly what kind of problem occurred (e.g., bad input, missing argument). + +## Meet the Click Exceptions + +Click has a family of exception classes designed specifically for handling command-line errors. The most important ones inherit from the base class `click.ClickException`. Here are some common ones you'll encounter (or use): + +* `ClickException`: The base for all Click-handled errors. +* `UsageError`: A general error indicating the command was used incorrectly (e.g., wrong number of arguments). It usually prints the command's usage instructions. +* `BadParameter`: Raised when the value provided for an option or argument is invalid (e.g., "five" for an integer type, or a value not in a `click.Choice`). +* `MissingParameter`: Raised when a required option or argument is not provided. +* `NoSuchOption`: Raised when the user tries to use an option that doesn't exist (e.g., `--verrbose` instead of `--verbose`). +* `FileError`: Raised by `click.File` or `click.Path` if a file can't be opened or accessed correctly. +* `Abort`: A special exception you can raise to stop execution immediately (like after a failed `click.confirm`). + +**The Magic:** The really neat part is that Click's main command processing logic is designed to *catch* these specific exceptions. When it catches one, it doesn't just crash. Instead, it: + +1. **Formats a helpful error message:** Often using information from the exception itself (like which parameter was bad). +2. **Prints the message** (usually prefixed with "Error:") to the standard error stream (`stderr`). +3. **Often shows relevant help text** (like the command's usage synopsis). +4. **Exits the application cleanly** with a non-zero exit code (signaling to the system that an error occurred). + +This gives the user clear feedback about what they did wrong and how to potentially fix it, without seeing scary Python tracebacks. + +## Seeing Exceptions in Action (Automatically) + +You've already seen Click exceptions working! Remember our `count_app.py` from [Chapter 4: ParamType](04_paramtype.md)? + +```python +# count_app.py (from Chapter 4) +import click + +@click.command() +@click.option('--count', default=1, type=click.INT, help='Number of times to print.') +@click.argument('message') +def repeat(count, message): + """Prints MESSAGE the specified number of times.""" + for _ in range(count): + click.echo(message) + +if __name__ == '__main__': + repeat() +``` + +If you run this with invalid input for `--count`: + +```bash +$ python count_app.py --count five "Oh no" +Usage: count_app.py [OPTIONS] MESSAGE +Try 'count_app.py --help' for help. + +Error: Invalid value for '--count': 'five' is not a valid integer. +``` + +That clear "Error: Invalid value for '--count': 'five' is not a valid integer." message? That's Click catching a `BadParameter` exception (raised internally by `click.INT.convert`) and showing it nicely! + +What if you forget the required `MESSAGE` argument? + +```bash +$ python count_app.py --count 3 +Usage: count_app.py [OPTIONS] MESSAGE +Try 'count_app.py --help' for help. + +Error: Missing argument 'MESSAGE'. +``` + +Again, a clear error message! This time, Click caught a `MissingParameter` exception. + +## Raising Exceptions Yourself: Custom Validation + +Click raises exceptions automatically for many common errors. But sometimes, you have validation logic that's specific to your application. For example, maybe an `--age` option must be positive. + +The standard way to report these custom validation errors is to **raise a `click.BadParameter` exception** yourself, usually from within a callback function. + +Let's add a callback to our `count_app.py` to ensure `count` is positive. + +```python +# count_app_validate.py +import click + +# 1. Define a validation callback function +def validate_count(ctx, param, value): + """Callback to ensure count is positive.""" + if value <= 0: + # 2. Raise BadParameter if validation fails + raise click.BadParameter("Count must be a positive number.") + # 3. Return the value if it's valid + return value + +@click.command() +# 4. Attach the callback to the --count option +@click.option('--count', default=1, type=click.INT, help='Number of times to print.', + callback=validate_count) # <-- Added callback +@click.argument('message') +def repeat(count, message): + """Prints MESSAGE the specified number of times (must be positive).""" + for _ in range(count): + click.echo(message) + +if __name__ == '__main__': + repeat() +``` + +Let's break down the changes: + +1. `def validate_count(ctx, param, value):`: We defined a function that takes the [Context](05_context.md), the [Parameter](03_parameter__option___argument_.md) object, and the *already type-converted* value. +2. `raise click.BadParameter(...)`: If the `value` (which we know is an `int` thanks to `type=click.INT`) is not positive, we raise `click.BadParameter` with our custom error message. +3. `return value`: If the value is valid, the callback **must** return it. +4. `callback=validate_count`: We told the `--count` option to use our `validate_count` function after type conversion. + +**Run it with invalid input:** + +```bash +$ python count_app_validate.py --count 0 "Zero?" +Usage: count_app_validate.py [OPTIONS] MESSAGE +Try 'count_app_validate.py --help' for help. + +Error: Invalid value for '--count': Count must be a positive number. + +$ python count_app_validate.py --count -5 "Negative?" +Usage: count_app_validate.py [OPTIONS] MESSAGE +Try 'count_app_validate.py --help' for help. + +Error: Invalid value for '--count': Count must be a positive number. +``` + +It works! Our custom validation logic triggered, we raised `click.BadParameter`, and Click caught it, displaying our specific error message cleanly. This is the standard way to integrate your own validation rules into Click's error handling. + +## How Click Handles Exceptions (Under the Hood) + +What exactly happens when a Click exception is raised, either by Click itself or by your code? + +1. **Raise:** An operation fails (like type conversion, parsing finding a missing argument, or your custom callback). A specific `ClickException` subclass (e.g., `BadParameter`, `MissingParameter`) is instantiated and raised. +2. **Catch:** Click's main application runner (usually triggered when you call your top-level `cli()` function) has a `try...except ClickException` block around the command execution logic. +3. **Show:** When a `ClickException` is caught, the runner calls the exception object's `show()` method. +4. **Format & Print:** The `show()` method (defined in `exceptions.py` for each exception type) formats the error message. + * `UsageError` (and its subclasses like `BadParameter`, `MissingParameter`, `NoSuchOption`) typically includes the command's usage string (`ctx.get_usage()`) and a hint to try the `--help` option. + * `BadParameter` adds context like "Invalid value for 'PARAMETER_NAME':". + * `MissingParameter` formats "Missing argument/option 'PARAMETER_NAME'.". + * The formatted message is printed to `stderr` using `click.echo()`, respecting color settings from the context. +5. **Exit:** After showing the message, Click calls `sys.exit()` with the exception's `exit_code` (usually `1` for general errors, `2` for usage errors). This terminates the program and signals the error status to the calling shell or script. + +Hereโ€™s a simplified sequence diagram for the `BadParameter` case when a user provides invalid input that fails type conversion: + +```mermaid +sequenceDiagram + participant User + participant CLI as YourApp.py + participant ClickRuntime + participant ParamType as ParamType (e.g., click.INT) + participant ClickExceptionHandling + + User->>CLI: python YourApp.py --count five + CLI->>ClickRuntime: Starts command execution + ClickRuntime->>ParamType: Calls convert(value='five', ...) for '--count' + ParamType->>ParamType: Tries int('five'), raises ValueError + ParamType->>ClickExceptionHandling: Catches ValueError, calls self.fail(...) + ClickExceptionHandling->>ClickExceptionHandling: Raises BadParameter("...'five' is not...") + ClickExceptionHandling-->>ClickRuntime: BadParameter propagates up + ClickRuntime->>ClickExceptionHandling: Catches BadParameter exception + ClickExceptionHandling->>ClickExceptionHandling: Calls exception.show() + ClickExceptionHandling->>CLI: Prints formatted "Error: Invalid value..." to stderr + ClickExceptionHandling->>CLI: Calls sys.exit(exception.exit_code) + CLI-->>User: Shows error message and exits +``` + +The core exception classes are defined in `click/exceptions.py`. You can see how `ClickException` defines the basic `show` method and `exit_code`, and how subclasses like `UsageError` and `BadParameter` override `format_message` to provide more specific output based on the context (`ctx`) and parameter (`param`) they might hold. + +```python +# Simplified structure from click/exceptions.py + +class ClickException(Exception): + exit_code = 1 + + def __init__(self, message: str) -> None: + # ... (stores message, gets color settings) ... + self.message = message + + def format_message(self) -> str: + return self.message + + def show(self, file=None) -> None: + # ... (gets stderr if file is None) ... + echo(f"Error: {self.format_message()}", file=file, color=self.show_color) + +class UsageError(ClickException): + exit_code = 2 + + def __init__(self, message: str, ctx=None) -> None: + super().__init__(message) + self.ctx = ctx + # ... + + def show(self, file=None) -> None: + # ... (gets stderr, color) ... + hint = "" + if self.ctx is not None and self.ctx.command.get_help_option(self.ctx): + hint = f"Try '{self.ctx.command_path} {self.ctx.help_option_names[0]}' for help.\n" + if self.ctx is not None: + echo(f"{self.ctx.get_usage()}\n{hint}", file=file, color=color) + # Call the base class's logic to print "Error: ..." + echo(f"Error: {self.format_message()}", file=file, color=color) + +class BadParameter(UsageError): + def __init__(self, message: str, ctx=None, param=None, param_hint=None) -> None: + super().__init__(message, ctx) + self.param = param + self.param_hint = param_hint + + def format_message(self) -> str: + # ... (logic to get parameter name/hint) ... + param_hint = self.param.get_error_hint(self.ctx) if self.param else self.param_hint + # ... + return f"Invalid value for {param_hint}: {self.message}" + +# Other exceptions like MissingParameter, NoSuchOption follow similar patterns +``` + +By using this structured exception system, Click ensures that user errors are reported consistently and helpfully across any Click application. + +## Conclusion + +Click Exceptions are the standard mechanism for reporting errors related to command usage and user input within Click applications. + +You've learned: + +* Click uses custom exceptions like `UsageError`, `BadParameter`, and `MissingParameter` to signal specific problems. +* Click catches these exceptions automatically to display user-friendly error messages, usage hints, and exit cleanly. +* You can (and should) raise exceptions like `click.BadParameter` in your own validation callbacks to report custom errors in a standard way. +* This system prevents confusing Python tracebacks and provides helpful feedback to the user. + +Understanding and using Click's exception hierarchy is key to building robust and user-friendly command-line interfaces that handle problems gracefully. + +This concludes our journey through the core concepts of Click! We've covered everything from basic [Commands and Groups](01_command___group.md), [Decorators](02_decorators.md), [Parameters](03_parameter__option___argument_.md), and [Types](04_paramtype.md), to managing runtime state with the [Context](05_context.md), creating interactive [Terminal UIs](06_term_ui__terminal_user_interface_.md), and handling errors with [Click Exceptions](07_click_exceptions.md). Armed with this knowledge, you're well-equipped to start building your own powerful and elegant command-line tools with Click! + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Click/index.md b/output/Click/index.md new file mode 100644 index 0000000..1d5a82f --- /dev/null +++ b/output/Click/index.md @@ -0,0 +1,44 @@ +# Tutorial: Click + +Click is a Python library that makes creating **command-line interfaces (CLIs)** *easy and fun*. +It uses simple Python **decorators** (`@click.command`, `@click.option`, etc.) to turn your functions into CLI commands with options and arguments. +Click handles parsing user input, generating help messages, validating data types, and managing the flow between commands, letting you focus on your application's logic. +It also provides tools for *terminal interactions* like prompting users and showing progress bars. + + +**Source Repository:** [https://github.com/pallets/click/tree/main/src/click](https://github.com/pallets/click/tree/main/src/click) + +```mermaid +flowchart TD + A0["Context"] + A1["Command / Group"] + A2["Parameter (Option / Argument)"] + A3["ParamType"] + A4["Decorators"] + A5["Term UI (Terminal User Interface)"] + A6["Click Exceptions"] + A4 -- "Creates/Configures" --> A1 + A4 -- "Creates/Configures" --> A2 + A0 -- "Manages execution of" --> A1 + A0 -- "Holds parsed values for" --> A2 + A2 -- "Uses for validation/conversion" --> A3 + A3 -- "Raises on conversion error" --> A6 + A1 -- "Uses for user interaction" --> A5 + A0 -- "Handles/Raises" --> A6 + A4 -- "Injects via @pass_context" --> A0 +``` + +## Chapters + +1. [Command / Group](01_command___group.md) +2. [Decorators](02_decorators.md) +3. [Parameter (Option / Argument)](03_parameter__option___argument_.md) +4. [ParamType](04_paramtype.md) +5. [Context](05_context.md) +6. [Term UI (Terminal User Interface)](06_term_ui__terminal_user_interface_.md) +7. [Click Exceptions](07_click_exceptions.md) + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Crawl4AI/01_asynccrawlerstrategy.md b/output/Crawl4AI/01_asynccrawlerstrategy.md new file mode 100644 index 0000000..c927656 --- /dev/null +++ b/output/Crawl4AI/01_asynccrawlerstrategy.md @@ -0,0 +1,242 @@ +# Chapter 1: How We Fetch Webpages - AsyncCrawlerStrategy + +Welcome to the Crawl4AI tutorial series! Our goal is to build intelligent agents that can understand and extract information from the web. The very first step in this process is actually *getting* the content from a webpage. This chapter explains how Crawl4AI handles that fundamental task. + +Imagine you need to pick up a package from a specific address. How do you get there and retrieve it? +* You could send a **simple, fast drone** that just grabs the package off the porch (if it's easily accessible). This is quick but might fail if the package is inside or requires a signature. +* Or, you could send a **full delivery truck with a driver**. The driver can ring the bell, wait, sign for the package, and even handle complex instructions. This is more versatile but takes more time and resources. + +In Crawl4AI, the `AsyncCrawlerStrategy` is like choosing your delivery vehicle. It defines *how* the crawler fetches the raw content (like the HTML, CSS, and maybe JavaScript results) of a webpage. + +## What Exactly is AsyncCrawlerStrategy? + +`AsyncCrawlerStrategy` is a core concept in Crawl4AI that represents the **method** or **technique** used to download the content of a given URL. Think of it as a blueprint: it specifies *that* we need a way to fetch content, but the specific *details* of how it's done can vary. + +This "blueprint" approach is powerful because it allows us to swap out the fetching mechanism depending on our needs, without changing the rest of our crawling logic. + +## The Default: AsyncPlaywrightCrawlerStrategy (The Delivery Truck) + +By default, Crawl4AI uses `AsyncPlaywrightCrawlerStrategy`. This strategy uses a real, automated web browser engine (like Chrome, Firefox, or WebKit) behind the scenes. + +**Why use a full browser?** + +* **Handles JavaScript:** Modern websites rely heavily on JavaScript to load content, change the layout, or fetch data after the initial page load. `AsyncPlaywrightCrawlerStrategy` runs this JavaScript, just like your normal browser does. +* **Simulates User Interaction:** It can wait for elements to appear, handle dynamic content, and see the page *after* scripts have run. +* **Gets the "Final" View:** It fetches the content as a user would see it in their browser. + +This is our "delivery truck" โ€“ powerful and capable of handling complex websites. However, like a real truck, it's slower and uses more memory and CPU compared to simpler methods. + +You generally don't need to *do* anything to use it, as it's the default! When you start Crawl4AI, it picks this strategy automatically. + +## Another Option: AsyncHTTPCrawlerStrategy (The Delivery Drone) + +Crawl4AI also offers `AsyncHTTPCrawlerStrategy`. This strategy is much simpler. It directly requests the URL and downloads the *initial* HTML source code that the web server sends back. + +**Why use this simpler strategy?** + +* **Speed:** It's significantly faster because it doesn't need to start a browser, render the page, or execute JavaScript. +* **Efficiency:** It uses much less memory and CPU. + +This is our "delivery drone" โ€“ super fast and efficient for simple tasks. + +**What's the catch?** + +* **No JavaScript:** It won't run any JavaScript on the page. If content is loaded dynamically by scripts, this strategy will likely miss it. +* **Basic HTML Only:** You get the raw HTML source, not necessarily what a user *sees* after the browser processes everything. + +This strategy is great for websites with simple, static HTML content or when you only need the basic structure and metadata very quickly. + +## Why Have Different Strategies? (The Power of Abstraction) + +Having `AsyncCrawlerStrategy` as a distinct concept offers several advantages: + +1. **Flexibility:** You can choose the best tool for the job. Need to crawl complex, dynamic sites? Use the default `AsyncPlaywrightCrawlerStrategy`. Need to quickly fetch basic HTML from thousands of simple pages? Switch to `AsyncHTTPCrawlerStrategy`. +2. **Maintainability:** The logic for *fetching* content is kept separate from the logic for *processing* it. +3. **Extensibility:** Advanced users could even create their *own* custom strategies for specialized fetching needs (though that's beyond this beginner tutorial). + +## How It Works Conceptually + +When you ask Crawl4AI to crawl a URL, the main `AsyncWebCrawler` doesn't fetch the content itself. Instead, it delegates the task to the currently selected `AsyncCrawlerStrategy`. + +Here's a simplified flow: + +```mermaid +sequenceDiagram + participant C as AsyncWebCrawler + participant S as AsyncCrawlerStrategy + participant W as Website + + C->>S: Please crawl("https://example.com") + Note over S: I'm using my method (e.g., Browser or HTTP) + S->>W: Request Page Content + W-->>S: Return Raw Content (HTML, etc.) + S-->>C: Here's the result (AsyncCrawlResponse) +``` + +The `AsyncWebCrawler` only needs to know how to talk to *any* strategy through a common interface (the `crawl` method). The strategy handles the specific details of the fetching process. + +## Using the Default Strategy (You're Already Doing It!) + +Let's see how you use the default `AsyncPlaywrightCrawlerStrategy` without even needing to specify it. + +```python +# main_example.py +import asyncio +from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode + +async def main(): + # When you create AsyncWebCrawler without specifying a strategy, + # it automatically uses AsyncPlaywrightCrawlerStrategy! + async with AsyncWebCrawler() as crawler: + print("Crawler is ready using the default strategy (Playwright).") + + # Let's crawl a simple page that just returns HTML + # We use CacheMode.BYPASS to ensure we fetch it fresh each time for this demo. + config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS) + result = await crawler.arun( + url="https://httpbin.org/html", + config=config + ) + + if result.success: + print("\nSuccessfully fetched content!") + # The strategy fetched the raw HTML. + # AsyncWebCrawler then processes it (more on that later). + print(f"First 100 chars of fetched HTML: {result.html[:100]}...") + else: + print(f"\nFailed to fetch content: {result.error_message}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**Explanation:** + +1. We import `AsyncWebCrawler` and supporting classes. +2. We create an instance of `AsyncWebCrawler()` inside an `async with` block (this handles setup and cleanup). Since we didn't tell it *which* strategy to use, it defaults to `AsyncPlaywrightCrawlerStrategy`. +3. We call `crawler.arun()` to crawl the URL. Under the hood, the `AsyncPlaywrightCrawlerStrategy` starts a browser, navigates to the page, gets the content, and returns it. +4. We print the first part of the fetched HTML from the `result`. + +## Explicitly Choosing the HTTP Strategy + +What if you know the page is simple and want the speed of the "delivery drone"? You can explicitly tell `AsyncWebCrawler` to use `AsyncHTTPCrawlerStrategy`. + +```python +# http_strategy_example.py +import asyncio +from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode +# Import the specific strategies we want to use +from crawl4ai.async_crawler_strategy import AsyncHTTPCrawlerStrategy + +async def main(): + # 1. Create an instance of the strategy you want + http_strategy = AsyncHTTPCrawlerStrategy() + + # 2. Pass the strategy instance when creating the AsyncWebCrawler + async with AsyncWebCrawler(crawler_strategy=http_strategy) as crawler: + print("Crawler is ready using the explicit HTTP strategy.") + + # Crawl the same simple page + config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS) + result = await crawler.arun( + url="https://httpbin.org/html", + config=config + ) + + if result.success: + print("\nSuccessfully fetched content using HTTP strategy!") + print(f"First 100 chars of fetched HTML: {result.html[:100]}...") + else: + print(f"\nFailed to fetch content: {result.error_message}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**Explanation:** + +1. We now also import `AsyncHTTPCrawlerStrategy`. +2. We create an instance: `http_strategy = AsyncHTTPCrawlerStrategy()`. +3. We pass this instance to the `AsyncWebCrawler` constructor: `AsyncWebCrawler(crawler_strategy=http_strategy)`. +4. The rest of the code is the same, but now `crawler.arun()` will use the faster, simpler HTTP GET request method defined by `AsyncHTTPCrawlerStrategy`. + +For a simple page like `httpbin.org/html`, both strategies will likely return the same HTML content, but the HTTP strategy would generally be faster and use fewer resources. On a complex JavaScript-heavy site, the HTTP strategy might fail to get the full content, while the Playwright strategy would handle it correctly. + +## A Glimpse Under the Hood + +You don't *need* to know the deep internals to use the strategies, but it helps to understand the structure. Inside the `crawl4ai` library, you'd find a file like `async_crawler_strategy.py`. + +It defines the "blueprint" (an Abstract Base Class): + +```python +# Simplified from async_crawler_strategy.py +from abc import ABC, abstractmethod +from .models import AsyncCrawlResponse # Defines the structure of the result + +class AsyncCrawlerStrategy(ABC): + """ + Abstract base class for crawler strategies. + """ + @abstractmethod + async def crawl(self, url: str, **kwargs) -> AsyncCrawlResponse: + """Fetch content from the URL.""" + pass # Each specific strategy must implement this +``` + +And then the specific implementations: + +```python +# Simplified from async_crawler_strategy.py +from playwright.async_api import Page # Playwright library for browser automation +# ... other imports + +class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy): + # ... (Initialization code to manage browsers) + + async def crawl(self, url: str, config: CrawlerRunConfig, **kwargs) -> AsyncCrawlResponse: + # Uses Playwright to: + # 1. Get a browser page + # 2. Navigate to the url (page.goto(url)) + # 3. Wait for content, run JS, etc. + # 4. Get the final HTML (page.content()) + # 5. Optionally take screenshots, etc. + # 6. Return an AsyncCrawlResponse + # ... implementation details ... + pass +``` + +```python +# Simplified from async_crawler_strategy.py +import aiohttp # Library for making HTTP requests asynchronously +# ... other imports + +class AsyncHTTPCrawlerStrategy(AsyncCrawlerStrategy): + # ... (Initialization code to manage HTTP sessions) + + async def crawl(self, url: str, config: CrawlerRunConfig, **kwargs) -> AsyncCrawlResponse: + # Uses aiohttp to: + # 1. Make an HTTP GET (or other method) request to the url + # 2. Read the response body (HTML) + # 3. Get response headers and status code + # 4. Return an AsyncCrawlResponse + # ... implementation details ... + pass +``` + +The key takeaway is that both strategies implement the same `crawl` method, allowing `AsyncWebCrawler` to use them interchangeably. + +## Conclusion + +You've learned about `AsyncCrawlerStrategy`, the core concept defining *how* Crawl4AI fetches webpage content. + +* It's like choosing a vehicle: a powerful browser (`AsyncPlaywrightCrawlerStrategy`, the default) or a fast, simple HTTP request (`AsyncHTTPCrawlerStrategy`). +* This abstraction gives you flexibility to choose the right fetching method for your task. +* You usually don't need to worry about it, as the default handles most modern websites well. + +Now that we understand how the raw content is fetched, the next step is to look at the main class that orchestrates the entire crawling process. + +**Next:** Let's dive into the [AsyncWebCrawler](02_asyncwebcrawler.md) itself! + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Crawl4AI/02_asyncwebcrawler.md b/output/Crawl4AI/02_asyncwebcrawler.md new file mode 100644 index 0000000..404e481 --- /dev/null +++ b/output/Crawl4AI/02_asyncwebcrawler.md @@ -0,0 +1,339 @@ +# Chapter 2: Meet the General Manager - AsyncWebCrawler + +In [Chapter 1: How We Fetch Webpages - AsyncCrawlerStrategy](01_asynccrawlerstrategy.md), we learned about the different ways Crawl4AI can fetch the raw content of a webpage, like choosing between a fast drone (`AsyncHTTPCrawlerStrategy`) or a versatile delivery truck (`AsyncPlaywrightCrawlerStrategy`). + +But who decides *which* delivery vehicle to use? Who tells it *which* address (URL) to go to? And who takes the delivered package (the raw HTML) and turns it into something useful? + +That's where the `AsyncWebCrawler` comes in. Think of it as the **General Manager** of the entire crawling operation. + +## What Problem Does `AsyncWebCrawler` Solve? + +Imagine you want to get information from a website. You need to: + +1. Decide *how* to fetch the page (like choosing the drone or truck from Chapter 1). +2. Actually *fetch* the page content. +3. Maybe *clean up* the messy HTML. +4. Perhaps *extract* specific pieces of information (like product prices or article titles). +5. Maybe *save* the results so you don't have to fetch them again immediately (caching). +6. Finally, give you the *final, processed result*. + +Doing all these steps manually for every URL would be tedious and complex. `AsyncWebCrawler` acts as the central coordinator, managing all these steps for you. You just tell it what URL to crawl and maybe some preferences, and it handles the rest. + +## What is `AsyncWebCrawler`? + +`AsyncWebCrawler` is the main class you'll interact with when using Crawl4AI. It's the primary entry point for starting any crawling task. + +**Key Responsibilities:** + +* **Initialization:** Sets up the necessary components, like the browser (if needed). +* **Coordination:** Takes your request (a URL and configuration) and orchestrates the different parts: + * Delegates fetching to an [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md). + * Manages caching using [CacheContext / CacheMode](09_cachecontext___cachemode.md). + * Uses a [ContentScrapingStrategy](04_contentscrapingstrategy.md) to clean and parse HTML. + * Applies a [RelevantContentFilter](05_relevantcontentfilter.md) if configured. + * Uses an [ExtractionStrategy](06_extractionstrategy.md) to pull out specific data if needed. +* **Result Packaging:** Bundles everything up into a neat [CrawlResult](07_crawlresult.md) object. +* **Resource Management:** Handles starting and stopping resources (like browsers) cleanly. + +It's the "conductor" making sure all the different instruments play together harmoniously. + +## Your First Crawl: Using `arun` + +Let's see the `AsyncWebCrawler` in action. The most common way to use it is with an `async with` block, which automatically handles setup and cleanup. The main method to crawl a single URL is `arun`. + +```python +# chapter2_example_1.py +import asyncio +from crawl4ai import AsyncWebCrawler # Import the General Manager + +async def main(): + # Create the General Manager instance using 'async with' + # This handles setup (like starting a browser if needed) + # and cleanup (closing the browser). + async with AsyncWebCrawler() as crawler: + print("Crawler is ready!") + + # Tell the manager to crawl a specific URL + url_to_crawl = "https://httpbin.org/html" # A simple example page + print(f"Asking the crawler to fetch: {url_to_crawl}") + + result = await crawler.arun(url=url_to_crawl) + + # Check if the crawl was successful + if result.success: + print("\nSuccess! Crawler got the content.") + # The result object contains the processed data + # We'll learn more about CrawlResult in Chapter 7 + print(f"Page Title: {result.metadata.get('title', 'N/A')}") + print(f"First 100 chars of Markdown: {result.markdown.raw_markdown[:100]}...") + else: + print(f"\nFailed to crawl: {result.error_message}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**Explanation:** + +1. **`import AsyncWebCrawler`**: We import the main class. +2. **`async def main():`**: Crawl4AI uses Python's `asyncio` for efficiency, so our code needs to be in an `async` function. +3. **`async with AsyncWebCrawler() as crawler:`**: This is the standard way to create and manage the crawler. The `async with` statement ensures that resources (like the underlying browser used by the default `AsyncPlaywrightCrawlerStrategy`) are properly started and stopped, even if errors occur. +4. **`crawler.arun(url=url_to_crawl)`**: This is the core command. We tell our `crawler` instance (the General Manager) to run (`arun`) the crawling process for the specified `url`. `await` is used because fetching webpages takes time, and `asyncio` allows other tasks to run while waiting. +5. **`result`**: The `arun` method returns a `CrawlResult` object. This object contains all the information gathered during the crawl (HTML, cleaned text, metadata, etc.). We'll explore this object in detail in [Chapter 7: Understanding the Results - CrawlResult](07_crawlresult.md). +6. **`result.success`**: We check this boolean flag to see if the crawl completed without critical errors. +7. **Accessing Data:** If successful, we can access processed information like the page title (`result.metadata['title']`) or the content formatted as Markdown (`result.markdown.raw_markdown`). + +## Configuring the Crawl + +Sometimes, the default behavior isn't quite what you need. Maybe you want to use the faster "drone" strategy from Chapter 1, or perhaps you want to ensure you *always* fetch a fresh copy of the page, ignoring any saved cache. + +You can customize the behavior of a specific `arun` call by passing a `CrawlerRunConfig` object. Think of this as giving specific instructions to the General Manager for *this particular job*. + +```python +# chapter2_example_2.py +import asyncio +from crawl4ai import AsyncWebCrawler +from crawl4ai import CrawlerRunConfig # Import configuration class +from crawl4ai import CacheMode # Import cache options + +async def main(): + async with AsyncWebCrawler() as crawler: + print("Crawler is ready!") + url_to_crawl = "https://httpbin.org/html" + + # Create a specific configuration for this run + # Tell the crawler to BYPASS the cache (fetch fresh) + run_config = CrawlerRunConfig( + cache_mode=CacheMode.BYPASS + ) + print("Configuration: Bypass cache for this run.") + + # Pass the config object to the arun method + result = await crawler.arun( + url=url_to_crawl, + config=run_config # Pass the specific instructions + ) + + if result.success: + print("\nSuccess! Crawler got fresh content (cache bypassed).") + print(f"Page Title: {result.metadata.get('title', 'N/A')}") + else: + print(f"\nFailed to crawl: {result.error_message}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**Explanation:** + +1. **`from crawl4ai import CrawlerRunConfig, CacheMode`**: We import the necessary classes for configuration. +2. **`run_config = CrawlerRunConfig(...)`**: We create an instance of `CrawlerRunConfig`. This object holds various settings for a specific crawl job. +3. **`cache_mode=CacheMode.BYPASS`**: We set the `cache_mode`. `CacheMode.BYPASS` tells the crawler to ignore any previously saved results for this URL and fetch it directly from the web server. We'll learn all about caching options in [Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode](09_cachecontext___cachemode.md). +4. **`crawler.arun(..., config=run_config)`**: We pass our custom `run_config` object to the `arun` method using the `config` parameter. + +The `CrawlerRunConfig` is very powerful and lets you control many aspects of the crawl, including which scraping or extraction methods to use. We'll dive deep into it in the next chapter: [Chapter 3: Giving Instructions - CrawlerRunConfig](03_crawlerrunconfig.md). + +## What Happens When You Call `arun`? (The Flow) + +When you call `crawler.arun(url="...")`, the `AsyncWebCrawler` (our General Manager) springs into action and coordinates several steps behind the scenes: + +```mermaid +sequenceDiagram + participant U as User + participant AWC as AsyncWebCrawler (Manager) + participant CC as Cache Check + participant CS as AsyncCrawlerStrategy (Fetcher) + participant SP as Scraping/Processing + participant CR as CrawlResult (Final Report) + + U->>AWC: arun("https://example.com", config) + AWC->>CC: Need content for "https://example.com"? (Respect CacheMode in config) + alt Cache Hit & Cache Mode allows reading + CC-->>AWC: Yes, here's the cached result. + AWC-->>CR: Package cached result. + AWC-->>U: Here is the CrawlResult + else Cache Miss or Cache Mode prevents reading + CC-->>AWC: No cached result / Cannot read cache. + AWC->>CS: Please fetch "https://example.com" (using configured strategy) + CS-->>AWC: Here's the raw response (HTML, etc.) + AWC->>SP: Process this raw content (Scrape, Filter, Extract based on config) + SP-->>AWC: Here's the processed data (Markdown, Metadata, etc.) + AWC->>CC: Cache this result? (Respect CacheMode in config) + CC-->>AWC: OK, cached. + AWC-->>CR: Package new result. + AWC-->>U: Here is the CrawlResult + end + +``` + +**Simplified Steps:** + +1. **Receive Request:** The `AsyncWebCrawler` gets the URL and configuration from your `arun` call. +2. **Check Cache:** It checks if a valid result for this URL is already saved (cached) and if the `CacheMode` allows using it. (See [Chapter 9](09_cachecontext___cachemode.md)). +3. **Fetch (if needed):** If no valid cached result exists or caching is bypassed, it asks the configured [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md) (e.g., Playwright or HTTP) to fetch the raw page content. +4. **Process Content:** It takes the raw HTML and passes it through various processing steps based on the configuration: + * **Scraping:** Cleaning up HTML, extracting basic structure using a [ContentScrapingStrategy](04_contentscrapingstrategy.md). + * **Filtering:** Optionally filtering content for relevance using a [RelevantContentFilter](05_relevantcontentfilter.md). + * **Extraction:** Optionally extracting specific structured data using an [ExtractionStrategy](06_extractionstrategy.md). +5. **Cache Result (if needed):** If caching is enabled for writing, it saves the final processed result. +6. **Return Result:** It bundles everything into a [CrawlResult](07_crawlresult.md) object and returns it to you. + +## Crawling Many Pages: `arun_many` + +What if you have a whole list of URLs to crawl? Calling `arun` in a loop works, but it might not be the most efficient way. `AsyncWebCrawler` provides the `arun_many` method designed for this. + +```python +# chapter2_example_3.py +import asyncio +from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode + +async def main(): + async with AsyncWebCrawler() as crawler: + urls_to_crawl = [ + "https://httpbin.org/html", + "https://httpbin.org/links/10/0", + "https://httpbin.org/robots.txt" + ] + print(f"Asking crawler to fetch {len(urls_to_crawl)} URLs.") + + # Use arun_many for multiple URLs + # We can still pass a config that applies to all URLs in the batch + config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS) + results = await crawler.arun_many(urls=urls_to_crawl, config=config) + + print(f"\nFinished crawling! Got {len(results)} results.") + for result in results: + status = "Success" if result.success else "Failed" + url_short = result.url.split('/')[-1] # Get last part of URL + print(f"- URL: {url_short:<10} | Status: {status:<7} | Title: {result.metadata.get('title', 'N/A')}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**Explanation:** + +1. **`urls_to_crawl = [...]`**: We define a list of URLs. +2. **`await crawler.arun_many(urls=urls_to_crawl, config=config)`**: We call `arun_many`, passing the list of URLs. It handles crawling them concurrently (like dispatching multiple delivery trucks or drones efficiently). +3. **`results`**: `arun_many` returns a list where each item is a `CrawlResult` object corresponding to one of the input URLs. + +`arun_many` is much more efficient for batch processing as it leverages `asyncio` to handle multiple fetches and processing tasks concurrently. It uses a [BaseDispatcher](10_basedispatcher.md) internally to manage this concurrency. + +## Under the Hood (A Peek at the Code) + +You don't need to know the internal details to use `AsyncWebCrawler`, but seeing the structure can help. Inside the `crawl4ai` library, the file `async_webcrawler.py` defines this class. + +```python +# Simplified from async_webcrawler.py + +# ... imports ... +from .async_crawler_strategy import AsyncCrawlerStrategy, AsyncPlaywrightCrawlerStrategy +from .async_configs import BrowserConfig, CrawlerRunConfig +from .models import CrawlResult +from .cache_context import CacheContext, CacheMode +# ... other strategy imports ... + +class AsyncWebCrawler: + def __init__( + self, + crawler_strategy: AsyncCrawlerStrategy = None, # You can provide a strategy... + config: BrowserConfig = None, # Configuration for the browser + # ... other parameters like logger, base_directory ... + ): + # If no strategy is given, it defaults to Playwright (the 'truck') + self.crawler_strategy = crawler_strategy or AsyncPlaywrightCrawlerStrategy(...) + self.browser_config = config or BrowserConfig() + # ... setup logger, directories, etc. ... + self.ready = False # Flag to track if setup is complete + + async def __aenter__(self): + # This is called when you use 'async with'. It starts the strategy. + await self.crawler_strategy.__aenter__() + await self.awarmup() # Perform internal setup + self.ready = True + return self + + async def __aexit__(self, exc_type, exc_val, exc_tb): + # This is called when exiting 'async with'. It cleans up. + await self.crawler_strategy.__aexit__(exc_type, exc_val, exc_tb) + self.ready = False + + async def arun(self, url: str, config: CrawlerRunConfig = None) -> CrawlResult: + # 1. Ensure config exists, set defaults (like CacheMode.ENABLED) + crawler_config = config or CrawlerRunConfig() + if crawler_config.cache_mode is None: + crawler_config.cache_mode = CacheMode.ENABLED + + # 2. Create CacheContext to manage caching logic + cache_context = CacheContext(url, crawler_config.cache_mode) + + # 3. Try reading from cache if allowed + cached_result = None + if cache_context.should_read(): + cached_result = await async_db_manager.aget_cached_url(url) + + # 4. If cache hit and valid, return cached result + if cached_result and self._is_cache_valid(cached_result, crawler_config): + # ... log cache hit ... + return cached_result + + # 5. If no cache hit or cache invalid/bypassed: Fetch fresh content + # Delegate to the configured AsyncCrawlerStrategy + async_response = await self.crawler_strategy.crawl(url, config=crawler_config) + + # 6. Process the HTML (scrape, filter, extract) + # This involves calling other strategies based on config + crawl_result = await self.aprocess_html( + url=url, + html=async_response.html, + config=crawler_config, + # ... other details from async_response ... + ) + + # 7. Write to cache if allowed + if cache_context.should_write(): + await async_db_manager.acache_url(crawl_result) + + # 8. Return the final CrawlResult + return crawl_result + + async def aprocess_html(self, url: str, html: str, config: CrawlerRunConfig, ...) -> CrawlResult: + # This internal method handles: + # - Getting the configured ContentScrapingStrategy + # - Calling its 'scrap' method + # - Getting the configured MarkdownGenerationStrategy + # - Calling its 'generate_markdown' method + # - Getting the configured ExtractionStrategy (if any) + # - Calling its 'run' method + # - Packaging everything into a CrawlResult + # ... implementation details ... + pass # Simplified + + async def arun_many(self, urls: List[str], config: Optional[CrawlerRunConfig] = None, ...) -> List[CrawlResult]: + # Uses a Dispatcher (like MemoryAdaptiveDispatcher) + # to run self.arun for each URL concurrently. + # ... implementation details using a dispatcher ... + pass # Simplified + + # ... other methods like awarmup, close, caching helpers ... +``` + +The key takeaway is that `AsyncWebCrawler` doesn't do the fetching or detailed processing *itself*. It acts as the central hub, coordinating calls to the various specialized `Strategy` classes based on the provided configuration. + +## Conclusion + +You've met the General Manager: `AsyncWebCrawler`! + +* It's the **main entry point** for using Crawl4AI. +* It **coordinates** all the steps: fetching, caching, scraping, extracting. +* You primarily interact with it using `async with` and the `arun()` (single URL) or `arun_many()` (multiple URLs) methods. +* It takes a URL and an optional `CrawlerRunConfig` object to customize the crawl. +* It returns a comprehensive `CrawlResult` object. + +Now that you understand the central role of `AsyncWebCrawler`, let's explore how to give it detailed instructions for each crawling job. + +**Next:** Let's dive into the specifics of configuration with [Chapter 3: Giving Instructions - CrawlerRunConfig](03_crawlerrunconfig.md). + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Crawl4AI/03_crawlerrunconfig.md b/output/Crawl4AI/03_crawlerrunconfig.md new file mode 100644 index 0000000..73bcd0c --- /dev/null +++ b/output/Crawl4AI/03_crawlerrunconfig.md @@ -0,0 +1,277 @@ +# Chapter 3: Giving Instructions - CrawlerRunConfig + +In [Chapter 2: Meet the General Manager - AsyncWebCrawler](02_asyncwebcrawler.md), we met the `AsyncWebCrawler`, the central coordinator for our web crawling tasks. We saw how to tell it *what* URL to crawl using the `arun` method. + +But what if we want to tell the crawler *how* to crawl that URL? Maybe we want it to take a picture (screenshot) of the page? Or perhaps we only care about a specific section of the page? Or maybe we want to ignore the cache and get the very latest version? + +Passing all these different instructions individually every time we call `arun` could get complicated and messy. + +```python +# Imagine doing this every time - it gets long! +# result = await crawler.arun( +# url="https://example.com", +# take_screenshot=True, +# ignore_cache=True, +# only_look_at_this_part="#main-content", +# wait_for_this_element="#data-table", +# # ... maybe many more settings ... +# ) +``` + +That's where `CrawlerRunConfig` comes in! + +## What Problem Does `CrawlerRunConfig` Solve? + +Think of `CrawlerRunConfig` as the **Instruction Manual** for a *specific* crawl job. Instead of giving the `AsyncWebCrawler` manager lots of separate instructions each time, you bundle them all neatly into a single `CrawlerRunConfig` object. + +This object tells the `AsyncWebCrawler` exactly *how* to handle a particular URL or set of URLs for that specific run. It makes your code cleaner and easier to manage. + +## What is `CrawlerRunConfig`? + +`CrawlerRunConfig` is a configuration class that holds all the settings for a single crawl operation initiated by `AsyncWebCrawler.arun()` or `arun_many()`. + +It allows you to customize various aspects of the crawl, such as: + +* **Taking Screenshots:** Should the crawler capture an image of the page? (`screenshot`) +* **Waiting:** How long should the crawler wait for the page or specific elements to load? (`page_timeout`, `wait_for`) +* **Focusing Content:** Should the crawler only process a specific part of the page? (`css_selector`) +* **Extracting Data:** Should the crawler use a specific method to pull out structured data? ([ExtractionStrategy](06_extractionstrategy.md)) +* **Caching:** How should the crawler interact with previously saved results? ([CacheMode](09_cachecontext___cachemode.md)) +* **And much more!** (like handling JavaScript, filtering links, etc.) + +## Using `CrawlerRunConfig` + +Let's see how to use it. Remember our basic crawl from Chapter 2? + +```python +# chapter3_example_1.py +import asyncio +from crawl4ai import AsyncWebCrawler + +async def main(): + async with AsyncWebCrawler() as crawler: + url_to_crawl = "https://httpbin.org/html" + print(f"Crawling {url_to_crawl} with default settings...") + + # This uses the default behavior (no specific config) + result = await crawler.arun(url=url_to_crawl) + + if result.success: + print("Success! Got the content.") + print(f"Screenshot taken? {'Yes' if result.screenshot else 'No'}") # Likely No + # We'll learn about CacheMode later, but it defaults to using the cache + else: + print(f"Failed: {result.error_message}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +Now, let's say for this *specific* crawl, we want to bypass the cache (fetch fresh) and also take a screenshot. + +We create a `CrawlerRunConfig` instance and pass it to `arun`: + +```python +# chapter3_example_2.py +import asyncio +from crawl4ai import AsyncWebCrawler +from crawl4ai import CrawlerRunConfig # 1. Import the config class +from crawl4ai import CacheMode # Import cache options + +async def main(): + async with AsyncWebCrawler() as crawler: + url_to_crawl = "https://httpbin.org/html" + print(f"Crawling {url_to_crawl} with custom settings...") + + # 2. Create an instance of CrawlerRunConfig with our desired settings + my_instructions = CrawlerRunConfig( + cache_mode=CacheMode.BYPASS, # Don't use the cache, fetch fresh + screenshot=True # Take a screenshot + ) + print("Instructions: Bypass cache, take screenshot.") + + # 3. Pass the config object to arun() + result = await crawler.arun( + url=url_to_crawl, + config=my_instructions # Pass our instruction manual + ) + + if result.success: + print("\nSuccess! Got the content with custom config.") + print(f"Screenshot taken? {'Yes' if result.screenshot else 'No'}") # Should be Yes + # Check if the screenshot file path exists in result.screenshot + if result.screenshot: + print(f"Screenshot saved to: {result.screenshot}") + else: + print(f"\nFailed: {result.error_message}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**Explanation:** + +1. **Import:** We import `CrawlerRunConfig` and `CacheMode`. +2. **Create Config:** We create an instance: `my_instructions = CrawlerRunConfig(...)`. We set `cache_mode` to `CacheMode.BYPASS` and `screenshot` to `True`. All other settings remain at their defaults. +3. **Pass Config:** We pass this `my_instructions` object to `crawler.arun` using the `config=` parameter. + +Now, when `AsyncWebCrawler` runs this job, it will look inside `my_instructions` and follow those specific settings for *this run only*. + +## Some Common `CrawlerRunConfig` Parameters + +`CrawlerRunConfig` has many options, but here are a few common ones you might use: + +* **`cache_mode`**: Controls caching behavior. + * `CacheMode.ENABLED` (Default): Use the cache if available, otherwise fetch and save. + * `CacheMode.BYPASS`: Always fetch fresh, ignoring any cached version (but still save the new result). + * `CacheMode.DISABLED`: Never read from or write to the cache. + * *(More details in [Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode](09_cachecontext___cachemode.md))* +* **`screenshot` (bool)**: If `True`, takes a screenshot of the fully rendered page. The path to the screenshot file will be in `CrawlResult.screenshot`. Default: `False`. +* **`pdf` (bool)**: If `True`, generates a PDF of the page. The path to the PDF file will be in `CrawlResult.pdf`. Default: `False`. +* **`css_selector` (str)**: If provided (e.g., `"#main-content"` or `.article-body`), the crawler will try to extract *only* the HTML content within the element(s) matching this CSS selector. This is great for focusing on the important part of a page. Default: `None` (process the whole page). +* **`wait_for` (str)**: A CSS selector (e.g., `"#data-loaded-indicator"`). The crawler will wait until an element matching this selector appears on the page before proceeding. Useful for pages that load content dynamically with JavaScript. Default: `None`. +* **`page_timeout` (int)**: Maximum time in milliseconds to wait for page navigation or certain operations. Default: `60000` (60 seconds). +* **`extraction_strategy`**: An object that defines how to extract specific, structured data (like product names and prices) from the page. Default: `None`. *(See [Chapter 6: Getting Specific Data - ExtractionStrategy](06_extractionstrategy.md))* +* **`scraping_strategy`**: An object defining how the raw HTML is cleaned and basic content (like text and links) is extracted. Default: `WebScrapingStrategy()`. *(See [Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy](04_contentscrapingstrategy.md))* + +Let's try combining a few: focus on a specific part of the page and wait for something to appear. + +```python +# chapter3_example_3.py +import asyncio +from crawl4ai import AsyncWebCrawler, CrawlerRunConfig + +async def main(): + # This example site has a heading 'H1' inside a 'body' tag. + url_to_crawl = "https://httpbin.org/html" + async with AsyncWebCrawler() as crawler: + print(f"Crawling {url_to_crawl}, focusing on the H1 tag...") + + # Instructions: Only get the H1 tag, wait max 10s for it + specific_config = CrawlerRunConfig( + css_selector="h1", # Only grab content inside

tags + page_timeout=10000 # Set page timeout to 10 seconds + # We could also add wait_for="h1" if needed for dynamic loading + ) + + result = await crawler.arun(url=url_to_crawl, config=specific_config) + + if result.success: + print("\nSuccess! Focused crawl completed.") + # The markdown should now ONLY contain the H1 content + print(f"Markdown content:\n---\n{result.markdown.raw_markdown.strip()}\n---") + else: + print(f"\nFailed: {result.error_message}") + +if __name__ == "__main__": + asyncio.run(main()) +``` + +This time, the `result.markdown` should only contain the text from the `

` tag on that page, because we used `css_selector="h1"` in our `CrawlerRunConfig`. + +## How `AsyncWebCrawler` Uses the Config (Under the Hood) + +You don't need to know the exact internal code, but it helps to understand the flow. When you call `crawler.arun(url, config=my_config)`, the `AsyncWebCrawler` essentially does this: + +1. Receives the `url` and the `my_config` object. +2. Before fetching, it checks `my_config.cache_mode` to see if it should look in the cache first. +3. If fetching is needed, it passes `my_config` to the underlying [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md). +4. The strategy uses settings from `my_config` like `page_timeout`, `wait_for`, and whether to take a `screenshot`. +5. After getting the raw HTML, `AsyncWebCrawler` uses the `my_config.scraping_strategy` and `my_config.css_selector` to process the content. +6. If `my_config.extraction_strategy` is set, it uses that to extract structured data. +7. Finally, it bundles everything into a `CrawlResult` and returns it. + +Here's a simplified view: + +```mermaid +sequenceDiagram + participant User + participant AWC as AsyncWebCrawler + participant Config as CrawlerRunConfig + participant Fetcher as AsyncCrawlerStrategy + participant Processor as Scraping/Extraction + + User->>AWC: arun(url, config=my_config) + AWC->>Config: Check my_config.cache_mode + alt Need to Fetch + AWC->>Fetcher: crawl(url, config=my_config) + Note over Fetcher: Uses my_config settings (timeout, wait_for, screenshot...) + Fetcher-->>AWC: Raw Response (HTML, screenshot?) + AWC->>Processor: Process HTML (using my_config.css_selector, my_config.extraction_strategy...) + Processor-->>AWC: Processed Data + else Use Cache + AWC->>AWC: Retrieve from Cache + end + AWC-->>User: Return CrawlResult +``` + +The `CrawlerRunConfig` acts as a messenger carrying your specific instructions throughout the crawling process. + +Inside the `crawl4ai` library, in the file `async_configs.py`, you'll find the definition of the `CrawlerRunConfig` class. It looks something like this (simplified): + +```python +# Simplified from crawl4ai/async_configs.py + +from .cache_context import CacheMode +from .extraction_strategy import ExtractionStrategy +from .content_scraping_strategy import ContentScrapingStrategy, WebScrapingStrategy +# ... other imports ... + +class CrawlerRunConfig(): + """ + Configuration class for controlling how the crawler runs each crawl operation. + """ + def __init__( + self, + # Caching + cache_mode: CacheMode = CacheMode.BYPASS, # Default behavior if not specified + + # Content Selection / Waiting + css_selector: str = None, + wait_for: str = None, + page_timeout: int = 60000, # 60 seconds + + # Media + screenshot: bool = False, + pdf: bool = False, + + # Processing Strategies + scraping_strategy: ContentScrapingStrategy = None, # Defaults internally if None + extraction_strategy: ExtractionStrategy = None, + + # ... many other parameters omitted for clarity ... + **kwargs # Allows for flexibility + ): + self.cache_mode = cache_mode + self.css_selector = css_selector + self.wait_for = wait_for + self.page_timeout = page_timeout + self.screenshot = screenshot + self.pdf = pdf + # Assign scraping strategy, ensuring a default if None is provided + self.scraping_strategy = scraping_strategy or WebScrapingStrategy() + self.extraction_strategy = extraction_strategy + # ... initialize other attributes ... + + # Helper methods like 'clone', 'to_dict', 'from_kwargs' might exist too + # ... +``` + +The key idea is that it's a class designed to hold various settings together. When you create an instance `CrawlerRunConfig(...)`, you're essentially creating an object that stores your choices for these parameters. + +## Conclusion + +You've learned about `CrawlerRunConfig`, the "Instruction Manual" for individual crawl jobs in Crawl4AI! + +* It solves the problem of passing many settings individually to `AsyncWebCrawler`. +* You create an instance of `CrawlerRunConfig` and set the parameters you want to customize (like `cache_mode`, `screenshot`, `css_selector`, `wait_for`). +* You pass this config object to `crawler.arun(url, config=your_config)`. +* This makes your code cleaner and gives you fine-grained control over *how* each crawl is performed. + +Now that we know how to fetch content ([AsyncCrawlerStrategy](01_asynccrawlerstrategy.md)), manage the overall process ([AsyncWebCrawler](02_asyncwebcrawler.md)), and give specific instructions ([CrawlerRunConfig](03_crawlerrunconfig.md)), let's look at how the raw, messy HTML fetched from the web is initially cleaned up and processed. + +**Next:** Let's explore [Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy](04_contentscrapingstrategy.md). + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/output/Crawl4AI/04_contentscrapingstrategy.md b/output/Crawl4AI/04_contentscrapingstrategy.md new file mode 100644 index 0000000..bcf6a20 --- /dev/null +++ b/output/Crawl4AI/04_contentscrapingstrategy.md @@ -0,0 +1,321 @@ +# Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy + +In [Chapter 3: Giving Instructions - CrawlerRunConfig](03_crawlerrunconfig.md), we learned how to give specific instructions to our `AsyncWebCrawler` using `CrawlerRunConfig`. This included telling it *how* to fetch the page and potentially take screenshots or PDFs. + +Now, imagine the crawler has successfully fetched the raw HTML content of a webpage. What's next? Raw HTML is often messy! It contains not just the main article or product description you might care about, but also: + +* Navigation menus +* Advertisements +* Headers and footers +* Hidden code like JavaScript (`