init push

This commit is contained in:
zachary62
2025-04-04 13:03:54 -04:00
parent e62ee2cb13
commit 2ebad5e5f2
160 changed files with 2 additions and 0 deletions

View File

@@ -0,0 +1,293 @@
# Chapter 1: The Celery App - Your Task Headquarters
Welcome to the world of Celery! If you've ever thought, "I wish this slow part of my web request could run somewhere else later," or "How can I process this huge amount of data without freezing my main application?", then Celery is here to help.
Celery allows you to run code (we call these "tasks") separately from your main application, either in the background on the same machine or distributed across many different machines.
But how do you tell Celery *what* tasks to run and *how* to run them? That's where the **Celery App** comes in.
## What Problem Does the Celery App Solve?
Imagine you're building a website. When a user uploads a profile picture, you need to resize it into different formats (thumbnail, medium, large). Doing this immediately when the user clicks "upload" can make the request slow and keep the user waiting.
Ideally, you want to:
1. Quickly save the original image.
2. Tell the user "Okay, got it!"
3. *Later*, in the background, resize the image.
Celery helps with step 3. But you need a central place to define the "resize image" task and configure *how* it should be run (e.g., where to send the request to resize, where to store the result). The **Celery App** is that central place.
Think of it like the main application object in web frameworks like Flask or Django. It's the starting point, the brain, the headquarters for everything Celery-related in your project.
## Creating Your First Celery App
Getting started is simple. You just need to create an instance of the `Celery` class.
Let's create a file named `celery_app.py`:
```python
# celery_app.py
from celery import Celery
# Create a Celery app instance
# 'tasks' is just a name for this app instance, often the module name.
# 'broker' tells Celery where to send task messages.
# We'll use Redis here for simplicity (you need Redis running).
app = Celery('tasks',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/0') # Added backend for results
print(f"Celery app created: {app}")
```
**Explanation:**
* `from celery import Celery`: We import the main `Celery` class.
* `app = Celery(...)`: We create an instance.
* `'tasks'`: This is the *name* of our Celery application. It's often good practice to use the name of the module where your app is defined. Celery uses this name to automatically name tasks if you don't provide one explicitly.
* `broker='redis://localhost:6379/0'`: This is crucial! It tells Celery where to send the task messages. A "broker" is like a post office for tasks. We're using Redis here, but Celery supports others like RabbitMQ. We'll learn more about the [Broker Connection (AMQP)](04_broker_connection__amqp_.md) in Chapter 4. (Note: AMQP is the protocol often used with brokers like RabbitMQ, but the concept applies even when using Redis).
* `backend='redis://localhost:6379/0'`: This tells Celery where to store the results of your tasks. If your task returns a value (like `2+2` returns `4`), Celery can store this `4` in the backend. We'll cover the [Result Backend](06_result_backend.md) in Chapter 6.
That's it! You now have a `Celery` application instance named `app`. This `app` object is your main tool for working with Celery.
## Defining a Task with the App
Now that we have our `app`, how do we define a task? We use the `@app.task` decorator.
Let's modify `celery_app.py`:
```python
# celery_app.py
from celery import Celery
import time
# Create a Celery app instance
app = Celery('tasks',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/0')
# Define a simple task using the app's decorator
@app.task
def add(x, y):
print(f"Task 'add' started with args: ({x}, {y})")
time.sleep(2) # Simulate some work
result = x + y
print(f"Task 'add' finished with result: {result}")
return result
print(f"Task 'add' is registered: {app.tasks.get('celery_app.add')}")
```
**Explanation:**
* `@app.task`: This is the magic decorator. It takes our regular Python function `add(x, y)` and registers it as a Celery task within our `app`.
* Now, `app` knows about a task called `celery_app.add` (Celery automatically generates the name based on the module `celery_app` and function `add`).
* We'll learn all about [Task](03_task.md)s in Chapter 3.
## Sending a Task (Conceptual)
How do we actually *run* this `add` task in the background? We use methods like `.delay()` or `.apply_async()` on the task object itself.
```python
# In a separate Python script or interpreter, after importing 'add' from celery_app.py
from celery_app import add
# Send the task to the broker configured in our 'app'
result_promise = add.delay(4, 5)
print(f"Task sent! It will run in the background.")
print(f"We got back a promise object: {result_promise}")
# We can later check the result using result_promise.get()
# (Requires a result backend and a worker running the task)
```
**Explanation:**
* `add.delay(4, 5)`: This doesn't run the `add` function *right now*. Instead, it:
1. Packages the task name (`celery_app.add`) and its arguments (`4`, `5`) into a message.
2. Sends this message to the **broker** (Redis, in our case) that was configured in our `Celery` app instance (`app`).
* It returns an `AsyncResult` object (our `result_promise`), which is like an IOU or a placeholder for the actual result. We can use this later to check if the task finished and what its result was (if we configured a [Result Backend](06_result_backend.md)).
A separate program, called a Celery [Worker](05_worker.md), needs to be running. This worker watches the broker for new task messages, executes the corresponding task function, and (optionally) stores the result in the backend. We'll learn how to run a worker in Chapter 5.
The key takeaway here is that the **Celery App** holds the configuration needed (`broker` and `backend` URLs) for `add.delay()` to know *where* to send the task message and potentially where the result will be stored.
## How It Works Internally (High-Level)
Let's visualize the process of creating the app and sending a task:
1. **Initialization (`Celery(...)`)**: When you create `app = Celery(...)`, the app instance stores the `broker` and `backend` URLs and sets up internal components like the task registry.
2. **Task Definition (`@app.task`)**: The decorator tells the `app` instance: "Hey, remember this function `add`? It's a task." The app stores this information in its internal task registry (`app.tasks`).
3. **Sending a Task (`add.delay(4, 5)`)**:
* `add.delay()` looks up the `app` it belongs to.
* It asks the `app` for the `broker` URL.
* It creates a message containing the task name (`celery_app.add`), arguments (`4, 5`), and other details.
* It uses the `broker` URL to connect to the broker (Redis) and sends the message.
```mermaid
sequenceDiagram
participant Client as Your Python Code
participant CeleryApp as app = Celery(...)
participant AddTask as @app.task add()
participant Broker as Redis/RabbitMQ
Client->>CeleryApp: Create instance (broker='redis://...')
Client->>AddTask: Define add() function with @app.task
Note over AddTask,CeleryApp: Decorator registers 'add' with 'app'
Client->>AddTask: Call add.delay(4, 5)
AddTask->>CeleryApp: Get broker configuration
CeleryApp-->>AddTask: 'redis://...'
AddTask->>Broker: Send task message ('add', 4, 5)
Broker-->>AddTask: Acknowledgment (message sent)
AddTask-->>Client: Return AsyncResult (promise)
```
This diagram shows how the `Celery App` acts as the central coordinator, holding configuration and enabling the task (`add`) to send its execution request to the Broker.
## Code Dive: Inside the `Celery` Class
Let's peek at some relevant code snippets (simplified for clarity).
**Initialization (`app/base.py`)**
When you call `Celery(...)`, the `__init__` method runs:
```python
# Simplified from celery/app/base.py
from .registry import TaskRegistry
from .utils import Settings
class Celery:
def __init__(self, main=None, broker=None, backend=None,
include=None, config_source=None, task_cls=None,
autofinalize=True, **kwargs):
self.main = main # Store the app name ('tasks' in our example)
self._tasks = TaskRegistry({}) # Create an empty dictionary for tasks
# Store broker/backend/include settings temporarily
self._preconf = {}
self.__autoset('broker_url', broker)
self.__autoset('result_backend', backend)
self.__autoset('include', include)
# ... other kwargs ...
# Configuration object - initially pending, loaded later
self._conf = Settings(...)
# ... other setup ...
_register_app(self) # Register this app instance globally (sometimes useful)
# Helper to store initial settings before full configuration load
def __autoset(self, key, value):
if value is not None:
self._preconf[key] = value
```
This shows how the `Celery` object is initialized, storing the name, setting up a task registry, and holding onto initial configuration like the `broker` URL. The full configuration is often loaded later (see [Configuration](02_configuration.md)).
**Task Decorator (`app/base.py`)**
The `@app.task` decorator ultimately calls `_task_from_fun`:
```python
# Simplified from celery/app/base.py
def task(self, *args, **opts):
# ... logic to handle decorator arguments ...
def _create_task_cls(fun):
# If app isn't finalized, might return a proxy object first
# Eventually calls _task_from_fun to create/register the task
ret = self._task_from_fun(fun, **opts)
return ret
return _create_task_cls
def _task_from_fun(self, fun, name=None, base=None, bind=False, **options):
# Generate task name if not provided (e.g., 'celery_app.add')
name = name or self.gen_task_name(fun.__name__, fun.__module__)
base = base or self.Task # Default base Task class
# Check if task already registered
if name not in self._tasks:
# Create a Task class dynamically based on the function
task = type(fun.__name__, (base,), {
'app': self, # Link task back to this app instance!
'name': name,
'run': staticmethod(fun), # The actual function to run
# ... other attributes and options ...
})() # Instantiate the new task class
self._tasks[task.name] = task # Add to app's task registry
task.bind(self) # Perform any binding steps
else:
task = self._tasks[name] # Task already exists
return task
```
This shows how the decorator uses the `app` instance (`self`) to generate a name, create a `Task` object wrapping your function, associate the task with the app (`'app': self`), and store it in the `app._tasks` registry.
**Sending Tasks (`app/base.py`)**
Calling `.delay()` or `.apply_async()` eventually uses `app.send_task`:
```python
# Simplified from celery/app/base.py
def send_task(self, name, args=None, kwargs=None, task_id=None,
producer=None, connection=None, router=None, **options):
# ... lots of logic to prepare options, task_id, routing ...
# Get the routing info (exchange, routing_key, queue)
# Uses app.conf for defaults if not specified
options = self.amqp.router.route(options, name, args, kwargs)
# Create the message body
message = self.amqp.create_task_message(
task_id or uuid(), # Generate task ID if needed
name, args, kwargs, # Task details
# ... other arguments like countdown, eta, expires ...
)
# Get a producer (handles connection/channel to broker)
# Uses the app's producer pool (app.producer_pool)
with self.producer_or_acquire(producer) as P:
# Tell the backend we're about to send (if tracking results)
if not options.get('ignore_result', False):
self.backend.on_task_call(P, task_id)
# Actually send the message via the producer
self.amqp.send_task_message(P, name, message, **options)
# Create the AsyncResult object to return to the caller
result = self.AsyncResult(task_id)
# ... set result properties ...
return result
```
This highlights how `send_task` relies on the `app` (via `self`) to:
* Access configuration (`self.conf`).
* Use the AMQP utilities (`self.amqp`) for routing and message creation.
* Access the result backend (`self.backend`).
* Get a connection/producer from the pool (`self.producer_or_acquire`).
* Create the `AsyncResult` using the app's result class (`self.AsyncResult`).
## Conclusion
You've learned that the `Celery App` is the essential starting point for any Celery project.
* It acts as the central **headquarters** or **brain**.
* You create it using `app = Celery(...)`, providing at least a name and a `broker` URL.
* It holds **configuration** (like broker/backend URLs).
* It **registers tasks** defined using the `@app.task` decorator.
* It enables tasks to be **sent** to the broker using methods like `.delay()`.
The app ties everything together. But how do you manage all the different settings Celery offers, beyond just the `broker` and `backend`?
In the next chapter, we'll dive deeper into how to configure your Celery app effectively.
**Next:** [Chapter 2: Configuration](02_configuration.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,252 @@
# Chapter 2: Configuration - Telling Celery How to Work
In [Chapter 1: The Celery App](01_celery_app.md), we created our first `Celery` app instance. We gave it a name and told it where our message broker and result backend were located using the `broker` and `backend` arguments:
```python
# From Chapter 1
from celery import Celery
app = Celery('tasks',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/0')
```
This worked, but what if we want to change settings later, or manage many different settings? Passing everything directly when creating the `app` can become messy.
## What Problem Does Configuration Solve?
Think of Celery as a busy workshop with different stations (workers, schedulers) and tools (message brokers, result storage). **Configuration** is the central instruction manual or settings panel for this entire workshop.
It tells Celery things like:
* **Where is the message broker?** (The post office for tasks)
* **Where should results be stored?** (The filing cabinet for completed work)
* **How should tasks be handled?** (e.g., What format should the messages use? Are there any speed limits for certain tasks?)
* **How should the workers behave?** (e.g., How many tasks can they work on at once?)
* **How should scheduled tasks run?** (e.g., What timezone should be used?)
Without configuration, Celery wouldn't know how to connect to your broker, where to put results, or how to manage the workflow. Configuration allows you to customize Celery to fit your specific needs.
## Key Configuration Concepts
While Celery has many settings, here are some fundamental ones you'll encounter often:
1. **`broker_url`**: The address of your message broker (like Redis or RabbitMQ). This is essential for sending and receiving task messages. We'll learn more about brokers in [Chapter 4: Broker Connection (AMQP)](04_broker_connection__amqp_.md).
2. **`result_backend`**: The address of your result store. This is needed if you want to keep track of task status or retrieve return values. We cover this in [Chapter 6: Result Backend](06_result_backend.md).
3. **`include`**: A list of module names that the Celery worker should import when it starts. This is often where your task definitions live (like the `add` task from Chapter 1).
4. **`task_serializer`**: Defines the format used to package task messages before sending them to the broker (e.g., 'json', 'pickle'). 'json' is a safe and common default.
5. **`timezone`**: Sets the timezone Celery uses, which is important for scheduled tasks managed by [Chapter 7: Beat (Scheduler)](07_beat__scheduler_.md).
## How to Configure Your Celery App
Celery is flexible and offers several ways to set its configuration.
**Method 1: Directly on the App Object (After Creation)**
You can update the configuration *after* creating the `Celery` app instance using the `app.conf.update()` method. This is handy for simple adjustments or quick tests.
```python
# celery_app.py
from celery import Celery
# Create the app (maybe with initial settings)
app = Celery('tasks', broker='redis://localhost:6379/0')
# Update configuration afterwards
app.conf.update(
result_backend='redis://localhost:6379/1', # Use database 1 for results
task_serializer='json',
result_serializer='json',
accept_content=['json'], # Only accept json formatted tasks
timezone='Europe/Oslo',
enable_utc=True, # Use UTC timezone internally
# Add task modules to import when worker starts
include=['my_tasks'] # Assumes you have a file my_tasks.py with tasks
)
print(f"Broker URL set to: {app.conf.broker_url}")
print(f"Result backend set to: {app.conf.result_backend}")
print(f"Timezone set to: {app.conf.timezone}")
```
**Explanation:**
* We create the `app` like before, potentially setting some initial config like the `broker`.
* `app.conf.update(...)`: We pass a Python dictionary to this method. The keys are Celery setting names (like `result_backend`, `timezone`), and the values are what we want to set them to.
* `app.conf` is the central configuration object attached to your `app` instance.
**Method 2: Dedicated Configuration Module (Recommended)**
For most projects, especially larger ones, it's cleaner to keep your Celery settings in a separate Python file (e.g., `celeryconfig.py`).
1. **Create `celeryconfig.py`:**
```python
# celeryconfig.py
# Broker settings
broker_url = 'redis://localhost:6379/0'
# Result backend settings
result_backend = 'redis://localhost:6379/1'
# Task settings
task_serializer = 'json'
result_serializer = 'json'
accept_content = ['json']
# Timezone settings
timezone = 'America/New_York'
enable_utc = True # Recommended
# List of modules to import when the Celery worker starts.
imports = ('proj.tasks',) # Example: Assuming tasks are in proj/tasks.py
```
**Explanation:**
* This is just a standard Python file.
* We define variables whose names match the Celery configuration settings (e.g., `broker_url`, `timezone`). Celery expects these specific names.
2. **Load the configuration in your app file (`celery_app.py`):**
```python
# celery_app.py
from celery import Celery
# Create the app instance (no need to pass broker/backend here now)
app = Celery('tasks')
# Load configuration from the 'celeryconfig' module
# Assumes celeryconfig.py is in the same directory or Python path
app.config_from_object('celeryconfig')
print(f"Loaded Broker URL from config file: {app.conf.broker_url}")
print(f"Loaded Timezone from config file: {app.conf.timezone}")
# You might still define tasks in this file or in the modules listed
# in celeryconfig.imports
@app.task
def multiply(x, y):
return x * y
```
**Explanation:**
* `app = Celery('tasks')`: We create the app instance, but we don't need to specify the broker or backend here because they will be loaded from the file.
* `app.config_from_object('celeryconfig')`: This is the key line. It tells Celery to:
* Find a module named `celeryconfig`.
* Look at all the uppercase variables defined in that module.
* Use those variables to configure the `app`.
This approach keeps your settings organized and separate from your application logic.
**Method 3: Environment Variables**
Celery settings can also be controlled via environment variables. This is very useful for deployments (e.g., using Docker) where you might want to change the broker address without changing code.
Environment variable names typically follow the pattern `CELERY_<SETTING_NAME_IN_UPPERCASE>`.
For example, you could set the broker URL in your terminal before running your app or worker:
```bash
# In your terminal (Linux/macOS)
export CELERY_BROKER_URL='amqp://guest:guest@localhost:5672//'
export CELERY_RESULT_BACKEND='redis://localhost:6379/2'
# Now run your Python script or Celery worker
python your_script.py
# or
# celery -A your_app_module worker --loglevel=info
```
Celery automatically picks up these environment variables. They often take precedence over settings defined in a configuration file or directly on the app, making them ideal for overriding settings in different environments (development, staging, production).
*Note: The exact precedence order can sometimes depend on how and when configuration is loaded, but environment variables are generally a high-priority source.*
## How It Works Internally (Simplified View)
1. **Loading:** When you create a `Celery` app or call `app.config_from_object()`, Celery reads the settings from the specified source (arguments, object/module, environment variables).
2. **Storing:** These settings are stored in a dictionary-like object accessible via `app.conf`. Celery uses a default set of values initially, which are then updated or overridden by your configuration.
3. **Accessing:** When a Celery component needs a setting (e.g., the worker needs the `broker_url` to connect, or a task needs the `task_serializer`), it simply looks up the required key in the `app.conf` object.
```mermaid
sequenceDiagram
participant ClientCode as Your App Setup (e.g., celery_app.py)
participant CeleryApp as app = Celery(...)
participant ConfigSource as celeryconfig.py / Env Vars
participant Worker as Celery Worker Process
participant Broker as Message Broker (e.g., Redis)
ClientCode->>CeleryApp: Create instance
ClientCode->>CeleryApp: app.config_from_object('celeryconfig')
CeleryApp->>ConfigSource: Read settings (broker_url, etc.)
ConfigSource-->>CeleryApp: Return settings values
Note over CeleryApp: Stores settings in app.conf
Worker->>CeleryApp: Start worker for 'app'
Worker->>CeleryApp: Access app.conf.broker_url
CeleryApp-->>Worker: Return 'redis://localhost:6379/0'
Worker->>Broker: Connect using 'redis://localhost:6379/0'
```
This diagram shows the app loading configuration first, and then the worker using that stored configuration (`app.conf`) to perform its duties, like connecting to the broker.
## Code Dive: Where Configuration Lives
* **`app.conf`:** This is the primary interface you interact with. It's an instance of a special dictionary-like class (`celery.app.utils.Settings`) that handles loading defaults, converting keys (Celery has changed setting names over time), and providing convenient access. You saw this in the direct update example: `app.conf.update(...)`.
* **Loading Logic (`config_from_object`)**: Methods like `app.config_from_object` typically delegate to the app's "loader" (`app.loader`). The loader (e.g., `celery.loaders.base.BaseLoader` or `celery.loaders.app.AppLoader`) handles the actual importing of the configuration module and extracting the settings. See `loaders/base.py` for the `config_from_object` method definition.
* **Default Settings**: Celery has a built-in set of default values for all its settings. These are defined in `celery.app.defaults`. Your configuration overrides these defaults. See `app/defaults.py`.
* **Accessing Settings**: Throughout the Celery codebase, different components access the configuration via `app.conf`. For instance, when sending a task (`app/base.py:send_task`), the code looks up `app.conf.broker_url` (or related settings) to know where and how to send the message.
```python
# Simplified concept from loaders/base.py
class BaseLoader:
# ...
def config_from_object(self, obj, silent=False):
if isinstance(obj, str):
# Import the module (e.g., 'celeryconfig')
obj = self._smart_import(obj, imp=self.import_from_cwd)
# ... error handling ...
# Store the configuration (simplified - actual process merges)
self._conf = force_mapping(obj) # Treat obj like a dictionary
# ...
return True
# Simplified concept from app/base.py (where settings are used)
class Celery:
# ...
def send_task(self, name, args=None, kwargs=None, **options):
# ... other setup ...
# Access configuration to know where the broker is
broker_connection_url = self.conf.broker_url # Reads from app.conf
# Use the broker URL to get a connection/producer
with self.producer_or_acquire(producer) as P:
# ... create message ...
# Send message using the connection derived from broker_url
self.amqp.send_task_message(P, name, message, **options)
# ... return result object ...
```
This illustrates the core idea: load configuration into `app.conf`, then components read from `app.conf` when they need instructions.
## Conclusion
Configuration is the backbone of Celery's flexibility. You've learned:
* **Why it's needed:** To tell Celery *how* to operate (broker, backend, tasks settings).
* **What can be configured:** Broker/backend URLs, serializers, timezones, task imports, and much more.
* **How to configure:**
* Directly via `app.conf.update()`.
* Using a dedicated module (`celeryconfig.py`) with `app.config_from_object()`. (Recommended)
* Using environment variables (great for deployment).
* **How it works:** Settings are loaded into `app.conf` and accessed by Celery components as needed.
With your Celery app configured, you're ready to define the actual work you want Celery to do. That's where Tasks come in!
**Next:** [Chapter 3: Task](03_task.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

245
docs/Celery/03_task.md Normal file
View File

@@ -0,0 +1,245 @@
# Chapter 3: Task - The Job Description
In [Chapter 1: The Celery App](01_celery_app.md), we set up our Celery headquarters, and in [Chapter 2: Configuration](02_configuration.md), we learned how to give it instructions. Now, we need to define the *actual work* we want Celery to do. This is where **Tasks** come in.
## What Problem Does a Task Solve?
Imagine you have a specific job that needs doing, like "Resize this image to thumbnail size" or "Send a welcome email to this new user." In Celery, each of these specific jobs is represented by a **Task**.
A Task is like a **job description** or a **recipe**. It contains the exact steps (the code) needed to complete a specific piece of work. You write this recipe once as a Python function, and then you can tell Celery to follow that recipe whenever you need that job done, potentially many times with different inputs (like resizing different images or sending emails to different users).
The key benefit is that you don't run the recipe immediately yourself. You hand the recipe (the Task) and the ingredients (the arguments, like the image file or the user's email) over to Celery. Celery then finds an available helper (a [Worker](05_worker.md)) who knows how to follow that specific recipe and lets them do the work in the background. This keeps your main application free to do other things.
## Defining Your First Task
Defining a task in Celery is surprisingly simple. You just take a regular Python function and "decorate" it using `@app.task`. Remember our `app` object from [Chapter 1](01_celery_app.md)? We use its `task` decorator.
Let's create a file, perhaps named `tasks.py`, to hold our task definitions:
```python
# tasks.py
import time
from celery_app import app # Import the app instance we created
@app.task
def add(x, y):
"""A simple task that adds two numbers."""
print(f"Task 'add' starting with ({x}, {y})")
# Simulate some work taking time
time.sleep(5)
result = x + y
print(f"Task 'add' finished with result: {result}")
return result
@app.task
def send_welcome_email(user_id):
"""A task simulating sending a welcome email."""
print(f"Task 'send_welcome_email' starting for user {user_id}")
# Simulate email sending process
time.sleep(3)
print(f"Welcome email supposedly sent to user {user_id}")
return f"Email sent to {user_id}"
# You can have many tasks in one file!
```
**Explanation:**
1. **`from celery_app import app`**: We import the `Celery` app instance we configured earlier. This instance holds the knowledge about our broker and backend.
2. **`@app.task`**: This is the magic decorator! When Celery sees this above a function (`add` or `send_welcome_email`), it says, "Ah! This isn't just a regular function; it's a job description that my workers need to know about."
3. **The Function (`add`, `send_welcome_email`)**: This is the actual Python code that performs the work. It's the core of the task the steps in the recipe. It can take arguments (like `x`, `y`, or `user_id`) and can return a value.
4. **Registration**: The `@app.task` decorator automatically *registers* this function with our Celery `app`. Now, `app` knows about a task named `tasks.add` and another named `tasks.send_welcome_email` (Celery creates the name from `module_name.function_name`). Workers connected to this `app` will be able to find and execute this code when requested.
*Self-Host Note:* If you are running this code, make sure you have a `celery_app.py` file containing your Celery app instance as shown in previous chapters, and that the `tasks.py` file can import `app` from it.
## Sending a Task for Execution
Okay, we've written our recipes (`add` and `send_welcome_email`). How do we tell Celery, "Please run the `add` recipe with the numbers 5 and 7"?
We **don't call the function directly** like `add(5, 7)`. If we did that, it would just run immediately in our current program, which defeats the purpose of using Celery!
Instead, we use special methods on the task object itself, most commonly `.delay()` or `.apply_async()`.
Let's try this in a separate Python script or an interactive Python session:
```python
# run_tasks.py
from tasks import add, send_welcome_email
print("Let's send some tasks!")
# --- Using .delay() ---
# Tell Celery to run add(5, 7) in the background
result_promise_add = add.delay(5, 7)
print(f"Sent task add(5, 7). Task ID: {result_promise_add.id}")
# Tell Celery to run send_welcome_email(123) in the background
result_promise_email = send_welcome_email.delay(123)
print(f"Sent task send_welcome_email(123). Task ID: {result_promise_email.id}")
# --- Using .apply_async() ---
# Does the same thing as .delay() but allows more options
result_promise_add_later = add.apply_async(args=(10, 20), countdown=10) # Run after 10s
print(f"Sent task add(10, 20) to run in 10s. Task ID: {result_promise_add_later.id}")
print("Tasks have been sent to the broker!")
print("A Celery worker needs to be running to pick them up.")
```
**Explanation:**
1. **`from tasks import add, send_welcome_email`**: We import our *task functions*. Because they were decorated with `@app.task`, they are now special Celery Task objects.
2. **`add.delay(5, 7)`**: This is the simplest way to send a task.
* It *doesn't* run `add(5, 7)` right now.
* It takes the arguments `(5, 7)`.
* It packages them up into a **message** along with the task's name (`tasks.add`).
* It sends this message to the **message broker** (like Redis or RabbitMQ) that we configured in our `celery_app.py`. Think of it like dropping a request slip into a mailbox.
3. **`send_welcome_email.delay(123)`**: Same idea, but for our email task. A message with `tasks.send_welcome_email` and the argument `123` is sent to the broker.
4. **`add.apply_async(args=(10, 20), countdown=10)`**: This is a more powerful way to send tasks.
* It does the same fundamental thing: sends a message to the broker.
* It allows for more options, like `args` (positional arguments as a tuple), `kwargs` (keyword arguments as a dict), `countdown` (delay execution by seconds), `eta` (run at a specific future time), and many others.
* `.delay(*args, **kwargs)` is just a convenient shortcut for `.apply_async(args=args, kwargs=kwargs)`.
5. **`result_promise_... = ...`**: Both `.delay()` and `apply_async()` return an `AsyncResult` object immediately. This is *not* the actual result of the task (like `12` for `add(5, 7)`). It's more like a receipt or a tracking number (notice the `.id` attribute). You can use this object later to check if the task finished and what its result was, but only if you've set up a [Result Backend](06_result_backend.md) (Chapter 6).
6. **The Worker**: Sending the task only puts the message on the queue. A separate process, the Celery [Worker](05_worker.md) (Chapter 5), needs to be running. The worker constantly watches the queue, picks up messages, finds the corresponding task function (using the name like `tasks.add`), and executes it with the provided arguments.
## How It Works Internally (Simplified)
Let's trace the journey of defining and sending our `add` task:
1. **Definition (`@app.task` in `tasks.py`)**:
* Python defines the `add` function.
* The `@app.task` decorator sees this function.
* It tells the `Celery` instance (`app`) about this function, registering it under the name `tasks.add` in an internal dictionary (`app.tasks`). The `app` instance knows the broker/backend settings.
2. **Sending (`add.delay(5, 7)` in `run_tasks.py`)**:
* You call `.delay()` on the `add` task object.
* `.delay()` (or `.apply_async()`) internally uses the `app` the task is bound to.
* It asks the `app` for the configured broker URL.
* It creates a message containing:
* Task Name: `tasks.add`
* Arguments: `(5, 7)`
* Other options (like a unique Task ID).
* It connects to the **Broker** (e.g., Redis) using the broker URL.
* It sends the message to a specific queue (usually named 'celery' by default) on the broker.
* It returns an `AsyncResult` object referencing the Task ID.
3. **Waiting**: The message sits in the queue on the broker, waiting.
4. **Execution (by a [Worker](05_worker.md))**:
* A separate Celery Worker process is running, connected to the same broker and `app`.
* The Worker fetches the message from the queue.
* It reads the task name: `tasks.add`.
* It looks up `tasks.add` in its copy of the `app.tasks` registry to find the actual `add` function code.
* It calls the `add` function with the arguments from the message: `add(5, 7)`.
* The function runs (prints logs, sleeps, calculates `12`).
* If a [Result Backend](06_result_backend.md) is configured, the Worker takes the return value (`12`) and stores it in the backend, associated with the Task ID.
* The Worker acknowledges the message to the broker, removing it from the queue.
```mermaid
sequenceDiagram
participant Client as Your Code (run_tasks.py)
participant TaskDef as @app.task def add()
participant App as Celery App Instance
participant Broker as Message Broker (e.g., Redis)
participant Worker as Celery Worker (separate process)
Note over TaskDef, App: 1. @app.task registers 'add' function with App's task registry
Client->>TaskDef: 2. Call add.delay(5, 7)
TaskDef->>App: 3. Get broker config
App-->>TaskDef: Broker URL
TaskDef->>Broker: 4. Send message ('tasks.add', (5, 7), task_id, ...)
Broker-->>TaskDef: Ack (Message Queued)
TaskDef-->>Client: 5. Return AsyncResult(task_id)
Worker->>Broker: 6. Fetch next message
Broker-->>Worker: Message ('tasks.add', (5, 7), task_id)
Worker->>App: 7. Lookup 'tasks.add' in registry
App-->>Worker: add function code
Worker->>Worker: 8. Execute add(5, 7) -> returns 12
Note over Worker: (Optionally store result in Backend)
Worker->>Broker: 9. Acknowledge message completion
```
## Code Dive: Task Creation and Sending
* **Task Definition (`@app.task`)**: This decorator is defined in `celery/app/base.py` within the `Celery` class method `task`. It ultimately calls `_task_from_fun`.
```python
# Simplified from celery/app/base.py
class Celery:
# ...
def task(self, *args, **opts):
# ... handles decorator arguments ...
def _create_task_cls(fun):
# Returns a Task instance or a Proxy that creates one later
ret = self._task_from_fun(fun, **opts)
return ret
return _create_task_cls
def _task_from_fun(self, fun, name=None, base=None, bind=False, **options):
# Generate name like 'tasks.add' if not given
name = name or self.gen_task_name(fun.__name__, fun.__module__)
base = base or self.Task # The base Task class (from celery.app.task)
if name not in self._tasks: # If not already registered...
# Dynamically create a Task class wrapping the function
task = type(fun.__name__, (base,), {
'app': self, # Link task back to this app instance!
'name': name,
'run': staticmethod(fun), # The actual function to run
'__doc__': fun.__doc__,
'__module__': fun.__module__,
# ... other options ...
})() # Instantiate the new Task class
self._tasks[task.name] = task # Add to app's registry!
task.bind(self) # Perform binding steps
else:
task = self._tasks[name] # Task already exists
return task
```
This shows how the decorator essentially creates a specialized object (an instance of a class derived from `celery.app.task.Task`) that wraps your original function and registers it with the `app` under a specific name.
* **Task Sending (`.delay`)**: The `.delay()` method is defined on the `Task` class itself in `celery/app/task.py`. It's a simple shortcut.
```python
# Simplified from celery/app/task.py
class Task:
# ...
def delay(self, *args, **kwargs):
"""Shortcut for apply_async(args, kwargs)"""
return self.apply_async(args, kwargs)
def apply_async(self, args=None, kwargs=None, ..., **options):
# ... argument checking, option processing ...
# Get the app associated with this task instance
app = self._get_app()
# If always_eager is set, run locally instead of sending
if app.conf.task_always_eager:
return self.apply(args, kwargs, ...) # Runs inline
# The main path: tell the app to send the task message
return app.send_task(
self.name, args, kwargs, task_type=self,
**options # Includes things like countdown, eta, queue etc.
)
```
You can see how `.delay` just calls `.apply_async`, which then (usually) delegates the actual message sending to the `app.send_task` method we saw briefly in [Chapter 1](01_celery_app.md). The `app` uses its configuration to know *how* and *where* to send the message.
## Conclusion
You've learned the core concept of a Celery **Task**:
* It represents a single, well-defined **unit of work** or **job description**.
* You define a task by decorating a normal Python function with `@app.task`. This **registers** the task with your Celery application.
* You **send** a task request (not run it directly) using `.delay()` or `.apply_async()`.
* Sending a task puts a **message** onto a queue managed by a **message broker**.
* A separate **Worker** process picks up the message and executes the corresponding task function.
Tasks are the fundamental building blocks of work in Celery. Now that you know how to define a task and request its execution, let's look more closely at the crucial component that handles passing these requests around: the message broker.
**Next:** [Chapter 4: Broker Connection (AMQP)](04_broker_connection__amqp_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,167 @@
# Chapter 4: Broker Connection (AMQP) - Celery's Postal Service
In [Chapter 3: Task](03_task.md), we learned how to define "job descriptions" (Tasks) like `add(x, y)` and how to request them using `.delay()`. But when you call `add.delay(2, 2)`, how does that request actually *get* to a worker process that can perform the addition? It doesn't just magically appear!
This is where the **Broker Connection** comes in. Think of it as Celery's built-in postal service.
## What Problem Does the Broker Connection Solve?
Imagine you want to send a letter (a task request) to a friend (a worker) who lives in another city. You can't just shout the message out your window and hope they hear it. You need:
1. A **Post Office** (the Message Broker, like RabbitMQ or Redis) that handles mail.
2. A way to **talk to the Post Office** (the Broker Connection) to drop off your letter or pick up mail addressed to you.
The Broker Connection is that crucial link between your application (where you call `.delay()`) or your Celery worker and the message broker system. It manages sending messages *to* the broker and receiving messages *from* the broker reliably.
Without this connection, your task requests would never leave your application, and your workers would never know there's work waiting for them.
## Key Concepts: Post Office & Rules
Let's break down the pieces:
1. **The Message Broker (The Post Office):** This is a separate piece of software that acts as a central hub for messages. Common choices are RabbitMQ and Redis. You tell Celery its address using the `broker_url` setting in your [Configuration](02_configuration.md).
```python
# From Chapter 2 - celeryconfig.py
broker_url = 'amqp://guest:guest@localhost:5672//' # Example for RabbitMQ
# Or maybe: broker_url = 'redis://localhost:6379/0' # Example for Redis
```
2. **The Connection (Talking to the Staff):** This is the active communication channel established between your Python code (either your main app or a worker) and the broker. It's like having an open phone line to the post office. Celery, using a library called `kombu`, handles creating and managing these connections based on the `broker_url`.
3. **AMQP (The Postal Rules):** AMQP stands for **Advanced Message Queuing Protocol**. Think of it as a specific set of rules and procedures for how post offices should operate how letters should be addressed, sorted, delivered, and confirmed.
* RabbitMQ is a broker that speaks AMQP natively.
* Other brokers, like Redis, use different protocols (their own set of rules).
* **Why mention AMQP?** It's a very common and powerful protocol for message queuing, and the principles behind it (exchanges, queues, routing) are fundamental to how Celery routes tasks, even when using other brokers. Celery's internal component for handling this communication is often referred to as `app.amqp` (found in `app/amqp.py`), even though the underlying library (`kombu`) supports multiple protocols. So, we focus on the *concept* of managing the broker connection, often using AMQP terminology as a reference point.
4. **Producer (Sending Mail):** When your application calls `add.delay(2, 2)`, it acts as a *producer*. It uses its broker connection to send a message ("Please run 'add' with arguments (2, 2)") to the broker.
5. **Consumer (Receiving Mail):** A Celery [Worker](05_worker.md) acts as a *consumer*. It uses its *own* broker connection to constantly check a specific mailbox (queue) at the broker for new messages. When it finds one, it takes it, performs the task, and tells the broker it's done.
## How Sending a Task Uses the Connection
Let's revisit sending a task from [Chapter 3: Task](03_task.md):
```python
# run_tasks.py (simplified)
from tasks import add
from celery_app import app # Assume app is configured with a broker_url
# 1. You call .delay()
print("Sending task...")
result_promise = add.delay(2, 2)
# Behind the scenes:
# a. Celery looks at the 'add' task, finds its associated 'app'.
# b. It asks 'app' for the broker_url from its configuration.
# c. It uses the app.amqp component (powered by Kombu) to get a connection
# to the broker specified by the URL (e.g., 'amqp://localhost...').
# d. It packages the task name 'tasks.add' and args (2, 2) into a message.
# e. It uses the connection to 'publish' (send) the message to the broker.
print(f"Task sent! ID: {result_promise.id}")
```
The `add.delay(2, 2)` call triggers this whole process. It needs the configured `broker_url` to know *which* post office to connect to, and the broker connection handles the actual sending of the "letter" (task message).
Similarly, a running Celery [Worker](05_worker.md) establishes its own connection to the *same* broker. It uses this connection to *listen* for incoming messages on the queues it's assigned to.
## How It Works Internally (Simplified)
Celery uses a powerful library called **Kombu** to handle the low-level details of connecting and talking to different types of brokers (RabbitMQ, Redis, etc.). The `app.amqp` object in Celery acts as a high-level interface to Kombu's features.
1. **Configuration:** The `broker_url` tells Kombu where and how to connect.
2. **Connection Pool:** To be efficient, Celery (via Kombu) often maintains a *pool* of connections. When you send a task, it might grab an existing, idle connection from the pool instead of creating a new one every time. This is faster. You can see this managed by `app.producer_pool` in `app/base.py`.
3. **Producer:** When `task.delay()` is called, it ultimately uses a `kombu.Producer` object. This object represents the ability to *send* messages. It's tied to a specific connection and channel.
4. **Publishing:** The producer's `publish()` method is called. This takes the task message (already serialized into a format like JSON), specifies the destination (exchange and routing key - think of these like the address and sorting code on an envelope), and sends it over the connection to the broker.
5. **Consumer:** A Worker uses a `kombu.Consumer` object. This object is set up to listen on specific queues via its connection. When a message arrives in one of those queues, the broker pushes it to the consumer over the connection, and the consumer triggers the appropriate Celery task execution logic.
```mermaid
sequenceDiagram
participant Client as Your App Code
participant Task as add.delay()
participant App as Celery App
participant AppAMQP as app.amqp (Kombu Interface)
participant Broker as RabbitMQ / Redis
Client->>Task: Call add.delay(2, 2)
Task->>App: Get broker config (broker_url)
App-->>Task: broker_url
Task->>App: Ask to send task 'tasks.add'
App->>AppAMQP: Send task message('tasks.add', (2, 2), ...)
Note over AppAMQP: Gets connection/producer (maybe from pool)
AppAMQP->>Broker: publish(message, routing_info) via Connection
Broker-->>AppAMQP: Acknowledge message received
AppAMQP-->>App: Message sent successfully
App-->>Task: Return AsyncResult
Task-->>Client: Return AsyncResult
```
This shows the flow: your code calls `.delay()`, Celery uses its configured connection details (`app.amqp` layer) to get a connection and producer, and then publishes the message to the broker.
## Code Dive: Sending a Message
Let's peek inside `app/amqp.py` where the `AMQP` class orchestrates sending. The `send_task_message` method (simplified below) is key.
```python
# Simplified from app/amqp.py within the AMQP class
# This function is configured internally and gets called by app.send_task
def _create_task_sender(self):
# ... (lots of setup: getting defaults from config, signals) ...
default_serializer = self.app.conf.task_serializer
default_compressor = self.app.conf.task_compression
def send_task_message(producer, name, message,
exchange=None, routing_key=None, queue=None,
serializer=None, compression=None, declare=None,
retry=None, retry_policy=None,
**properties):
# ... (Determine exchange, routing_key, queue based on config/options) ...
# ... (Prepare headers, properties, handle retries) ...
headers, properties, body, sent_event = message # Unpack the prepared message tuple
# The core action: Use the producer to publish the message!
ret = producer.publish(
body, # The actual task payload (args, kwargs, etc.)
exchange=exchange,
routing_key=routing_key,
serializer=serializer or default_serializer, # e.g., 'json'
compression=compression or default_compressor,
retry=retry,
retry_policy=retry_policy,
declare=declare, # Maybe declare queues/exchanges if needed
headers=headers,
**properties # Other message properties (correlation_id, etc.)
)
# ... (Send signals like task_sent, publish events if configured) ...
return ret
return send_task_message
```
**Explanation:**
* This function takes a `producer` object (which is linked to a broker connection via Kombu).
* It figures out the final destination details (exchange, routing key).
* It calls `producer.publish()`, passing the task body and all the necessary options (like serializer). This is the function that actually sends the data over the network connection to the broker.
The `Connection` objects themselves are managed by Kombu (see `kombu/connection.py`). Celery uses these objects via its `app.connection_for_write()` or `app.connection_for_read()` methods, which often pull from the connection pool (`kombu.pools`).
## Conclusion
The Broker Connection is Celery's vital communication link, its "postal service."
* It connects your application and workers to the **Message Broker** (like RabbitMQ or Redis).
* It uses the `broker_url` from your [Configuration](02_configuration.md) to know where to connect.
* Protocols like **AMQP** define the "rules" for communication, although Celery's underlying library (Kombu) handles various protocols.
* Your app **produces** task messages and sends them over the connection.
* Workers **consume** task messages received over their connection.
* Celery manages connections efficiently, often using **pools**.
Understanding the broker connection helps clarify how tasks move from where they're requested to where they run. Now that we know how tasks are defined and sent across the wire, let's look at the entity that actually picks them up and does the work.
**Next:** [Chapter 5: Worker](05_worker.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

223
docs/Celery/05_worker.md Normal file
View File

@@ -0,0 +1,223 @@
# Chapter 5: Worker - The Task Doer
In [Chapter 4: Broker Connection (AMQP)](04_broker_connection__amqp_.md), we learned how Celery uses a message broker, like a postal service, to send task messages. When you call `add.delay(2, 2)`, a message asking to run the `add` task with arguments `(2, 2)` gets dropped into a mailbox (the broker queue).
But who actually checks that mailbox, picks up the message, and performs the addition? That's the job of the **Celery Worker**.
## What Problem Does the Worker Solve?
Imagine our workshop analogy again. You've defined the blueprint for a job ([Task](03_task.md)) and you've dropped the work order into the central inbox ([Broker Connection (AMQP)](04_broker_connection__amqp_.md)). Now you need an actual employee or a machine to:
1. Look in the inbox for new work orders.
2. Pick up an order.
3. Follow the instructions (run the task code).
4. Maybe put the finished product (the result) somewhere specific.
5. Mark the order as complete.
The **Celery Worker** is that employee or machine. It's a separate program (process) that you run, whose sole purpose is to execute the tasks you send to the broker. Without a worker running, your task messages would just sit in the queue forever, waiting for someone to process them.
## Starting Your First Worker
Running a worker is typically done from your command line or terminal. You need to tell the worker where to find your [Celery App](01_celery_app.md) instance (which holds the configuration, including the broker address and the list of known tasks).
Assuming you have:
* A file `celery_app.py` containing your `app = Celery(...)` instance.
* A file `tasks.py` containing your task definitions (like `add` and `send_welcome_email`) decorated with `@app.task`.
* Your message broker (e.g., Redis or RabbitMQ) running.
You can start a worker like this:
```bash
# In your terminal, in the same directory as celery_app.py and tasks.py
# Make sure your Python environment has celery and the broker driver installed
# (e.g., pip install celery redis)
celery -A celery_app worker --loglevel=info
```
**Explanation:**
* `celery`: This is the main Celery command-line program.
* `-A celery_app`: The `-A` flag (or `--app`) tells Celery where to find your `Celery` app instance. `celery_app` refers to the `celery_app.py` file (or module) and implies Celery should look for an instance named `app` inside it.
* `worker`: This specifies that you want to run the worker component.
* `--loglevel=info`: This sets the logging level. `info` is a good starting point, showing you when the worker connects, finds tasks, and executes them. Other levels include `debug` (more verbose), `warning`, `error`, and `critical`.
**What You'll See:**
When the worker starts successfully, you'll see a banner like this (details may vary):
```text
-------------- celery@yourhostname v5.x.x (stars)
--- ***** -----
-- ******* ---- Linux-5.15.0...-generic-x86_64-with-... 2023-10-27 10:00:00
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: tasks:0x7f...
- ** ---------- .> transport: redis://localhost:6379/0
- ** ---------- .> results: redis://localhost:6379/0
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. tasks.add
. tasks.send_welcome_email
[2023-10-27 10:00:01,000: INFO/MainProcess] Connected to redis://localhost:6379/0
[2023-10-27 10:00:01,050: INFO/MainProcess] mingle: searching for neighbors
[2023-10-27 10:00:02,100: INFO/MainProcess] mingle: all alone
[2023-10-27 10:00:02,150: INFO/MainProcess] celery@yourhostname ready.
```
**Key Parts of the Banner:**
* `celery@yourhostname`: The unique name of this worker instance.
* `transport`: The broker URL it connected to (from your app config).
* `results`: The result backend URL (if configured).
* `concurrency`: How many tasks this worker can potentially run at once (defaults to the number of CPU cores) and the execution pool type (`prefork` is common). We'll touch on this later.
* `queues`: The specific "mailboxes" (queues) the worker is listening to. `celery` is the default queue name.
* `[tasks]`: A list of all the tasks the worker discovered (like our `tasks.add` and `tasks.send_welcome_email`). If your tasks don't show up here, the worker won't be able to run them!
The final `celery@yourhostname ready.` message means the worker is connected and waiting for jobs!
## What the Worker Does
Now that the worker is running, let's trace what happens when you send a task (e.g., from `run_tasks.py` in [Chapter 3: Task](03_task.md)):
1. **Waiting:** The worker is connected to the broker, listening on the `celery` queue.
2. **Message Arrival:** Your `add.delay(5, 7)` call sends a message to the `celery` queue on the broker. The broker notifies the worker.
3. **Receive & Decode:** The worker receives the raw message. It decodes it to find the task name (`tasks.add`), the arguments (`(5, 7)`), and other info (like a unique task ID).
4. **Find Task Code:** The worker looks up the name `tasks.add` in its internal registry (populated when it started) to find the actual Python function `add` defined in `tasks.py`.
5. **Execute:** The worker executes the function: `add(5, 7)`.
* You will see the `print` statements from your task function appear in the *worker's* terminal output:
```text
[2023-10-27 10:05:00,100: INFO/ForkPoolWorker-1] Task tasks.add[some-task-id] received
Task 'add' starting with (5, 7)
Task 'add' finished with result: 12
[2023-10-27 10:05:05,150: INFO/ForkPoolWorker-1] Task tasks.add[some-task-id] succeeded in 5.05s: 12
```
6. **Store Result (Optional):** If a [Result Backend](06_result_backend.md) is configured, the worker takes the return value (`12`) and sends it to the backend, associating it with the task's unique ID.
7. **Acknowledge:** The worker sends an "acknowledgement" (ack) back to the broker. This tells the broker, "I finished processing this message successfully, you can delete it from the queue." This ensures tasks aren't lost if a worker crashes mid-execution (the message would remain on the queue for another worker to pick up).
8. **Wait Again:** The worker goes back to waiting for the next message.
## Running Multiple Workers and Concurrency
* **Multiple Workers:** You can start multiple worker processes by running the `celery worker` command again, perhaps on different machines or in different terminals on the same machine. They will all connect to the same broker and pull tasks from the queue, allowing you to process tasks in parallel and scale your application.
* **Concurrency within a Worker:** A single worker process can often handle more than one task concurrently. Celery achieves this using *execution pools*.
* **Prefork (Default):** The worker starts several child *processes*. Each child process handles one task at a time. The `-c` (or `--concurrency`) flag controls the number of child processes (default is the number of CPU cores). This is good for CPU-bound tasks.
* **Eventlet/Gevent:** Uses *green threads* (lightweight concurrency managed by libraries like eventlet or gevent). A single worker process can handle potentially hundreds or thousands of tasks concurrently, especially if the tasks are I/O-bound (e.g., waiting for network requests). You select these using the `-P` flag: `celery -A celery_app worker -P eventlet -c 1000`. Requires installing the respective library (`pip install eventlet` or `pip install gevent`).
* **Solo:** Executes tasks one after another in the main worker process. Useful for debugging. `-P solo`.
* **Threads:** Uses regular OS threads. `-P threads`. Less common for Celery tasks due to Python's Global Interpreter Lock (GIL) limitations for CPU-bound tasks, but can be useful for I/O-bound tasks.
For beginners, sticking with the default **prefork** pool is usually fine. Just know that the worker can likely handle multiple tasks simultaneously.
## How It Works Internally (Simplified)
Let's visualize the worker's main job: processing a single task.
1. **Startup:** The `celery worker` command starts the main worker process. It loads the `Celery App`, reads the configuration (`broker_url`, tasks to import, etc.).
2. **Connect & Listen:** The worker establishes a connection to the message broker and tells it, "I'm ready to consume messages from the 'celery' queue."
3. **Message Delivery:** The broker sees a message for the 'celery' queue (sent by `add.delay(5, 7)`) and delivers it to the connected worker.
4. **Consumer Receives:** The worker's internal "Consumer" component receives the message.
5. **Task Dispatch:** The Consumer decodes the message, identifies the task (`tasks.add`), and finds the arguments (`(5, 7)`). It then hands this off to the configured execution pool (e.g., prefork).
6. **Pool Execution:** The pool (e.g., a child process in the prefork pool) gets the task function and arguments and executes `add(5, 7)`.
7. **Result Return:** The pool process finishes execution and returns the result (`12`) back to the main worker process.
8. **Result Handling (Optional):** The main worker process, if a [Result Backend](06_result_backend.md) is configured, sends the result (`12`) and task ID to the backend store.
9. **Acknowledgement:** The main worker process sends an "ack" message back to the broker, confirming the task message was successfully processed. The broker then deletes the message.
```mermaid
sequenceDiagram
participant CLI as Terminal (celery worker)
participant WorkerMain as Worker Main Process
participant App as Celery App Instance
participant Broker as Message Broker
participant Pool as Execution Pool (e.g., Prefork Child)
participant TaskCode as Your Task Function (add)
CLI->>WorkerMain: Start celery -A celery_app worker
WorkerMain->>App: Load App & Config (broker_url, tasks)
WorkerMain->>Broker: Connect & Listen on 'celery' queue
Broker-->>WorkerMain: Deliver Message ('tasks.add', (5, 7), task_id)
WorkerMain->>WorkerMain: Decode Message
WorkerMain->>Pool: Request Execute add(5, 7) with task_id
Pool->>TaskCode: Run add(5, 7)
TaskCode-->>Pool: Return 12
Pool-->>WorkerMain: Result=12 for task_id
Note over WorkerMain: (Optionally) Store 12 in Result Backend
WorkerMain->>Broker: Acknowledge task_id is complete
```
## Code Dive: Where Worker Logic Lives
* **Command Line Entry Point (`celery/bin/worker.py`):** This script handles parsing the command-line arguments (`-A`, `-l`, `-c`, `-P`, etc.) when you run `celery worker ...`. It ultimately creates and starts a `WorkController` instance. (See `worker()` function in the file).
* **Main Worker Class (`celery/worker/worker.py`):** The `WorkController` class is the heart of the worker. It manages all the different components (like the pool, consumer, timer, etc.) using a system called "bootsteps". It handles the overall startup, shutdown, and coordination. (See `WorkController` class).
* **Message Handling (`celery/worker/consumer/consumer.py`):** The `Consumer` class (specifically its `Blueprint` and steps like `Tasks` and `Evloop`) is responsible for the core loop of fetching messages from the broker via the connection, decoding them, and dispatching them to the execution pool using task strategies. (See `Consumer.create_task_handler`).
* **Execution Pools (`celery/concurrency/`):** Modules like `prefork.py`, `solo.py`, `eventlet.py`, `gevent.py` implement the different concurrency models (`-P` flag). The `WorkController` selects and manages one of these pools.
A highly simplified conceptual view of the core message processing logic within the `Consumer`:
```python
# Conceptual loop inside the Consumer (highly simplified)
def message_handler(message):
try:
# 1. Decode message (task name, args, kwargs, id, etc.)
task_name, args, kwargs, task_id = decode_message(message.body, message.headers)
# 2. Find the registered task function
task_func = app.tasks[task_name]
# 3. Prepare execution request for the pool
request = TaskRequest(task_id, task_name, task_func, args, kwargs)
# 4. Send request to the pool for execution
# (Pool runs request.execute() which calls task_func(*args, **kwargs))
pool.apply_async(request.execute, accept_callback=task_succeeded, ...)
except Exception as e:
# Handle errors (e.g., unknown task, decoding error)
log_error(e)
message.reject() # Tell broker it failed
def task_succeeded(task_id, retval):
# Called by the pool when task finishes successfully
# 5. Store result (optional)
if app.backend:
app.backend.store_result(task_id, retval, status='SUCCESS')
# 6. Acknowledge message to broker
message.ack()
# --- Setup ---
# Worker connects to broker and registers message_handler
# for incoming messages on the subscribed queue(s)
connection.consume(queue_name, callback=message_handler)
# Start the event loop to wait for messages
connection.drain_events()
```
This illustrates the fundamental cycle: receive -> decode -> find task -> execute via pool -> handle result -> acknowledge. The actual code involves much more detail regarding error handling, state management, different protocols, rate limiting, etc., managed through the bootstep system.
## Conclusion
You've now met the **Celery Worker**, the essential component that actually *runs* your tasks.
* It's a **separate process** you start from the command line (`celery worker`).
* It connects to the **broker** using the configuration from your **Celery App**.
* It **listens** for task messages on queues.
* It **executes** the corresponding task code when a message arrives.
* It handles **concurrency** using execution pools (like prefork, eventlet, gevent).
* It **acknowledges** messages to the broker upon successful completion.
Without workers, Celery tasks would never get done. But what happens when a task finishes? What if it returns a value, like our `add` task returning `12`? How can your original application find out the result? That's where the Result Backend comes in.
**Next:** [Chapter 6: Result Backend](06_result_backend.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,318 @@
# Chapter 6: Result Backend - Checking Your Task's Homework
In [Chapter 5: Worker](05_worker.md), we met the Celery Worker, the diligent entity that picks up task messages from the [Broker Connection (AMQP)](04_broker_connection__amqp_.md) and executes the code defined in our [Task](03_task.md).
But what happens after the worker finishes a task? What if the task was supposed to calculate something, like `add(2, 2)`? How do we, back in our main application, find out the answer (`4`)? Or even just know if the task finished successfully or failed?
This is where the **Result Backend** comes in. It's like a dedicated place to check the status and results of the homework assigned to the workers.
## What Problem Does the Result Backend Solve?
Imagine you give your Celery worker a math problem: "What is 123 + 456?". The worker goes away, calculates the answer (579), and... then what?
If you don't tell the worker *where* to put the answer, it just disappears! You, back in your main program, have no idea if the worker finished, if it got the right answer, or if it encountered an error.
The **Result Backend** solves this by providing a storage location (like a database, a cache like Redis, or even via the message broker itself) where the worker can:
1. Record the final **state** of the task (e.g., `SUCCESS`, `FAILURE`).
2. Store the task's **return value** (e.g., `579`) if it succeeded.
3. Store the **error** information (e.g., `TypeError: unsupported operand type(s)...`) if it failed.
Later, your main application can query this Result Backend using the task's unique ID to retrieve this information.
Think of it as a shared filing cabinet:
* The **Worker** puts the completed homework (result and status) into a specific folder (identified by the task ID).
* Your **Application** can later look inside that folder (using the task ID) to see the results.
## Key Concepts
1. **Storage:** It's a place to store task results and states. This could be Redis, a relational database (like PostgreSQL or MySQL), MongoDB, RabbitMQ (using RPC), and others.
2. **Task ID:** Each task execution gets a unique ID (remember the `result_promise_add.id` from Chapter 3?). This ID is the key used to store and retrieve the result from the backend.
3. **State:** Besides the return value, the backend stores the task's current state (e.g., `PENDING`, `STARTED`, `SUCCESS`, `FAILURE`, `RETRY`, `REVOKED`).
4. **Return Value / Exception:** If the task finishes successfully (`SUCCESS`), the backend stores the value the task function returned. If it fails (`FAILURE`), it stores details about the exception that occurred.
5. **`AsyncResult` Object:** When you call `task.delay()` or `task.apply_async()`, Celery gives you back an `AsyncResult` object. This object holds the task's ID and provides methods to interact with the result backend (check status, get the result, etc.).
## How to Use a Result Backend
**1. Configure It!**
First, you need to tell your Celery app *where* the result backend is located. You do this using the `result_backend` configuration setting, just like you set the `broker_url` in [Chapter 2: Configuration](02_configuration.md).
Let's configure our app to use Redis (make sure you have Redis running!) as the result backend. We'll use database number `1` for results to keep it separate from the broker which might be using database `0`.
```python
# celery_app.py
from celery import Celery
# Configure BOTH broker and result backend
app = Celery('tasks',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/1') # <-- Result Backend URL
# You could also use app.config_from_object('celeryconfig')
# if result_backend = 'redis://localhost:6379/1' is in celeryconfig.py
# ... your task definitions (@app.task) would go here or be imported ...
@app.task
def add(x, y):
import time
time.sleep(3) # Simulate work
return x + y
@app.task
def fail_sometimes(x):
import random
if random.random() < 0.5:
raise ValueError("Something went wrong!")
return f"Processed {x}"
```
**Explanation:**
* `backend='redis://localhost:6379/1'`: We provide a URL telling Celery to use the Redis server running on `localhost`, port `6379`, and specifically database `1` for storing results. (The `backend` argument is an alias for `result_backend`).
**2. Send a Task and Get the `AsyncResult`**
When you send a task, the returned object is your key to the result.
```python
# run_tasks.py
from celery_app import add, fail_sometimes
# Send the add task
result_add = add.delay(10, 20)
print(f"Sent task add(10, 20). Task ID: {result_add.id}")
# Send the task that might fail
result_fail = fail_sometimes.delay("my data")
print(f"Sent task fail_sometimes('my data'). Task ID: {result_fail.id}")
```
**Explanation:**
* `result_add` and `result_fail` are `AsyncResult` objects. They contain the `.id` attribute, which is the unique identifier for *this specific execution* of the task.
**3. Check the Status and Get the Result**
Now, you can use the `AsyncResult` object to interact with the result backend.
**(Run a worker in another terminal first: `celery -A celery_app worker --loglevel=info`)**
```python
# continue in run_tasks.py or a new Python session
from celery_app import app # Need app for AsyncResult if creating from ID
# Use the AsyncResult objects we got earlier
# Or, if you only have the ID, you can recreate the AsyncResult:
# result_add = app.AsyncResult('the-task-id-you-saved-earlier')
print(f"\nChecking results for add task ({result_add.id})...")
# Check if the task is finished (returns True/False immediately)
print(f"Is add ready? {result_add.ready()}")
# Check the state (returns 'PENDING', 'STARTED', 'SUCCESS', 'FAILURE', etc.)
print(f"State of add: {result_add.state}")
# Get the result. IMPORTANT: This call will BLOCK until the task is finished!
# If the task failed, this will raise the exception that occurred in the worker.
try:
# Set a timeout (in seconds) to avoid waiting forever
final_result = result_add.get(timeout=10)
print(f"Result of add: {final_result}")
print(f"Did add succeed? {result_add.successful()}")
print(f"Final state of add: {result_add.state}")
except Exception as e:
print(f"Could not get result for add: {type(e).__name__} - {e}")
print(f"Final state of add: {result_add.state}")
print(f"Did add fail? {result_add.failed()}")
# Get the traceback if it failed
print(f"Traceback: {result_add.traceback}")
print(f"\nChecking results for fail_sometimes task ({result_fail.id})...")
try:
# Wait up to 10 seconds for this task
fail_result = result_fail.get(timeout=10)
print(f"Result of fail_sometimes: {fail_result}")
print(f"Did fail_sometimes succeed? {result_fail.successful()}")
print(f"Final state of fail_sometimes: {result_fail.state}")
except Exception as e:
print(f"Could not get result for fail_sometimes: {type(e).__name__} - {e}")
print(f"Final state of fail_sometimes: {result_fail.state}")
print(f"Did fail_sometimes fail? {result_fail.failed()}")
print(f"Traceback:\n{result_fail.traceback}")
```
**Explanation & Potential Output:**
* `result.ready()`: Checks if the task has finished (reached a `SUCCESS`, `FAILURE`, or other final state). Non-blocking.
* `result.state`: Gets the current state string. Non-blocking.
* `result.successful()`: Returns `True` if the state is `SUCCESS`. Non-blocking.
* `result.failed()`: Returns `True` if the state is `FAILURE` or another exception state. Non-blocking.
* `result.get(timeout=...)`: This is the most common way to get the actual return value.
* **It blocks** (waits) until the task completes *or* the timeout expires.
* If the task state becomes `SUCCESS`, it returns the value the task function returned (e.g., `30`).
* If the task state becomes `FAILURE`, it **raises** the exception that occurred in the worker (e.g., `ValueError: Something went wrong!`).
* If the timeout is reached before the task finishes, it raises a `celery.exceptions.TimeoutError`.
* `result.traceback`: If the task failed, this contains the error traceback string from the worker.
**(Example Output - might vary for `fail_sometimes` due to randomness)**
```text
Sent task add(10, 20). Task ID: f5e8a3f6-c7b1-4a9e-8f0a-1b2c3d4e5f6a
Sent task fail_sometimes('my data'). Task ID: 9b1d8c7e-a6f5-4b3a-9c8d-7e6f5a4b3c2d
Checking results for add task (f5e8a3f6-c7b1-4a9e-8f0a-1b2c3d4e5f6a)...
Is add ready? False
State of add: PENDING # Or STARTED if checked quickly after worker picks it up
Result of add: 30
Did add succeed? True
Final state of add: SUCCESS
Checking results for fail_sometimes task (9b1d8c7e-a6f5-4b3a-9c8d-7e6f5a4b3c2d)...
Could not get result for fail_sometimes: ValueError - Something went wrong!
Final state of fail_sometimes: FAILURE
Did fail_sometimes fail? True
Traceback:
Traceback (most recent call last):
File "/path/to/celery/app/trace.py", line ..., in trace_task
R = retval = fun(*args, **kwargs)
File "/path/to/celery/app/trace.py", line ..., in __protected_call__
return self.run(*args, **kwargs)
File "/path/to/your/project/celery_app.py", line ..., in fail_sometimes
raise ValueError("Something went wrong!")
ValueError: Something went wrong!
```
## How It Works Internally
1. **Task Sent:** Your application calls `add.delay(10, 20)`. It sends a message to the **Broker** and gets back an `AsyncResult` object containing the unique `task_id`.
2. **Worker Executes:** A **Worker** picks up the task message from the Broker. It finds the `add` function and executes `add(10, 20)`. The function returns `30`.
3. **Worker Stores Result:** Because a `result_backend` is configured (`redis://.../1`), the Worker:
* Connects to the Result Backend (Redis DB 1).
* Prepares the result data (e.g., `{'status': 'SUCCESS', 'result': 30, 'task_id': 'f5e8...', ...}`).
* Stores this data in the backend, using the `task_id` as the key (e.g., in Redis, it might set a key like `celery-task-meta-f5e8a3f6-c7b1-4a9e-8f0a-1b2c3d4e5f6a` to the JSON representation of the result data).
* It might also set an expiry time on the result if configured (`result_expires`).
4. **Client Checks Result:** Your application calls `result_add.get(timeout=10)` on the `AsyncResult` object.
5. **Client Queries Backend:** The `AsyncResult` object uses the `task_id` (`f5e8...`) and the configured `result_backend` URL:
* It connects to the Result Backend (Redis DB 1).
* It repeatedly fetches the data associated with the `task_id` key (e.g., `GET celery-task-meta-f5e8...` in Redis).
* It checks the `status` field in the retrieved data.
* If the status is `PENDING` or `STARTED`, it waits a short interval and tries again, until the timeout is reached.
* If the status is `SUCCESS`, it extracts the `result` field (`30`) and returns it.
* If the status is `FAILURE`, it extracts the `result` field (which contains exception info), reconstructs the exception, and raises it.
```mermaid
sequenceDiagram
participant Client as Your Application
participant Task as add.delay(10, 20)
participant Broker as Message Broker (Redis DB 0)
participant Worker as Celery Worker
participant ResultBackend as Result Backend (Redis DB 1)
participant AsyncResult as result_add = AsyncResult(...)
Client->>Task: Call add.delay(10, 20)
Task->>Broker: Send task message (task_id: 't1')
Task-->>Client: Return AsyncResult (id='t1')
Worker->>Broker: Fetch message (task_id: 't1')
Worker->>Worker: Execute add(10, 20) -> returns 30
Worker->>ResultBackend: Store result (key='t1', value={'status': 'SUCCESS', 'result': 30, ...})
ResultBackend-->>Worker: Ack (Result stored)
Worker->>Broker: Ack message complete
Client->>AsyncResult: Call result_add.get(timeout=10)
loop Check Backend Until Ready or Timeout
AsyncResult->>ResultBackend: Get result for key='t1'
ResultBackend-->>AsyncResult: Return {'status': 'SUCCESS', 'result': 30, ...}
end
AsyncResult-->>Client: Return 30
```
## Code Dive: Storing and Retrieving Results
* **Backend Loading (`celery/app/backends.py`):** When Celery starts, it uses the `result_backend` URL to look up the correct backend class (e.g., `RedisBackend`, `DatabaseBackend`, `RPCBackend`) using functions like `by_url` and `by_name`. These map URL schemes (`redis://`, `db+postgresql://`, `rpc://`) or aliases ('redis', 'db', 'rpc') to the actual Python classes. The mapping is defined in `BACKEND_ALIASES`.
* **Base Classes (`celery/backends/base.py`):** All result backends inherit from `BaseBackend`. Many common backends (like Redis, Memcached) inherit from `BaseKeyValueStoreBackend`, which provides common logic for storing results using keys.
* **Storing Result (`BaseKeyValueStoreBackend._store_result` in `celery/backends/base.py`):** This method (called by the worker) is responsible for actually saving the result.
```python
# Simplified from backends/base.py (inside BaseKeyValueStoreBackend)
def _store_result(self, task_id, result, state,
traceback=None, request=None, **kwargs):
# 1. Prepare the metadata dictionary
meta = self._get_result_meta(result=result, state=state,
traceback=traceback, request=request)
meta['task_id'] = bytes_to_str(task_id) # Ensure task_id is str
# (Check if already successfully stored to prevent overwrites - omitted for brevity)
# 2. Encode the metadata (e.g., to JSON or pickle)
encoded_meta = self.encode(meta)
# 3. Get the specific key for this task
key = self.get_key_for_task(task_id) # e.g., b'celery-task-meta-<task_id>'
# 4. Call the specific backend's 'set' method (implemented by RedisBackend etc.)
# It might also set an expiry time (self.expires)
try:
self._set_with_state(key, encoded_meta, state) # Calls self.set(key, encoded_meta)
except Exception as exc:
# Handle potential storage errors, maybe retry
raise BackendStoreError(...) from exc
return result # Returns the original (unencoded) result
```
The `self.set()` method is implemented by the concrete backend (e.g., `RedisBackend.set` uses `redis-py` client's `setex` or `set` command).
* **Retrieving Result (`BaseBackend.wait_for` or `BaseKeyValueStoreBackend.get_many` in `celery/backends/base.py`):** When you call `AsyncResult.get()`, it often ends up calling `wait_for` or similar methods that poll the backend.
```python
# Simplified from backends/base.py (inside SyncBackendMixin)
def wait_for(self, task_id,
timeout=None, interval=0.5, no_ack=True, on_interval=None):
"""Wait for task and return its result meta."""
self._ensure_not_eager() # Check if running in eager mode
time_elapsed = 0.0
while True:
# 1. Get metadata from backend (calls self._get_task_meta_for)
meta = self.get_task_meta(task_id)
# 2. Check if the task is in a final state
if meta['status'] in states.READY_STATES:
return meta # Return the full metadata dict
# 3. Call interval callback if provided
if on_interval:
on_interval()
# 4. Sleep to avoid busy-waiting
time.sleep(interval)
time_elapsed += interval
# 5. Check for timeout
if timeout and time_elapsed >= timeout:
raise TimeoutError('The operation timed out.')
```
The `self.get_task_meta(task_id)` eventually calls `self._get_task_meta_for(task_id)`, which in `BaseKeyValueStoreBackend` uses `self.get(key)` (e.g., `RedisBackend.get` uses `redis-py` client's `GET` command) and then decodes the result using `self.decode_result`.
## Conclusion
You've learned about the crucial **Result Backend**:
* It acts as a **storage place** (like a filing cabinet or database) for task results and states.
* It's configured using the `result_backend` setting in your [Celery App](01_celery_app.md).
* The [Worker](05_worker.md) stores the outcome (success value or failure exception) in the backend after executing a [Task](03_task.md).
* You use the `AsyncResult` object (returned by `.delay()` or `.apply_async()`) and its methods (`.get()`, `.state`, `.ready()`) to query the backend using the task's unique ID.
* Various backend types exist (Redis, Database, RPC, etc.), each with different characteristics.
Result backends allow your application to track the progress and outcome of background work. But what if you want tasks to run automatically at specific times or on a regular schedule, like sending a report every morning? That's where Celery's scheduler comes in.
**Next:** [Chapter 7: Beat (Scheduler)](07_beat__scheduler_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,329 @@
# Chapter 7: Beat (Scheduler) - Celery's Alarm Clock
In the last chapter, [Chapter 6: Result Backend](06_result_backend.md), we learned how to track the status and retrieve the results of our background tasks. This is great when we manually trigger tasks from our application. But what if we want tasks to run automatically, without us needing to press a button every time?
Maybe you need to:
* Send out a newsletter email every Friday morning.
* Clean up temporary files in your system every night.
* Check the health of your external services every 5 minutes.
How can you make Celery do these things on a regular schedule? Meet **Celery Beat**.
## What Problem Does Beat Solve?
Imagine you have a task, say `send_daily_report()`, that needs to run every morning at 8:00 AM. How would you achieve this? You could try setting up a system `cron` job to call a Python script that sends the Celery task, but that adds another layer of complexity.
Celery provides its own built-in solution: **Beat**.
**Beat is Celery's periodic task scheduler.** Think of it like a dedicated alarm clock or a `cron` job system built specifically for triggering Celery tasks. It's a separate program that you run alongside your workers. Its job is simple:
1. Read a list of scheduled tasks (e.g., "run `send_daily_report` every day at 8:00 AM").
2. Keep track of the time.
3. When the time comes for a scheduled task, Beat sends the task message to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md), just as if you had called `.delay()` yourself.
4. A regular Celery [Worker](05_worker.md) then picks up the task from the broker and executes it.
Beat doesn't run the tasks itself; it just *schedules* them by sending the messages at the right time.
## Key Concepts
1. **Beat Process:** A separate Celery program you run (like `celery -A your_app beat`). It needs access to your Celery app's configuration.
2. **Schedule:** A configuration setting (usually `beat_schedule` in your Celery config) that defines which tasks should run and when. This schedule can use simple intervals (like every 30 seconds) or cron-like patterns (like "every Monday at 9 AM").
3. **Schedule Storage:** Beat needs to remember when each task was last run so it knows when it's due again. By default, it saves this information to a local file named `celerybeat-schedule` (using Python's `shelve` module).
4. **Ticker:** The heart of Beat. It's an internal loop that wakes up periodically, checks the schedule against the current time, and sends messages for any due tasks.
## How to Use Beat
Let's schedule two tasks:
* Our `add` task from [Chapter 3: Task](03_task.md) to run every 15 seconds.
* A new (dummy) task `send_report` to run every minute.
**1. Define the Schedule in Configuration**
The best place to define your schedule is in your configuration, either directly on the `app` object or in a separate `celeryconfig.py` file (see [Chapter 2: Configuration](02_configuration.md)). We'll use a separate file.
First, create the new task in your `tasks.py`:
```python
# tasks.py (add this new task)
from celery_app import app
import time
@app.task
def add(x, y):
"""A simple task that adds two numbers."""
print(f"Task 'add' starting with ({x}, {y})")
time.sleep(2) # Simulate short work
result = x + y
print(f"Task 'add' finished with result: {result}")
return result
@app.task
def send_report(name):
"""A task simulating sending a report."""
print(f"Task 'send_report' starting for report: {name}")
time.sleep(5) # Simulate longer work
print(f"Report '{name}' supposedly sent.")
return f"Report {name} sent."
```
Now, update or create `celeryconfig.py`:
```python
# celeryconfig.py
from datetime import timedelta
from celery.schedules import crontab
# Basic Broker/Backend settings (replace with your actual URLs)
broker_url = 'redis://localhost:6379/0'
result_backend = 'redis://localhost:6379/1'
timezone = 'UTC' # Or your preferred timezone, e.g., 'America/New_York'
enable_utc = True
# List of modules to import when the Celery worker starts.
# Make sure tasks.py is discoverable in your Python path
imports = ('tasks',)
# Define the Beat schedule
beat_schedule = {
# Executes tasks.add every 15 seconds with arguments (16, 16)
'add-every-15-seconds': {
'task': 'tasks.add', # The task name
'schedule': 15.0, # Run every 15 seconds (float or timedelta)
'args': (16, 16), # Positional arguments for the task
},
# Executes tasks.send_report every minute
'send-report-every-minute': {
'task': 'tasks.send_report',
'schedule': crontab(), # Use crontab() for "every minute"
'args': ('daily-summary',), # Argument for the report name
# Example using crontab for more specific timing:
# 'schedule': crontab(hour=8, minute=0, day_of_week='fri'), # Every Friday at 8:00 AM
},
}
```
**Explanation:**
* `from datetime import timedelta`: Used for simple interval schedules.
* `from celery.schedules import crontab`: Used for cron-like scheduling.
* `imports = ('tasks',)`: Ensures the worker and beat know about the tasks defined in `tasks.py`.
* `beat_schedule = {...}`: This dictionary holds all your scheduled tasks.
* Each key (`'add-every-15-seconds'`, `'send-report-every-minute'`) is a unique name for the schedule entry.
* Each value is another dictionary describing the schedule:
* `'task'`: The full name of the task to run (e.g., `'module_name.task_name'`).
* `'schedule'`: Defines *when* to run.
* A `float` or `int`: number of seconds between runs.
* A `timedelta` object: the time interval between runs.
* A `crontab` object: for complex schedules (minute, hour, day_of_week, etc.). `crontab()` with no arguments means "every minute".
* `'args'`: A tuple of positional arguments to pass to the task.
* `'kwargs'`: (Optional) A dictionary of keyword arguments to pass to the task.
* `'options'`: (Optional) A dictionary of execution options like `queue`, `priority`.
**2. Load the Configuration in Your App**
Make sure your `celery_app.py` loads this configuration:
```python
# celery_app.py
from celery import Celery
# Create the app instance
app = Celery('tasks')
# Load configuration from the 'celeryconfig' module
app.config_from_object('celeryconfig')
# Tasks might be defined here, but we put them in tasks.py
# which is loaded via the 'imports' setting in celeryconfig.py
```
**3. Run Celery Beat**
Now, open a terminal and run the Beat process. You need to tell it where your app is (`-A celery_app`):
```bash
# In your terminal
celery -A celery_app beat --loglevel=info
```
**Explanation:**
* `celery`: The Celery command-line tool.
* `-A celery_app`: Points to your app instance (in `celery_app.py`).
* `beat`: Tells Celery to start the scheduler process.
* `--loglevel=info`: Shows informational messages about what Beat is doing.
You'll see output similar to this:
```text
celery beat v5.x.x is starting.
__ - ... __ - _
LocalTime -> 2023-10-27 11:00:00
Configuration ->
. broker -> redis://localhost:6379/0
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]@INFO
. maxinterval -> 300.0s (5m0s)
celery beat v5.x.x has started.
```
Beat is now running! It will check the schedule and:
* Every 15 seconds, it will send a message to run `tasks.add(16, 16)`.
* Every minute, it will send a message to run `tasks.send_report('daily-summary')`.
**4. Run a Worker (Crucial!)**
Beat only *sends* the task messages. You still need a [Worker](05_worker.md) running to actually *execute* the tasks. Open **another terminal** and start a worker:
```bash
# In a SECOND terminal
celery -A celery_app worker --loglevel=info
```
Now, watch the output in the **worker's terminal**. You should see logs appearing periodically as the worker receives and executes the tasks sent by Beat:
```text
# Output in the WORKER terminal (example)
[2023-10-27 11:00:15,000: INFO/MainProcess] Task tasks.add[task-id-1] received
Task 'add' starting with (16, 16)
Task 'add' finished with result: 32
[2023-10-27 11:00:17,050: INFO/MainProcess] Task tasks.add[task-id-1] succeeded in 2.05s: 32
[2023-10-27 11:01:00,000: INFO/MainProcess] Task tasks.send_report[task-id-2] received
Task 'send_report' starting for report: daily-summary
[2023-10-27 11:01:00,000: INFO/MainProcess] Task tasks.add[task-id-3] received # Another 'add' task might arrive while 'send_report' runs
Task 'add' starting with (16, 16)
Task 'add' finished with result: 32
[2023-10-27 11:01:02,050: INFO/MainProcess] Task tasks.add[task-id-3] succeeded in 2.05s: 32
Report 'daily-summary' supposedly sent.
[2023-10-27 11:01:05,100: INFO/MainProcess] Task tasks.send_report[task-id-2] succeeded in 5.10s: "Report daily-summary sent."
... and so on ...
```
You have successfully set up scheduled tasks!
## How It Works Internally (Simplified)
1. **Startup:** You run `celery -A celery_app beat`. The Beat process starts.
2. **Load Config:** It loads the Celery app (`celery_app`) and reads its configuration, paying special attention to `beat_schedule`.
3. **Load State:** It opens the schedule file (e.g., `celerybeat-schedule`) to see when each task was last run. If the file doesn't exist, it creates it.
4. **Main Loop (Tick):** Beat enters its main loop (the "ticker").
5. **Calculate Due Tasks:** In each tick, Beat looks at every entry in `beat_schedule`. For each entry, it compares the current time with the task's `schedule` definition and its `last_run_at` time (from the schedule file). It calculates which tasks are due to run *right now*.
6. **Send Task Message:** If a task (e.g., `add-every-15-seconds`) is due, Beat constructs a task message (containing `'tasks.add'`, `args=(16, 16)`, etc.) just like `.delay()` would. It sends this message to the configured **Broker**.
7. **Update State:** Beat updates the `last_run_at` time for the task it just sent in its internal state and saves this back to the schedule file.
8. **Sleep:** Beat calculates the time until the *next* scheduled task is due and sleeps for that duration (or up to a maximum interval, `beat_max_loop_interval`, usually 5 minutes, whichever is shorter).
9. **Repeat:** Go back to step 5.
Meanwhile, a **Worker** process is connected to the same **Broker**, picks up the task messages sent by Beat, and executes them.
```mermaid
sequenceDiagram
participant Beat as Celery Beat Process
participant ScheduleCfg as beat_schedule Config
participant ScheduleDB as celerybeat-schedule File
participant Broker as Message Broker
participant Worker as Celery Worker
Beat->>ScheduleCfg: Load schedule definitions on startup
Beat->>ScheduleDB: Load last run times on startup
loop Tick Loop (e.g., every second or more)
Beat->>Beat: Check current time
Beat->>ScheduleCfg: Get definition for 'add-every-15'
Beat->>ScheduleDB: Get last run time for 'add-every-15'
Beat->>Beat: Calculate if 'add-every-15' is due now
alt Task 'add-every-15' is due
Beat->>Broker: Send task message('tasks.add', (16, 16))
Broker-->>Beat: Ack (Message Queued)
Beat->>ScheduleDB: Update last run time for 'add-every-15'
ScheduleDB-->>Beat: Ack (Saved)
end
Beat->>Beat: Calculate time until next task is due
Beat->>Beat: Sleep until next check
end
Worker->>Broker: Fetch task message ('tasks.add', ...)
Broker-->>Worker: Deliver message
Worker->>Worker: Execute task add(16, 16)
Worker->>Broker: Ack message complete
```
## Code Dive: Where Beat Lives
* **Command Line (`celery/bin/beat.py`):** Handles the `celery beat` command, parses arguments (`-A`, `-s`, `-S`, `--loglevel`), and creates/runs the `Beat` service object.
* **Beat Service Runner (`celery/apps/beat.py`):** The `Beat` class sets up the environment, loads the app, initializes logging, creates the actual scheduler service (`celery.beat.Service`), installs signal handlers, and starts the service.
* **Beat Service (`celery/beat.py:Service`):** This class manages the lifecycle of the scheduler. Its `start()` method contains the main loop that repeatedly calls `scheduler.tick()`. It loads the scheduler class specified in the configuration (defaulting to `PersistentScheduler`).
* **Scheduler (`celery/beat.py:Scheduler` / `PersistentScheduler`):** This is the core logic.
* `Scheduler` is the base class. Its `tick()` method calculates the time until the next event, finds due tasks, calls `apply_entry` for due tasks, and returns the sleep interval.
* `PersistentScheduler` inherits from `Scheduler` and adds the logic to load/save the schedule state (last run times) using `shelve` (the `celerybeat-schedule` file). It overrides methods like `setup_schedule`, `sync`, `close`, and `schedule` property to interact with the `shelve` store (`self._store`).
* **Schedule Types (`celery/schedules.py`):** Defines classes like `schedule` (for `timedelta` intervals) and `crontab`. These classes implement the `is_due(last_run_at)` method, which the `Scheduler.tick()` method uses to determine if a task entry should run.
A simplified conceptual look at the `beat_schedule` config structure:
```python
# Example structure from celeryconfig.py
beat_schedule = {
'schedule-name-1': { # Unique name for this entry
'task': 'my_app.tasks.task1', # Task to run (module.task_name)
'schedule': 30.0, # When to run (e.g., seconds, timedelta, crontab)
'args': (arg1, arg2), # Optional: Positional arguments
'kwargs': {'key': 'value'}, # Optional: Keyword arguments
'options': {'queue': 'hipri'},# Optional: Execution options
},
'schedule-name-2': {
'task': 'my_app.tasks.task2',
'schedule': crontab(minute=0, hour=0), # e.g., Run at midnight
# ... other options ...
},
}
```
And a very simplified concept of the `Scheduler.tick()` method:
```python
# Simplified conceptual logic of Scheduler.tick()
def tick(self):
remaining_times = []
due_tasks = []
# 1. Iterate through schedule entries
for entry in self.schedule.values(): # self.schedule reads from PersistentScheduler._store['entries']
# 2. Check if entry is due using its schedule object (e.g., crontab)
is_due, next_time_to_run = entry.is_due() # Calls schedule.is_due(entry.last_run_at)
if is_due:
due_tasks.append(entry)
else:
remaining_times.append(next_time_to_run) # Store time until next check
# 3. Apply due tasks (send message to broker)
for entry in due_tasks:
self.apply_entry(entry) # Sends task message and updates entry's last_run_at in schedule store
# 4. Calculate minimum sleep time until next event
return min(remaining_times + [self.max_interval])
```
## Conclusion
Celery Beat is your tool for automating task execution within the Celery ecosystem.
* It acts as a **scheduler**, like an alarm clock or `cron` for Celery tasks.
* It runs as a **separate process** (`celery beat`).
* You define the schedule using the `beat_schedule` setting in your configuration, specifying **what** tasks run, **when** (using intervals or crontabs), and with what **arguments**.
* Beat **sends task messages** to the broker at the scheduled times.
* Running **Workers** are still required to pick up and execute these tasks.
Beat allows you to reliably automate recurring background jobs, from simple periodic checks to complex, time-specific operations.
Now that we know how to run individual tasks, get their results, and schedule them automatically, what if we want to create more complex workflows involving multiple tasks that depend on each other? That's where Celery's Canvas comes in.
**Next:** [Chapter 8: Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,343 @@
# Chapter 8: Canvas (Signatures & Primitives) - Building Task Workflows
In the previous chapter, [Chapter 7: Beat (Scheduler)](07_beat__scheduler_.md), we learned how to schedule tasks to run automatically at specific times using Celery Beat. This is great for recurring jobs. But what if you need to run a sequence of tasks, where one task depends on the result of another? Or run multiple tasks in parallel and then collect their results?
Imagine you're building a feature where a user uploads an article, and you need to:
1. Fetch the article content from a URL.
2. Process the text to extract keywords.
3. Process the text to detect the language.
4. Once *both* processing steps are done, save the article and the extracted metadata to your database.
Simply running these tasks independently won't work. Keyword extraction and language detection can happen at the same time, but only *after* the content is fetched. Saving can only happen *after* both processing steps are complete. How do you orchestrate this multi-step workflow?
This is where **Celery Canvas** comes in. It provides the building blocks to design complex task workflows.
## What Problem Does Canvas Solve?
Canvas helps you connect individual [Task](03_task.md)s together to form more sophisticated processes. It solves the problem of defining dependencies and flow control between tasks. Instead of just firing off tasks one by one and hoping they complete in the right order or manually checking results, Canvas lets you declare the desired workflow structure directly.
Think of it like having different types of Lego bricks:
* Some bricks represent a single task.
* Other bricks let you connect tasks end-to-end (run in sequence).
* Some let you stack bricks side-by-side (run in parallel).
* Others let you build a structure where several parallel steps must finish before the next piece is added.
Canvas gives you these connecting bricks for your Celery tasks.
## Key Concepts: Signatures and Primitives
The core ideas in Canvas are **Signatures** and **Workflow Primitives**.
1. **Signature (`signature` or `.s()`): The Basic Building Block**
* A `Signature` wraps up everything needed to call a single task: the task's name, the arguments (`args`), the keyword arguments (`kwargs`), and any execution options (like `countdown`, `eta`, queue name).
* Think of it as a **pre-filled request form** or a **recipe card** for a specific task execution. It doesn't *run* the task immediately; it just holds the plan for running it.
* The easiest way to create a signature is using the `.s()` shortcut on a task function.
```python
# tasks.py
from celery_app import app # Assuming app is defined in celery_app.py
@app.task
def add(x, y):
return x + y
# Create a signature for add(2, 3)
add_sig = add.s(2, 3)
# add_sig now holds the 'plan' to run add(2, 3)
print(f"Signature: {add_sig}")
print(f"Task name: {add_sig.task}")
print(f"Arguments: {add_sig.args}")
# To actually run it, you call .delay() or .apply_async() ON the signature
# result_promise = add_sig.delay()
```
**Output:**
```text
Signature: tasks.add(2, 3)
Task name: tasks.add
Arguments: (2, 3)
```
2. **Primitives: Connecting the Blocks**
Canvas provides several functions (primitives) to combine signatures into workflows:
* **`chain`:** Links tasks sequentially. The result of the first task is passed as the first argument to the second task, and so on.
* Analogy: An assembly line where each station passes its output to the next.
* Syntax: `(sig1 | sig2 | sig3)` or `chain(sig1, sig2, sig3)`
* **`group`:** Runs a list of tasks in parallel. It returns a special result object that helps track the group.
* Analogy: Hiring several workers to do similar jobs independently at the same time.
* Syntax: `group(sig1, sig2, sig3)`
* **`chord`:** Runs a group of tasks in parallel (the "header"), and *then*, once *all* tasks in the group have finished successfully, it runs a single callback task (the "body") with the results of the header tasks.
* Analogy: A team of researchers works on different parts of a project in parallel. Once everyone is done, a lead researcher collects all the findings to write the final report.
* Syntax: `chord(group(header_sigs), body_sig)`
There are other primitives like `chunks`, `xmap`, and `starmap`, but `chain`, `group`, and `chord` are the most fundamental ones for building workflows.
## How to Use Canvas: Building the Article Processing Workflow
Let's build the workflow we described earlier: Fetch -> (Process Keywords & Detect Language in parallel) -> Save.
**1. Define the Tasks**
First, we need our basic tasks. Let's create dummy versions in `tasks.py`:
```python
# tasks.py
from celery_app import app
import time
import random
@app.task
def fetch_data(url):
print(f"Fetching data from {url}...")
time.sleep(1)
# Simulate fetching some data
data = f"Content from {url} - {random.randint(1, 100)}"
print(f"Fetched: {data}")
return data
@app.task
def process_part_a(data):
print(f"Processing Part A for: {data}")
time.sleep(2)
result_a = f"Keywords for '{data}'"
print("Part A finished.")
return result_a
@app.task
def process_part_b(data):
print(f"Processing Part B for: {data}")
time.sleep(3) # Simulate slightly longer processing
result_b = f"Language for '{data}'"
print("Part B finished.")
return result_b
@app.task
def combine_results(results):
# 'results' will be a list containing the return values
# of process_part_a and process_part_b
print(f"Combining results: {results}")
time.sleep(1)
final_output = f"Combined: {results[0]} | {results[1]}"
print(f"Final Output: {final_output}")
return final_output
```
**2. Define the Workflow Using Canvas**
Now, in a separate script or Python shell, let's define the workflow using signatures and primitives.
```python
# run_workflow.py
from celery import chain, group, chord
from tasks import fetch_data, process_part_a, process_part_b, combine_results
# The URL we want to process
article_url = "http://example.com/article1"
# Create the workflow structure
# 1. Fetch data. The result (data) is passed to the next step.
# 2. The next step is a chord:
# - Header: A group running process_part_a and process_part_b in parallel.
# Both tasks receive the 'data' from fetch_data.
# - Body: combine_results receives a list of results from the group.
workflow = chain(
fetch_data.s(article_url), # Step 1: Fetch
chord( # Step 2: Chord
group(process_part_a.s(), process_part_b.s()), # Header: Parallel processing
combine_results.s() # Body: Combine results
)
)
print(f"Workflow definition:\n{workflow}")
# Start the workflow
print("\nSending workflow to Celery...")
result_promise = workflow.apply_async()
print(f"Workflow sent! Final result ID: {result_promise.id}")
print("Run a Celery worker to execute the tasks.")
# You can optionally wait for the final result:
# final_result = result_promise.get()
# print(f"\nWorkflow finished! Final result: {final_result}")
```
**Explanation:**
* We import `chain`, `group`, `chord` from `celery`.
* We import our task functions.
* `fetch_data.s(article_url)`: Creates a signature for the first step.
* `process_part_a.s()` and `process_part_b.s()`: Create signatures for the parallel tasks. Note that we *don't* provide the `data` argument here. `chain` automatically passes the result of `fetch_data` to the *next* task in the sequence. Since the next task is a `chord` containing a `group`, Celery cleverly passes the `data` to *each* task within that group.
* `combine_results.s()`: Creates the signature for the final step (the chord's body). It doesn't need arguments initially because the `chord` will automatically pass the list of results from the header group to it.
* `chain(...)`: Connects `fetch_data` to the `chord`.
* `chord(group(...), ...)`: Defines that the group must finish before `combine_results` is called.
* `group(...)`: Defines that `process_part_a` and `process_part_b` run in parallel.
* `workflow.apply_async()`: This sends the *first* task (`fetch_data`) to the broker. The rest of the workflow is encoded in the task's options (like `link` or `chord` information) so that Celery knows what to do next after each step completes.
If you run this script (and have a [Worker](05_worker.md) running), you'll see the tasks execute in the worker logs, respecting the defined dependencies and parallelism. `fetch_data` runs first, then `process_part_a` and `process_part_b` run concurrently, and finally `combine_results` runs after both A and B are done.
## How It Works Internally (Simplified Walkthrough)
Let's trace a simpler workflow: `my_chain = (add.s(2, 2) | add.s(4))`
1. **Workflow Definition:** When you create `my_chain`, Celery creates a `chain` object containing the signatures `add.s(2, 2)` and `add.s(4)`.
2. **Sending (`my_chain.apply_async()`):**
* Celery looks at the first task in the chain: `add.s(2, 2)`.
* It prepares to send this task message to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md).
* Crucially, it adds a special option to the message, often called `link` (or uses the `chain` field in newer protocols). This option contains the *signature* of the next task in the chain: `add.s(4)`.
* The message for `add(2, 2)` (with the link to `add(4)`) is sent to the broker.
3. **Worker 1 Executes First Task:**
* A [Worker](05_worker.md) picks up the message for `add(2, 2)`.
* It runs the `add` function with arguments `(2, 2)`. The result is `4`.
* The worker stores the result `4` in the [Result Backend](06_result_backend.md) (if configured).
* The worker notices the `link` option in the original message, pointing to `add.s(4)`.
4. **Worker 1 Sends Second Task:**
* The worker takes the result of the first task (`4`).
* It uses the linked signature `add.s(4)`.
* It *prepends* the result (`4`) to the arguments of the linked signature, making it effectively `add.s(4, 4)`. *(Note: The original `4` in `add.s(4)` came from the chain definition, the first `4` is the result)*.
* It sends a *new* message to the broker for `add(4, 4)`.
5. **Worker 2 Executes Second Task:**
* Another (or the same) worker picks up the message for `add(4, 4)`.
* It runs `add(4, 4)`. The result is `8`.
* It stores the result `8` in the backend.
* There are no more links, so the chain is complete.
`group` works by sending all task messages in the group concurrently. `chord` is more complex; it involves the workers coordinating via the [Result Backend](06_result_backend.md) to count completed tasks in the header before the callback task is finally sent.
```mermaid
sequenceDiagram
participant Client as Your Code
participant Canvas as workflow = chain(...)
participant Broker as Message Broker
participant Worker as Celery Worker
Client->>Canvas: workflow.apply_async()
Note over Canvas: Prepare msg for add(2, 2) with link=add.s(4)
Canvas->>Broker: Send Task 1 msg ('add', (2, 2), link=add.s(4), id=T1)
Broker-->>Canvas: Ack
Canvas-->>Client: Return AsyncResult(id=T2) # ID of the *last* task in chain
Worker->>Broker: Fetch msg (T1)
Broker-->>Worker: Deliver Task 1 msg
Worker->>Worker: Execute add(2, 2) -> returns 4
Note over Worker: Store result 4 for T1 in Backend
Worker->>Worker: Check 'link' option -> add.s(4)
Note over Worker: Prepare msg for add(4, 4) using result 4 + linked args
Worker->>Broker: Send Task 2 msg ('add', (4, 4), id=T2)
Broker-->>Worker: Ack
Worker->>Broker: Ack Task 1 msg complete
Worker->>Broker: Fetch msg (T2)
Broker-->>Worker: Deliver Task 2 msg
Worker->>Worker: Execute add(4, 4) -> returns 8
Note over Worker: Store result 8 for T2 in Backend
Worker->>Broker: Ack Task 2 msg complete
```
## Code Dive: Canvas Implementation
The logic for signatures and primitives resides primarily in `celery/canvas.py`.
* **`Signature` Class:**
* Defined in `celery/canvas.py`. It's essentially a dictionary subclass holding `task`, `args`, `kwargs`, `options`, etc.
* The `.s()` method on a `Task` instance (in `celery/app/task.py`) is a shortcut to create a `Signature`.
* `apply_async`: Prepares arguments/options by calling `_merge` and then delegates to `self.type.apply_async` (the task's method) or `app.send_task`.
* `link`, `link_error`: Methods that modify the `options` dictionary to add callbacks.
* `__or__`: The pipe operator (`|`) overload. It checks the type of the right-hand operand (`other`) and constructs a `_chain` object accordingly.
```python
# Simplified from celery/canvas.py
class Signature(dict):
# ... methods like __init__, clone, set, apply_async ...
def link(self, callback):
# Appends callback signature to the 'link' list in options
return self.append_to_list_option('link', callback)
def link_error(self, errback):
# Appends errback signature to the 'link_error' list in options
return self.append_to_list_option('link_error', errback)
def __or__(self, other):
# Called when you use the pipe '|' operator
if isinstance(other, Signature):
# task | task -> chain
return _chain(self, other, app=self._app)
# ... other cases for group, chain ...
return NotImplemented
```
* **`_chain` Class:**
* Also in `celery/canvas.py`, inherits from `Signature`. Its `task` name is hardcoded to `'celery.chain'`. The actual task signatures are stored in `kwargs['tasks']`.
* `apply_async` / `run`: Contains the logic to handle sending the first task with the rest of the chain embedded in the options (either via `link` for protocol 1 or the `chain` message property for protocol 2).
* `prepare_steps`: This complex method recursively unwraps nested primitives (like a chain within a chain, or a group that needs to become a chord) and sets up the linking between steps.
```python
# Simplified concept from celery/canvas.py (chain execution)
class _chain(Signature):
# ... __init__, __or__ ...
def apply_async(self, args=None, kwargs=None, **options):
# ... handle always_eager ...
return self.run(args, kwargs, app=self.app, **options)
def run(self, args=None, kwargs=None, app=None, **options):
# ... setup ...
tasks, results = self.prepare_steps(...) # Unroll and freeze tasks
if results: # If there are tasks to run
first_task = tasks.pop() # Get the first task (list is reversed)
remaining_chain = tasks if tasks else None
# Determine how to pass the chain info (link vs. message field)
use_link = self._use_link # ... logic to decide ...
if use_link:
# Protocol 1: Link first task to the second task
if remaining_chain:
first_task.link(remaining_chain.pop())
# (Worker handles subsequent links)
options_to_apply = options # Pass original options
else:
# Protocol 2: Embed the rest of the reversed chain in options
options_to_apply = ChainMap({'chain': remaining_chain}, options)
# Send the *first* task only
result_from_apply = first_task.apply_async(**options_to_apply)
# Return AsyncResult of the *last* task in the original chain
return results[0]
```
* **`group` Class:**
* In `celery/canvas.py`. Its `task` name is `'celery.group'`.
* `apply_async`: Iterates through its `tasks`, freezes each one (assigning a common `group_id`), sends their messages, and collects the `AsyncResult` objects into a `GroupResult`. It uses a `barrier` (from the `vine` library) to track completion.
* **`chord` Class:**
* In `celery/canvas.py`. Its `task` name is `'celery.chord'`.
* `apply_async` / `run`: Coordinates with the result backend (`backend.apply_chord`). It typically runs the header `group` first, configuring it to notify the backend upon completion. The backend then triggers the `body` task once the count is reached.
## Conclusion
Celery Canvas transforms simple tasks into powerful workflow components.
* A **Signature** (`task.s()`) captures the details for a single task call without running it.
* Primitives like **`chain`** (`|`), **`group`**, and **`chord`** combine signatures to define complex execution flows:
* `chain`: Sequence (output of one to input of next).
* `group`: Parallel execution.
* `chord`: Parallel execution followed by a callback with all results.
* You compose these primitives like building with Lego bricks to model your application's logic.
* Calling `.apply_async()` on a workflow primitive starts the process by sending the first task(s), embedding the rest of the workflow logic in the task options or using backend coordination.
Canvas allows you to move complex orchestration logic out of your application code and into Celery, making your tasks more modular and your overall system more robust.
Now that you can build and run complex workflows, how do you monitor what's happening inside Celery? How do you know when tasks start, finish, or fail in real-time?
**Next:** [Chapter 9: Events](09_events.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

310
docs/Celery/09_events.md Normal file
View File

@@ -0,0 +1,310 @@
# Chapter 9: Events - Listening to Celery's Heartbeat
In [Chapter 8: Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md), we saw how to build complex workflows by chaining tasks together or running them in parallel. But as your Celery system gets busier, you might wonder: "What are my workers doing *right now*? Which tasks have started? Which ones finished successfully or failed?"
Imagine you're running an important data processing job involving many tasks. Wouldn't it be great to have a live dashboard showing the progress, or get immediate notifications if something goes wrong? This is where **Celery Events** come in.
## What Problem Do Events Solve?
Celery Events provide a **real-time monitoring system** for your tasks and workers. Think of it like a live activity log or a notification system built into Celery.
Without events, finding out what happened requires checking logs or querying the [Result Backend](06_result_backend.md) for each task individually. This isn't ideal for getting a live overview of the entire cluster.
Events solve this by having workers broadcast messages (events) about important actions they take, such as:
* A worker coming online or going offline.
* A worker receiving a task.
* A worker starting to execute a task.
* A task succeeding or failing.
* A worker sending out a heartbeat signal.
Other programs can then listen to this stream of event messages to monitor the health and activity of the Celery cluster in real-time, build dashboards (like the popular tool Flower), or trigger custom alerts.
## Key Concepts
1. **Events:** Special messages sent by workers (and sometimes clients) describing an action. Each event has a `type` (e.g., `task-received`, `worker-online`) and contains details relevant to that action (like the task ID, worker hostname, timestamp).
2. **Event Exchange:** Events aren't sent to the regular task queues. They are published to a dedicated, named exchange on the [Broker Connection (AMQP)](04_broker_connection__amqp_.md). Think of it as a separate broadcast channel just for monitoring messages.
3. **Event Sender (`EventDispatcher`):** A component within the [Worker](05_worker.md) responsible for creating and sending event messages to the broker's event exchange. This is usually disabled by default for performance reasons.
4. **Event Listener (`EventReceiver`):** Any program that connects to the event exchange on the broker and consumes the stream of event messages. This could be the `celery events` command-line tool, Flower, or your own custom monitoring script.
5. **Event Types:** Celery defines many event types. Some common ones include:
* `worker-online`, `worker-offline`, `worker-heartbeat`: Worker status updates.
* `task-sent`: Client sent a task request (requires `task_send_sent_event` setting).
* `task-received`: Worker received the task message.
* `task-started`: Worker started executing the task code.
* `task-succeeded`: Task finished successfully.
* `task-failed`: Task failed with an error.
* `task-retried`: Task is being retried.
* `task-revoked`: Task was cancelled/revoked.
## How to Use Events: Simple Monitoring
Let's see how to enable events and watch the live stream using Celery's built-in tool.
**1. Enable Events in the Worker**
By default, workers don't send events to save resources. You need to explicitly tell them to start sending. You can do this in two main ways:
* **Command-line flag (`-E`):** When starting your worker, add the `-E` flag.
```bash
# Start a worker AND enable sending events
celery -A celery_app worker --loglevel=info -E
```
* **Configuration Setting:** Set `worker_send_task_events = True` in your Celery configuration ([Chapter 2: Configuration](02_configuration.md)). This is useful if you always want events enabled for workers using that configuration. You can also enable worker-specific events (`worker-online`, `worker-heartbeat`) with `worker_send_worker_events = True` (which defaults to True).
```python
# celeryconfig.py (example)
broker_url = 'redis://localhost:6379/0'
result_backend = 'redis://localhost:6379/1'
imports = ('tasks',)
# Enable sending task-related events
task_send_sent_event = False # Optional: If you want task-sent events too
worker_send_task_events = True
worker_send_worker_events = True # Usually True by default
```
Now, any worker started with this configuration (or the `-E` flag) will publish events to the broker.
**2. Watch the Event Stream**
Celery provides a command-line tool called `celery events` that acts as a simple event listener and prints the events it receives to your console.
Open **another terminal** (while your worker with events enabled is running) and run:
```bash
# Watch for events associated with your app
celery -A celery_app events
```
Alternatively, you can use the more descriptive (but older) command `celery control enable_events` to tell already running workers to start sending events, and `celery control disable_events` to stop them.
**What You'll See:**
Initially, `celery events` might show nothing. Now, try sending a task from another script or shell (like the `run_tasks.py` from [Chapter 3: Task](03_task.md)):
```python
# In a third terminal/shell
from tasks import add
result = add.delay(5, 10)
print(f"Sent task {result.id}")
```
Switch back to the terminal running `celery events`. You should see output similar to this (details and timestamps will vary):
```text
-> celery events v5.x.x
-> connected to redis://localhost:6379/0
-------------- task-received celery@myhostname [2023-10-27 12:00:01.100]
uuid:a1b2c3d4-e5f6-7890-1234-567890abcdef
name:tasks.add
args:[5, 10]
kwargs:{}
retries:0
eta:null
hostname:celery@myhostname
timestamp:1666872001.1
pid:12345
...
-------------- task-started celery@myhostname [2023-10-27 12:00:01.150]
uuid:a1b2c3d4-e5f6-7890-1234-567890abcdef
hostname:celery@myhostname
timestamp:1666872001.15
pid:12345
...
-------------- task-succeeded celery@myhostname [2023-10-27 12:00:04.200]
uuid:a1b2c3d4-e5f6-7890-1234-567890abcdef
result:'15'
runtime:3.05
hostname:celery@myhostname
timestamp:1666872004.2
pid:12345
...
```
**Explanation:**
* `celery events` connects to the broker defined in `celery_app`.
* It listens for messages on the event exchange.
* As the worker processes the `add(5, 10)` task, it sends `task-received`, `task-started`, and `task-succeeded` events.
* `celery events` receives these messages and prints their details.
This gives you a raw, real-time feed of what's happening in your Celery cluster!
**Flower: A Visual Monitor**
While `celery events` is useful, it's quite basic. A very popular tool called **Flower** uses the same event stream to provide a web-based dashboard for monitoring your Celery cluster. It shows running tasks, completed tasks, worker status, task details, and more, all updated in real-time thanks to Celery Events. You can typically install it (`pip install flower`) and run it (`celery -A celery_app flower`).
## How It Works Internally (Simplified)
1. **Worker Action:** A worker performs an action (e.g., starts executing task `T1`).
2. **Event Dispatch:** If events are enabled, the worker's internal `EventDispatcher` component is notified.
3. **Create Event Message:** The `EventDispatcher` creates a dictionary representing the event (e.g., `{'type': 'task-started', 'uuid': 'T1', 'hostname': 'worker1', ...}`).
4. **Publish to Broker:** The `EventDispatcher` uses its connection to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md) to publish this event message to a specific **event exchange** (usually named `celeryev`). It uses a routing key based on the event type (e.g., `task.started`).
5. **Listener Connects:** A monitoring tool (like `celery events` or Flower) starts up. It creates an `EventReceiver`.
6. **Declare Queue:** The `EventReceiver` connects to the same broker and declares a temporary, unique queue bound to the event exchange (`celeryev`), often configured to receive all event types (`#` routing key).
7. **Consume Events:** The `EventReceiver` starts consuming messages from its dedicated queue.
8. **Process Event:** When an event message (like the `task-started` message for `T1`) arrives from the broker, the `EventReceiver` decodes it and passes it to a handler (e.g., `celery events` prints it, Flower updates its web UI).
```mermaid
sequenceDiagram
participant Worker
participant Dispatcher as EventDispatcher (in Worker)
participant Broker as Message Broker
participant Receiver as EventReceiver (e.g., celery events tool)
participant Display as Console/UI
Worker->>Worker: Starts executing Task T1
Worker->>Dispatcher: Notify: Task T1 started
Dispatcher->>Dispatcher: Create event message {'type': 'task-started', ...}
Dispatcher->>Broker: Publish event msg to 'celeryev' exchange (routing_key='task.started')
Broker-->>Dispatcher: Ack (Message Sent)
Receiver->>Broker: Connect and declare unique queue bound to 'celeryev' exchange
Broker-->>Receiver: Queue ready
Broker->>Receiver: Deliver event message {'type': 'task-started', ...}
Receiver->>Receiver: Decode message
Receiver->>Display: Process event (e.g., print to console)
```
## Code Dive: Sending and Receiving Events
* **Enabling Events (`celery/worker/consumer/events.py`):** The `Events` bootstep in the worker process is responsible for initializing the `EventDispatcher`. The `-E` flag or configuration settings control whether this bootstep actually enables the dispatcher.
```python
# Simplified from worker/consumer/events.py
class Events(bootsteps.StartStopStep):
requires = (Connection,)
def __init__(self, c, task_events=True, # Controlled by config/flags
# ... other flags ...
**kwargs):
self.send_events = task_events # or other flags
self.enabled = self.send_events
# ...
super().__init__(c, **kwargs)
def start(self, c):
# ... gets connection ...
# Creates the actual dispatcher instance
dis = c.event_dispatcher = c.app.events.Dispatcher(
c.connection_for_write(),
hostname=c.hostname,
enabled=self.send_events, # Only sends if enabled
# ... other options ...
)
# ... flush buffer ...
```
* **Sending Events (`celery/events/dispatcher.py`):** The `EventDispatcher` class has the `send` method, which creates the event dictionary and calls `publish`.
```python
# Simplified from events/dispatcher.py
class EventDispatcher:
# ... __init__ setup ...
def send(self, type, blind=False, ..., **fields):
if self.enabled:
groups, group = self.groups, group_from(type)
if groups and group not in groups:
return # Don't send if this group isn't enabled
# ... potential buffering logic (omitted) ...
# Call publish to actually send
return self.publish(type, fields, self.producer, blind=blind,
Event=Event, ...)
def publish(self, type, fields, producer, blind=False, Event=Event, **kwargs):
# Create the event dictionary
clock = None if blind else self.clock.forward()
event = Event(type, hostname=self.hostname, utcoffset=utcoffset(),
pid=self.pid, clock=clock, **fields)
# Publish using the underlying Kombu producer
with self.mutex:
return self._publish(event, producer,
routing_key=type.replace('-', '.'), **kwargs)
def _publish(self, event, producer, routing_key, **kwargs):
exchange = self.exchange # The dedicated event exchange
try:
# Kombu's publish method sends the message
producer.publish(
event, # The dictionary payload
routing_key=routing_key,
exchange=exchange.name,
declare=[exchange], # Ensure exchange exists
serializer=self.serializer, # e.g., 'json'
headers=self.headers,
delivery_mode=self.delivery_mode, # e.g., transient
**kwargs
)
except Exception as exc:
# ... error handling / buffering ...
raise
```
* **Receiving Events (`celery/events/receiver.py`):** The `EventReceiver` class (used by tools like `celery events`) sets up a consumer to listen for messages on the event exchange.
```python
# Simplified from events/receiver.py
class EventReceiver(ConsumerMixin): # Uses Kombu's ConsumerMixin
def __init__(self, channel, handlers=None, routing_key='#', ...):
# ... setup app, channel, handlers ...
self.exchange = get_exchange(..., name=self.app.conf.event_exchange)
self.queue = Queue( # Create a unique, auto-deleting queue
'.'.join([self.queue_prefix, self.node_id]),
exchange=self.exchange,
routing_key=routing_key, # Often '#' to get all events
auto_delete=True, durable=False,
# ... other queue options ...
)
# ...
def get_consumers(self, Consumer, channel):
# Tell ConsumerMixin to consume from our event queue
return [Consumer(queues=[self.queue],
callbacks=[self._receive], # Method to call on message
no_ack=True, # Events usually don't need explicit ack
accept=self.accept)]
# This method is registered as the callback for new messages
def _receive(self, body, message):
# Decode message body (can be single event or list in newer Celery)
if isinstance(body, list):
process, from_message = self.process, self.event_from_message
[process(*from_message(event)) for event in body]
else:
self.process(*self.event_from_message(body))
# process() calls the appropriate handler from self.handlers
def process(self, type, event):
"""Process event by dispatching to configured handler."""
handler = self.handlers.get(type) or self.handlers.get('*')
handler and handler(event) # Call the handler function
```
## Conclusion
Celery Events provide a powerful mechanism for **real-time monitoring** of your distributed task system.
* Workers (when enabled via `-E` or configuration) send **event messages** describing their actions (like task start/finish, worker online).
* These messages go to a dedicated **event exchange** on the broker.
* Tools like `celery events` or Flower act as **listeners** (`EventReceiver`), consuming this stream to provide insights into the cluster's activity.
* Events are the foundation for building dashboards, custom monitoring, and diagnostic tools.
Understanding events helps you observe and manage your Celery application more effectively.
So far, we've explored the major components and concepts of Celery. But how does a worker actually start up? How does it initialize all these different parts like the connection, the consumer, the event dispatcher, and the execution pool in the right order? That's orchestrated by a system called Bootsteps.
**Next:** [Chapter 10: Bootsteps](10_bootsteps.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

227
docs/Celery/10_bootsteps.md Normal file
View File

@@ -0,0 +1,227 @@
# Chapter 10: Bootsteps - How Celery Workers Start Up
In [Chapter 9: Events](09_events.md), we learned how to monitor the real-time activity within our Celery system. We've now covered most of the key parts of Celery: the [Celery App](01_celery_app.md), [Task](03_task.md)s, the [Broker Connection (AMQP)](04_broker_connection__amqp_.md), the [Worker](05_worker.md), the [Result Backend](06_result_backend.md), [Beat (Scheduler)](07_beat__scheduler_.md), [Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md), and [Events](09_events.md).
But have you ever wondered how the Celery worker manages to get all these different parts working together when you start it? When you run `celery worker`, it needs to connect to the broker, set up the execution pool, start listening for tasks, maybe start the event dispatcher, and possibly even start an embedded Beat scheduler. How does it ensure all these things happen in the correct order? That's where **Bootsteps** come in.
## What Problem Do Bootsteps Solve?
Imagine you're assembling a complex piece of furniture. You have many parts and screws, and the instructions list a specific sequence of steps. You can't attach the tabletop before you've built the legs! Similarly, a Celery worker has many internal components that need to be initialized and started in a precise order.
For example, the worker needs to:
1. Establish a connection to the [Broker Connection (AMQP)](04_broker_connection__amqp_.md).
2. *Then*, start the consumer logic that uses this connection to fetch tasks.
3. Set up the execution pool (like prefork or eventlet) that will actually run the tasks.
4. Start optional components like the [Events](09_events.md) dispatcher or the embedded [Beat (Scheduler)](07_beat__scheduler_.md).
If these steps happen out of order (e.g., trying to fetch tasks before connecting to the broker), the worker will fail.
**Bootsteps** provide a framework within Celery to define this startup (and shutdown) sequence. It's like the assembly instructions or a detailed checklist for the worker. Each major component or initialization phase is defined as a "step," and steps can declare dependencies on each other (e.g., "Step B requires Step A to be finished"). Celery uses this information to automatically figure out the correct order to start everything up and, just as importantly, the correct reverse order to shut everything down cleanly.
This makes the worker's internal structure more organized, modular, and easier for Celery developers to extend with new features. As a user, you generally don't write bootsteps yourself, but understanding the concept helps demystify the worker's startup process.
## Key Concepts
1. **Step (`Step`):** A single, distinct part of the worker's startup or shutdown logic. Think of it as one instruction in the assembly manual. Examples include initializing the broker connection, starting the execution pool, or starting the component that listens for task messages (the consumer).
2. **Blueprint (`Blueprint`):** A collection of related steps that manage a larger component. For instance, the main "Consumer" component within the worker has its own blueprint defining steps for connection, event handling, task fetching, etc.
3. **Dependencies (`requires`):** A step can declare that it needs other steps to be completed first. For example, the step that starts fetching tasks (`Tasks`) *requires* the step that establishes the broker connection (`Connection`).
4. **Order:** Celery analyzes the `requires` declarations of all steps within a blueprint (and potentially across blueprints) to build a dependency graph. It then sorts this graph to determine the exact order in which steps must be started. Shutdown usually happens in the reverse order.
## How It Works: The Worker Startup Sequence
You don't typically interact with bootsteps directly, but you see their effect every time you start a worker.
When you run:
`celery -A your_app worker --loglevel=info`
Celery initiates the **Worker Controller** (`WorkController`). This controller uses the Bootstep framework, specifically a main **Blueprint**, to manage its initialization.
Here's a simplified idea of what happens under the hood, orchestrated by Bootsteps:
1. **Load Blueprint:** The `WorkController` loads its main blueprint, which includes steps for core functionalities.
2. **Build Graph:** Celery looks at all the steps defined in the blueprint (e.g., `Connection`, `Pool`, `Consumer`, `Timer`, `Events`, potentially `Beat`) and their `requires` attributes. It builds a dependency graph.
3. **Determine Order:** It calculates the correct startup order from the graph (a "topological sort"). For example, it determines that `Connection` must start before `Consumer`, and `Pool` must start before `Consumer` can start dispatching tasks to it.
4. **Execute Steps:** The `WorkController` iterates through the steps in the determined order and calls each step's `start` method.
* The `Connection` step establishes the link to the broker.
* The `Timer` step sets up internal timers.
* The `Pool` step initializes the execution pool (e.g., starts prefork child processes).
* The `Events` step starts the event dispatcher (if `-E` was used).
* The `Consumer` step (usually last) starts the main loop that fetches tasks from the broker and dispatches them to the pool.
5. **Worker Ready:** Once all essential bootsteps have successfully started, the worker prints the "ready" message and begins processing tasks.
When you stop the worker (e.g., with Ctrl+C), a similar process happens in reverse using the steps' `stop` or `terminate` methods, ensuring connections are closed, pools are shut down, etc., in the correct order.
## Internal Implementation Walkthrough
Let's visualize the simplified startup flow managed by bootsteps:
```mermaid
sequenceDiagram
participant CLI as `celery worker ...`
participant WorkerMain as Worker Main Process
participant Blueprint as Main Worker Blueprint
participant DepGraph as Dependency Graph Builder
participant Step1 as Connection Step
participant Step2 as Pool Step
participant Step3 as Consumer Step
CLI->>WorkerMain: Start worker command
WorkerMain->>Blueprint: Load blueprint definition (steps & requires)
Blueprint->>DepGraph: Define steps and dependencies
DepGraph->>Blueprint: Return sorted startup order [Step1, Step2, Step3]
WorkerMain->>Blueprint: Iterate through sorted steps
Blueprint->>Step1: Call start()
Step1-->>Blueprint: Connection established
Blueprint->>Step2: Call start()
Step2-->>Blueprint: Pool initialized
Blueprint->>Step3: Call start()
Step3-->>Blueprint: Consumer loop started
Blueprint-->>WorkerMain: Startup complete
WorkerMain->>WorkerMain: Worker is Ready
```
The Bootstep framework relies on classes defined mainly in `celery/bootsteps.py`.
## Code Dive: Anatomy of a Bootstep
Bootsteps are defined as classes inheriting from `Step` or `StartStopStep`.
* **Defining a Step:** A step class defines its logic and dependencies.
```python
# Simplified concept from celery/bootsteps.py
# Base class for all steps
class Step:
# List of other Step classes needed before this one runs
requires = ()
def __init__(self, parent, **kwargs):
# Called when the blueprint is applied to the parent (e.g., Worker)
# Can be used to set initial attributes on the parent.
pass
def create(self, parent):
# Create the service/component managed by this step.
# Often returns an object to be stored.
pass
def include(self, parent):
# Logic to add this step to the parent's step list.
# Called after __init__.
if self.should_include(parent):
self.obj = self.create(parent) # Store created object if needed
parent.steps.append(self)
return True
return False
# A common step type with start/stop/terminate methods
class StartStopStep(Step):
obj = None # Holds the object created by self.create
def start(self, parent):
# Logic to start the component/service
if self.obj and hasattr(self.obj, 'start'):
self.obj.start()
def stop(self, parent):
# Logic to stop the component/service gracefully
if self.obj and hasattr(self.obj, 'stop'):
self.obj.stop()
def terminate(self, parent):
# Logic to force shutdown (if different from stop)
if self.obj:
term_func = getattr(self.obj, 'terminate', None) or getattr(self.obj, 'stop', None)
if term_func:
term_func()
# include() method adds self to parent.steps if created
```
**Explanation:**
* `requires`: A tuple of other Step classes that must be fully started *before* this step's `start` method is called. This defines the dependencies.
* `__init__`, `create`, `include`: Methods involved in setting up the step and potentially creating the component it manages.
* `start`, `stop`, `terminate`: Methods called during the worker's lifecycle (startup, graceful shutdown, forced shutdown).
* **Blueprint:** Manages a collection of steps.
```python
# Simplified concept from celery/bootsteps.py
from celery.utils.graph import DependencyGraph
class Blueprint:
# Set of default step classes (or string names) included in this blueprint
default_steps = set()
def __init__(self, steps=None, name=None, **kwargs):
self.name = name or self.__class__.__name__
# Combine default steps with any provided steps
self.types = set(steps or []) | set(self.default_steps)
self.steps = {} # Will hold step instances
self.order = [] # Will hold sorted step instances
# ... other callbacks ...
def apply(self, parent, **kwargs):
# 1. Load step classes from self.types
step_classes = self.claim_steps() # {name: StepClass, ...}
# 2. Build the dependency graph
self.graph = DependencyGraph(
((Cls, Cls.requires) for Cls in step_classes.values()),
# ... formatter options ...
)
# 3. Get the topologically sorted order
sorted_classes = self.graph.topsort()
# 4. Instantiate and include each step
self.order = []
for S in sorted_classes:
step = S(parent, **kwargs) # Call Step.__init__
self.steps[step.name] = step
self.order.append(step)
for step in self.order:
step.include(parent) # Call Step.include -> Step.create
return self
def start(self, parent):
# Called by the parent (e.g., Worker) to start all steps
for step in self.order: # Use the sorted order
if hasattr(step, 'start'):
step.start(parent)
def stop(self, parent):
# Called by the parent to stop all steps (in reverse order)
for step in reversed(self.order):
if hasattr(step, 'stop'):
step.stop(parent)
# ... other methods like close, terminate, restart ...
```
**Explanation:**
* `default_steps`: Defines the standard components managed by this blueprint.
* `apply`: The core method that takes the step definitions, builds the `DependencyGraph` based on `requires`, gets the sorted execution `order`, and then instantiates and includes each step.
* `start`/`stop`: Iterate through the calculated `order` (or its reverse) to start/stop the components managed by each step.
* **Example Usage (Worker Components):** The worker's main components are defined as bootsteps in `celery/worker/components.py`. You can see classes like `Pool`, `Consumer`, `Timer`, `Beat`, each inheriting from `bootsteps.Step` or `bootsteps.StartStopStep` and potentially defining `requires`. The `Consumer` blueprint in `celery/worker/consumer/consumer.py` then lists many of these (`Connection`, `Events`, `Tasks`, etc.) in its `default_steps`.
## Conclusion
You've learned about Bootsteps, the underlying framework that brings order to the Celery worker's startup and shutdown procedures.
* They act as an **assembly guide** or **checklist** for the worker.
* Each core function (connecting, starting pool, consuming tasks) is a **Step**.
* Steps declare **Dependencies** (`requires`) on each other.
* A **Blueprint** groups related steps.
* Celery uses a **Dependency Graph** to determine the correct **order** to start and stop steps.
* This ensures components like the [Broker Connection (AMQP)](04_broker_connection__amqp_.md), [Worker](05_worker.md) pool, and task consumer initialize and terminate predictably.
While you typically don't write bootsteps as an end-user, understanding their role clarifies how the complex machinery of a Celery worker reliably comes to life and shuts down.
---
This concludes our introductory tour of Celery's core concepts! We hope these chapters have given you a solid foundation for understanding how Celery works and how you can use it to build robust and scalable distributed applications. Happy tasking!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

50
docs/Celery/index.md Normal file
View File

@@ -0,0 +1,50 @@
# Tutorial: Celery
Celery is a system for running **distributed tasks** *asynchronously*. You define *units of work* (Tasks) in your Python code. When you want a task to run, you send a message using a **message broker** (like RabbitMQ or Redis). One or more **Worker** processes are running in the background, listening for these messages. When a worker receives a message, it executes the corresponding task. Optionally, the task's result (or any error) can be stored in a **Result Backend** (like Redis or a database) so you can check its status or retrieve the output later. Celery helps manage this whole process, making it easier to handle background jobs, scheduled tasks, and complex workflows.
**Source Repository:** [https://github.com/celery/celery/tree/d1c35bbdf014f13f4ab698d75e3ea381a017b090/celery](https://github.com/celery/celery/tree/d1c35bbdf014f13f4ab698d75e3ea381a017b090/celery)
```mermaid
flowchart TD
A0["Celery App"]
A1["Task"]
A2["Worker"]
A3["Broker Connection (AMQP)"]
A4["Result Backend"]
A5["Canvas (Signatures & Primitives)"]
A6["Beat (Scheduler)"]
A7["Configuration"]
A8["Events"]
A9["Bootsteps"]
A0 -- "Defines and sends" --> A1
A0 -- "Uses for messaging" --> A3
A0 -- "Uses for results" --> A4
A0 -- "Loads and uses" --> A7
A1 -- "Updates state in" --> A4
A2 -- "Executes" --> A1
A2 -- "Fetches tasks from" --> A3
A2 -- "Uses for lifecycle" --> A9
A5 -- "Represents task invocation" --> A1
A6 -- "Sends scheduled tasks via" --> A3
A8 -- "Sends events via" --> A3
A9 -- "Manages connection via" --> A3
```
## Chapters
1. [Celery App](01_celery_app.md)
2. [Configuration](02_configuration.md)
3. [Task](03_task.md)
4. [Broker Connection (AMQP)](04_broker_connection__amqp_.md)
5. [Worker](05_worker.md)
6. [Result Backend](06_result_backend.md)
7. [Beat (Scheduler)](07_beat__scheduler_.md)
8. [Canvas (Signatures & Primitives)](08_canvas__signatures___primitives_.md)
9. [Events](09_events.md)
10. [Bootsteps](10_bootsteps.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)