init push

This commit is contained in:
zachary62
2025-04-04 13:03:54 -04:00
parent e62ee2cb13
commit 2ebad5e5f2
160 changed files with 2 additions and 0 deletions

View File

@@ -0,0 +1,263 @@
# Chapter 1: Modules and Programs: Building Blocks of DSPy
Welcome to the first chapter of our journey into DSPy! We're excited to have you here.
Imagine you want to build something cool with AI, like a smart assistant that can answer questions based on your documents. This involves several steps: understanding the question, finding the right information in the documents, and then crafting a clear answer. How do you organize all these steps in your code?
That's where **Modules** and **Programs** come in! They are the fundamental building blocks in DSPy, helping you structure your AI applications cleanly and effectively.
Think of it like building with **Lego bricks**:
* A **`Module`** is like a single Lego brick. It's a basic unit that performs a specific, small task.
* A **`Program`** is like your final Lego creation (a car, a house). It's built by combining several Lego bricks (`Module`s) together in a specific way to achieve a bigger goal.
In this chapter, we'll learn:
* What a `Module` is and what it does.
* How `Program`s use `Module`s to solve complex tasks.
* How they create structure and manage the flow of information.
Let's start building!
## What is a `Module`?
A `dspy.Module` is the most basic building block in DSPy. Think of it as:
* **A Function:** Like a function in Python, it takes some input, does something, and produces an output.
* **A Lego Brick:** It performs one specific job.
* **A Specialist:** It often specializes in one task, frequently involving interaction with a powerful AI model like a Language Model ([LM](05_lm__language_model_client_.md)) or a Retrieval Model ([RM](06_rm__retrieval_model_client_.md)). We'll learn more about LMs and RMs later!
The key idea is **encapsulation**. A `Module` bundles a piece of logic together, hiding the internal complexity. You just need to know what it does, not necessarily *every single detail* of how it does it.
Every `Module` has two main parts:
1. `__init__`: This is where you set up the module, like defining any internal components or settings it needs.
2. `forward`: This is where the main logic happens. It defines *what the module does* when you call it with some input.
Let's look at a conceptual example. DSPy provides pre-built modules. One common one is `dspy.Predict`, which is designed to call a Language Model to generate an output based on some input, following specific instructions.
```python
import dspy
# Conceptual structure of a simple Module like dspy.Predict
class BasicPredict(dspy.Module): # Inherits from dspy.Module
def __init__(self, instructions):
super().__init__() # Important initialization
self.instructions = instructions
# In a real DSPy module, we'd set up LM connection here
# self.lm = ... (connect to language model)
def forward(self, input_data):
# 1. Combine instructions and input_data
prompt = self.instructions + "\nInput: " + input_data + "\nOutput:"
# 2. Call the Language Model (LM) with the prompt
# lm_output = self.lm(prompt) # Simplified call
lm_output = f"Generated answer for '{input_data}' based on instructions." # Dummy output
# 3. Return the result
return lm_output
# How you might use it (conceptual)
# predictor = BasicPredict(instructions="Translate the input to French.")
# french_text = predictor(input_data="Hello")
# print(french_text) # Might output: "Generated answer for 'Hello' based on instructions."
```
In this simplified view:
* `BasicPredict` inherits from `dspy.Module`. All your custom modules will do this.
* `__init__` stores the `instructions`. Real DSPy modules might initialize connections to LMs or load settings here.
* `forward` defines the core task: combining instructions and input, (conceptually) calling an LM, and returning the result.
Don't worry about the LM details yet! The key takeaway is that a `Module` wraps a specific piece of work, defined in its `forward` method. DSPy provides useful pre-built modules like `dspy.Predict` and `dspy.ChainOfThought` (which encourages step-by-step reasoning), and you can also build your own.
## What is a `Program`?
Now, what if your task is more complex than a single LM call? For instance, answering a question based on documents might involve:
1. Understanding the `question`.
2. Generating search queries based on the `question`.
3. Using a Retrieval Model ([RM](06_rm__retrieval_model_client_.md)) to find relevant `context` documents using the queries.
4. Using a Language Model ([LM](05_lm__language_model_client_.md)) to generate the final `answer` based on the `question` and `context`.
This is too much for a single simple `Module`. We need to combine multiple modules!
This is where a `Program` comes in. **Technically, a `Program` in DSPy is also just a `dspy.Module`!** The difference is in how we use it: a `Program` is typically a `Module` that *contains and coordinates other `Module`s*.
Think back to the Lego analogy:
* Small `Module`s are like bricks for the engine, wheels, and chassis.
* The `Program` is the main `Module` representing the whole car, defining how the engine, wheels, and chassis bricks connect and work together in its `forward` method.
A `Program` defines the **data flow** between its sub-modules. It orchestrates the sequence of operations.
Let's sketch out a simple `Program` for our question-answering example:
```python
import dspy
# Assume we have these pre-built or custom Modules (simplified)
class GenerateSearchQuery(dspy.Module):
def forward(self, question):
# Logic to create search queries from the question
print(f"Generating query for: {question}")
return f"search query for '{question}'"
class RetrieveContext(dspy.Module):
def forward(self, query):
# Logic to find documents using the query
print(f"Retrieving context for: {query}")
return f"Relevant context document about '{query}'"
class GenerateAnswer(dspy.Module):
def forward(self, question, context):
# Logic to generate answer using question and context
print(f"Generating answer for: {question} using context: {context}")
return f"Final answer about '{question}' based on context."
# Now, let's build the Program (which is also a Module!)
class RAG(dspy.Module): # RAG = Retrieval-Augmented Generation
def __init__(self):
super().__init__()
# Initialize the sub-modules it will use
self.generate_query = GenerateSearchQuery()
self.retrieve = RetrieveContext()
self.generate_answer = GenerateAnswer()
def forward(self, question):
# Define the flow of data through the sub-modules
print("\n--- RAG Program Start ---")
search_query = self.generate_query(question=question)
context = self.retrieve(query=search_query)
answer = self.generate_answer(question=question, context=context)
print("--- RAG Program End ---")
return answer
# How to use the Program
rag_program = RAG()
final_answer = rag_program(question="What is DSPy?")
print(f"\nFinal Output: {final_answer}")
```
If you run this conceptual code, you'd see output like:
```
--- RAG Program Start ---
Generating query for: What is DSPy?
Retrieving context for: search query for 'What is DSPy?'
Generating answer for: What is DSPy? using context: Relevant context document about 'search query for 'What is DSPy?''
--- RAG Program End ---
Final Output: Final answer about 'What is DSPy?' based on context.
```
See how the `RAG` program works?
1. In `__init__`, it creates instances of the smaller modules it needs (`GenerateSearchQuery`, `RetrieveContext`, `GenerateAnswer`).
2. In `forward`, it calls these modules *in order*, passing the output of one as the input to the next. It defines the workflow!
## Hierarchical Structure
Modules can contain other modules, which can contain *even more* modules! This allows you to build complex systems by breaking them down into manageable, hierarchical parts.
Imagine our `GenerateAnswer` module was actually quite complex. Maybe it first summarizes the context, then drafts an answer, then refines it. We could implement `GenerateAnswer` as *another* program containing these sub-modules!
```mermaid
graph TD
A[RAG Program] --> B(GenerateSearchQuery Module);
A --> C(RetrieveContext Module);
A --> D(GenerateAnswer Module / Program);
D --> D1(SummarizeContext Module);
D --> D2(DraftAnswer Module);
D --> D3(RefineAnswer Module);
```
This diagram shows how the `RAG` program uses `GenerateAnswer`, which itself could be composed of smaller modules like `SummarizeContext`, `DraftAnswer`, and `RefineAnswer`. This nesting makes complex systems easier to design, understand, and debug.
## How It Works Under the Hood (A Tiny Peek)
You don't need to know the deep internals right now, but it's helpful to have a basic mental model.
1. **Foundation:** All DSPy modules, whether simple bricks or complex programs, inherit from a base class (`dspy.primitives.module.BaseModule`). This provides common functionality like saving, loading, and finding internal parameters (we'll touch on saving/loading later).
2. **Execution:** When you call a module (e.g., `rag_program(question="...")`), Python executes its `__call__` method. In DSPy, this typically just calls the `forward` method you defined.
3. **Orchestration:** If a module's `forward` method calls other modules (like in our `RAG` example), it simply executes their `forward` methods in turn, passing the data as defined in the code.
Here's a simplified sequence of what happens when we call `rag_program("What is DSPy?")`:
```mermaid
sequenceDiagram
participant User
participant RAGProgram as RAG Program (forward)
participant GenQuery as GenerateQuery (forward)
participant Retrieve as RetrieveContext (forward)
participant GenAnswer as GenerateAnswer (forward)
User->>RAGProgram: Call with "What is DSPy?"
RAGProgram->>GenQuery: Call with question="What is DSPy?"
GenQuery-->>RAGProgram: Return "search query..."
RAGProgram->>Retrieve: Call with query="search query..."
Retrieve-->>RAGProgram: Return "Relevant context..."
RAGProgram->>GenAnswer: Call with question, context
GenAnswer-->>RAGProgram: Return "Final answer..."
RAGProgram-->>User: Return "Final answer..."
```
The core files involved are:
* `primitives/module.py`: Defines the `BaseModule` class, the ancestor of all modules.
* `primitives/program.py`: Defines the `Module` class (which you inherit from) itself, adding core methods like `__call__` that invokes `forward`.
You can see from the code snippets provided earlier (like `ChainOfThought` or `Predict`) that they inherit from `dspy.Module` and define `__init__` and `forward`, just like our examples.
```python
# Snippet from dspy/primitives/program.py (Simplified)
from dspy.primitives.module import BaseModule
class Module(BaseModule): # Inherits from BaseModule
def __init__(self):
super()._base_init()
# ... initialization ...
def forward(self, *args, **kwargs):
# This is where the main logic of the module goes.
# Users override this method in their own modules.
raise NotImplementedError # Needs to be implemented by subclasses
def __call__(self, *args, **kwargs):
# When you call module_instance(), this runs...
# ...and typically calls self.forward()
return self.forward(*args, **kwargs)
# You write classes like this:
class MyModule(dspy.Module):
def __init__(self):
super().__init__()
# Your setup
def forward(self, input_data):
# Your logic
result = ...
return result
```
The important part is the pattern: inherit from `dspy.Module`, set things up in `__init__`, and define the core logic in `forward`.
## Conclusion
Congratulations! You've learned about the fundamental organizing principle in DSPy: **Modules** and **Programs**.
* **Modules** are the basic building blocks, like Lego bricks, often handling a specific task (maybe calling an [LM](05_lm__language_model_client_.md) or [RM](06_rm__retrieval_model_client_.md)).
* **Programs** are also Modules, but they typically combine *other* modules to orchestrate a more complex workflow, defining how data flows between them.
* The `forward` method is key it contains the logic of what a module *does*.
* This structure allows you to build complex AI systems in a clear, manageable, and hierarchical way.
Now that we understand how modules provide structure, how do they know what kind of input data they expect and what kind of output data they should produce? That's where **Signatures** come in!
Let's dive into that next!
**Next:** [Chapter 2: Signature](02_signature.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

201
docs/DSPy/02_signature.md Normal file
View File

@@ -0,0 +1,201 @@
# Chapter 2: Signatures - Defining the Task
In [Chapter 1: Modules and Programs](01_module___program.md), we learned that `Module`s are like Lego bricks that perform specific tasks, often using Language Models ([LM](05_lm__language_model_client_.md)). We saw how `Program`s combine these modules.
But how does a `Module`, especially one using an LM like `dspy.Predict`, know *exactly* what job to do?
Imagine you ask a chef (our LM) to cook something. Just saying "cook" isn't enough! You need to tell them:
1. **What ingredients to use** (the inputs).
2. **What dish to make** (the outputs).
3. **The recipe or instructions** (how to make it).
This is precisely what a **`Signature`** does in DSPy!
A `Signature` acts like a clear recipe or contract for a DSPy `Module`. It defines:
* **Input Fields:** What information the module needs to start its work.
* **Output Fields:** What information the module is expected to produce.
* **Instructions:** Natural language guidance (like a recipe!) telling the underlying LM *how* to transform the inputs into the outputs.
Think of it as specifying the 'shape' and 'purpose' of a module, making sure everyone (you, DSPy, and the LM) understands the task.
## Why Do We Need Signatures?
Without a clear definition, how would a module like `dspy.Predict` know what to ask the LM?
Let's say we want a module to translate English text to French. We need to tell it:
* It needs an `english_sentence` as input.
* It should produce a `french_sentence` as output.
* The *task* is to translate the input sentence into French.
A `Signature` bundles all this information together neatly.
## Defining a Signature: The Recipe Card
The most common way to define a Signature is by creating a Python class that inherits from `dspy.Signature`.
Let's create our English-to-French translation signature:
```python
import dspy
from dspy.signatures.field import InputField, OutputField
class TranslateToFrench(dspy.Signature):
"""Translates English text to French.""" # <-- These are the Instructions!
# Define the Input Field the module expects
english_sentence = dspy.InputField(desc="The original sentence in English")
# Define the Output Field the module should produce
french_sentence = dspy.OutputField(desc="The translated sentence in French")
```
Let's break this down:
1. **`class TranslateToFrench(dspy.Signature):`**: We declare a new class named `TranslateToFrench` that inherits from `dspy.Signature`. This tells DSPy it's a signature definition.
2. **`"""Translates English text to French."""`**: This is the **docstring**. It's crucial! DSPy uses this docstring as the natural language **Instructions** for the LM. It tells the LM the *goal* of the task.
3. **`english_sentence = dspy.InputField(...)`**: We define an input field named `english_sentence`. `dspy.InputField` marks this as required input. The `desc` provides a helpful description (good for documentation and potentially useful for the LM later).
4. **`french_sentence = dspy.OutputField(...)`**: We define an output field named `french_sentence`. `dspy.OutputField` marks this as the expected output. The `desc` describes what this field should contain.
That's it! We've created a reusable "recipe card" that clearly defines our translation task.
## How Modules Use Signatures
Now, how does a `Module` like `dspy.Predict` use this `TranslateToFrench` signature?
`dspy.Predict` is a pre-built module designed to take a signature and use an LM to generate the output fields based on the input fields and instructions.
Here's how you might use our signature with `dspy.Predict` (we'll cover `dspy.Predict` in detail in [Chapter 4](04_predict.md)):
```python
# Assume 'lm' is a configured Language Model client (more in Chapter 5)
# lm = dspy.OpenAI(model='gpt-3.5-turbo')
# dspy.settings.configure(lm=lm)
# Create an instance of dspy.Predict, giving it our Signature
translator = dspy.Predict(TranslateToFrench)
# Call the predictor with the required input field
english = "Hello, how are you?"
result = translator(english_sentence=english)
# The result object will contain the output field defined in the signature
print(f"English: {english}")
# Assuming the LM works correctly, it might print:
# print(f"French: {result.french_sentence}") # => French: Bonjour, comment ça va?
```
In this (slightly simplified) example:
1. `translator = dspy.Predict(TranslateToFrench)`: We create a `Predict` module. Crucially, we pass our `TranslateToFrench` **class** itself to it. `dspy.Predict` now knows the input/output fields and the instructions from the signature.
2. `result = translator(english_sentence=english)`: When we call the `translator`, we provide the input data using the exact name defined in our signature (`english_sentence`).
3. `result.french_sentence`: `dspy.Predict` uses the LM, guided by the signature's instructions and fields, to generate the output. It then returns an object where you can access the generated French text using the output field name (`french_sentence`).
The `Signature` acts as the bridge, ensuring the `Predict` module knows its job specification.
## How It Works Under the Hood (A Peek)
You don't need to memorize this, but understanding the flow helps! When a module like `dspy.Predict` uses a `Signature`:
1. **Inspection:** The module looks at the `Signature` class (`TranslateToFrench` in our case).
2. **Extract Info:** It identifies the `InputField`s (`english_sentence`), `OutputField`s (`french_sentence`), and the `Instructions` (the docstring: `"Translates English text to French."`).
3. **Prompt Formatting:** When you call the module (e.g., `translator(english_sentence="Hello")`), it uses this information to build a prompt for the [LM](05_lm__language_model_client_.md). This prompt typically includes:
* The **Instructions**.
* Clearly labeled **Input Fields** and their values.
* Clearly labeled **Output Fields** (often just the names, indicating what the LM should generate).
4. **LM Call:** The formatted prompt is sent to the configured LM.
5. **Parsing Output:** The LM's response is received. DSPy tries to parse this response to extract the values for the defined `OutputField`s (like `french_sentence`).
6. **Return Result:** A structured result object containing the parsed outputs is returned.
Let's visualize this flow:
```mermaid
sequenceDiagram
participant User
participant PredictModule as dspy.Predict(TranslateToFrench)
participant Signature as TranslateToFrench
participant LM as Language Model
User->>PredictModule: Call with english_sentence="Hello"
PredictModule->>Signature: Get Instructions, Input/Output Fields
Signature-->>PredictModule: Return structure ("Translates...", "english_sentence", "french_sentence")
PredictModule->>LM: Send formatted prompt (e.g., "Translate...\nEnglish: Hello\nFrench:")
LM-->>PredictModule: Return generated text (e.g., "Bonjour")
PredictModule->>Signature: Parse LM output into 'french_sentence' field
Signature-->>PredictModule: Return structured output {french_sentence: "Bonjour"}
PredictModule-->>User: Return structured output (Prediction object)
```
The core logic for defining signatures resides in:
* `dspy/signatures/signature.py`: Defines the base `Signature` class and the logic for handling instructions and fields.
* `dspy/signatures/field.py`: Defines `InputField` and `OutputField`.
Modules like `dspy.Predict` (in `dspy/predict/predict.py`) contain the code to *read* these Signatures and interact with LMs accordingly.
```python
# Simplified view inside dspy/signatures/signature.py
from pydantic import BaseModel
from pydantic.fields import FieldInfo
# ... other imports ...
class SignatureMeta(type(BaseModel)):
# Metaclass magic to handle fields and docstring
def __new__(mcs, name, bases, namespace, **kwargs):
# ... logic to find fields, handle docstring ...
cls = super().__new__(mcs, name, bases, namespace, **kwargs)
cls.__doc__ = cls.__doc__ or _default_instructions(cls) # Default instructions if none provided
# ... logic to validate fields ...
return cls
@property
def instructions(cls) -> str:
# Retrieves the docstring as instructions
return inspect.cleandoc(getattr(cls, "__doc__", ""))
@property
def input_fields(cls) -> dict[str, FieldInfo]:
# Finds fields marked as input
return cls._get_fields_with_type("input")
@property
def output_fields(cls) -> dict[str, FieldInfo]:
# Finds fields marked as output
return cls._get_fields_with_type("output")
class Signature(BaseModel, metaclass=SignatureMeta):
# The base class you inherit from
pass
# Simplified view inside dspy/signatures/field.py
import pydantic
def InputField(**kwargs):
# Creates a Pydantic field marked as input for DSPy
return pydantic.Field(**move_kwargs(**kwargs, __dspy_field_type="input"))
def OutputField(**kwargs):
# Creates a Pydantic field marked as output for DSPy
return pydantic.Field(**move_kwargs(**kwargs, __dspy_field_type="output"))
```
The key takeaway is that the `Signature` class structure (using `InputField`, `OutputField`, and the docstring) provides a standardized way for modules to understand the task specification.
## Conclusion
You've now learned about `Signatures`, the essential component for defining *what* a DSPy module should do!
* A `Signature` specifies the **Inputs**, **Outputs**, and **Instructions** for a task.
* It acts like a contract or recipe card for modules, especially those using LMs.
* You typically define them by subclassing `dspy.Signature`, using `InputField`, `OutputField`, and a descriptive **docstring** for instructions.
* Modules like `dspy.Predict` use Signatures to understand the task and generate appropriate prompts for the LM.
Signatures bring clarity and structure to LM interactions. But how do we provide concrete examples to help the LM learn or perform better? That's where `Examples` come in!
**Next:** [Chapter 3: Example](03_example.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

229
docs/DSPy/03_example.md Normal file
View File

@@ -0,0 +1,229 @@
# Chapter 3: Example - Your Data Points
In [Chapter 2: Signature](02_signature.md), we learned how to define the *task* for a DSPy module using `Signatures` specifying the inputs, outputs, and instructions. It's like writing a recipe card.
But sometimes, just giving instructions isn't enough. Imagine teaching someone to translate by just giving the rule "Translate English to French". They might struggle! It often helps to show them a few *examples* of correct translations.
This is where **`dspy.Example`** comes in! It's how you represent individual data points or examples within DSPy.
Think of a `dspy.Example` as:
* **A Single Row:** Like one row in a spreadsheet or database table.
* **A Flashcard:** Holding a specific question and its answer, or an input and its desired output.
* **A Test Case:** A concrete instance of the task defined by your `Signature`.
In this chapter, we'll learn:
* What a `dspy.Example` is and how it stores data.
* How to create `Example` objects.
* Why `Example`s are essential for few-shot learning, training, and evaluation.
* How to mark specific fields as inputs using `.with_inputs()`.
Let's dive into representing our data!
## What is a `dspy.Example`?
A `dspy.Example` is a fundamental data structure in DSPy designed to hold the information for a single instance of your task. It essentially acts like a flexible container (similar to a Python dictionary) where you store key-value pairs.
Crucially, the **keys** in your `Example` should generally match the **field names** you defined in your [Signature](02_signature.md).
Let's revisit our `TranslateToFrench` signature from Chapter 2:
```python
# From Chapter 2
import dspy
from dspy.signatures.field import InputField, OutputField
class TranslateToFrench(dspy.Signature):
"""Translates English text to French."""
english_sentence = dspy.InputField(desc="The original sentence in English")
french_sentence = dspy.OutputField(desc="The translated sentence in French")
```
This signature has two fields: `english_sentence` (input) and `french_sentence` (output).
An `Example` representing one instance of this task would need to contain values for these keys.
## Creating an Example
Creating a `dspy.Example` is straightforward. You can initialize it with keyword arguments, where the argument names match the fields you care about (usually your Signature fields).
```python
import dspy
# Create an example for our translation task
example1 = dspy.Example(
english_sentence="Hello, world!",
french_sentence="Bonjour le monde!"
)
# You can access the values like attributes
print(f"English: {example1.english_sentence}")
print(f"French: {example1.french_sentence}")
```
**Output:**
```
English: Hello, world!
French: Bonjour le monde!
```
See? `example1` now holds one complete data point for our translation task. It bundles the input (`english_sentence`) and the corresponding desired output (`french_sentence`) together.
You can also create examples from dictionaries:
```python
data_dict = {
"english_sentence": "How are you?",
"french_sentence": "Comment ça va?"
}
example2 = dspy.Example(data_dict)
print(f"Example 2 English: {example2.english_sentence}")
```
**Output:**
```
Example 2 English: How are you?
```
## Why Use Examples? The Three Main Roles
`Example` objects are the standard way DSPy handles data, and they are used in three critical ways:
1. **Few-Shot Demonstrations:** When using modules like `dspy.Predict` (which we'll see in [Chapter 4: Predict](04_predict.md)), you can provide a few `Example` objects directly in the prompt sent to the Language Model (LM). This shows the LM *exactly* how to perform the task, often leading to much better results than instructions alone. It's like showing the chef pictures of the final dish alongside the recipe.
2. **Training Data:** When you want to *optimize* your DSPy program (e.g., automatically find the best prompts or few-shot examples), you use **Teleprompters** ([Chapter 8: Teleprompter / Optimizer](08_teleprompter___optimizer.md)). Teleprompters require a training set, which is simply a list of `dspy.Example` objects representing the tasks you want your program to learn to do well.
3. **Evaluation Data:** How do you know if your DSPy program is working correctly? You test it on a dataset! The `dspy.evaluate` module ([Chapter 7: Evaluate](07_evaluate.md)) takes a list of `dspy.Example` objects (your test set or development set) and measures your program's performance against the expected outputs (labels) in those examples.
In all these cases, `dspy.Example` provides a consistent way to package and manage your data points.
## Marking Inputs: `.with_inputs()`
Often, especially during training and evaluation, DSPy needs to know which fields in your `Example` represent the *inputs* to your program and which represent the *outputs* or *labels* (the ground truth answers).
The `.with_inputs()` method allows you to explicitly mark certain keys as input fields. This method returns a *new* `Example` object with this input information attached, leaving the original unchanged.
Let's mark `english_sentence` as the input for our `example1`:
```python
# Our original example
example1 = dspy.Example(
english_sentence="Hello, world!",
french_sentence="Bonjour le monde!"
)
# Mark 'english_sentence' as the input field
input_marked_example = example1.with_inputs("english_sentence")
# Let's check the inputs and labels (non-inputs)
print(f"Inputs: {input_marked_example.inputs()}")
print(f"Labels: {input_marked_example.labels()}")
```
**Output:**
```
Inputs: Example({'english_sentence': 'Hello, world!'}) (input_keys={'english_sentence'})
Labels: Example({'french_sentence': 'Bonjour le monde!'}) (input_keys=set())
```
Notice:
* `.with_inputs("english_sentence")` didn't change `example1`. It created `input_marked_example`.
* `input_marked_example.inputs()` returns a new `Example` containing only the fields marked as inputs.
* `input_marked_example.labels()` returns a new `Example` containing the remaining fields (the outputs/labels).
This distinction is vital for evaluation (comparing predictions against labels) and optimization (knowing what the program receives vs. what it should produce). Datasets loaded within DSPy often automatically handle marking inputs for you based on common conventions.
## How It Works Under the Hood (A Peek)
The `dspy.Example` object is fundamentally quite simple. It's designed to behave much like a Python dictionary but with some added conveniences like attribute-style access (`example.field`) and the special `.with_inputs()` method.
1. **Storage:** Internally, an `Example` uses a dictionary (often named `_store`) to hold all the key-value pairs you provide.
```python
# Conceptual internal structure
example = dspy.Example(question="What is DSPy?", answer="A framework...")
# example._store == {'question': 'What is DSPy?', 'answer': 'A framework...'}
```
2. **Attribute Access:** When you access `example.question`, Python's magic methods (`__getattr__`) look up `'question'` in the internal `_store`. Similarly, setting `example.new_field = value` uses `__setattr__` to update the `_store`.
3. **`.with_inputs()`:** This method creates a *copy* of the current `Example`'s `_store`. It then stores the provided input keys (like `{'english_sentence'}`) in a separate internal attribute (like `_input_keys`) on the *new* copied object. It doesn't modify the original `Example`.
4. **`.inputs()` and `.labels()`:** These methods check the `_input_keys` attribute. `.inputs()` creates a new `Example` containing only the key-value pairs whose keys are *in* `_input_keys`. `.labels()` creates a new `Example` containing the key-value pairs whose keys are *not* in `_input_keys`.
Let's look at a simplified view of the code from `dspy/primitives/example.py`:
```python
# Simplified view from dspy/primitives/example.py
class Example:
def __init__(self, base=None, **kwargs):
self._store = {} # The internal dictionary
self._input_keys = None # Stores the input keys after with_inputs()
# Simplified: Copy from base or dictionary if provided
if base and isinstance(base, dict): self._store = base.copy()
# Simplified: Update with keyword arguments
self._store.update(kwargs)
# Allows accessing self.key like dictionary lookup self._store[key]
def __getattr__(self, key):
if key in self._store: return self._store[key]
raise AttributeError(f"No attribute '{key}'")
# Allows setting self.key like dictionary assignment self._store[key] = value
def __setattr__(self, key, value):
if key.startswith("_"): super().__setattr__(key, value) # Handle internal attributes
else: self._store[key] = value
# Allows dictionary-style access example[key]
def __getitem__(self, key): return self._store[key]
# Creates a *copy* and marks input keys on the copy.
def with_inputs(self, *keys):
copied = self.copy() # Make a shallow copy
copied._input_keys = set(keys) # Store the input keys on the copy
return copied
# Returns a new Example containing only input fields.
def inputs(self):
if self._input_keys is None: raise ValueError("Inputs not set.")
# Create a dict with only input keys
input_dict = {k: v for k, v in self._store.items() if k in self._input_keys}
# Return a new Example wrapping this dict
return type(self)(base=input_dict).with_inputs(*self._input_keys)
# Returns a new Example containing only non-input fields (labels).
def labels(self):
input_keys = self.inputs().keys() if self._input_keys else set()
# Create a dict with only non-input keys
label_dict = {k: v for k, v in self._store.items() if k not in input_keys}
# Return a new Example wrapping this dict
return type(self)(base=label_dict)
# Helper to create a copy
def copy(self, **kwargs):
return type(self)(base=self, **kwargs)
# ... other helpful methods like keys(), values(), items(), etc. ...
```
The key idea is that `dspy.Example` provides a convenient and standardized wrapper around your data points, making it easy to use them for few-shot examples, training, and evaluation, while also allowing you to specify which parts are inputs versus labels.
## Conclusion
You've now mastered `dspy.Example`, the way DSPy represents individual data points!
* An `Example` holds key-value pairs, like a **row in a spreadsheet** or a **flashcard**.
* Its keys typically correspond to the fields defined in a [Signature](02_signature.md).
* `Example`s are essential for providing **few-shot demonstrations**, **training data** for optimizers ([Teleprompter / Optimizer](08_teleprompter___optimizer.md)), and **evaluation data** for testing ([Evaluate](07_evaluate.md)).
* The `.with_inputs()` method lets you mark which fields are inputs, crucial for distinguishing inputs from labels.
Now that we have `Signatures` to define *what* task to do, and `Examples` to hold the *data* for that task, how do we actually get a Language Model to *do* the task based on the signature? That's the job of the `dspy.Predict` module!
**Next:** [Chapter 4: Predict](04_predict.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

249
docs/DSPy/04_predict.md Normal file
View File

@@ -0,0 +1,249 @@
# Chapter 4: Predict - The Basic LM Caller
In [Chapter 3: Example](03_example.md), we learned how to create `dspy.Example` objects to represent our data points like flashcards holding an input and its corresponding desired output. We also saw in [Chapter 2: Signature](02_signature.md) how to define the *task* itself using `dspy.Signature`.
Now, we have the recipe (`Signature`) and some sample dishes (`Example`s). How do we actually get the chef (our Language Model or LM) to cook? How do we combine the instructions from the `Signature` and maybe some `Example`s to prompt the LM and get a result back?
This is where **`dspy.Predict`** comes in! It's the most fundamental way in DSPy to make a single call to a Language Model.
Think of `dspy.Predict` as:
* **A Basic Request:** Like asking the LM to do *one specific thing* based on instructions.
* **The Workhorse:** It handles formatting the input, calling the LM, and extracting the answer.
* **A Single Lego Brick:** It's the simplest "thinking" block in DSPy, directly using the LM's power.
In this chapter, we'll learn:
* What `dspy.Predict` does.
* How to use it with a `Signature`.
* How it turns your instructions and data into an LM call.
* How to get the generated output.
Let's make our first LM call!
## What is `dspy.Predict`?
`dspy.Predict` is a DSPy [Module](01_module___program.md). Its job is simple but essential:
1. **Takes a `Signature`:** When you create a `dspy.Predict` module, you tell it which `Signature` to use. This tells `Predict` what inputs to expect, what outputs to produce, and the instructions for the LM.
2. **Receives Inputs:** When you call the `Predict` module, you provide the input data (matching the `Signature`'s input fields).
3. **Formats a Prompt:** It combines the `Signature`'s instructions, the input data you provided, and potentially some `Example`s (called demonstrations or "demos") into a text prompt suitable for an LM.
4. **Calls the LM:** It sends this carefully crafted prompt to the configured Language Model ([Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md)).
5. **Parses the Output:** It takes the LM's generated text response and tries to extract the specific pieces of information defined by the `Signature`'s output fields.
6. **Returns a `Prediction`:** It gives you back a structured object (a `dspy.Prediction`) containing the extracted output fields.
It's the core mechanism for executing a single, defined prediction task using an LM.
## Using `dspy.Predict`
Let's use our `TranslateToFrench` signature from Chapter 2 to see `dspy.Predict` in action.
**1. Define the Signature (Recap):**
```python
import dspy
from dspy.signatures.field import InputField, OutputField
class TranslateToFrench(dspy.Signature):
"""Translates English text to French."""
english_sentence = dspy.InputField(desc="The original sentence in English")
french_sentence = dspy.OutputField(desc="The translated sentence in French")
```
This signature tells our module it needs `english_sentence` and should produce `french_sentence`, following the instruction "Translates English text to French."
**2. Configure the Language Model (A Sneak Peek):**
Before using `Predict`, DSPy needs to know *which* LM to talk to (like OpenAI's GPT-3.5, a local model, etc.). We'll cover this fully in [Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md), but here's a quick example:
```python
# Assume you have an OpenAI API key configured
# We'll explain this properly in the next chapter!
gpt3_turbo = dspy.OpenAI(model='gpt-3.5-turbo')
dspy.settings.configure(lm=gpt3_turbo)
```
This tells DSPy to use the `gpt-3.5-turbo` model for any LM calls.
**3. Create and Use `dspy.Predict`:**
Now we can create our translator module using `dspy.Predict` and our signature.
```python
# Create a Predict module using our signature
translator = dspy.Predict(TranslateToFrench)
# Prepare the input data
english_input = "Hello, how are you?"
# Call the predictor with the input field name from the signature
result = translator(english_sentence=english_input)
# Access the output field name from the signature
print(f"English: {english_input}")
print(f"French: {result.french_sentence}")
```
**What happens here?**
1. `translator = dspy.Predict(TranslateToFrench)`: We create an instance of `Predict`, telling it to use the `TranslateToFrench` signature.
2. `result = translator(english_sentence=english_input)`: We *call* the `translator` module like a function. We pass the input using the keyword argument `english_sentence`, which matches the `InputField` name in our signature.
3. `result.french_sentence`: `Predict` works its magic! It builds a prompt (using the signature's instructions and the input), sends it to GPT-3.5 Turbo, gets the French translation back, parses it, and stores it in the `result` object. We access the translation using the `OutputField` name, `french_sentence`.
**Expected Output (might vary slightly based on the LM):**
```
English: Hello, how are you?
French: Bonjour, comment ça va?
```
It worked! `dspy.Predict` successfully used the LM to perform the translation task defined by our signature.
## Giving Examples (Few-Shot Learning)
Sometimes, just instructions aren't enough for the LM to understand the *exact format* or style you want. You can provide a few examples (`dspy.Example` objects from [Chapter 3: Example](03_example.md)) to guide it better. This is called "few-shot learning".
You pass these examples using the `demos` argument when calling the `Predict` module.
```python
# Create some example translations (from Chapter 3)
demo1 = dspy.Example(english_sentence="Good morning!", french_sentence="Bonjour!")
demo2 = dspy.Example(english_sentence="Thank you.", french_sentence="Merci.")
# Our translator module (same as before)
translator = dspy.Predict(TranslateToFrench)
# Input we want to translate
english_input = "See you later."
# Call the predictor, this time providing demos
result_with_demos = translator(
english_sentence=english_input,
demos=[demo1, demo2] # Pass our examples here!
)
print(f"English: {english_input}")
print(f"French (with demos): {result_with_demos.french_sentence}")
```
**What's different?**
* We created `demo1` and `demo2`, which are `dspy.Example` objects containing both the English and French sentences.
* We passed `demos=[demo1, demo2]` when calling `translator`.
Now, `dspy.Predict` will format the prompt to include these examples *before* asking the LM to translate the new input. This often leads to more accurate or better-formatted results, especially for complex tasks.
**Expected Output (likely similar, but potentially more consistent):**
```
English: See you later.
French (with demos): À plus tard.
```
## How It Works Under the Hood
What actually happens when you call `translator(english_sentence=...)`?
1. **Gather Information:** The `Predict` module (`translator`) gets the input value (`"Hello, how are you?"`) and any `demos` provided. It already knows its `Signature` (`TranslateToFrench`).
2. **Format Prompt:** It constructs a text prompt for the LM. This prompt usually includes:
* The `Signature`'s instructions (`"Translates English text to French."`).
* The `demos` (if provided), formatted clearly (e.g., "English: Good morning!\nFrench: Bonjour!\n---\nEnglish: Thank you.\nFrench: Merci.\n---").
* The current input, labeled according to the `Signature` (`"English: Hello, how are you?"`).
* A label indicating where the LM should put its answer (`"French:"`).
3. **LM Call:** The `Predict` module sends this complete prompt string to the configured [LM](05_lm__language_model_client_.md) (e.g., GPT-3.5 Turbo).
4. **Receive Completion:** The LM generates text based on the prompt (e.g., it might return `"Bonjour, comment ça va?"`).
5. **Parse Output:** `Predict` looks at the `Signature`'s `OutputField`s (`french_sentence`). It parses the LM's completion to extract the value corresponding to `french_sentence`.
6. **Return Prediction:** It bundles the extracted output(s) into a `dspy.Prediction` object and returns it. You can then access the results like `result.french_sentence`.
Let's visualize this flow:
```mermaid
sequenceDiagram
participant User
participant PredictModule as translator (Predict)
participant Signature as TranslateToFrench
participant LM as Language Model Client
User->>PredictModule: Call with english_sentence="Hello", demos=[...]
PredictModule->>Signature: Get Instructions, Input/Output Fields
Signature-->>PredictModule: Return structure ("Translate...", "english_sentence", "french_sentence")
PredictModule->>PredictModule: Format prompt (Instructions + Demos + Input + Output Label)
PredictModule->>LM: Send formatted prompt ("Translate...\nEnglish: ...\nFrench: ...\n---\nEnglish: Hello\nFrench:")
LM-->>PredictModule: Return completion text ("Bonjour, comment ça va?")
PredictModule->>Signature: Parse completion for 'french_sentence'
Signature-->>PredictModule: Return parsed value {"french_sentence": "Bonjour, comment ça va?"}
PredictModule-->>User: Return Prediction object (result)
```
The core logic resides in `dspy/predict/predict.py`.
```python
# Simplified view from dspy/predict/predict.py
from dspy.primitives.program import Module
from dspy.primitives.prediction import Prediction
from dspy.signatures.signature import ensure_signature
from dspy.dsp.utils import settings # To get the configured LM
class Predict(Module):
def __init__(self, signature, **config):
super().__init__()
# Store the signature and any extra configuration
self.signature = ensure_signature(signature)
self.config = config
# Other initializations (demos, etc.)
self.demos = []
self.lm = None # LM will be set later or taken from settings
def forward(self, **kwargs):
# Get signature, demos, and LM (either passed in or from settings)
signature = self.signature # Use the stored signature
demos = kwargs.pop("demos", self.demos) # Get demos if provided
lm = kwargs.pop("lm", self.lm) or settings.lm # Find the LM to use
# Prepare inputs for the LM call
inputs = kwargs # Remaining kwargs are the inputs
# --- This is where the magic happens ---
# 1. Format the prompt using signature, demos, inputs
# (Simplified - actual formatting is more complex)
prompt = format_prompt(signature, demos, inputs)
# 2. Call the Language Model
# (Simplified - handles retries, multiple generations etc.)
lm_output_text = lm(prompt, **self.config)
# 3. Parse the LM's output text based on the signature's output fields
# (Simplified - extracts fields like 'french_sentence')
parsed_output = parse_output(signature, lm_output_text)
# --- End Magic ---
# 4. Create and return a Prediction object
prediction = Prediction(signature=signature, **parsed_output)
# (Optionally trace the call)
# settings.trace.append(...)
return prediction
# (Helper functions format_prompt and parse_output would exist elsewhere)
```
This simplified code shows the key steps: initialize with a signature, and in the `forward` method, use the signature, demos, and inputs to format a prompt, call the LM, parse the output, and return a `Prediction`. The `dspy.Prediction` object itself (defined in `dspy/primitives/prediction.py`) is essentially a specialized container holding the results corresponding to the signature's output fields.
## Conclusion
You've now learned about `dspy.Predict`, the fundamental building block in DSPy for making a single call to a Language Model!
* `dspy.Predict` takes a `Signature` to understand the task (inputs, outputs, instructions).
* It formats a prompt, calls the LM, and parses the response.
* You call it like a function, passing inputs that match the `Signature`'s `InputField`s.
* It returns a `dspy.Prediction` object containing the results, accessible via the `Signature`'s `OutputField` names.
* You can provide few-shot `Example`s via the `demos` argument to guide the LM.
`Predict` is the simplest way to leverage an LM in DSPy. But how do we actually connect DSPy to different LMs like those from OpenAI, Anthropic, Cohere, or even models running on your own machine? That's what we'll explore next!
**Next:** [Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,304 @@
# Chapter 5: LM (Language Model Client) - The Engine Room
In [Chapter 4: Predict](04_predict.md), we saw how `dspy.Predict` takes a [Signature](02_signature.md) and input data to magically generate an output. We used our `translator` example:
```python
# translator = dspy.Predict(TranslateToFrench)
# result = translator(english_sentence="Hello, how are you?")
# print(result.french_sentence) # --> Bonjour, comment ça va?
```
But wait... how did `dspy.Predict` *actually* produce that French sentence? It didn't just invent it! It needed to talk to a powerful Language Model (LM) like GPT-3.5, GPT-4, Claude, Llama, or some other AI brain.
How does DSPy connect your program (`dspy.Predict` in this case) to these external AI brains? That's the job of the **LM (Language Model Client)** abstraction!
Think of the LM Client as:
* **The Engine:** It's the core component that provides the "thinking" power to your DSPy modules.
* **The Translator:** It speaks the specific language (API calls, parameters) required by different LM providers (like OpenAI, Anthropic, Cohere, Hugging Face, or models running locally).
* **The Connection:** It bridges the gap between your abstract DSPy code and the concrete LM service.
In this chapter, you'll learn:
* What the LM Client does and why it's crucial.
* How to tell DSPy which Language Model to use.
* How this setup lets you easily switch between different LMs.
* A peek under the hood at how the connection works.
Let's connect our program to an AI brain!
## What Does the LM Client Do?
When a module like `dspy.Predict` needs an LM to generate text, it doesn't make the raw API call itself. Instead, it relies on the configured **LM Client**. The LM Client handles several important tasks:
1. **API Interaction:** It knows how to format the request (the prompt, parameters like `temperature`, `max_tokens`) in the exact way the target LM provider expects. It then makes the actual network call to the provider's API (or interacts with a local model).
2. **Parameter Management:** You can set standard parameters like `temperature` (controlling randomness) or `max_tokens` (limiting output length) when you configure the LM Client. It ensures these are sent correctly with each request.
3. **Authentication:** It usually handles sending your API keys securely (often by reading them from environment variables).
4. **Retries:** If an API call fails due to a temporary issue (like a network glitch or the LM service being busy), the LM Client often automatically retries the request a few times.
5. **Standard Interface:** It provides a consistent way for DSPy modules (`Predict`, `ChainOfThought`, etc.) to interact with *any* supported LM. This means you can swap the underlying LM without changing your module code.
6. **Caching:** To save time and money, the LM Client usually caches responses. If you make the exact same request again, it can return the saved result instantly instead of calling the LM API again.
Essentially, the LM Client abstracts away all the messy details of talking to different AI models, giving your DSPy program a clean and consistent engine to rely on.
## Configuring Which LM to Use
So, how do you tell DSPy *which* LM engine to use? You do this using `dspy.settings.configure`.
First, you need to import and create an instance of the specific client for your desired LM provider. DSPy integrates with many models primarily through the `litellm` library, but also provides direct wrappers for common ones like OpenAI.
**Example: Configuring OpenAI's GPT-3.5 Turbo**
Let's say you want to use OpenAI's `gpt-3.5-turbo` model.
1. **Import the client:**
```python
import dspy
```
*(Note: For many common providers like OpenAI, Anthropic, Cohere, etc., you can use the general `dspy.LM` client which leverages `litellm`)*
2. **Create an instance:** You specify the model name. API keys are typically picked up automatically from environment variables (e.g., `OPENAI_API_KEY`). You can also set default parameters here.
```python
# Use the generic dspy.LM for LiteLLM integration
# Model name follows 'provider/model_name' format for many models
turbo = dspy.LM(model='openai/gpt-3.5-turbo', max_tokens=100)
# Or, if you prefer the dedicated OpenAI client wrapper (functionally similar for basic use)
# from dspy.models.openai import OpenAI
# turbo = OpenAI(model='gpt-3.5-turbo', max_tokens=100)
```
This creates an object `turbo` that knows how to talk to the `gpt-3.5-turbo` model via OpenAI's API (using `litellm`'s connection logic) and will limit responses to 100 tokens by default.
3. **Configure DSPy settings:** You tell DSPy globally that this is the LM engine to use for subsequent calls.
```python
dspy.settings.configure(lm=turbo)
```
That's it! Now, any DSPy module (like `dspy.Predict`) that needs to call an LM will automatically use the `turbo` instance we just configured.
**Using Other Models (via `dspy.LM` and LiteLLM)**
The `dspy.LM` client is very powerful because it uses `litellm` under the hood, which supports a vast numberk of models from providers like Anthropic, Cohere, Google, Hugging Face, Ollama (for local models), and more. You generally just need to change the `model` string.
```python
# Example: Configure Anthropic's Claude 3 Haiku
# (Assumes ANTHROPIC_API_KEY environment variable is set)
# Note: Provider prefix 'anthropic/' is often optional if model name is unique
claude_haiku = dspy.LM(model='anthropic/claude-3-haiku-20240307', max_tokens=200)
dspy.settings.configure(lm=claude_haiku)
# Now DSPy modules will use Claude 3 Haiku
# Example: Configure a local model served via Ollama
# (Assumes Ollama server is running and has the 'llama3' model)
local_llama = dspy.LM(model='ollama/llama3', max_tokens=500, temperature=0.7)
dspy.settings.configure(lm=local_llama)
# Now DSPy modules will use the local Llama 3 model via Ollama
```
You only need to configure the LM **once** (usually at the start of your script).
## How Modules Use the Configured LM
Remember our `translator` module from [Chapter 4: Predict](04_predict.md)?
```python
# Define signature (same as before)
class TranslateToFrench(dspy.Signature):
"""Translates English text to French."""
english_sentence = dspy.InputField()
french_sentence = dspy.OutputField()
# Configure the LM (e.g., using OpenAI)
# turbo = dspy.LM(model='openai/gpt-3.5-turbo', max_tokens=100)
# dspy.settings.configure(lm=turbo)
# Create the Predict module
translator = dspy.Predict(TranslateToFrench)
# Use the module - NO need to pass the LM here!
result = translator(english_sentence="Hello, how are you?")
print(result.french_sentence)
```
Notice that we didn't pass `turbo` or `claude_haiku` or `local_llama` directly to `dspy.Predict`. When `translator(...)` is called, `dspy.Predict` internally asks `dspy.settings` for the currently configured `lm`. It then uses that client object to handle the actual LM interaction.
## The Power of Swapping LMs
This setup makes it incredibly easy to experiment with different language models. Want to see if Claude does a better job at translation than GPT-3.5? Just change the configuration!
```python
# --- Experiment 1: Using GPT-3.5 Turbo ---
print("Testing with GPT-3.5 Turbo...")
turbo = dspy.LM(model='openai/gpt-3.5-turbo', max_tokens=100)
dspy.settings.configure(lm=turbo)
translator = dspy.Predict(TranslateToFrench)
result_turbo = translator(english_sentence="Where is the library?")
print(f"GPT-3.5: {result_turbo.french_sentence}")
# --- Experiment 2: Using Claude 3 Haiku ---
print("\nTesting with Claude 3 Haiku...")
claude_haiku = dspy.LM(model='anthropic/claude-3-haiku-20240307', max_tokens=100)
dspy.settings.configure(lm=claude_haiku)
# We can reuse the SAME translator object, or create a new one
# It will pick up the NEWLY configured LM from settings
result_claude = translator(english_sentence="Where is the library?")
print(f"Claude 3 Haiku: {result_claude.french_sentence}")
```
**Expected Output:**
```
Testing with GPT-3.5 Turbo...
GPT-3.5: Où est la bibliothèque?
Testing with Claude 3 Haiku...
Claude 3 Haiku: Où se trouve la bibliothèque ?
```
Look at that! We changed the underlying AI brain just by modifying the `dspy.settings.configure` call. The core logic of our `translator` module remained untouched. This flexibility is a key advantage of DSPy.
## How It Works Under the Hood (A Peek)
Let's trace what happens when `translator(english_sentence=...)` runs:
1. **Module Execution:** The `forward` method of the `dspy.Predict` module (`translator`) starts executing.
2. **Get LM Client:** Inside its logic, `Predict` needs to call an LM. It accesses `dspy.settings.lm`. This returns the currently configured LM client object (e.g., the `claude_haiku` instance we set).
3. **Format Prompt:** `Predict` uses the [Signature](02_signature.md) and the input (`english_sentence`) to prepare the text prompt.
4. **LM Client Call:** `Predict` calls the LM client object, passing the formatted prompt and any necessary parameters (like `max_tokens` which might come from the client's defaults or be overridden). Let's say it calls `claude_haiku(prompt, max_tokens=100, ...)`.
5. **API Interaction (Inside LM Client):**
* The `claude_haiku` object (an instance of `dspy.LM`) checks its cache first. If the same request was made recently, it might return the cached response directly.
* If not cached, it constructs the specific API request for Anthropic's Claude 3 Haiku model (using `litellm`). This includes setting headers, API keys, and formatting the prompt/parameters correctly for Anthropic.
* It makes the HTTPS request to the Anthropic API endpoint.
* It handles potential retries if the API returns specific errors.
* It receives the raw response from the API.
6. **Parse Response (Inside LM Client):** The client extracts the generated text content from the API response structure.
7. **Return to Module:** The LM client returns the generated text (e.g., `"Où se trouve la bibliothèque ?"`) back to the `dspy.Predict` module.
8. **Module Finishes:** `Predict` takes this text, parses it according to the `OutputField` (`french_sentence`) in the signature, and returns the final `Prediction` object.
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant User
participant PredictModule as translator (Predict)
participant Settings as dspy.settings
participant LMClient as LM Client (e.g., dspy.LM instance)
participant ActualAPI as Actual LM API (e.g., Anthropic)
User->>PredictModule: Call translator(english_sentence="...")
PredictModule->>Settings: Get configured lm
Settings-->>PredictModule: Return LMClient instance
PredictModule->>PredictModule: Format prompt for LM
PredictModule->>LMClient: __call__(prompt, **params)
LMClient->>LMClient: Check Cache (Cache Miss)
LMClient->>ActualAPI: Send formatted API request (prompt, key, params)
ActualAPI-->>LMClient: Return API response
LMClient->>LMClient: Parse response, extract text
LMClient-->>PredictModule: Return generated text
PredictModule->>PredictModule: Parse text into output fields
PredictModule-->>User: Return Prediction object
```
**Relevant Code Files:**
* `dspy/clients/lm.py`: Defines the main `dspy.LM` class which uses `litellm` for broad compatibility. It handles caching (in-memory and disk via `litellm`), retries, parameter mapping, and calling the appropriate `litellm` functions.
* `dspy/clients/base_lm.py`: Defines the `BaseLM` abstract base class that all LM clients inherit from. It includes the basic `__call__` structure, history tracking, and requires subclasses to implement the core `forward` method for making the actual API call. It also defines `inspect_history`.
* `dspy/models/openai.py` (and others like `anthropic.py`, `cohere.py` - though `dspy.LM` is often preferred now): Specific client implementations (often inheriting from `BaseLM` or using `dspy.LM` internally).
* `dspy/dsp/utils/settings.py`: Defines the `Settings` singleton object where the configured `lm` (and other components like `rm`) are stored and accessed globally or via thread-local context.
```python
# Simplified structure from dspy/clients/base_lm.py
class BaseLM:
def __init__(self, model, **kwargs):
self.model = model
self.kwargs = kwargs # Default params like temp, max_tokens
self.history = [] # Stores records of calls
@with_callbacks # Handles logging, potential custom hooks
def __call__(self, prompt=None, messages=None, **kwargs):
# 1. Call the actual request logic (implemented by subclasses)
response = self.forward(prompt=prompt, messages=messages, **kwargs)
# 2. Extract the output text(s)
outputs = [choice.message.content for choice in response.choices] # Simplified
# 3. Log the interaction (prompt, response, cost, etc.)
# (self.history.append(...))
# 4. Return the list of generated texts
return outputs
def forward(self, prompt=None, messages=None, **kwargs):
# Subclasses MUST implement this method to make the actual API call
# It should return an object similar to OpenAI's API response structure
raise NotImplementedError
# Simplified structure from dspy/clients/lm.py
import litellm
class LM(BaseLM): # Inherits from BaseLM
def __init__(self, model, model_type="chat", ..., num_retries=8, **kwargs):
super().__init__(model=model, **kwargs)
self.model_type = model_type
self.num_retries = num_retries
# ... other setup ...
def forward(self, prompt=None, messages=None, **kwargs):
# Combine default and call-specific kwargs
request_kwargs = {**self.kwargs, **kwargs}
messages = messages or [{"role": "user", "content": prompt}]
# Use litellm to make the call, handles different providers
# Simplified - handles caching, retries, model types under the hood
if self.model_type == "chat":
response = litellm.completion(
model=self.model,
messages=messages,
# Pass combined parameters
**request_kwargs,
# Configure retries and caching via litellm
num_retries=self.num_retries,
# cache=...
)
else: # Text completion model type
response = litellm.text_completion(...) # Simplified
# LiteLLM returns an object compatible with BaseLM's expectations
return response
# Simplified Usage in a Module (like Predict)
# from dspy.dsp.utils import settings
# Inside Predict's forward method:
# lm_client = settings.lm # Get the globally configured client
# prompt_text = self._generate_prompt(...) # Format the prompt
# parameters = self.config # Get parameters specific to this Predict instance
# generated_texts = lm_client(prompt_text, **parameters) # Call the LM Client!
# output_text = generated_texts[0]
# parsed_result = self._parse_output(output_text) # Parse based on signature
# return Prediction(**parsed_result)
```
The key is that modules interact with the standard `BaseLM` interface (primarily its `__call__` method), and the specific LM client implementation handles the rest.
## Conclusion
You've now demystified the **LM (Language Model Client)**! It's the essential engine connecting your DSPy programs to the power of large language models.
* The LM Client acts as a **translator** and **engine**, handling API calls, parameters, retries, and caching.
* You configure which LM to use **globally** via `dspy.settings.configure(lm=...)`, usually using `dspy.LM` for broad compatibility via `litellm`.
* DSPy modules like `dspy.Predict` automatically **use the configured LM** without needing it passed explicitly.
* This makes it easy to **swap out different LMs** (like GPT-4, Claude, Llama) with minimal code changes, facilitating experimentation.
Now that we know how to connect to the "brain" (LM), what about connecting to external knowledge sources like databases or document collections? That's where the **RM (Retrieval Model Client)** comes in.
**Next:** [Chapter 6: RM (Retrieval Model Client)](06_rm__retrieval_model_client_.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,345 @@
# Chapter 6: RM (Retrieval Model Client) - Your Program's Librarian
In [Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md), we learned how to connect our DSPy programs to the powerful "brain" of a Language Model (LM) using the LM Client. The LM is great at generating creative text, answering questions based on its vast training data, and reasoning.
But what if your program needs information that the LM wasn't trained on?
* Maybe it's very recent news (LMs often have knowledge cut-offs).
* Maybe it's private information from your company's documents.
* Maybe it's specific details from a large technical manual.
LMs can't know *everything*. Sometimes, your program needs to **look things up** in an external knowledge source before it can generate an answer.
Imagine you're building a chatbot that answers questions about your company's latest product manuals. The LM itself probably hasn't read them. Your program needs a way to:
1. Receive the user's question (e.g., "How do I reset the Frobozz device?").
2. **Search** through the product manuals for relevant sections about resetting the Frobozz.
3. Give those relevant sections to the LM as **context**.
4. Ask the LM to generate a final answer based on the user's question *and* the context it just found.
This "search" step is where the **RM (Retrieval Model Client)** comes in!
Think of the RM as:
* **A Specialized Librarian:** Your program asks it to find relevant information on a topic (the query).
* **A Search Engine Interface:** It connects your DSPy program to an external search system or database.
* **The Knowledge Fetcher:** It retrieves relevant text snippets (passages) to help the LM.
In this chapter, you'll learn:
* What an RM Client does and why it's essential for knowledge-intensive tasks.
* How to configure DSPy to use a specific Retrieval Model.
* How DSPy modules can use the configured RM to find information.
* A glimpse into how the RM fetches data behind the scenes.
Let's give our program access to external knowledge!
## What Does the RM Client Do?
The RM Client acts as a bridge between your DSPy program and an external knowledge source. Its main job is to:
1. **Receive a Search Query:** Your program gives it a text query (e.g., "reset Frobozz device").
2. **Interface with a Retrieval System:** It talks to the actual search engine or database. This could be:
* A **Vector Database:** Like Pinecone, Weaviate, Chroma, Milvus (great for searching based on meaning).
* A **Specialized Retrieval API:** Like ColBERTv2 (a powerful neural search model), You.com Search API, or a custom company search API.
* A **Local Index:** A search index built over your own files (e.g., using ColBERT locally).
3. **Fetch Relevant Passages:** It asks the retrieval system to find the top `k` most relevant text documents or passages based on the query.
4. **Return the Passages:** It gives these retrieved passages back to your DSPy program, usually as a list of text strings or structured objects.
The key goal is to provide **relevant context** that the [LM (Language Model Client)](05_lm__language_model_client_.md) can then use to perform its task more accurately, often within a structure called Retrieval-Augmented Generation (RAG).
## Configuring Which RM to Use
Just like we configured the LM in the previous chapter, we need to tell DSPy which RM to use. This is done using `dspy.settings.configure`.
First, you import and create an instance of the specific RM client you want to use. DSPy has built-in clients for several common retrieval systems.
**Example: Configuring ColBERTv2 (a hosted endpoint)**
ColBERTv2 is a powerful retrieval model. Let's imagine there's a public server running ColBERTv2 that has indexed Wikipedia.
1. **Import the client:**
```python
import dspy
```
*(For many RMs like ColBERTv2, Pinecone, Weaviate, the client is directly available under `dspy` or `dspy.retrieve`)*
2. **Create an instance:** You need to provide the URL and port (if applicable) of the ColBERTv2 server.
```python
# Assume a ColBERTv2 server is running at this URL indexing Wikipedia
colbertv2_wiki = dspy.ColBERTv2(url='http://your-colbertv2-endpoint.com:8893', port=None)
```
This creates an object `colbertv2_wiki` that knows how to talk to that specific ColBERTv2 server.
3. **Configure DSPy settings:** Tell DSPy globally that this is the RM to use.
```python
dspy.settings.configure(rm=colbertv2_wiki)
```
Now, any DSPy module that needs to retrieve information will automatically use the `colbertv2_wiki` instance.
**Using Other RMs (e.g., Pinecone, Weaviate)**
Configuring other RMs follows a similar pattern. You'll typically need to provide details like index names, API keys (often via environment variables), and the client object for that specific service.
```python
# Example: Configuring Pinecone (Conceptual - requires setup)
# from dspy.retrieve.pinecone_rm import PineconeRM
# Assumes PINECONE_API_KEY and PINECONE_ENVIRONMENT are set in environment
# pinecone_retriever = PineconeRM(
# pinecone_index_name='my-company-docs-index',
# # Assuming embeddings are done via OpenAI's model
# openai_embed_model='text-embedding-ada-002'
# )
# dspy.settings.configure(rm=pinecone_retriever)
# Example: Configuring Weaviate (Conceptual - requires setup)
# import weaviate
# from dspy.retrieve.weaviate_rm import WeaviateRM
# weaviate_client = weaviate.connect_to_local() # Or connect_to_wcs, etc.
# weaviate_retriever = WeaviateRM(
# weaviate_collection_name='my_manuals',
# weaviate_client=weaviate_client
# )
# dspy.settings.configure(rm=weaviate_retriever)
```
*(Don't worry about the specifics of connecting to Pinecone or Weaviate here; the key takeaway is the `dspy.settings.configure(rm=...)` pattern.)*
## How Modules Use the Configured RM: `dspy.Retrieve`
Usually, you don't call `dspy.settings.rm(...)` directly in your main program logic. Instead, you use a DSPy module designed for retrieval. The most basic one is `dspy.Retrieve`.
The `dspy.Retrieve` module is a simple [Module](01_module___program.md) whose job is to:
1. Take a query as input.
2. Call the currently configured RM (`dspy.settings.rm`).
3. Return the retrieved passages.
Here's how you typically use it within a DSPy `Program`:
```python
import dspy
# Assume RM is already configured (e.g., colbertv2_wiki from before)
# dspy.settings.configure(rm=colbertv2_wiki)
class SimpleRAG(dspy.Module):
def __init__(self, num_passages=3):
super().__init__()
# Initialize the Retrieve module, asking for top 3 passages
self.retrieve = dspy.Retrieve(k=num_passages)
# Initialize a Predict module to generate the answer
self.generate_answer = dspy.Predict('context, question -> answer')
def forward(self, question):
# 1. Retrieve relevant context using the configured RM
context = self.retrieve(query=question).passages # Note: Pass query=...
# 2. Generate the answer using the LM, providing context
prediction = self.generate_answer(context=context, question=question)
return prediction
# --- Let's try it ---
# Assume LM is also configured (e.g., gpt3_turbo from Chapter 5)
# dspy.settings.configure(lm=gpt3_turbo)
rag_program = SimpleRAG()
question = "What is the largest rodent?"
result = rag_program(question=question)
print(f"Question: {question}")
# The retrieve module would fetch passages about rodents...
# print(f"Context: {context}") # (Would show passages about capybaras, etc.)
print(f"Answer: {result.answer}")
```
**What's happening?**
1. `self.retrieve = dspy.Retrieve(k=3)`: Inside our `SimpleRAG` program, we create an instance of `dspy.Retrieve`. We tell it we want the top `k=3` passages.
2. `context = self.retrieve(query=question).passages`: In the `forward` method, we call the `retrieve` module with the input `question` as the `query`.
* **Crucially:** The `dspy.Retrieve` module automatically looks up `dspy.settings.rm` (our configured `colbertv2_wiki`).
* It calls `colbertv2_wiki(question, k=3)`.
* The RM client fetches the passages.
* `dspy.Retrieve` returns a `dspy.Prediction` object, and we access the list of passage texts using `.passages`.
3. `self.generate_answer(context=context, question=question)`: We then pass the fetched `context` (along with the original `question`) to our `generate_answer` module (a `dspy.Predict` instance), which uses the configured [LM](05_lm__language_model_client_.md) to produce the final answer.
**Expected Output (using a Wikipedia RM and a capable LM):**
```
Question: What is the largest rodent?
Answer: The largest rodent is the capybara.
```
The `dspy.Retrieve` module handles the interaction with the configured RM seamlessly.
## Calling the RM Directly (for Testing)
While `dspy.Retrieve` is the standard way, you *can* call the configured RM directly if you want to quickly test it or see what it returns.
```python
import dspy
# Assume colbertv2_wiki is configured as the RM
# dspy.settings.configure(rm=colbertv2_wiki)
query = "Stanford University mascot"
k = 2 # Ask for top 2 passages
# Call the configured RM directly
retrieved_passages = dspy.settings.rm(query, k=k)
# Print the results
print(f"Query: {query}")
print(f"Retrieved Passages (Top {k}):")
for i, passage in enumerate(retrieved_passages):
# RM clients often return dotdict objects with 'long_text'
print(f"--- Passage {i+1} ---")
print(passage.long_text) # Access the text content
```
**Expected Output (might vary depending on the RM and its index):**
```
Query: Stanford University mascot
Retrieved Passages (Top 2):
--- Passage 1 ---
Stanford Tree | Stanford University Athletics The Stanford Tree is the Stanford Band's mascot and the unofficial mascot of Stanford University. Stanford's team name is "Cardinal", referring to the vivid red color (not the bird as at several other schools). The Tree, in various versions, has been called one of America's most bizarre and controversial college mascots. The tree costume is created anew by the Band member selected to be the Tree each year. The Tree appears at football games, basketball games, and other Stanford Athletic events. Any current student may petition to become the Tree for the following year....
--- Passage 2 ---
Stanford Cardinal | The Official Site of Stanford Athletics Stanford University is home to 36 varsity sports programs, 20 for women and 16 for men. Stanford participates in the NCAA's Division I (Football Bowl Subdivision subdivision for football). Stanford is a member of the Pac-12 Conference in most sports; the men's and women's water polo teams are members of the Mountain Pacific Sports Federation, the men's volleyball team is a member of the Mountain Pacific Sports Federation, the field hockey team is a member of the America East Conference, and the sailing team competes in the Pacific Coast Collegiate Sailing Conference....
```
This shows how you can directly interact with the RM client configured in `dspy.settings`. Notice the output is often a list of `dspy.dsp.utils.dotdict` objects, where the actual text is usually in the `long_text` attribute. `dspy.Retrieve` conveniently extracts just the text into its `.passages` list.
## How It Works Under the Hood
Let's trace the journey of a query when using `dspy.Retrieve` within our `SimpleRAG` program:
1. **Module Call:** The `SimpleRAG` program's `forward` method calls `self.retrieve(query="What is the largest rodent?")`.
2. **Get RM Client:** The `dspy.Retrieve` module (`self.retrieve`) needs an RM. It looks up `dspy.settings.rm`. This returns the configured RM client object (e.g., our `colbertv2_wiki` instance).
3. **RM Client Call:** The `Retrieve` module calls the RM client object's `forward` (or `__call__`) method, passing the query and `k` (e.g., `colbertv2_wiki("What is the largest rodent?", k=3)`).
4. **External Interaction (Inside RM Client):**
* The `colbertv2_wiki` object (an instance of `dspy.ColBERTv2`) constructs an HTTP request to the ColBERTv2 server URL (`http://your-colbertv2-endpoint.com:8893`). The request includes the query and `k`.
* It sends the request over the network.
* The external ColBERTv2 server receives the request, searches its index (e.g., Wikipedia), and finds the top 3 relevant passages.
* The server sends the passages back in the HTTP response (often as JSON).
5. **Parse Response (Inside RM Client):** The `colbertv2_wiki` client receives the response, parses the JSON, and converts the passages into a list of `dspy.dsp.utils.dotdict` objects (each containing `long_text`, potentially `pid`, `score`, etc.).
6. **Return to Module:** The RM client returns this list of `dotdict` passages back to the `dspy.Retrieve` module.
7. **Extract Text:** The `Retrieve` module takes the list of `dotdict` objects and extracts the `long_text` from each, creating a simple list of strings.
8. **Return Prediction:** It packages this list of strings into a `dspy.Prediction` object under the `passages` key and returns it to the `SimpleRAG` program.
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant User
participant RAGProgram as SimpleRAG (forward)
participant RetrieveMod as dspy.Retrieve
participant Settings as dspy.settings
participant RMClient as RM Client (e.g., ColBERTv2)
participant ExtSearch as External Search (e.g., ColBERT Server)
User->>RAGProgram: Call with question="..."
RAGProgram->>RetrieveMod: Call retrieve(query=question)
RetrieveMod->>Settings: Get configured rm
Settings-->>RetrieveMod: Return RMClient instance
RetrieveMod->>RMClient: __call__(query, k=3)
RMClient->>ExtSearch: Send Search Request (query, k)
ExtSearch-->>RMClient: Return Found Passages
RMClient->>RMClient: Parse Response into dotdicts
RMClient-->>RetrieveMod: Return list[dotdict]
RetrieveMod->>RetrieveMod: Extract 'long_text' into list[str]
RetrieveMod-->>RAGProgram: Return Prediction(passages=list[str])
RAGProgram->>RAGProgram: Use context for LM call...
RAGProgram-->>User: Return final answer
```
**Relevant Code Files:**
* `dspy/retrieve/retrieve.py`: Defines the `dspy.Retrieve` module. Its `forward` method gets the query, retrieves the RM from `dspy.settings`, calls the RM, and processes the results into a `Prediction`.
* `dspy/dsp/colbertv2.py`: Defines the `dspy.ColBERTv2` client. Its `__call__` method makes HTTP requests (`requests.get` or `requests.post`) to a ColBERTv2 endpoint and parses the JSON response. (Other clients like `dspy/retrieve/pinecone_rm.py` or `dspy/retrieve/weaviate_rm.py` contain logic specific to those services).
* `dspy/dsp/utils/settings.py`: Where the configured `rm` instance is stored and accessed globally (as seen in [Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md)).
```python
# Simplified view from dspy/retrieve/retrieve.py
import dspy
from dspy.primitives.prediction import Prediction
class Retrieve(dspy.Module):
def __init__(self, k=3):
super().__init__()
self.k = k
def forward(self, query: str, k: Optional[int] = None) -> Prediction:
# Determine how many passages to retrieve
k = k if k is not None else self.k
# Get the configured RM client from global settings
rm_client = dspy.settings.rm
if not rm_client:
raise AssertionError("No RM is loaded. Configure with dspy.settings.configure(rm=...).")
# Call the RM client instance
# The RM client handles communication with the actual search system
passages_or_dotdicts = rm_client(query, k=k) # e.g., calls colbertv2_wiki(query, k=k)
# Ensure output is iterable and extract text
# (Simplified - handles different return types from RMs)
if isinstance(passages_or_dotdicts, list) and hasattr(passages_or_dotdicts[0], 'long_text'):
passages = [psg.long_text for psg in passages_or_dotdicts]
else:
# Assume it's already a list of strings or handle other cases
passages = list(passages_or_dotdicts)
# Return passages wrapped in a Prediction object
return Prediction(passages=passages)
# Simplified view from dspy/dsp/colbertv2.py
import requests
from dspy.dsp.utils import dotdict
class ColBERTv2:
def __init__(self, url: str, port: Optional[int] = None, **kwargs):
self.url = f"{url}:{port}" if port else url
# ... other init ...
def __call__(self, query: str, k: int = 10, **kwargs) -> list[dotdict]:
# Construct the payload for the API request
payload = {"query": query, "k": k}
try:
# Make the HTTP GET request to the ColBERTv2 server
res = requests.get(self.url, params=payload, timeout=10)
res.raise_for_status() # Raise an exception for bad status codes
# Parse the JSON response
json_response = res.json()
topk = json_response.get("topk", [])[:k]
# Convert results into dotdict objects for consistency
passages = [dotdict({**d, "long_text": d.get("text", "")}) for d in topk]
return passages
except requests.exceptions.RequestException as e:
print(f"Error calling ColBERTv2 server: {e}")
return [] # Return empty list on error
```
The key idea is abstraction: `dspy.Retrieve` uses whatever RM is configured in `dspy.settings`, and the specific RM client hides the details of talking to its particular backend search system.
## Conclusion
You've now met the **RM (Retrieval Model Client)**, your DSPy program's connection to external knowledge sources!
* An RM acts like a **librarian** or **search engine interface**.
* It takes a **query** and fetches **relevant text passages** from systems like vector databases (Pinecone, Weaviate) or APIs (ColBERTv2).
* It provides crucial **context** for LMs, enabling tasks like answering questions about recent events or private documents (Retrieval-Augmented Generation - RAG).
* You configure it globally using `dspy.settings.configure(rm=...)`.
* The `dspy.Retrieve` module is the standard way to use the configured RM within your programs.
With LMs providing reasoning and RMs providing knowledge, we can build powerful DSPy programs. But how do we know if our program is actually working well? How do we measure its performance? That's where evaluation comes in!
**Next:** [Chapter 7: Evaluate](07_evaluate.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

315
docs/DSPy/07_evaluate.md Normal file
View File

@@ -0,0 +1,315 @@
# Chapter 7: Evaluate - Grading Your Program
In the previous chapter, [Chapter 6: RM (Retrieval Model Client)](06_rm__retrieval_model_client_.md), we learned how to connect our DSPy program to external knowledge sources using Retrieval Models (RMs). We saw how combining RMs with Language Models (LMs) allows us to build sophisticated programs like Retrieval-Augmented Generation (RAG) systems.
Now that we can build these powerful programs, a crucial question arises: **How good are they?** If we build a RAG system to answer questions, how often does it get the answer right? How do we measure its performance objectively?
This is where **`dspy.Evaluate`** comes in! It's DSPy's built-in tool for testing and grading your programs.
Think of `dspy.Evaluate` as:
* **An Automated Grader:** Like a teacher grading a batch of homework assignments based on an answer key.
* **A Test Suite Runner:** Similar to how software developers use test suites to check if their code works correctly.
* **Your Program's Report Card:** It gives you a score that tells you how well your DSPy program is performing on a specific set of tasks.
In this chapter, you'll learn:
* What you need to evaluate a DSPy program.
* How to define a metric (a grading rule).
* How to use `dspy.Evaluate` to run the evaluation and get a score.
* How it works behind the scenes.
Let's learn how to grade our DSPy creations!
## The Ingredients for Evaluation
To grade your program using `dspy.Evaluate`, you need three main ingredients:
1. **Your DSPy `Program`:** The program you want to test. This could be a simple `dspy.Predict` module or a complex multi-step program like the `SimpleRAG` we sketched out in the last chapter.
2. **A Dataset (`devset`):** A list of `dspy.Example` objects ([Chapter 3: Example](03_example.md)). Crucially, these examples must contain not only the **inputs** your program expects but also the **gold standard outputs** (the correct answers or desired results) that you want to compare against. This dataset is often called a "development set" or "dev set".
3. **A Metric Function (`metric`):** A Python function you define. This function takes one gold standard `Example` and the `Prediction` generated by your program for that example's inputs. It then compares them and returns a score indicating how well the prediction matched the gold standard. The score is often `1.0` for a perfect match and `0.0` for a mismatch, but it can also be a fractional score (e.g., for F1 score).
`dspy.Evaluate` takes these three ingredients, runs your program on all examples in the dataset, uses your metric function to score each prediction against the gold standard, and finally reports the average score across the entire dataset.
## Evaluating a Simple Question Answering Program
Let's illustrate this with a simple example. Suppose we have a basic DSPy program that's supposed to answer simple questions.
```python
import dspy
# Assume we have configured an LM client (Chapter 5)
# gpt3_turbo = dspy.LM(model='openai/gpt-3.5-turbo')
# dspy.settings.configure(lm=gpt3_turbo)
# A simple program using dspy.Predict (Chapter 4)
class BasicQA(dspy.Module):
def __init__(self):
super().__init__()
# Use a simple signature: question -> answer
self.predictor = dspy.Predict('question -> answer')
def forward(self, question):
return self.predictor(question=question)
# Create an instance of our program
qa_program = BasicQA()
```
Now, let's prepare the other ingredients for evaluation.
**1. Prepare the Dataset (`devset`)**
We need a list of `dspy.Example` objects, each containing a `question` (input) and the correct `answer` (gold standard output).
```python
# Create example data points with questions and gold answers
dev_example1 = dspy.Example(question="What color is the sky?", answer="blue")
dev_example2 = dspy.Example(question="What is 2 + 2?", answer="4")
dev_example3 = dspy.Example(question="What is the capital of France?", answer="Paris")
dev_example_wrong = dspy.Example(question="Who wrote Hamlet?", answer="Shakespeare") # Let's assume our QA program might get this wrong
# Create the development set (list of examples)
devset = [dev_example1, dev_example2, dev_example3, dev_example_wrong]
# We need to tell DSPy which fields are inputs vs outputs for evaluation
# The .with_inputs() method marks the input keys.
# The remaining keys ('answer' in this case) are treated as labels.
devset = [d.with_inputs('question') for d in devset]
```
Here, we've created a small dataset `devset` with four question-answer pairs. We used `.with_inputs('question')` to mark the `question` field as the input; `dspy.Evaluate` will automatically treat the remaining field (`answer`) as the gold label to compare against.
**2. Define a Metric Function (`metric`)**
We need a function that compares the program's predicted answer to the gold answer in an example. Let's create a simple "exact match" metric.
```python
def simple_exact_match_metric(gold_example, prediction, trace=None):
# Does the predicted 'answer' EXACTLY match the gold 'answer'?
# '.answer' field comes from our Predict signature 'question -> answer'
# 'gold_example.answer' is the gold label from the devset example
return prediction.answer == gold_example.answer
# Note: DSPy often provides common metrics too, like dspy.evaluate.answer_exact_match
# import dspy.evaluate
# metric = dspy.evaluate.answer_exact_match
```
Our `simple_exact_match_metric` function takes the gold `dspy.Example` (`gold_example`) and the program's output `dspy.Prediction` (`prediction`). It returns `True` (which Python treats as `1.0`) if the predicted `answer` matches the gold `answer`, and `False` (`0.0`) otherwise. The `trace` argument is optional and can be ignored for basic metrics; it sometimes contains information about the program's execution steps.
**3. Create and Run `dspy.Evaluate`**
Now we have all the ingredients: `qa_program`, `devset`, and `simple_exact_match_metric`. Let's use `dspy.Evaluate`.
```python
from dspy.evaluate import Evaluate
# 1. Create the Evaluator instance
evaluator = Evaluate(
devset=devset, # The dataset to evaluate on
metric=simple_exact_match_metric, # The function to score predictions
num_threads=4, # Run 4 evaluations in parallel (optional)
display_progress=True, # Show a progress bar (optional)
display_table=True # Display results in a table (optional)
)
# 2. Run the evaluation by calling the evaluator with the program
# This will run qa_program on each example in devset,
# score it using simple_exact_match_metric, and return the average score.
average_score = evaluator(qa_program)
print(f"Average Score: {average_score}%")
```
**What happens here?**
1. We create an `Evaluate` object, providing our dataset and metric. We also request parallel execution (`num_threads=4`) for speed and ask for progress/table display.
2. We call the `evaluator` instance with our `qa_program`.
3. `Evaluate` iterates through `devset`:
* For `dev_example1`, it calls `qa_program(question="What color is the sky?")`. Let's assume the program predicts `answer="blue"`.
* It calls `simple_exact_match_metric(dev_example1, predicted_output)`. Since `"blue" == "blue"`, the score is `1.0`.
* It does the same for `dev_example2` (input: "What is 2 + 2?"). Assume prediction is `answer="4"`. Score: `1.0`.
* It does the same for `dev_example3` (input: "What is the capital of France?"). Assume prediction is `answer="Paris"`. Score: `1.0`.
* It does the same for `dev_example_wrong` (input: "Who wrote Hamlet?"). Maybe the simple LM messes up and predicts `answer="William Shakespeare"`. Since `"William Shakespeare" != "Shakespeare"`, the score is `0.0`.
4. `Evaluate` calculates the average score: `(1.0 + 1.0 + 1.0 + 0.0) / 4 = 0.75`.
5. It prints the average score as a percentage.
**Expected Output:**
A progress bar will be shown (if `tqdm` is installed), followed by a table like this (requires `pandas`):
```text
Average Metric: 3 / 4 (75.0%)
question answer simple_exact_match_metric
0 What color is the sky? blue ✔️ [True]
1 What is 2 + 2? 4 ✔️ [True]
2 What is the capital of France? Paris ✔️ [True]
3 Who wrote Hamlet? Shakespeare
```
*(Note: The table shows the predicted answer if different, otherwise just the metric outcome. The exact table format might vary slightly).*
And finally:
```text
Average Score: 75.0%
```
This tells us our simple QA program achieved 75% accuracy on our small development set using the exact match criterion.
## Getting More Details (Optional Flags)
Sometimes, just the average score isn't enough. You might want to see the score for each individual example or the actual predictions made by the program. `Evaluate` provides flags for this:
* `return_all_scores=True`: Returns the average score *and* a list containing the individual score for each example.
* `return_outputs=True`: Returns the average score *and* a list of tuples, where each tuple contains `(example, prediction, score)`.
```python
# Re-run evaluation asking for more details
evaluator_detailed = Evaluate(devset=devset, metric=simple_exact_match_metric)
# Get individual scores
avg_score, individual_scores = evaluator_detailed(qa_program, return_all_scores=True)
print(f"Individual Scores: {individual_scores}") # Output: [True, True, True, False]
# Get full outputs
avg_score, outputs_list = evaluator_detailed(qa_program, return_outputs=True)
# outputs_list[0] would be roughly: (dev_example1, Prediction(answer='blue'), True)
# outputs_list[3] would be roughly: (dev_example_wrong, Prediction(answer='William Shakespeare'), False)
print(f"Number of outputs returned: {len(outputs_list)}") # Output: 4
```
These flags are useful for more detailed error analysis to understand *where* your program is failing.
## How It Works Under the Hood
What happens internally when you call `evaluator(program)`?
1. **Initialization:** The `Evaluate` instance stores the `devset`, `metric`, `num_threads`, and other settings.
2. **Parallel Executor:** It creates a `ParallelExecutor` (if `num_threads > 1`) to manage running the evaluations concurrently.
3. **Iteration:** It iterates through each `example` in the `devset`.
4. **Program Execution:** For each `example`, it calls `program(**example.inputs())` (e.g., `qa_program(question=example.question)`). This runs your DSPy program's `forward` method to get a `prediction`.
5. **Metric Calculation:** It calls the provided `metric` function, passing it the original `example` (which contains the gold labels) and the `prediction` object returned by the program (e.g., `metric(example, prediction)`). This yields a `score`.
6. **Error Handling:** If running the program or the metric causes an error for a specific example, `Evaluate` catches it (up to `max_errors`), records a default `failure_score` (usually 0.0), and continues with the rest of the dataset.
7. **Aggregation:** It collects all the individual scores (including failure scores).
8. **Calculate Average:** It computes the average score by summing all scores and dividing by the total number of examples in the `devset`.
9. **Return Results:** It returns the average score (and optionally the individual scores or full output tuples based on the flags).
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant User
participant Evaluator as dspy.Evaluate
participant Executor as ParallelExecutor
participant Program as Your DSPy Program
participant Metric as Your Metric Function
User->>Evaluator: __call__(program)
Evaluator->>Executor: Create (manages threads)
loop For each example in devset
Executor->>Executor: Assign task to a thread
Note over Executor, Program: In parallel thread:
Executor->>Program: Call program(**example.inputs())
Program-->>Executor: Return prediction
Executor->>Metric: Call metric(example, prediction)
Metric-->>Executor: Return score
end
Executor->>Evaluator: Collect all results (predictions, scores)
Evaluator->>Evaluator: Calculate average score
Evaluator-->>User: Return average score (and other requested data)
```
**Relevant Code Files:**
* `dspy/evaluate/evaluate.py`: Defines the `Evaluate` class.
* The `__init__` method stores the configuration.
* The `__call__` method orchestrates the evaluation: sets up the `ParallelExecutor`, defines the `process_item` function (which runs the program and metric for one example), executes it over the `devset`, aggregates results, and handles display/return logic.
* `dspy/utils/parallelizer.py`: Contains the `ParallelExecutor` class used for running tasks concurrently across multiple threads or processes.
* `dspy/evaluate/metrics.py`: Contains implementations of common metrics like `answer_exact_match`.
```python
# Simplified view from dspy/evaluate/evaluate.py
# ... imports ...
from dspy.utils.parallelizer import ParallelExecutor
class Evaluate:
def __init__(self, devset, metric, num_threads=1, ..., failure_score=0.0):
self.devset = devset
self.metric = metric
self.num_threads = num_threads
self.display_progress = ...
self.display_table = ...
# ... store other flags ...
self.failure_score = failure_score
# @with_callbacks # Decorator handles optional logging/callbacks
def __call__(self, program, metric=None, devset=None, ...):
# Use provided args or fall back to instance attributes
metric = metric if metric is not None else self.metric
devset = devset if devset is not None else self.devset
num_threads = ... # Similar logic for other args
# Create executor for parallelism
executor = ParallelExecutor(num_threads=num_threads, ...)
# Define the work to be done for each example
def process_item(example):
try:
# Run the program with the example's inputs
prediction = program(**example.inputs())
# Call the metric function with the gold example and prediction
score = metric(example, prediction)
return prediction, score
except Exception as e:
# Handle errors during program/metric execution
# Log error, return None or failure score
print(f"Error processing example: {e}")
return None # Executor will handle None later
# Execute process_item for all examples in devset using the executor
raw_results = executor.execute(process_item, devset)
# Process results, handle failures (replace None with failure score)
results = []
for i, r in enumerate(raw_results):
example = devset[i]
if r is None: # Execution failed for this example
prediction, score = dspy.Prediction(), self.failure_score
else:
prediction, score = r
results.append((example, prediction, score))
# Calculate the average score
total_score = sum(score for *_, score in results)
num_examples = len(devset)
average_score = round(100 * total_score / num_examples, 2) if num_examples > 0 else 0
# Display table if requested
if self.display_table:
self._display_result_table(...) # Internal helper function
# Return results based on flags (return_all_scores, return_outputs)
# ... logic to construct return tuple ...
return average_score # Base return value
```
The core logic involves running the program and the metric function for each data point, handling potential errors, and averaging the results, with parallel processing to speed things up.
## Conclusion
You've now learned about `dspy.Evaluate`, the standard way to measure the performance of your DSPy programs!
* `Evaluate` acts as an **automated grader** for your DSPy modules.
* It requires three ingredients: your **program**, a **dataset (`devset`)** with gold labels, and a **metric function** to compare predictions against labels.
* It runs the program on the dataset, applies the metric, and reports the **average score**.
* It supports **parallel execution** for speed and offers options to display progress, show results tables, and return detailed outputs.
Knowing how well your program performs is essential. But what if the score isn't good enough? How can we *improve* the program, perhaps by automatically finding better prompts or few-shot examples?
That's precisely what **Teleprompters** (Optimizers) are designed for! Let's dive into how DSPy can help automatically optimize your programs next.
**Next:** [Chapter 8: Teleprompter / Optimizer](08_teleprompter___optimizer.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,314 @@
# Chapter 8: Teleprompter / Optimizer - Your Program's Coach
Welcome to Chapter 8! In [Chapter 7: Evaluate](07_evaluate.md), we learned how to grade our DSPy programs using metrics and datasets to see how well they perform. That's great for knowing our score, but what if the score isn't high enough?
Think about building our `BasicQA` program from the last chapter. Maybe we tried running it and found it only got 75% accuracy. How do we improve it?
Traditionally, we might start **manually tweaking prompts**:
* "Maybe I should rephrase the instructions?"
* "Should I add some examples (few-shot demonstrations)?"
* "Which examples work best?"
This manual process, often called "prompt engineering," can be slow, tedious, and requires a lot of guesswork. Wouldn't it be amazing if DSPy could **automatically figure out the best prompts and examples** for us?
That's exactly what **Teleprompters** (also called Optimizers) do! They are DSPy's built-in automated prompt engineers and program tuners.
Think of a Teleprompter as a **coach** for your DSPy program (the 'student'):
* The coach observes how the student performs on practice drills (a dataset).
* It uses feedback (a metric) to figure out weaknesses.
* It suggests new strategies (better instructions, better examples) to improve performance.
* It repeats this until the student performs much better!
In this chapter, we'll learn:
* What a Teleprompter is and the problem it solves.
* The key ingredients needed to use a Teleprompter.
* How to use a simple Teleprompter (`BootstrapFewShot`) to automatically find good few-shot examples.
* The basic idea behind how Teleprompters optimize programs.
Let's automate the improvement process!
## What is a Teleprompter / Optimizer?
A `Teleprompter` in DSPy is an algorithm that takes your DSPy [Program](01_module___program.md) (the 'student') and automatically tunes its internal parameters to maximize performance on a given task. These parameters are most often:
1. **Instructions:** The natural language guidance given to the Language Models ([LM](05_lm__language_model_client_.md)) within your program's modules (like `dspy.Predict`).
2. **Few-Shot Examples (Demos):** The `dspy.Example` objects provided in prompts to show the LM how to perform the task.
Some advanced Teleprompters can even fine-tune the weights of the LM itself!
To work its magic, a Teleprompter needs three things (sound familiar? They're similar to evaluation!):
1. **The Student Program:** The DSPy program you want to improve.
2. **A Training Dataset (`trainset`):** A list of `dspy.Example` objects ([Chapter 3: Example](03_example.md)) representing the task. The Teleprompter will use this data to practice and learn.
3. **A Metric Function (`metric`):** The same kind of function we used in [Chapter 7: Evaluate](07_evaluate.md). It tells the Teleprompter how well the student program is doing on each example in the `trainset`.
The Teleprompter uses the `metric` to guide its search for better instructions or demos, trying different combinations and keeping the ones that yield the highest score on the `trainset`. The output is an **optimized version of your student program**.
## Use Case: Automatically Finding Good Few-Shot Examples with `BootstrapFewShot`
Let's revisit our `BasicQA` program and the evaluation setup from Chapter 7.
```python
import dspy
from dspy.evaluate import Evaluate
# Assume LM is configured (e.g., dspy.settings.configure(lm=...))
# Our simple program
class BasicQA(dspy.Module):
def __init__(self):
super().__init__()
self.predictor = dspy.Predict('question -> answer')
def forward(self, question):
return self.predictor(question=question)
# Our metric from Chapter 7
def simple_exact_match_metric(gold, prediction, trace=None):
return prediction.answer.lower() == gold.answer.lower()
# Our dataset from Chapter 7 (let's use it as a trainset now)
dev_example1 = dspy.Example(question="What color is the sky?", answer="blue")
dev_example2 = dspy.Example(question="What is 2 + 2?", answer="4")
dev_example3 = dspy.Example(question="What is the capital of France?", answer="Paris")
# Example our program might struggle with initially
dev_example_hard = dspy.Example(question="Who painted the Mona Lisa?", answer="Leonardo da Vinci")
trainset = [dev_example1, dev_example2, dev_example3, dev_example_hard]
trainset = [d.with_inputs('question') for d in trainset]
# Let's evaluate the initial program (likely imperfect)
initial_program = BasicQA()
evaluator = Evaluate(devset=trainset, metric=simple_exact_match_metric, display_progress=False)
initial_score = evaluator(initial_program)
print(f"Initial Score (on trainset): {initial_score}%")
# Might output: Initial Score (on trainset): 75.0% (assuming it fails the last one)
```
Our initial program gets 75%. We could try adding few-shot examples manually, but which ones? And how many?
Let's use `dspy.teleprompt.BootstrapFewShot`. This Teleprompter automatically creates and selects few-shot demonstrations for the predictors in your program.
**1. Import the Teleprompter:**
```python
from dspy.teleprompt import BootstrapFewShot
```
**2. Instantiate the Teleprompter:**
We need to give it the `metric` function it should use to judge success. We can also specify how many candidate demos (`max_bootstrapped_demos`) it should try to find for each predictor.
```python
# Configure the BootstrapFewShot optimizer
# It will use the metric to find successful demonstrations
# max_bootstrapped_demos=4 means it will try to find up to 4 good examples for EACH predictor
config = dict(max_bootstrapped_demos=4, metric=simple_exact_match_metric)
teleprompter = BootstrapFewShot(**config)
```
**3. Compile the Program:**
This is the main step. We call the Teleprompter's `compile` method, giving it our initial `student` program and the `trainset`. It returns a *new*, optimized program.
```python
# Compile the program!
# This runs the optimization process using the trainset.
# It uses a 'teacher' model (often the student itself or a copy)
# to generate traces, finds successful ones via the metric,
# and adds them as demos to the student's predictors.
compiled_program = teleprompter.compile(student=initial_program, trainset=trainset)
# The 'compiled_program' is a new instance of BasicQA,
# but its internal predictor now has few-shot examples added!
```
**What just happened?**
Behind the scenes, `BootstrapFewShot` (conceptually):
* Used a "teacher" program (often a copy of the student or another specified LM configuration) to run each example in the `trainset`.
* For each example, it checked if the teacher's output was correct using our `simple_exact_match_metric`.
* If an example was processed correctly, the Teleprompter saved the input/output pair as a potential "demonstration" (a good example).
* It collected these successful demonstrations.
* It assigned a selection of these good demonstrations (`max_bootstrapped_demos`) to the `demos` attribute of the corresponding predictor inside our `compiled_program`.
**4. Evaluate the Compiled Program:**
Now, let's see if the optimized program performs better on the same `trainset`.
```python
# Evaluate the compiled program
compiled_score = evaluator(compiled_program)
print(f"Compiled Score (on trainset): {compiled_score}%")
# If the optimization worked, the score should be higher!
# Might output: Compiled Score (on trainset): 100.0%
```
If `BootstrapFewShot` found good examples (like the "Mona Lisa" one after the teacher model successfully answered it), the `compiled_program` now has these examples embedded in its prompts, helping the LM perform better on similar questions. We automated the process of finding effective few-shot examples!
## How Optimization Works (Conceptual)
Different Teleprompters use different strategies, but the core idea is usually:
1. **Goal:** Find program parameters (instructions, demos) that maximize the `metric` score on the `trainset`.
2. **Search Space:** The "space" of all possible instructions or combinations of demos.
3. **Search Strategy:** How the Teleprompter explores this space.
* `BootstrapFewShot`: Generates candidate demos based on successful teacher executions.
* Other optimizers (like `COPRO` or `MIPROv2` mentioned in the code snippets) might use an LM to *propose* new instructions, evaluate them, and iterate. Some use sophisticated search algorithms like Bayesian Optimization or random search.
4. **Evaluation:** Use the `metric` and `trainset` to score each candidate configuration (e.g., a program with specific demos or instructions).
5. **Selection:** Keep the configuration that resulted in the best score.
**Analogy Revisited:**
* **Coach:** The Teleprompter algorithm (`BootstrapFewShot`).
* **Student:** Your DSPy `Program` (`initial_program`).
* **Practice Drills:** The `trainset`.
* **Scoring:** The `metric` function (`simple_exact_match_metric`).
* **Trying Techniques:** Generating/selecting different demos or instructions.
* **Adopting Best Techniques:** Creating the `compiled_program` with the highest-scoring demos/instructions found.
## How It Works Under the Hood (`BootstrapFewShot` Peek)
Let's briefly look at the internal flow for `BootstrapFewShot.compile()`:
1. **Prepare Teacher:** It sets up a 'teacher' program. This is often a copy of the student program, sometimes configured with specific settings (like a higher temperature for more exploration) or potentially using labeled examples if provided (`LabeledFewShot` within `BootstrapFewShot`).
2. **Iterate Trainset:** It goes through each `example` in the `trainset`.
3. **Teacher Execution:** For each `example`, it runs the `teacher` program (`teacher(**example.inputs())`). This happens within a `dspy.settings.context` block to capture the execution `trace`.
4. **Metric Check:** It uses the provided `metric` to compare the `teacher`'s prediction against the `example`'s gold label (`metric(example, prediction, trace)`).
5. **Collect Demos:** If the `metric` returns success (e.g., `True` or a score above a threshold), the Teleprompter extracts the input/output steps from the execution `trace`. Each successful trace step can become a candidate `dspy.Example` demonstration.
6. **Assign Demos:** After iterating through the `trainset`, it takes the collected successful demonstrations (up to `max_bootstrapped_demos`) and assigns them to the `demos` attribute of the corresponding predictors in the `student` program instance.
7. **Return Compiled Student:** It returns the modified `student` program, which now contains the bootstrapped few-shot examples.
```mermaid
sequenceDiagram
participant User
participant Teleprompter as BootstrapFewShot
participant StudentProgram as Student Program
participant TeacherProgram as Teacher Program
participant LM as Language Model
participant Metric as Metric Function
participant CompiledProgram as Compiled Program (Student with Demos)
User->>Teleprompter: compile(student=StudentProgram, trainset=...)
Teleprompter->>TeacherProgram: Set up (copy of student, potentially modified)
loop For each example in trainset
Teleprompter->>TeacherProgram: Run example.inputs()
TeacherProgram->>LM: Make calls (via Predictors)
LM-->>TeacherProgram: Return predictions
TeacherProgram-->>Teleprompter: Return final prediction & trace
Teleprompter->>Metric: Evaluate(example, prediction, trace)
Metric-->>Teleprompter: Return score (success/fail)
alt Metric returns success
Teleprompter->>Teleprompter: Extract demo from trace
end
end
Teleprompter->>StudentProgram: Assign selected demos to predictors
StudentProgram-->>CompiledProgram: Create compiled version
Teleprompter-->>User: Return CompiledProgram
```
**Relevant Code Files:**
* `dspy/teleprompt/teleprompt.py`: Defines the base `Teleprompter` class.
* `dspy/teleprompt/bootstrap.py`: Contains the implementation for `BootstrapFewShot`. Key methods include `compile` (orchestrates the process) and `_bootstrap_one_example` (handles running the teacher and checking the metric for a single training example).
```python
# Simplified view from dspy/teleprompt/bootstrap.py
# ... imports ...
from .teleprompt import Teleprompter
from .vanilla import LabeledFewShot # Used for teacher setup if labeled demos are needed
import dspy
class BootstrapFewShot(Teleprompter):
def __init__(self, metric=None, max_bootstrapped_demos=4, ...):
self.metric = metric
self.max_bootstrapped_demos = max_bootstrapped_demos
# ... other initializations ...
def compile(self, student, *, teacher=None, trainset):
self.trainset = trainset
self._prepare_student_and_teacher(student, teacher) # Sets up self.student and self.teacher
self._prepare_predictor_mappings() # Links student predictors to teacher predictors
self._bootstrap() # Runs the core bootstrapping logic
self.student = self._train() # Assigns collected demos to the student
self.student._compiled = True
return self.student
def _bootstrap(self):
# ... setup ...
self.name2traces = {name: [] for name in self.name2predictor} # Store successful traces per predictor
for example_idx, example in enumerate(tqdm.tqdm(self.trainset)):
# ... logic to stop early if enough demos found ...
success = self._bootstrap_one_example(example, round_idx=0) # Try to get a demo from this example
# ... potentially multiple rounds ...
# ... logging ...
def _bootstrap_one_example(self, example, round_idx=0):
# ... setup teacher context (e.g., temperature) ...
try:
with dspy.settings.context(trace=[], **self.teacher_settings):
# Optionally modify teacher LM settings for exploration
# ...
# Run the teacher program
prediction = self.teacher(**example.inputs())
trace = dspy.settings.trace # Get the execution trace
# Evaluate the prediction using the metric
if self.metric:
metric_val = self.metric(example, prediction, trace)
# Determine success based on metric value/threshold
success = bool(metric_val) # Simplified
else:
success = True # Assume success if no metric provided
except Exception:
success = False
# ... error handling ...
if success:
# If successful, extract demos from the trace
for step in trace:
predictor, inputs, outputs = step
demo = dspy.Example(augmented=True, **inputs, **outputs)
try:
predictor_name = self.predictor2name[id(predictor)]
# Store the successful demo example
self.name2traces[predictor_name].append(demo)
except KeyError:
continue # Handle potential issues finding the predictor
return success
def _train(self):
# Assign the collected demos to the student's predictors
for name, predictor in self.student.named_predictors():
demos_for_predictor = self.name2traces[name][:self.max_bootstrapped_demos]
# Potentially mix with labeled demos if configured
# ...
predictor.demos = demos_for_predictor # Assign the demos!
return self.student
```
This simplified view shows the core loop: run the teacher, check the metric, collect successful traces as demos, and finally assign those demos to the student program.
## Conclusion
You've now learned about DSPy's **Teleprompters / Optimizers**, the powerful tools for automating prompt engineering!
* Teleprompters act like **coaches**, automatically tuning your DSPy programs (students).
* They optimize parameters like **instructions** and **few-shot examples (demos)**.
* They require a **student program**, a **training dataset**, and a **metric** function.
* We saw how `BootstrapFewShot` automatically finds effective few-shot examples by running a teacher model and collecting successful execution traces.
* The result of `teleprompter.compile()` is an **optimized program** instance, ready to be used or evaluated further.
Teleprompters save you from the tedious process of manual tuning, allowing you to build high-performing LM-based programs more efficiently.
Now that we understand how to build, evaluate, and automatically optimize DSPy programs, how can we make them interact smoothly with different data formats or models, especially when integrating with other systems? That's where **Adapters** come in.
**Next:** [Chapter 9: Adapter](09_adapter.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

362
docs/DSPy/09_adapter.md Normal file
View File

@@ -0,0 +1,362 @@
# Chapter 9: Adapter - The Universal Translator
Welcome to Chapter 9! In [Chapter 8: Teleprompter / Optimizer](08_teleprompter___optimizer.md), we saw how DSPy can automatically optimize our programs by finding better prompts or few-shot examples. We ended up with a `compiled_program` that should perform better.
Now, this optimized program needs to communicate with a Language Model ([LM](05_lm__language_model_client_.md)) to actually do its work. But here's a potential challenge: different types of LMs expect different kinds of input!
* Older **Completion Models** (like GPT-3 `davinci`) expect a single, long text prompt.
* Newer **Chat Models** (like GPT-4, Claude 3, Llama 3 Chat) expect a structured list of messages, each with a role (like "system", "user", or "assistant").
Our DSPy program, using its [Signature](02_signature.md), defines the task in an abstract way (inputs, outputs, instructions). How does this abstract definition get translated into the specific format required by the LM we're using, especially these modern chat models?
That's where the **`Adapter`** comes in! It acts like a universal translator.
Think of it like this:
* Your DSPy program (using a `Signature`) has a message it wants to send to the LM.
* The LM speaks a specific language (e.g., "chat message list" language).
* The `Adapter` translates your program's message into the LM's language, handles the conversation, and translates the LM's reply back into a format your DSPy program understands.
In this chapter, you'll learn:
* What problem Adapters solve.
* What an `Adapter` does (formatting and parsing).
* How they allow your DSPy code to work with different LMs seamlessly.
* How they work behind the scenes (mostly automatically!).
Let's meet the translator!
## The Problem: Different LMs, Different Languages
Imagine you have a DSPy Signature for summarizing text:
```python
import dspy
class Summarize(dspy.Signature):
"""Summarize the given text."""
text = dspy.InputField(desc="The text to summarize.")
summary = dspy.OutputField(desc="A concise summary.")
```
And you use it in a `dspy.Predict` module:
```python
# Assume LM is configured (Chapter 5)
summarizer = dspy.Predict(Summarize)
long_text = "DSPy is a framework for programming foundation models..." # (imagine longer text)
result = summarizer(text=long_text)
# We expect result.summary to contain the summary
```
Now, if the configured LM is a **completion model**, the `summarizer` needs to create a single prompt like:
```text
Summarize the given text.
---
Follow the following format.
Text: ${text}
Summary: ${summary}
---
Text: DSPy is a framework for programming foundation models...
Summary:
```
But if the configured LM is a **chat model**, it needs a structured list of messages, perhaps like this:
```python
[
{"role": "system", "content": "Summarize the given text.\n\nFollow the following format.\n\nText: ${text}\nSummary: ${summary}"},
{"role": "user", "content": "Text: DSPy is a framework for programming foundation models...\nSummary:"}
]
```
*(Simplified - actual chat formatting can be more complex)*
How does `dspy.Predict` know which format to use? And how does it extract the `summary` from the potentially differently formatted responses? It doesn't! That's the job of the **Adapter**.
## What Does an Adapter Do?
An `Adapter` is a component that sits between your DSPy module (like `dspy.Predict`) and the [LM Client](05_lm__language_model_client_.md). Its main tasks are:
1. **Formatting:** It takes the abstract information from DSPy the [Signature](02_signature.md) (instructions, input/output fields), any few-shot `demos` ([Example](03_example.md)), and the current `inputs` and **formats** it into the specific structure the target LM expects (either a single string or a list of chat messages).
2. **Parsing:** After the LM generates its response (which is usually just raw text), the `Adapter` **parses** this text to extract the values for the output fields defined in the `Signature` (like extracting the generated `summary` text).
The most common adapter is the `dspy.adapters.ChatAdapter`, which is specifically designed to translate between the DSPy format and the message list format expected by chat models.
## Why Use Adapters? Flexibility!
The main benefit of using Adapters is **flexibility**.
* **Write Once, Run Anywhere:** Your core DSPy program logic (your `Module`s, `Program`s, and `Signature`s) remains the same regardless of whether you're using a completion LM or a chat LM.
* **Easy Switching:** You can switch the underlying [LM Client](05_lm__language_model_client_.md) (e.g., from OpenAI GPT-3 to Anthropic Claude 3) in `dspy.settings`, and the appropriate Adapter (usually the default `ChatAdapter`) handles the communication differences automatically.
* **Standard Interface:** Adapters ensure that modules like `dspy.Predict` have a consistent way to interact with LMs, hiding the complexities of different API formats.
## How Adapters Work: Format and Parse
Let's look conceptually at what the `ChatAdapter` does:
**1. Formatting (`format` method):**
Imagine calling our `summarizer` with one demo example:
```python
# Demo example
demo = dspy.Example(
text="Long article about cats.",
summary="Cats are popular pets."
).with_inputs("text")
# Call the summarizer with the demo
result = summarizer(text=long_text, demos=[demo])
```
The `ChatAdapter`'s `format` method might take the `Summarize` signature, the `demo`, and the `long_text` input and produce a list of messages like this:
```python
# Conceptual Output of ChatAdapter.format()
[
# 1. System message from Signature instructions
{"role": "system", "content": "Summarize the given text.\n\n---\n\nFollow the following format.\n\nText: ${text}\nSummary: ${summary}\n\n---\n\n"},
# 2. User turn for the demo input
{"role": "user", "content": "Text: Long article about cats.\nSummary:"},
# 3. Assistant turn for the demo output
{"role": "assistant", "content": "Summary: Cats are popular pets."}, # (Might use special markers like [[ ## Summary ## ]])
# 4. User turn for the actual input
{"role": "user", "content": "Text: DSPy is a framework for programming foundation models...\nSummary:"}
]
```
*(Note: `ChatAdapter` uses specific markers like `[[ ## field_name ## ]]` to clearly separate fields in the content, making parsing easier)*
This message list is then passed to the chat-based LM Client.
**2. Parsing (`parse` method):**
The chat LM responds, likely mimicking the format. Its response might be a string like:
```text
[[ ## summary ## ]]
DSPy helps build and optimize language model pipelines.
```
The `ChatAdapter`'s `parse` method takes this string. It looks for the markers (`[[ ## summary ## ]]`) defined by the `Summarize` signature's output fields. It extracts the content associated with each marker and returns a dictionary:
```python
# Conceptual Output of ChatAdapter.parse()
{
"summary": "DSPy helps build and optimize language model pipelines."
}
```
This dictionary is then packaged into the `dspy.Prediction` object (as `result.summary`) that your `summarizer` module returns.
## Using Adapters (It's Often Automatic!)
The good news is that you usually don't interact with Adapters directly. Modules like `dspy.Predict` are designed to use the currently configured adapter automatically.
DSPy sets a default adapter (usually `ChatAdapter`) in its global `dspy.settings`. When you configure your [LM Client](05_lm__language_model_client_.md) like this:
```python
import dspy
# Configure LM (Chapter 5)
# turbo = dspy.LM(model='openai/gpt-3.5-turbo')
# dspy.settings.configure(lm=turbo)
# Default Adapter (ChatAdapter) is usually active automatically!
# You typically DON'T need to configure it unless you want a different one.
# dspy.settings.configure(adapter=dspy.adapters.ChatAdapter())
```
Now, when you use `dspy.Predict` or other modules that call LMs, they will internally use `dspy.settings.adapter` (the `ChatAdapter` in this case) to handle the formatting and parsing needed to talk to the configured `dspy.settings.lm` (`turbo`).
```python
# The summarizer automatically uses the configured LM and Adapter
summarizer = dspy.Predict(Summarize)
result = summarizer(text=long_text) # Adapter works its magic here!
print(result.summary)
```
You write your DSPy code at a higher level of abstraction, and the Adapter handles the translation details for you.
## How It Works Under the Hood
Let's trace the flow when `summarizer(text=long_text)` is called, assuming a chat LM and the `ChatAdapter` are configured:
1. **`Predict.__call__`:** The `summarizer` (`dspy.Predict`) instance is called.
2. **Get Components:** It retrieves the `Signature` (`Summarize`), `demos`, `inputs` (`text`), the configured `LM` client, and the configured `Adapter` (e.g., `ChatAdapter`) from `dspy.settings`.
3. **`Adapter.__call__`:** `Predict` calls the `Adapter` instance, passing it the LM, signature, demos, and inputs.
4. **`Adapter.format`:** The `Adapter`'s `__call__` method first calls its own `format` method. `ChatAdapter.format` generates the list of chat messages (system prompt, demo turns, final user turn).
5. **`LM.__call__`:** The `Adapter`'s `__call__` method then passes the formatted messages to the `LM` client instance (e.g., `turbo(messages=...)`).
6. **API Call:** The `LM` client sends the messages to the actual LM API (e.g., OpenAI API).
7. **API Response:** The LM API returns the generated completion text (e.g., `[[ ## summary ## ]]\nDSPy helps...`).
8. **`LM.__call__` Returns:** The `LM` client returns the raw completion string(s) back to the `Adapter`.
9. **`Adapter.parse`:** The `Adapter`'s `__call__` method calls its own `parse` method with the completion string. `ChatAdapter.parse` extracts the content based on the `[[ ## ... ## ]]` markers and the `Signature`'s output fields.
10. **`Adapter.__call__` Returns:** The `Adapter` returns a list of dictionaries, each representing a parsed completion (e.g., `[{'summary': 'DSPy helps...'}]`).
11. **`Predict.__call__` Returns:** `Predict` packages these parsed dictionaries into `dspy.Prediction` objects and returns the result.
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant User
participant PredictMod as dspy.Predict (summarizer)
participant Adapter as Adapter (e.g., ChatAdapter)
participant LMClient as LM Client (e.g., turbo)
participant LMApi as Actual LM API
User->>PredictMod: Call summarizer(text=...)
PredictMod->>Adapter: __call__(lm=LMClient, signature, demos, inputs)
Adapter->>Adapter: format(signature, demos, inputs)
Adapter-->>Adapter: Return formatted_messages (list)
Adapter->>LMClient: __call__(messages=formatted_messages)
LMClient->>LMApi: Send API Request
LMApi-->>LMClient: Return raw_completion_text
LMClient-->>Adapter: Return raw_completion_text
Adapter->>Adapter: parse(signature, raw_completion_text)
Adapter-->>Adapter: Return parsed_output (dict)
Adapter-->>PredictMod: Return list[parsed_output]
PredictMod->>PredictMod: Create Prediction object(s)
PredictMod-->>User: Return Prediction object(s)
```
**Relevant Code Files:**
* `dspy/adapters/base.py`: Defines the abstract `Adapter` class.
* Requires subclasses to implement `format` and `parse`.
* The `__call__` method orchestrates the format -> LM call -> parse sequence.
* `dspy/adapters/chat_adapter.py`: Defines `ChatAdapter`, the default implementation.
* `format`: Implements logic to create the system/user/assistant message list, using `[[ ## ... ## ]]` markers. Includes helper functions like `format_turn` and `prepare_instructions`.
* `parse`: Implements logic to find the `[[ ## ... ## ]]` markers in the LM's output string and extract the corresponding values.
* `dspy/predict/predict.py`: The `Predict` module's `forward` method retrieves the adapter from `dspy.settings` and calls it.
```python
# Simplified view from dspy/adapters/base.py
from abc import ABC, abstractmethod
# ... other imports ...
class Adapter(ABC):
# ... init ...
# The main orchestration method
def __call__(
self,
lm: "LM",
lm_kwargs: dict[str, Any],
signature: Type[Signature],
demos: list[dict[str, Any]],
inputs: dict[str, Any],
) -> list[dict[str, Any]]:
# 1. Format the inputs for the LM
# Returns either a string or list[dict] (for chat)
formatted_input = self.format(signature, demos, inputs)
# Prepare arguments for the LM call
lm_call_args = dict(prompt=formatted_input) if isinstance(formatted_input, str) else dict(messages=formatted_input)
# 2. Call the Language Model Client
outputs = lm(**lm_call_args, **lm_kwargs) # Returns list of strings or dicts
# 3. Parse the LM outputs
parsed_values = []
for output in outputs:
# Extract raw text (simplified)
raw_text = output if isinstance(output, str) else output["text"]
# Parse the raw text based on the signature
value = self.parse(signature, raw_text)
# Validate fields (simplified)
# ...
parsed_values.append(value)
return parsed_values
@abstractmethod
def format(self, signature, demos, inputs) -> list[dict[str, Any]] | str:
# Subclasses must implement this to format input for the LM
raise NotImplementedError
@abstractmethod
def parse(self, signature: Type[Signature], completion: str) -> dict[str, Any]:
# Subclasses must implement this to parse the LM's output string
raise NotImplementedError
# ... other helper methods (format_fields, format_turn, etc.) ...
# Simplified view from dspy/adapters/chat_adapter.py
# ... imports ...
import re
field_header_pattern = re.compile(r"\[\[ ## (\w+) ## \]\]") # Matches [[ ## field_name ## ]]
class ChatAdapter(Adapter):
# ... init ...
def format(self, signature, demos, inputs) -> list[dict[str, Any]]:
messages = []
# 1. Create system message from signature instructions
# (Uses helper `prepare_instructions`)
prepared_instructions = prepare_instructions(signature)
messages.append({"role": "system", "content": prepared_instructions})
# 2. Format demos into user/assistant turns
# (Uses helper `format_turn`)
for demo in demos:
messages.append(self.format_turn(signature, demo, role="user"))
messages.append(self.format_turn(signature, demo, role="assistant"))
# 3. Format final input into a user turn
# (Handles chat history if present, uses `format_turn`)
# ... logic for chat history or simple input ...
messages.append(self.format_turn(signature, inputs, role="user"))
# Expand image tags if needed
messages = try_expand_image_tags(messages)
return messages
def parse(self, signature: Type[Signature], completion: str) -> dict[str, Any]:
# Logic to split completion string by [[ ## field_name ## ]] markers
# Finds matches using `field_header_pattern`
sections = self._split_completion_by_markers(completion)
fields = {}
for field_name, field_content in sections:
if field_name in signature.output_fields:
try:
# Use helper `parse_value` to cast string to correct type
fields[field_name] = parse_value(field_content, signature.output_fields[field_name].annotation)
except Exception as e:
# Handle parsing errors
# ...
pass
# Check if all expected output fields were found
# ...
return fields
# ... helper methods: format_turn, format_fields, _split_completion_by_markers ...
```
The key takeaway is that `Adapter` subclasses provide concrete implementations for `format` (DSPy -> LM format) and `parse` (LM output -> DSPy format), enabling smooth communication.
## Conclusion
You've now met the **`Adapter`**, DSPy's universal translator!
* Adapters solve the problem of **different LMs expecting different input formats** (e.g., completion prompts vs. chat messages).
* They act as a bridge, **formatting** DSPy's abstract [Signature](02_signature.md), demos, and inputs into the LM-specific format, and **parsing** the LM's raw output back into structured DSPy data.
* The primary benefit is **flexibility**, allowing you to use the same DSPy program with various LM types without changing your core logic.
* Adapters like `ChatAdapter` usually work **automatically** behind the scenes, configured via `dspy.settings`.
With Adapters handling the translation, LM Clients providing the connection, and RMs fetching knowledge, we have a powerful toolkit. But how do we manage all these configurations globally? That's the role of `dspy.settings`.
**Next:** [Chapter 10: Settings](10_settings.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

367
docs/DSPy/10_settings.md Normal file
View File

@@ -0,0 +1,367 @@
# Chapter 10: Settings - Your Program's Control Panel
Welcome to the final chapter of our introductory DSPy tutorial! In [Chapter 9: Adapter](09_adapter.md), we saw how Adapters act as translators, allowing our DSPy programs to communicate seamlessly with different types of Language Models (LMs).
Throughout the previous chapters, we've seen snippets like `dspy.settings.configure(lm=...)` and `dspy.settings.configure(rm=...)`. We mentioned that modules like `dspy.Predict` or `dspy.Retrieve` automatically find and use these configured components. But how does this central configuration work? How do we manage these important defaults for our entire project?
That's where **`dspy.settings`** comes in! It's the central control panel for your DSPy project.
Think of `dspy.settings` like the **Defaults menu** in a software application:
* You set your preferred font, theme, or language once in the settings.
* The entire application then uses these defaults unless you specifically choose something different for a particular document or window.
`dspy.settings` does the same for your DSPy programs. It holds the default [LM (Language Model Client)](05_lm__language_model_client_.md), [RM (Retrieval Model Client)](06_rm__retrieval_model_client_.md), and [Adapter](09_adapter.md) that your modules will use.
In this chapter, you'll learn:
* Why a central settings object is useful.
* How to configure global defaults using `dspy.settings.configure`.
* How modules automatically use these settings.
* How to temporarily override settings for specific parts of your code using `dspy.context`.
Let's learn how to manage our program's defaults!
## Why Use `dspy.settings`?
Imagine building a complex DSPy [Program](01_module___program.md) with many sub-modules that need to call an LM or an RM. Without a central settings object, you might have to pass the LM and RM instances explicitly to every single module during initialization or when calling them. This would be tedious and make your code harder to manage.
```python
# --- WITHOUT dspy.settings (Conceptual - DON'T DO THIS) ---
import dspy
# Assume lm_instance and rm_instance are created somewhere
class GenerateSearchQuery(dspy.Module):
def __init__(self, lm): # Needs LM passed in
self.predictor = dspy.Predict('question -> query', lm=lm) # Pass LM to Predict
# ... forward ...
class RetrieveContext(dspy.Module):
def __init__(self, rm): # Needs RM passed in
self.retriever = dspy.Retrieve(rm=rm, k=3) # Pass RM to Retrieve
# ... forward ...
# ... other modules needing lm or rm ...
class ComplexRAG(dspy.Module):
def __init__(self, lm, rm): # Needs LM and RM passed in
self.gen_query = GenerateSearchQuery(lm=lm) # Pass LM down
self.retrieve = RetrieveContext(rm=rm) # Pass RM down
# ... other sub-modules needing lm or rm ...
def forward(self, question, lm=None, rm=None): # Maybe pass them here too? Messy!
# ... use sub-modules ...
```
This gets complicated quickly!
`dspy.settings` solves this by providing a single, global place to store these configurations. You configure it once, and all DSPy modules can access the defaults they need automatically.
## Configuring Global Defaults
The primary way to set defaults is using the `dspy.settings.configure` method. You typically do this once near the beginning of your script or application.
Let's set up a default LM and RM:
```python
import dspy
# 1. Create your LM and RM instances (as seen in Chapters 5 & 6)
# Example using OpenAI and a dummy RM
try:
# Assumes OPENAI_API_KEY is set
turbo = dspy.LM(model='openai/gpt-3.5-turbo-instruct', max_tokens=100)
except ImportError:
print("Note: dspy[openai] not installed. Using dummy LM.")
# Define a dummy LM if OpenAI isn't available
class DummyLM(dspy.LM):
def __init__(self): super().__init__(model="dummy")
def basic_request(self, prompt, **kwargs): return {"choices": [{"text": "Dummy LM Response"}]}
def __call__(self, prompt, **kwargs): return ["Dummy LM Response"]
turbo = DummyLM()
# Dummy RM for demonstration
class DummyRM(dspy.Retrieve):
def __init__(self, k=3): super().__init__(k=k)
def forward(self, query, k=None):
k = k if k is not None else self.k
return dspy.Prediction(passages=[f"Dummy passage {i+1} for '{query}'" for i in range(k)])
my_rm = DummyRM(k=3)
# 2. Configure dspy.settings with these instances
dspy.settings.configure(lm=turbo, rm=my_rm)
# That's it! Defaults are now set globally.
print(f"Default LM: {dspy.settings.lm}")
print(f"Default RM: {dspy.settings.rm}")
```
**Output (example):**
```text
Default LM: LM(model='openai/gpt-3.5-turbo-instruct', temperature=0.0, max_tokens=100, ...) # Or DummyLM
Default RM: Retrieve(k=3) # Or DummyRM
```
Now, any `dspy.Predict`, `dspy.ChainOfThought`, or `dspy.Retrieve` module created *after* this configuration will automatically use `turbo` as the LM and `my_rm` as the RM, unless told otherwise explicitly.
## How Modules Use the Settings
Modules like `dspy.Predict` and `dspy.Retrieve` are designed to look for their required components (LM or RM) in `dspy.settings` if they aren't provided directly.
Consider `dspy.Predict`:
```python
import dspy
# Assume settings were configured as above
# Create a Predict module WITHOUT passing 'lm' explicitly
simple_predictor = dspy.Predict('input -> output')
# When we call it, it will automatically use dspy.settings.lm
result = simple_predictor(input="Tell me a fact.")
print(result.output)
```
**Output (using DummyLM):**
```text
Dummy LM Response
```
Inside its `forward` method, `dspy.Predict` essentially does this (simplified):
```python
# Simplified internal logic of dspy.Predict.forward()
def forward(self, **kwargs):
# ... get signature, demos, config ...
# Get the LM: Use 'lm' passed in kwargs, OR self.lm (if set), OR dspy.settings.lm
lm_to_use = kwargs.pop("lm", self.lm) or dspy.settings.lm
assert lm_to_use is not None, "No LM configured!"
# ... format prompt using signature/demos/inputs ...
# ... call lm_to_use(prompt, ...) ...
# ... parse output ...
# ... return Prediction ...
```
Similarly, `dspy.Retrieve` looks for `dspy.settings.rm`:
```python
import dspy
# Assume settings were configured as above
# Create a Retrieve module WITHOUT passing 'rm' explicitly
retriever = dspy.Retrieve() # Uses default k=3 from DummyRM initialization
# When called, it uses dspy.settings.rm
results = retriever(query="DSPy benefits")
print(results.passages)
```
**Output (using DummyRM):**
```text
["Dummy passage 1 for 'DSPy benefits'", "Dummy passage 2 for 'DSPy benefits'", "Dummy passage 3 for 'DSPy benefits'"]
```
This automatic lookup makes your program code much cleaner, as you don't need to thread the `lm` and `rm` objects through every part of your application.
## Temporary Overrides with `dspy.context`
Sometimes, you might want to use a *different* LM or RM for just a specific part of your code, without changing the global default. For example, maybe you want to use a more powerful (and expensive) LM like GPT-4 for a critical reasoning step, while using a cheaper LM like GPT-3.5 for the rest of the program.
You can achieve this using the `dspy.settings.context` context manager. Changes made inside a `with dspy.settings.context(...)` block are **thread-local** and only last until the block exits.
```python
import dspy
# Assume global settings have 'turbo' (GPT-3.5 or Dummy) as the LM
# dspy.settings.configure(lm=turbo, rm=my_rm)
print(f"Outside context: {dspy.settings.lm}")
# Let's create a more powerful (dummy) LM for demonstration
class DummyGPT4(dspy.LM):
def __init__(self): super().__init__(model="dummy-gpt4")
def basic_request(self, prompt, **kwargs): return {"choices": [{"text": "GPT-4 Dummy Response"}]}
def __call__(self, prompt, **kwargs): return ["GPT-4 Dummy Response"]
gpt4_dummy = DummyGPT4()
# Use dspy.context to temporarily switch the LM
with dspy.settings.context(lm=gpt4_dummy, rm=None): # Temporarily set lm, unset rm
print(f"Inside context: {dspy.settings.lm}")
print(f"Inside context (RM): {dspy.settings.rm}")
# Modules used inside this block will use the temporary settings
predictor_in_context = dspy.Predict('input -> output')
result_in_context = predictor_in_context(input="Complex reasoning task")
print(f"Prediction in context: {result_in_context.output}")
# Trying to use RM here would fail as it's None in this context
# retriever_in_context = dspy.Retrieve()
# retriever_in_context(query="something") # This would raise an error
# Settings revert back automatically outside the block
print(f"Outside context again: {dspy.settings.lm}")
print(f"Outside context again (RM): {dspy.settings.rm}")
```
**Output (example):**
```text
Outside context: LM(model='openai/gpt-3.5-turbo-instruct', ...) # Or DummyLM
Inside context: LM(model='dummy-gpt4', ...)
Inside context (RM): None
Prediction in context: GPT-4 Dummy Response
Outside context again: LM(model='openai/gpt-3.5-turbo-instruct', ...) # Or DummyLM
Outside context again (RM): Retrieve(k=3) # Or DummyRM
```
Inside the `with` block, `dspy.settings.lm` temporarily pointed to `gpt4_dummy`, and `dspy.settings.rm` was temporarily `None`. The `predictor_in_context` used the temporary LM. Once the block ended, the settings automatically reverted to the global defaults.
This is crucial for writing clean code where different parts might need different configurations, and also essential for how DSPy's optimizers ([Chapter 8: Teleprompter / Optimizer](08_teleprompter___optimizer.md)) work internally to manage different model configurations during optimization.
## How It Works Under the Hood
`dspy.settings` uses a combination of global variables and thread-local storage to manage configurations.
1. **Global Defaults:** There's a primary configuration dictionary (`main_thread_config`) that holds the settings configured by `dspy.settings.configure()`.
2. **Ownership:** To prevent race conditions in multi-threaded applications, only the *first* thread that calls `configure` becomes the "owner" and is allowed to make further global changes using `configure`.
3. **Thread-Local Overrides:** `dspy.settings.context()` uses Python's `threading.local` storage. When you enter a `with dspy.settings.context(...)` block, it stores the specified overrides (`lm=gpt4_dummy`, etc.) in a place specific to the current thread.
4. **Attribute Access:** When code accesses `dspy.settings.lm`, the `Settings` object first checks if there's an override for `lm` in the current thread's local storage.
* If yes, it returns the thread-local override.
* If no, it returns the value from the global `main_thread_config`.
5. **Context Exit:** When the `with` block finishes, the `context` manager restores the thread-local storage to its state *before* the block was entered, effectively removing the temporary overrides for that thread.
**Sequence Diagram: Module Accessing Settings**
```mermaid
sequenceDiagram
participant User
participant Module as Your Module (e.g., Predict)
participant Settings as dspy.settings
participant ThreadLocalStorage as Thread-Local Storage
participant GlobalConfig as Global Defaults
User->>Module: Call module(input=...)
Module->>Settings: Get configured lm (`settings.lm`)
Settings->>ThreadLocalStorage: Check for 'lm' override?
alt Override Exists
ThreadLocalStorage-->>Settings: Return thread-local lm
Settings-->>Module: Return thread-local lm
else No Override
ThreadLocalStorage-->>Settings: No override found
Settings->>GlobalConfig: Get global 'lm'
GlobalConfig-->>Settings: Return global lm
Settings-->>Module: Return global lm
end
Module->>Module: Use the returned lm for processing...
Module-->>User: Return result
```
This mechanism ensures that global settings are the default, but thread-specific overrides via `dspy.context` take precedence when active, providing both convenience and flexibility.
**Relevant Code Files:**
* `dspy/dsp/utils/settings.py`: Defines the `Settings` class, the `DEFAULT_CONFIG`, manages global state (`main_thread_config`, `config_owner_thread_id`), uses `threading.local` for overrides, and implements the `configure` method and the `context` context manager.
```python
# Simplified view from dspy/dsp/utils/settings.py
import copy
import threading
from contextlib import contextmanager
# from dspy.dsp.utils.utils import dotdict # Simplified as dict
DEFAULT_CONFIG = dict(lm=None, rm=None, adapter=None, ...) # Default values
# Global state
main_thread_config = copy.deepcopy(DEFAULT_CONFIG)
config_owner_thread_id = None
global_lock = threading.Lock()
# Thread-local storage for overrides
class ThreadLocalOverrides(threading.local):
def __init__(self):
self.overrides = {}
thread_local_overrides = ThreadLocalOverrides()
class Settings:
_instance = None
def __new__(cls): # Singleton pattern
if cls._instance is None: cls._instance = super().__new__(cls)
return cls._instance
# When you access settings.lm or settings['lm']
def __getattr__(self, name):
# Check thread-local overrides first
overrides = getattr(thread_local_overrides, "overrides", {})
if name in overrides: return overrides[name]
# Fall back to global config
elif name in main_thread_config: return main_thread_config[name]
else: raise AttributeError(f"'Settings' object has no attribute '{name}'")
def __getitem__(self, key): return self.__getattr__(key)
# dspy.settings.configure(...)
def configure(self, **kwargs):
global main_thread_config, config_owner_thread_id
current_thread_id = threading.get_ident()
with global_lock: # Ensure thread safety for configuration
if config_owner_thread_id is None: config_owner_thread_id = current_thread_id
elif config_owner_thread_id != current_thread_id:
raise RuntimeError("dspy.settings can only be changed by the thread that initially configured it.")
# Update global config
for k, v in kwargs.items(): main_thread_config[k] = v
# with dspy.settings.context(...)
@contextmanager
def context(self, **kwargs):
# Save current overrides
original_overrides = getattr(thread_local_overrides, "overrides", {}).copy()
# Create new overrides for this context (combining global + old local + new)
new_overrides = {**main_thread_config, **original_overrides, **kwargs}
# Apply new overrides to thread-local storage
thread_local_overrides.overrides = new_overrides
try:
yield # Code inside the 'with' block runs here
finally:
# Restore original overrides when exiting the block
thread_local_overrides.overrides = original_overrides
# The global instance you use
settings = Settings()
```
This structure elegantly handles both global defaults and safe, temporary, thread-specific overrides.
## Conclusion
Congratulations! You've reached the end of this introductory DSPy tutorial and learned about `dspy.settings`, the central control panel.
* `dspy.settings` holds **global default configurations** like the [LM](05_lm__language_model_client_.md), [RM](06_rm__retrieval_model_client_.md), and [Adapter](09_adapter.md).
* You configure it **once** using `dspy.settings.configure(lm=..., rm=...)`.
* DSPy modules like `dspy.Predict` and `dspy.Retrieve` automatically **use these defaults**, simplifying your code.
* `dspy.context` allows for **temporary, thread-local overrides**, providing flexibility without affecting the global state.
By mastering these 10 chapters, you've gained a solid foundation in the core concepts of DSPy:
1. Structuring programs with [Modules and Programs](01_module___program.md).
2. Defining tasks with [Signatures](02_signature.md).
3. Representing data with [Examples](03_example.md).
4. Making basic LM calls with [Predict](04_predict.md).
5. Connecting to AI brains with [LM Clients](05_lm__language_model_client_.md).
6. Accessing external knowledge with [RM Clients](06_rm__retrieval_model_client_.md).
7. Measuring performance with [Evaluate](07_evaluate.md).
8. Automating optimization with [Teleprompters](08_teleprompter___optimizer.md).
9. Ensuring compatibility with [Adapters](09_adapter.md).
10. Managing configuration with [Settings](10_settings.md).
You're now equipped to start building, evaluating, and optimizing your own sophisticated language model pipelines with DSPy. Happy programming!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

58
docs/DSPy/index.md Normal file
View File

@@ -0,0 +1,58 @@
# Tutorial: DSPy
DSPy helps you build and optimize *programs* that use **Language Models (LMs)** and **Retrieval Models (RMs)**.
Think of it like composing Lego bricks (**Modules**) where each brick performs a specific task (like generating text or retrieving information).
**Signatures** define what each Module does (its inputs and outputs), and **Teleprompters** automatically tune these modules (like optimizing prompts or examples) to get the best performance on your data.
**Source Repository:** [https://github.com/stanfordnlp/dspy/tree/7cdfe988e6404289b896d946d957f17bb4d9129b/dspy](https://github.com/stanfordnlp/dspy/tree/7cdfe988e6404289b896d946d957f17bb4d9129b/dspy)
```mermaid
flowchart TD
A0["Module / Program"]
A1["Signature"]
A2["Predict"]
A3["LM (Language Model Client)"]
A4["RM (Retrieval Model Client)"]
A5["Teleprompter / Optimizer"]
A6["Example"]
A7["Evaluate"]
A8["Adapter"]
A9["Settings"]
A0 -- "Contains / Composes" --> A0
A0 -- "Uses (via Retrieve)" --> A4
A1 -- "Defines structure for" --> A6
A2 -- "Implements" --> A1
A2 -- "Calls" --> A3
A2 -- "Uses demos from" --> A6
A2 -- "Formats prompts using" --> A8
A5 -- "Optimizes" --> A0
A5 -- "Fine-tunes" --> A3
A5 -- "Uses training data from" --> A6
A5 -- "Uses metric from" --> A7
A7 -- "Tests" --> A0
A7 -- "Evaluates on dataset of" --> A6
A8 -- "Translates" --> A1
A8 -- "Formats demos from" --> A6
A9 -- "Configures default" --> A3
A9 -- "Configures default" --> A4
A9 -- "Configures default" --> A8
```
## Chapters
1. [Module / Program](01_module___program.md)
2. [Signature](02_signature.md)
3. [Example](03_example.md)
4. [Predict](04_predict.md)
5. [LM (Language Model Client)](05_lm__language_model_client_.md)
6. [RM (Retrieval Model Client)](06_rm__retrieval_model_client_.md)
7. [Evaluate](07_evaluate.md)
8. [Teleprompter / Optimizer](08_teleprompter___optimizer.md)
9. [Adapter](09_adapter.md)
10. [Settings](10_settings.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)