init push

This commit is contained in:
zachary62
2025-04-04 13:03:54 -04:00
parent e62ee2cb13
commit 2ebad5e5f2
160 changed files with 2 additions and 0 deletions

View File

@@ -0,0 +1,296 @@
# Chapter 5: ChatCompletionClient - Talking to the Brains
So far, we've learned about:
* [Agents](01_agent.md): The workers in our system.
* [Messaging](02_messaging_system__topic___subscription_.md): How agents communicate broadly.
* [AgentRuntime](03_agentruntime.md): The manager that runs the show.
* [Tools](04_tool.md): How agents get specific skills.
But how does an agent actually *think* or *generate text*? Many powerful agents rely on Large Language Models (LLMs) think of models like GPT-4, Claude, or Gemini as their "brains". How does an agent in AutoGen Core communicate with these external LLM services?
This is where the **`ChatCompletionClient`** comes in. It's the dedicated component for talking to LLMs.
## Motivation: Bridging the Gap to LLMs
Imagine you want to build an agent that can summarize long articles.
1. You give the agent an article (as a message).
2. The agent needs to send this article to an LLM (like GPT-4).
3. It also needs to tell the LLM: "Please summarize this."
4. The LLM processes the request and generates a summary.
5. The agent needs to receive this summary back from the LLM.
How does the agent handle the technical details of connecting to the LLM's specific API, formatting the request correctly, sending it over the internet, and understanding the response?
The `ChatCompletionClient` solves this! Think of it as the **standard phone line and translator** connecting your agent to the LLM service. You tell the client *what* to say (the conversation history and instructions), and it handles *how* to say it to the specific LLM and translates the LLM's reply back into a standard format.
## Key Concepts: Understanding the LLM Communicator
Let's break down the `ChatCompletionClient`:
1. **LLM Communication Bridge:** It's the primary way AutoGen agents interact with external LLM APIs (like OpenAI, Anthropic, Google Gemini, etc.). It hides the complexity of specific API calls.
2. **Standard Interface (`create` method):** It defines a common way to send requests and receive responses, regardless of the underlying LLM. The core method is `create`. You give it:
* `messages`: A list of messages representing the conversation history so far.
* Optional `tools`: A list of tools ([Chapter 4](04_tool.md)) the LLM might be able to use.
* Other parameters (like `json_output` hints, `cancellation_token`).
3. **Messages (`LLMMessage`):** The conversation history is passed as a sequence of specific message types defined in `autogen_core.models`:
* `SystemMessage`: Instructions for the LLM (e.g., "You are a helpful assistant.").
* `UserMessage`: Input from the user or another agent (e.g., the article text).
* `AssistantMessage`: Previous responses from the LLM (can include text or requests to call functions/tools).
* `FunctionExecutionResultMessage`: The results of executing a tool/function call.
4. **Tools (`ToolSchema`):** You can provide the schemas of available tools ([Chapter 4](04_tool.md)). The LLM might then respond not with text, but with a request to call one of these tools (`FunctionCall` inside an `AssistantMessage`).
5. **Response (`CreateResult`):** The `create` method returns a standard `CreateResult` object containing:
* `content`: The LLM's generated text or a list of `FunctionCall` requests.
* `finish_reason`: Why the LLM stopped generating (e.g., "stop", "length", "function_calls").
* `usage`: How many input (`prompt_tokens`) and output (`completion_tokens`) tokens were used.
* `cached`: Whether the response came from a cache.
6. **Token Tracking:** The client automatically tracks token usage (`prompt_tokens`, `completion_tokens`) for each call. You can query the total usage via methods like `total_usage()`. This is vital for monitoring costs, as most LLM APIs charge based on tokens.
## Use Case Example: Summarizing Text with an LLM
Let's build a simplified scenario where we use a `ChatCompletionClient` to ask an LLM to summarize text.
**Goal:** Send text to an LLM via a client and get a summary back.
**Step 1: Prepare the Input Messages**
We need to structure our request as a list of `LLMMessage` objects.
```python
# File: prepare_messages.py
from autogen_core.models import SystemMessage, UserMessage
# Instructions for the LLM
system_prompt = SystemMessage(
content="You are a helpful assistant designed to summarize text concisely."
)
# The text we want to summarize
article_text = """
AutoGen is a framework that enables the development of LLM applications using multiple agents
that can converse with each other to solve tasks. AutoGen agents are customizable,
conversable, and can seamlessly allow human participation. They can operate in various modes
that employ combinations of LLMs, human inputs, and tools.
"""
user_request = UserMessage(
content=f"Please summarize the following text in one sentence:\n\n{article_text}",
source="User" # Indicate who provided this input
)
# Combine into a list for the client
messages_to_send = [system_prompt, user_request]
print("Messages prepared:")
for msg in messages_to_send:
print(f"- {msg.type}: {msg.content[:50]}...") # Print first 50 chars
```
This code defines the instructions (`SystemMessage`) and the user's request (`UserMessage`) and puts them in a list, ready to be sent.
**Step 2: Use the ChatCompletionClient (Conceptual)**
Now, we need an instance of a `ChatCompletionClient`. In a real application, you'd configure a specific client (like `OpenAIChatCompletionClient` with your API key). For this example, let's imagine we have a pre-configured client called `llm_client`.
```python
# File: call_llm_client.py
import asyncio
from autogen_core.models import CreateResult, RequestUsage
# Assume 'messages_to_send' is from the previous step
# Assume 'llm_client' is a pre-configured ChatCompletionClient instance
# (e.g., llm_client = OpenAIChatCompletionClient(config=...))
async def get_summary(client, messages):
print("\nSending messages to LLM via ChatCompletionClient...")
try:
# The core call: send messages, get structured result
response: CreateResult = await client.create(
messages=messages,
# We aren't providing tools in this simple example
tools=[]
)
print("Received response:")
print(f"- Finish Reason: {response.finish_reason}")
print(f"- Content: {response.content}") # This should be the summary
print(f"- Usage (Tokens): Prompt={response.usage.prompt_tokens}, Completion={response.usage.completion_tokens}")
print(f"- Cached: {response.cached}")
# Also, check total usage tracked by the client
total_usage = client.total_usage()
print(f"\nClient Total Usage: Prompt={total_usage.prompt_tokens}, Completion={total_usage.completion_tokens}")
except Exception as e:
print(f"An error occurred: {e}")
# --- Placeholder for actual client ---
class MockChatCompletionClient: # Simulate a real client
_total_usage = RequestUsage(prompt_tokens=0, completion_tokens=0)
async def create(self, messages, tools=[], **kwargs) -> CreateResult:
# Simulate API call and response
prompt_len = sum(len(str(m.content)) for m in messages) // 4 # Rough token estimate
summary = "AutoGen is a multi-agent framework for developing LLM applications."
completion_len = len(summary) // 4 # Rough token estimate
usage = RequestUsage(prompt_tokens=prompt_len, completion_tokens=completion_len)
self._total_usage.prompt_tokens += usage.prompt_tokens
self._total_usage.completion_tokens += usage.completion_tokens
return CreateResult(
finish_reason="stop", content=summary, usage=usage, cached=False
)
def total_usage(self) -> RequestUsage: return self._total_usage
# Other required methods (count_tokens, model_info etc.) omitted for brevity
async def main():
from prepare_messages import messages_to_send # Get messages from previous step
mock_client = MockChatCompletionClient()
await get_summary(mock_client, messages_to_send)
# asyncio.run(main()) # If you run this, it uses the mock client
```
This code shows the essential `client.create(...)` call. We pass our `messages_to_send` and receive a `CreateResult`. We then print the summary (`response.content`) and the token usage reported for that specific call (`response.usage`) and the total tracked by the client (`client.total_usage()`).
**How an Agent Uses It:**
Typically, an agent's logic (e.g., inside its `on_message` handler) would:
1. Receive an incoming message (like the article to summarize).
2. Prepare the list of `LLMMessage` objects (including system prompts, history, and the new request).
3. Access a `ChatCompletionClient` instance (often provided during agent setup or accessed via its context).
4. Call `await client.create(...)`.
5. Process the `CreateResult` (e.g., extract the summary text, check for function calls if tools were provided).
6. Potentially send the result as a new message to another agent or return it.
## Under the Hood: How the Client Talks to the LLM
What happens when you call `await client.create(...)`?
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant Agent as Agent Logic
participant Client as ChatCompletionClient
participant Formatter as API Formatter
participant HTTP as HTTP Client
participant LLM_API as External LLM API
Agent->>+Client: create(messages, tools)
Client->>+Formatter: Format messages & tools for specific API (e.g., OpenAI JSON format)
Formatter-->>-Client: Return formatted request body
Client->>+HTTP: Send POST request to LLM API endpoint with formatted body & API Key
HTTP->>+LLM_API: Transmit request over network
LLM_API->>LLM_API: Process request, generate completion/function call
LLM_API-->>-HTTP: Return API response (e.g., JSON)
HTTP-->>-Client: Receive HTTP response
Client->>+Formatter: Parse API response (extract content, usage, finish_reason)
Formatter-->>-Client: Return parsed data
Client->>Client: Create standard CreateResult object
Client-->>-Agent: Return CreateResult
```
1. **Prepare:** The `ChatCompletionClient` takes the standard `LLMMessage` list and `ToolSchema` list.
2. **Format:** It translates these into the specific format required by the target LLM's API (e.g., the JSON structure expected by OpenAI's `/chat/completions` endpoint). This might involve renaming roles (like `SystemMessage` to `system`), formatting tool descriptions, etc.
3. **Request:** It uses an underlying HTTP client to send a network request (usually a POST request) to the LLM service's API endpoint, including the formatted data and authentication (like an API key).
4. **Wait & Receive:** It waits for the LLM service to process the request and send back a response over the network.
5. **Parse:** It receives the raw HTTP response (usually JSON) from the API.
6. **Standardize:** It parses this specific API response, extracting the generated text or function calls, token usage figures, finish reason, etc.
7. **Return:** It packages all this information into a standard `CreateResult` object and returns it to the calling agent code.
**Code Glimpse:**
* **`ChatCompletionClient` Protocol (`models/_model_client.py`):** This is the abstract base class (or protocol) defining the *contract* that all specific clients must follow.
```python
# From: models/_model_client.py (Simplified ABC)
from abc import ABC, abstractmethod
from typing import Sequence, Optional, Mapping, Any, AsyncGenerator, Union
from ._types import LLMMessage, CreateResult, RequestUsage
from ..tools import Tool, ToolSchema
from .. import CancellationToken
class ChatCompletionClient(ABC):
@abstractmethod
async def create(
self, messages: Sequence[LLMMessage], *,
tools: Sequence[Tool | ToolSchema] = [],
json_output: Optional[bool] = None, # Hint for JSON mode
extra_create_args: Mapping[str, Any] = {}, # API-specific args
cancellation_token: Optional[CancellationToken] = None,
) -> CreateResult: ... # The core method
@abstractmethod
def create_stream(
self, # Similar to create, but yields results incrementally
# ... parameters ...
) -> AsyncGenerator[Union[str, CreateResult], None]: ...
@abstractmethod
def total_usage(self) -> RequestUsage: ... # Get total tracked usage
@abstractmethod
def count_tokens(self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = []) -> int: ... # Estimate token count
# Other methods like close(), actual_usage(), remaining_tokens(), model_info...
```
Concrete classes like `OpenAIChatCompletionClient`, `AnthropicChatCompletionClient` etc., implement these methods using the specific libraries and API calls for each service.
* **`LLMMessage` Types (`models/_types.py`):** These define the structure of messages passed *to* the client.
```python
# From: models/_types.py (Simplified)
from pydantic import BaseModel
from typing import List, Union, Literal
from .. import FunctionCall # From Chapter 4 context
class SystemMessage(BaseModel):
content: str
type: Literal["SystemMessage"] = "SystemMessage"
class UserMessage(BaseModel):
content: Union[str, List[Union[str, Image]]] # Can include images!
source: str
type: Literal["UserMessage"] = "UserMessage"
class AssistantMessage(BaseModel):
content: Union[str, List[FunctionCall]] # Can be text or function calls
source: str
type: Literal["AssistantMessage"] = "AssistantMessage"
# FunctionExecutionResultMessage also exists here...
```
* **`CreateResult` (`models/_types.py`):** This defines the structure of the response *from* the client.
```python
# From: models/_types.py (Simplified)
from pydantic import BaseModel
from dataclasses import dataclass
from typing import Union, List, Optional
from .. import FunctionCall
@dataclass
class RequestUsage:
prompt_tokens: int
completion_tokens: int
FinishReasons = Literal["stop", "length", "function_calls", "content_filter", "unknown"]
class CreateResult(BaseModel):
finish_reason: FinishReasons
content: Union[str, List[FunctionCall]] # LLM output
usage: RequestUsage # Token usage for this call
cached: bool
# Optional fields like logprobs, thought...
```
Using these standard types ensures that agent logic can work consistently, even if you switch the underlying LLM service by using a different `ChatCompletionClient` implementation.
## Next Steps
You now understand the role of `ChatCompletionClient` as the crucial link between AutoGen agents and the powerful capabilities of Large Language Models. It provides a standard way to send conversational history and tool definitions, receive generated text or function call requests, and track token usage.
Managing the conversation history (`messages`) sent to the client is very important. How do you ensure the LLM has the right context, especially after tool calls have happened?
* [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md): Learn how AutoGen helps manage the conversation history, including adding tool call requests and their results, before sending it to the `ChatCompletionClient`.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)