mirror of
https://github.com/aljazceru/Tutorial-Codebase-Knowledge.git
synced 2025-12-19 07:24:20 +01:00
init push
This commit is contained in:
296
docs/AutoGen Core/05_chatcompletionclient.md
Normal file
296
docs/AutoGen Core/05_chatcompletionclient.md
Normal file
@@ -0,0 +1,296 @@
|
||||
# Chapter 5: ChatCompletionClient - Talking to the Brains
|
||||
|
||||
So far, we've learned about:
|
||||
* [Agents](01_agent.md): The workers in our system.
|
||||
* [Messaging](02_messaging_system__topic___subscription_.md): How agents communicate broadly.
|
||||
* [AgentRuntime](03_agentruntime.md): The manager that runs the show.
|
||||
* [Tools](04_tool.md): How agents get specific skills.
|
||||
|
||||
But how does an agent actually *think* or *generate text*? Many powerful agents rely on Large Language Models (LLMs) – think of models like GPT-4, Claude, or Gemini – as their "brains". How does an agent in AutoGen Core communicate with these external LLM services?
|
||||
|
||||
This is where the **`ChatCompletionClient`** comes in. It's the dedicated component for talking to LLMs.
|
||||
|
||||
## Motivation: Bridging the Gap to LLMs
|
||||
|
||||
Imagine you want to build an agent that can summarize long articles.
|
||||
1. You give the agent an article (as a message).
|
||||
2. The agent needs to send this article to an LLM (like GPT-4).
|
||||
3. It also needs to tell the LLM: "Please summarize this."
|
||||
4. The LLM processes the request and generates a summary.
|
||||
5. The agent needs to receive this summary back from the LLM.
|
||||
|
||||
How does the agent handle the technical details of connecting to the LLM's specific API, formatting the request correctly, sending it over the internet, and understanding the response?
|
||||
|
||||
The `ChatCompletionClient` solves this! Think of it as the **standard phone line and translator** connecting your agent to the LLM service. You tell the client *what* to say (the conversation history and instructions), and it handles *how* to say it to the specific LLM and translates the LLM's reply back into a standard format.
|
||||
|
||||
## Key Concepts: Understanding the LLM Communicator
|
||||
|
||||
Let's break down the `ChatCompletionClient`:
|
||||
|
||||
1. **LLM Communication Bridge:** It's the primary way AutoGen agents interact with external LLM APIs (like OpenAI, Anthropic, Google Gemini, etc.). It hides the complexity of specific API calls.
|
||||
|
||||
2. **Standard Interface (`create` method):** It defines a common way to send requests and receive responses, regardless of the underlying LLM. The core method is `create`. You give it:
|
||||
* `messages`: A list of messages representing the conversation history so far.
|
||||
* Optional `tools`: A list of tools ([Chapter 4](04_tool.md)) the LLM might be able to use.
|
||||
* Other parameters (like `json_output` hints, `cancellation_token`).
|
||||
|
||||
3. **Messages (`LLMMessage`):** The conversation history is passed as a sequence of specific message types defined in `autogen_core.models`:
|
||||
* `SystemMessage`: Instructions for the LLM (e.g., "You are a helpful assistant.").
|
||||
* `UserMessage`: Input from the user or another agent (e.g., the article text).
|
||||
* `AssistantMessage`: Previous responses from the LLM (can include text or requests to call functions/tools).
|
||||
* `FunctionExecutionResultMessage`: The results of executing a tool/function call.
|
||||
|
||||
4. **Tools (`ToolSchema`):** You can provide the schemas of available tools ([Chapter 4](04_tool.md)). The LLM might then respond not with text, but with a request to call one of these tools (`FunctionCall` inside an `AssistantMessage`).
|
||||
|
||||
5. **Response (`CreateResult`):** The `create` method returns a standard `CreateResult` object containing:
|
||||
* `content`: The LLM's generated text or a list of `FunctionCall` requests.
|
||||
* `finish_reason`: Why the LLM stopped generating (e.g., "stop", "length", "function_calls").
|
||||
* `usage`: How many input (`prompt_tokens`) and output (`completion_tokens`) tokens were used.
|
||||
* `cached`: Whether the response came from a cache.
|
||||
|
||||
6. **Token Tracking:** The client automatically tracks token usage (`prompt_tokens`, `completion_tokens`) for each call. You can query the total usage via methods like `total_usage()`. This is vital for monitoring costs, as most LLM APIs charge based on tokens.
|
||||
|
||||
## Use Case Example: Summarizing Text with an LLM
|
||||
|
||||
Let's build a simplified scenario where we use a `ChatCompletionClient` to ask an LLM to summarize text.
|
||||
|
||||
**Goal:** Send text to an LLM via a client and get a summary back.
|
||||
|
||||
**Step 1: Prepare the Input Messages**
|
||||
|
||||
We need to structure our request as a list of `LLMMessage` objects.
|
||||
|
||||
```python
|
||||
# File: prepare_messages.py
|
||||
from autogen_core.models import SystemMessage, UserMessage
|
||||
|
||||
# Instructions for the LLM
|
||||
system_prompt = SystemMessage(
|
||||
content="You are a helpful assistant designed to summarize text concisely."
|
||||
)
|
||||
|
||||
# The text we want to summarize
|
||||
article_text = """
|
||||
AutoGen is a framework that enables the development of LLM applications using multiple agents
|
||||
that can converse with each other to solve tasks. AutoGen agents are customizable,
|
||||
conversable, and can seamlessly allow human participation. They can operate in various modes
|
||||
that employ combinations of LLMs, human inputs, and tools.
|
||||
"""
|
||||
user_request = UserMessage(
|
||||
content=f"Please summarize the following text in one sentence:\n\n{article_text}",
|
||||
source="User" # Indicate who provided this input
|
||||
)
|
||||
|
||||
# Combine into a list for the client
|
||||
messages_to_send = [system_prompt, user_request]
|
||||
|
||||
print("Messages prepared:")
|
||||
for msg in messages_to_send:
|
||||
print(f"- {msg.type}: {msg.content[:50]}...") # Print first 50 chars
|
||||
```
|
||||
This code defines the instructions (`SystemMessage`) and the user's request (`UserMessage`) and puts them in a list, ready to be sent.
|
||||
|
||||
**Step 2: Use the ChatCompletionClient (Conceptual)**
|
||||
|
||||
Now, we need an instance of a `ChatCompletionClient`. In a real application, you'd configure a specific client (like `OpenAIChatCompletionClient` with your API key). For this example, let's imagine we have a pre-configured client called `llm_client`.
|
||||
|
||||
```python
|
||||
# File: call_llm_client.py
|
||||
import asyncio
|
||||
from autogen_core.models import CreateResult, RequestUsage
|
||||
# Assume 'messages_to_send' is from the previous step
|
||||
# Assume 'llm_client' is a pre-configured ChatCompletionClient instance
|
||||
# (e.g., llm_client = OpenAIChatCompletionClient(config=...))
|
||||
|
||||
async def get_summary(client, messages):
|
||||
print("\nSending messages to LLM via ChatCompletionClient...")
|
||||
try:
|
||||
# The core call: send messages, get structured result
|
||||
response: CreateResult = await client.create(
|
||||
messages=messages,
|
||||
# We aren't providing tools in this simple example
|
||||
tools=[]
|
||||
)
|
||||
print("Received response:")
|
||||
print(f"- Finish Reason: {response.finish_reason}")
|
||||
print(f"- Content: {response.content}") # This should be the summary
|
||||
print(f"- Usage (Tokens): Prompt={response.usage.prompt_tokens}, Completion={response.usage.completion_tokens}")
|
||||
print(f"- Cached: {response.cached}")
|
||||
|
||||
# Also, check total usage tracked by the client
|
||||
total_usage = client.total_usage()
|
||||
print(f"\nClient Total Usage: Prompt={total_usage.prompt_tokens}, Completion={total_usage.completion_tokens}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"An error occurred: {e}")
|
||||
|
||||
# --- Placeholder for actual client ---
|
||||
class MockChatCompletionClient: # Simulate a real client
|
||||
_total_usage = RequestUsage(prompt_tokens=0, completion_tokens=0)
|
||||
async def create(self, messages, tools=[], **kwargs) -> CreateResult:
|
||||
# Simulate API call and response
|
||||
prompt_len = sum(len(str(m.content)) for m in messages) // 4 # Rough token estimate
|
||||
summary = "AutoGen is a multi-agent framework for developing LLM applications."
|
||||
completion_len = len(summary) // 4 # Rough token estimate
|
||||
usage = RequestUsage(prompt_tokens=prompt_len, completion_tokens=completion_len)
|
||||
self._total_usage.prompt_tokens += usage.prompt_tokens
|
||||
self._total_usage.completion_tokens += usage.completion_tokens
|
||||
return CreateResult(
|
||||
finish_reason="stop", content=summary, usage=usage, cached=False
|
||||
)
|
||||
def total_usage(self) -> RequestUsage: return self._total_usage
|
||||
# Other required methods (count_tokens, model_info etc.) omitted for brevity
|
||||
|
||||
async def main():
|
||||
from prepare_messages import messages_to_send # Get messages from previous step
|
||||
mock_client = MockChatCompletionClient()
|
||||
await get_summary(mock_client, messages_to_send)
|
||||
|
||||
# asyncio.run(main()) # If you run this, it uses the mock client
|
||||
```
|
||||
This code shows the essential `client.create(...)` call. We pass our `messages_to_send` and receive a `CreateResult`. We then print the summary (`response.content`) and the token usage reported for that specific call (`response.usage`) and the total tracked by the client (`client.total_usage()`).
|
||||
|
||||
**How an Agent Uses It:**
|
||||
Typically, an agent's logic (e.g., inside its `on_message` handler) would:
|
||||
1. Receive an incoming message (like the article to summarize).
|
||||
2. Prepare the list of `LLMMessage` objects (including system prompts, history, and the new request).
|
||||
3. Access a `ChatCompletionClient` instance (often provided during agent setup or accessed via its context).
|
||||
4. Call `await client.create(...)`.
|
||||
5. Process the `CreateResult` (e.g., extract the summary text, check for function calls if tools were provided).
|
||||
6. Potentially send the result as a new message to another agent or return it.
|
||||
|
||||
## Under the Hood: How the Client Talks to the LLM
|
||||
|
||||
What happens when you call `await client.create(...)`?
|
||||
|
||||
**Conceptual Flow:**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Agent as Agent Logic
|
||||
participant Client as ChatCompletionClient
|
||||
participant Formatter as API Formatter
|
||||
participant HTTP as HTTP Client
|
||||
participant LLM_API as External LLM API
|
||||
|
||||
Agent->>+Client: create(messages, tools)
|
||||
Client->>+Formatter: Format messages & tools for specific API (e.g., OpenAI JSON format)
|
||||
Formatter-->>-Client: Return formatted request body
|
||||
Client->>+HTTP: Send POST request to LLM API endpoint with formatted body & API Key
|
||||
HTTP->>+LLM_API: Transmit request over network
|
||||
LLM_API->>LLM_API: Process request, generate completion/function call
|
||||
LLM_API-->>-HTTP: Return API response (e.g., JSON)
|
||||
HTTP-->>-Client: Receive HTTP response
|
||||
Client->>+Formatter: Parse API response (extract content, usage, finish_reason)
|
||||
Formatter-->>-Client: Return parsed data
|
||||
Client->>Client: Create standard CreateResult object
|
||||
Client-->>-Agent: Return CreateResult
|
||||
```
|
||||
|
||||
1. **Prepare:** The `ChatCompletionClient` takes the standard `LLMMessage` list and `ToolSchema` list.
|
||||
2. **Format:** It translates these into the specific format required by the target LLM's API (e.g., the JSON structure expected by OpenAI's `/chat/completions` endpoint). This might involve renaming roles (like `SystemMessage` to `system`), formatting tool descriptions, etc.
|
||||
3. **Request:** It uses an underlying HTTP client to send a network request (usually a POST request) to the LLM service's API endpoint, including the formatted data and authentication (like an API key).
|
||||
4. **Wait & Receive:** It waits for the LLM service to process the request and send back a response over the network.
|
||||
5. **Parse:** It receives the raw HTTP response (usually JSON) from the API.
|
||||
6. **Standardize:** It parses this specific API response, extracting the generated text or function calls, token usage figures, finish reason, etc.
|
||||
7. **Return:** It packages all this information into a standard `CreateResult` object and returns it to the calling agent code.
|
||||
|
||||
**Code Glimpse:**
|
||||
|
||||
* **`ChatCompletionClient` Protocol (`models/_model_client.py`):** This is the abstract base class (or protocol) defining the *contract* that all specific clients must follow.
|
||||
|
||||
```python
|
||||
# From: models/_model_client.py (Simplified ABC)
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Sequence, Optional, Mapping, Any, AsyncGenerator, Union
|
||||
from ._types import LLMMessage, CreateResult, RequestUsage
|
||||
from ..tools import Tool, ToolSchema
|
||||
from .. import CancellationToken
|
||||
|
||||
class ChatCompletionClient(ABC):
|
||||
@abstractmethod
|
||||
async def create(
|
||||
self, messages: Sequence[LLMMessage], *,
|
||||
tools: Sequence[Tool | ToolSchema] = [],
|
||||
json_output: Optional[bool] = None, # Hint for JSON mode
|
||||
extra_create_args: Mapping[str, Any] = {}, # API-specific args
|
||||
cancellation_token: Optional[CancellationToken] = None,
|
||||
) -> CreateResult: ... # The core method
|
||||
|
||||
@abstractmethod
|
||||
def create_stream(
|
||||
self, # Similar to create, but yields results incrementally
|
||||
# ... parameters ...
|
||||
) -> AsyncGenerator[Union[str, CreateResult], None]: ...
|
||||
|
||||
@abstractmethod
|
||||
def total_usage(self) -> RequestUsage: ... # Get total tracked usage
|
||||
|
||||
@abstractmethod
|
||||
def count_tokens(self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = []) -> int: ... # Estimate token count
|
||||
|
||||
# Other methods like close(), actual_usage(), remaining_tokens(), model_info...
|
||||
```
|
||||
Concrete classes like `OpenAIChatCompletionClient`, `AnthropicChatCompletionClient` etc., implement these methods using the specific libraries and API calls for each service.
|
||||
|
||||
* **`LLMMessage` Types (`models/_types.py`):** These define the structure of messages passed *to* the client.
|
||||
|
||||
```python
|
||||
# From: models/_types.py (Simplified)
|
||||
from pydantic import BaseModel
|
||||
from typing import List, Union, Literal
|
||||
from .. import FunctionCall # From Chapter 4 context
|
||||
|
||||
class SystemMessage(BaseModel):
|
||||
content: str
|
||||
type: Literal["SystemMessage"] = "SystemMessage"
|
||||
|
||||
class UserMessage(BaseModel):
|
||||
content: Union[str, List[Union[str, Image]]] # Can include images!
|
||||
source: str
|
||||
type: Literal["UserMessage"] = "UserMessage"
|
||||
|
||||
class AssistantMessage(BaseModel):
|
||||
content: Union[str, List[FunctionCall]] # Can be text or function calls
|
||||
source: str
|
||||
type: Literal["AssistantMessage"] = "AssistantMessage"
|
||||
|
||||
# FunctionExecutionResultMessage also exists here...
|
||||
```
|
||||
|
||||
* **`CreateResult` (`models/_types.py`):** This defines the structure of the response *from* the client.
|
||||
|
||||
```python
|
||||
# From: models/_types.py (Simplified)
|
||||
from pydantic import BaseModel
|
||||
from dataclasses import dataclass
|
||||
from typing import Union, List, Optional
|
||||
from .. import FunctionCall
|
||||
|
||||
@dataclass
|
||||
class RequestUsage:
|
||||
prompt_tokens: int
|
||||
completion_tokens: int
|
||||
|
||||
FinishReasons = Literal["stop", "length", "function_calls", "content_filter", "unknown"]
|
||||
|
||||
class CreateResult(BaseModel):
|
||||
finish_reason: FinishReasons
|
||||
content: Union[str, List[FunctionCall]] # LLM output
|
||||
usage: RequestUsage # Token usage for this call
|
||||
cached: bool
|
||||
# Optional fields like logprobs, thought...
|
||||
```
|
||||
Using these standard types ensures that agent logic can work consistently, even if you switch the underlying LLM service by using a different `ChatCompletionClient` implementation.
|
||||
|
||||
## Next Steps
|
||||
|
||||
You now understand the role of `ChatCompletionClient` as the crucial link between AutoGen agents and the powerful capabilities of Large Language Models. It provides a standard way to send conversational history and tool definitions, receive generated text or function call requests, and track token usage.
|
||||
|
||||
Managing the conversation history (`messages`) sent to the client is very important. How do you ensure the LLM has the right context, especially after tool calls have happened?
|
||||
|
||||
* [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md): Learn how AutoGen helps manage the conversation history, including adding tool call requests and their results, before sending it to the `ChatCompletionClient`.
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
Reference in New Issue
Block a user