# Chapter 5: ChatCompletionClient - Talking to the Brains So far, we've learned about: * [Agents](01_agent.md): The workers in our system. * [Messaging](02_messaging_system__topic___subscription_.md): How agents communicate broadly. * [AgentRuntime](03_agentruntime.md): The manager that runs the show. * [Tools](04_tool.md): How agents get specific skills. But how does an agent actually *think* or *generate text*? Many powerful agents rely on Large Language Models (LLMs) – think of models like GPT-4, Claude, or Gemini – as their "brains". How does an agent in AutoGen Core communicate with these external LLM services? This is where the **`ChatCompletionClient`** comes in. It's the dedicated component for talking to LLMs. ## Motivation: Bridging the Gap to LLMs Imagine you want to build an agent that can summarize long articles. 1. You give the agent an article (as a message). 2. The agent needs to send this article to an LLM (like GPT-4). 3. It also needs to tell the LLM: "Please summarize this." 4. The LLM processes the request and generates a summary. 5. The agent needs to receive this summary back from the LLM. How does the agent handle the technical details of connecting to the LLM's specific API, formatting the request correctly, sending it over the internet, and understanding the response? The `ChatCompletionClient` solves this! Think of it as the **standard phone line and translator** connecting your agent to the LLM service. You tell the client *what* to say (the conversation history and instructions), and it handles *how* to say it to the specific LLM and translates the LLM's reply back into a standard format. ## Key Concepts: Understanding the LLM Communicator Let's break down the `ChatCompletionClient`: 1. **LLM Communication Bridge:** It's the primary way AutoGen agents interact with external LLM APIs (like OpenAI, Anthropic, Google Gemini, etc.). It hides the complexity of specific API calls. 2. **Standard Interface (`create` method):** It defines a common way to send requests and receive responses, regardless of the underlying LLM. The core method is `create`. You give it: * `messages`: A list of messages representing the conversation history so far. * Optional `tools`: A list of tools ([Chapter 4](04_tool.md)) the LLM might be able to use. * Other parameters (like `json_output` hints, `cancellation_token`). 3. **Messages (`LLMMessage`):** The conversation history is passed as a sequence of specific message types defined in `autogen_core.models`: * `SystemMessage`: Instructions for the LLM (e.g., "You are a helpful assistant."). * `UserMessage`: Input from the user or another agent (e.g., the article text). * `AssistantMessage`: Previous responses from the LLM (can include text or requests to call functions/tools). * `FunctionExecutionResultMessage`: The results of executing a tool/function call. 4. **Tools (`ToolSchema`):** You can provide the schemas of available tools ([Chapter 4](04_tool.md)). The LLM might then respond not with text, but with a request to call one of these tools (`FunctionCall` inside an `AssistantMessage`). 5. **Response (`CreateResult`):** The `create` method returns a standard `CreateResult` object containing: * `content`: The LLM's generated text or a list of `FunctionCall` requests. * `finish_reason`: Why the LLM stopped generating (e.g., "stop", "length", "function_calls"). * `usage`: How many input (`prompt_tokens`) and output (`completion_tokens`) tokens were used. * `cached`: Whether the response came from a cache. 6. **Token Tracking:** The client automatically tracks token usage (`prompt_tokens`, `completion_tokens`) for each call. You can query the total usage via methods like `total_usage()`. This is vital for monitoring costs, as most LLM APIs charge based on tokens. ## Use Case Example: Summarizing Text with an LLM Let's build a simplified scenario where we use a `ChatCompletionClient` to ask an LLM to summarize text. **Goal:** Send text to an LLM via a client and get a summary back. **Step 1: Prepare the Input Messages** We need to structure our request as a list of `LLMMessage` objects. ```python # File: prepare_messages.py from autogen_core.models import SystemMessage, UserMessage # Instructions for the LLM system_prompt = SystemMessage( content="You are a helpful assistant designed to summarize text concisely." ) # The text we want to summarize article_text = """ AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and can seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools. """ user_request = UserMessage( content=f"Please summarize the following text in one sentence:\n\n{article_text}", source="User" # Indicate who provided this input ) # Combine into a list for the client messages_to_send = [system_prompt, user_request] print("Messages prepared:") for msg in messages_to_send: print(f"- {msg.type}: {msg.content[:50]}...") # Print first 50 chars ``` This code defines the instructions (`SystemMessage`) and the user's request (`UserMessage`) and puts them in a list, ready to be sent. **Step 2: Use the ChatCompletionClient (Conceptual)** Now, we need an instance of a `ChatCompletionClient`. In a real application, you'd configure a specific client (like `OpenAIChatCompletionClient` with your API key). For this example, let's imagine we have a pre-configured client called `llm_client`. ```python # File: call_llm_client.py import asyncio from autogen_core.models import CreateResult, RequestUsage # Assume 'messages_to_send' is from the previous step # Assume 'llm_client' is a pre-configured ChatCompletionClient instance # (e.g., llm_client = OpenAIChatCompletionClient(config=...)) async def get_summary(client, messages): print("\nSending messages to LLM via ChatCompletionClient...") try: # The core call: send messages, get structured result response: CreateResult = await client.create( messages=messages, # We aren't providing tools in this simple example tools=[] ) print("Received response:") print(f"- Finish Reason: {response.finish_reason}") print(f"- Content: {response.content}") # This should be the summary print(f"- Usage (Tokens): Prompt={response.usage.prompt_tokens}, Completion={response.usage.completion_tokens}") print(f"- Cached: {response.cached}") # Also, check total usage tracked by the client total_usage = client.total_usage() print(f"\nClient Total Usage: Prompt={total_usage.prompt_tokens}, Completion={total_usage.completion_tokens}") except Exception as e: print(f"An error occurred: {e}") # --- Placeholder for actual client --- class MockChatCompletionClient: # Simulate a real client _total_usage = RequestUsage(prompt_tokens=0, completion_tokens=0) async def create(self, messages, tools=[], **kwargs) -> CreateResult: # Simulate API call and response prompt_len = sum(len(str(m.content)) for m in messages) // 4 # Rough token estimate summary = "AutoGen is a multi-agent framework for developing LLM applications." completion_len = len(summary) // 4 # Rough token estimate usage = RequestUsage(prompt_tokens=prompt_len, completion_tokens=completion_len) self._total_usage.prompt_tokens += usage.prompt_tokens self._total_usage.completion_tokens += usage.completion_tokens return CreateResult( finish_reason="stop", content=summary, usage=usage, cached=False ) def total_usage(self) -> RequestUsage: return self._total_usage # Other required methods (count_tokens, model_info etc.) omitted for brevity async def main(): from prepare_messages import messages_to_send # Get messages from previous step mock_client = MockChatCompletionClient() await get_summary(mock_client, messages_to_send) # asyncio.run(main()) # If you run this, it uses the mock client ``` This code shows the essential `client.create(...)` call. We pass our `messages_to_send` and receive a `CreateResult`. We then print the summary (`response.content`) and the token usage reported for that specific call (`response.usage`) and the total tracked by the client (`client.total_usage()`). **How an Agent Uses It:** Typically, an agent's logic (e.g., inside its `on_message` handler) would: 1. Receive an incoming message (like the article to summarize). 2. Prepare the list of `LLMMessage` objects (including system prompts, history, and the new request). 3. Access a `ChatCompletionClient` instance (often provided during agent setup or accessed via its context). 4. Call `await client.create(...)`. 5. Process the `CreateResult` (e.g., extract the summary text, check for function calls if tools were provided). 6. Potentially send the result as a new message to another agent or return it. ## Under the Hood: How the Client Talks to the LLM What happens when you call `await client.create(...)`? **Conceptual Flow:** ```mermaid sequenceDiagram participant Agent as Agent Logic participant Client as ChatCompletionClient participant Formatter as API Formatter participant HTTP as HTTP Client participant LLM_API as External LLM API Agent->>+Client: create(messages, tools) Client->>+Formatter: Format messages & tools for specific API (e.g., OpenAI JSON format) Formatter-->>-Client: Return formatted request body Client->>+HTTP: Send POST request to LLM API endpoint with formatted body & API Key HTTP->>+LLM_API: Transmit request over network LLM_API->>LLM_API: Process request, generate completion/function call LLM_API-->>-HTTP: Return API response (e.g., JSON) HTTP-->>-Client: Receive HTTP response Client->>+Formatter: Parse API response (extract content, usage, finish_reason) Formatter-->>-Client: Return parsed data Client->>Client: Create standard CreateResult object Client-->>-Agent: Return CreateResult ``` 1. **Prepare:** The `ChatCompletionClient` takes the standard `LLMMessage` list and `ToolSchema` list. 2. **Format:** It translates these into the specific format required by the target LLM's API (e.g., the JSON structure expected by OpenAI's `/chat/completions` endpoint). This might involve renaming roles (like `SystemMessage` to `system`), formatting tool descriptions, etc. 3. **Request:** It uses an underlying HTTP client to send a network request (usually a POST request) to the LLM service's API endpoint, including the formatted data and authentication (like an API key). 4. **Wait & Receive:** It waits for the LLM service to process the request and send back a response over the network. 5. **Parse:** It receives the raw HTTP response (usually JSON) from the API. 6. **Standardize:** It parses this specific API response, extracting the generated text or function calls, token usage figures, finish reason, etc. 7. **Return:** It packages all this information into a standard `CreateResult` object and returns it to the calling agent code. **Code Glimpse:** * **`ChatCompletionClient` Protocol (`models/_model_client.py`):** This is the abstract base class (or protocol) defining the *contract* that all specific clients must follow. ```python # From: models/_model_client.py (Simplified ABC) from abc import ABC, abstractmethod from typing import Sequence, Optional, Mapping, Any, AsyncGenerator, Union from ._types import LLMMessage, CreateResult, RequestUsage from ..tools import Tool, ToolSchema from .. import CancellationToken class ChatCompletionClient(ABC): @abstractmethod async def create( self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = [], json_output: Optional[bool] = None, # Hint for JSON mode extra_create_args: Mapping[str, Any] = {}, # API-specific args cancellation_token: Optional[CancellationToken] = None, ) -> CreateResult: ... # The core method @abstractmethod def create_stream( self, # Similar to create, but yields results incrementally # ... parameters ... ) -> AsyncGenerator[Union[str, CreateResult], None]: ... @abstractmethod def total_usage(self) -> RequestUsage: ... # Get total tracked usage @abstractmethod def count_tokens(self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = []) -> int: ... # Estimate token count # Other methods like close(), actual_usage(), remaining_tokens(), model_info... ``` Concrete classes like `OpenAIChatCompletionClient`, `AnthropicChatCompletionClient` etc., implement these methods using the specific libraries and API calls for each service. * **`LLMMessage` Types (`models/_types.py`):** These define the structure of messages passed *to* the client. ```python # From: models/_types.py (Simplified) from pydantic import BaseModel from typing import List, Union, Literal from .. import FunctionCall # From Chapter 4 context class SystemMessage(BaseModel): content: str type: Literal["SystemMessage"] = "SystemMessage" class UserMessage(BaseModel): content: Union[str, List[Union[str, Image]]] # Can include images! source: str type: Literal["UserMessage"] = "UserMessage" class AssistantMessage(BaseModel): content: Union[str, List[FunctionCall]] # Can be text or function calls source: str type: Literal["AssistantMessage"] = "AssistantMessage" # FunctionExecutionResultMessage also exists here... ``` * **`CreateResult` (`models/_types.py`):** This defines the structure of the response *from* the client. ```python # From: models/_types.py (Simplified) from pydantic import BaseModel from dataclasses import dataclass from typing import Union, List, Optional from .. import FunctionCall @dataclass class RequestUsage: prompt_tokens: int completion_tokens: int FinishReasons = Literal["stop", "length", "function_calls", "content_filter", "unknown"] class CreateResult(BaseModel): finish_reason: FinishReasons content: Union[str, List[FunctionCall]] # LLM output usage: RequestUsage # Token usage for this call cached: bool # Optional fields like logprobs, thought... ``` Using these standard types ensures that agent logic can work consistently, even if you switch the underlying LLM service by using a different `ChatCompletionClient` implementation. ## Next Steps You now understand the role of `ChatCompletionClient` as the crucial link between AutoGen agents and the powerful capabilities of Large Language Models. It provides a standard way to send conversational history and tool definitions, receive generated text or function call requests, and track token usage. Managing the conversation history (`messages`) sent to the client is very important. How do you ensure the LLM has the right context, especially after tool calls have happened? * [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md): Learn how AutoGen helps manage the conversation history, including adding tool call requests and their results, before sending it to the `ChatCompletionClient`. --- Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)