init push

2026-02-03 05:34:30 +01:00 · 2025-04-04 13:03:54 -04:00
parent e62ee2cb13
commit 2ebad5e5f2
160 changed files with 2 additions and 0 deletions
--- a/Core/05_chatcompletionclient.md
+++ b/Core/05_chatcompletionclient.md
@@ -0,0 +1,296 @@
+# Chapter 5: ChatCompletionClient - Talking to the Brains
+
+So far, we've learned about:
+*   [Agents](01_agent.md): The workers in our system.
+*   [Messaging](02_messaging_system__topic___subscription_.md): How agents communicate broadly.
+*   [AgentRuntime](03_agentruntime.md): The manager that runs the show.
+*   [Tools](04_tool.md): How agents get specific skills.
+
+But how does an agent actually *think* or *generate text*? Many powerful agents rely on Large Language Models (LLMs) – think of models like GPT-4, Claude, or Gemini – as their "brains". How does an agent in AutoGen Core communicate with these external LLM services?
+
+This is where the **`ChatCompletionClient`** comes in. It's the dedicated component for talking to LLMs.
+
+## Motivation: Bridging the Gap to LLMs
+
+Imagine you want to build an agent that can summarize long articles.
+1.  You give the agent an article (as a message).
+2.  The agent needs to send this article to an LLM (like GPT-4).
+3.  It also needs to tell the LLM: "Please summarize this."
+4.  The LLM processes the request and generates a summary.
+5.  The agent needs to receive this summary back from the LLM.
+
+How does the agent handle the technical details of connecting to the LLM's specific API, formatting the request correctly, sending it over the internet, and understanding the response?
+
+The `ChatCompletionClient` solves this! Think of it as the **standard phone line and translator** connecting your agent to the LLM service. You tell the client *what* to say (the conversation history and instructions), and it handles *how* to say it to the specific LLM and translates the LLM's reply back into a standard format.
+
+## Key Concepts: Understanding the LLM Communicator
+
+Let's break down the `ChatCompletionClient`:
+
+1.  **LLM Communication Bridge:** It's the primary way AutoGen agents interact with external LLM APIs (like OpenAI, Anthropic, Google Gemini, etc.). It hides the complexity of specific API calls.
+
+2.  **Standard Interface (`create` method):** It defines a common way to send requests and receive responses, regardless of the underlying LLM. The core method is `create`. You give it:
+    *   `messages`: A list of messages representing the conversation history so far.
+    *   Optional `tools`: A list of tools ([Chapter 4](04_tool.md)) the LLM might be able to use.
+    *   Other parameters (like `json_output` hints, `cancellation_token`).
+
+3.  **Messages (`LLMMessage`):** The conversation history is passed as a sequence of specific message types defined in `autogen_core.models`:
+    *   `SystemMessage`: Instructions for the LLM (e.g., "You are a helpful assistant.").
+    *   `UserMessage`: Input from the user or another agent (e.g., the article text).
+    *   `AssistantMessage`: Previous responses from the LLM (can include text or requests to call functions/tools).
+    *   `FunctionExecutionResultMessage`: The results of executing a tool/function call.
+
+4.  **Tools (`ToolSchema`):** You can provide the schemas of available tools ([Chapter 4](04_tool.md)). The LLM might then respond not with text, but with a request to call one of these tools (`FunctionCall` inside an `AssistantMessage`).
+
+5.  **Response (`CreateResult`):** The `create` method returns a standard `CreateResult` object containing:
+    *   `content`: The LLM's generated text or a list of `FunctionCall` requests.
+    *   `finish_reason`: Why the LLM stopped generating (e.g., "stop", "length", "function_calls").
+    *   `usage`: How many input (`prompt_tokens`) and output (`completion_tokens`) tokens were used.
+    *   `cached`: Whether the response came from a cache.
+
+6.  **Token Tracking:** The client automatically tracks token usage (`prompt_tokens`, `completion_tokens`) for each call. You can query the total usage via methods like `total_usage()`. This is vital for monitoring costs, as most LLM APIs charge based on tokens.
+
+## Use Case Example: Summarizing Text with an LLM
+
+Let's build a simplified scenario where we use a `ChatCompletionClient` to ask an LLM to summarize text.
+
+**Goal:** Send text to an LLM via a client and get a summary back.
+
+**Step 1: Prepare the Input Messages**
+
+We need to structure our request as a list of `LLMMessage` objects.
+
+```python
+# File: prepare_messages.py
+from autogen_core.models import SystemMessage, UserMessage
+
+# Instructions for the LLM
+system_prompt = SystemMessage(
+    content="You are a helpful assistant designed to summarize text concisely."
+)
+
+# The text we want to summarize
+article_text = """
+AutoGen is a framework that enables the development of LLM applications using multiple agents
+that can converse with each other to solve tasks. AutoGen agents are customizable,
+conversable, and can seamlessly allow human participation. They can operate in various modes
+that employ combinations of LLMs, human inputs, and tools.
+"""
+user_request = UserMessage(
+    content=f"Please summarize the following text in one sentence:\n\n{article_text}",
+    source="User" # Indicate who provided this input
+)
+
+# Combine into a list for the client
+messages_to_send = [system_prompt, user_request]
+
+print("Messages prepared:")
+for msg in messages_to_send:
+    print(f"- {msg.type}: {msg.content[:50]}...") # Print first 50 chars
+```
+This code defines the instructions (`SystemMessage`) and the user's request (`UserMessage`) and puts them in a list, ready to be sent.
+
+**Step 2: Use the ChatCompletionClient (Conceptual)**
+
+Now, we need an instance of a `ChatCompletionClient`. In a real application, you'd configure a specific client (like `OpenAIChatCompletionClient` with your API key). For this example, let's imagine we have a pre-configured client called `llm_client`.
+
+```python
+# File: call_llm_client.py
+import asyncio
+from autogen_core.models import CreateResult, RequestUsage
+# Assume 'messages_to_send' is from the previous step
+# Assume 'llm_client' is a pre-configured ChatCompletionClient instance
+# (e.g., llm_client = OpenAIChatCompletionClient(config=...))
+
+async def get_summary(client, messages):
+    print("\nSending messages to LLM via ChatCompletionClient...")
+    try:
+        # The core call: send messages, get structured result
+        response: CreateResult = await client.create(
+            messages=messages,
+            # We aren't providing tools in this simple example
+            tools=[]
+        )
+        print("Received response:")
+        print(f"- Finish Reason: {response.finish_reason}")
+        print(f"- Content: {response.content}") # This should be the summary
+        print(f"- Usage (Tokens): Prompt={response.usage.prompt_tokens}, Completion={response.usage.completion_tokens}")
+        print(f"- Cached: {response.cached}")
+
+        # Also, check total usage tracked by the client
+        total_usage = client.total_usage()
+        print(f"\nClient Total Usage: Prompt={total_usage.prompt_tokens}, Completion={total_usage.completion_tokens}")
+
+    except Exception as e:
+        print(f"An error occurred: {e}")
+
+# --- Placeholder for actual client ---
+class MockChatCompletionClient: # Simulate a real client
+    _total_usage = RequestUsage(prompt_tokens=0, completion_tokens=0)
+    async def create(self, messages, tools=[], **kwargs) -> CreateResult:
+        # Simulate API call and response
+        prompt_len = sum(len(str(m.content)) for m in messages) // 4 # Rough token estimate
+        summary = "AutoGen is a multi-agent framework for developing LLM applications."
+        completion_len = len(summary) // 4 # Rough token estimate
+        usage = RequestUsage(prompt_tokens=prompt_len, completion_tokens=completion_len)
+        self._total_usage.prompt_tokens += usage.prompt_tokens
+        self._total_usage.completion_tokens += usage.completion_tokens
+        return CreateResult(
+            finish_reason="stop", content=summary, usage=usage, cached=False
+        )
+    def total_usage(self) -> RequestUsage: return self._total_usage
+    # Other required methods (count_tokens, model_info etc.) omitted for brevity
+
+async def main():
+    from prepare_messages import messages_to_send # Get messages from previous step
+    mock_client = MockChatCompletionClient()
+    await get_summary(mock_client, messages_to_send)
+
+# asyncio.run(main()) # If you run this, it uses the mock client
+```
+This code shows the essential `client.create(...)` call. We pass our `messages_to_send` and receive a `CreateResult`. We then print the summary (`response.content`) and the token usage reported for that specific call (`response.usage`) and the total tracked by the client (`client.total_usage()`).
+
+**How an Agent Uses It:**
+Typically, an agent's logic (e.g., inside its `on_message` handler) would:
+1. Receive an incoming message (like the article to summarize).
+2. Prepare the list of `LLMMessage` objects (including system prompts, history, and the new request).
+3. Access a `ChatCompletionClient` instance (often provided during agent setup or accessed via its context).
+4. Call `await client.create(...)`.
+5. Process the `CreateResult` (e.g., extract the summary text, check for function calls if tools were provided).
+6. Potentially send the result as a new message to another agent or return it.
+
+## Under the Hood: How the Client Talks to the LLM
+
+What happens when you call `await client.create(...)`?
+
+**Conceptual Flow:**
+
+```mermaid
+sequenceDiagram
+    participant Agent as Agent Logic
+    participant Client as ChatCompletionClient
+    participant Formatter as API Formatter
+    participant HTTP as HTTP Client
+    participant LLM_API as External LLM API
+
+    Agent->>+Client: create(messages, tools)
+    Client->>+Formatter: Format messages & tools for specific API (e.g., OpenAI JSON format)
+    Formatter-->>-Client: Return formatted request body
+    Client->>+HTTP: Send POST request to LLM API endpoint with formatted body & API Key
+    HTTP->>+LLM_API: Transmit request over network
+    LLM_API->>LLM_API: Process request, generate completion/function call
+    LLM_API-->>-HTTP: Return API response (e.g., JSON)
+    HTTP-->>-Client: Receive HTTP response
+    Client->>+Formatter: Parse API response (extract content, usage, finish_reason)
+    Formatter-->>-Client: Return parsed data
+    Client->>Client: Create standard CreateResult object
+    Client-->>-Agent: Return CreateResult
+```
+
+1.  **Prepare:** The `ChatCompletionClient` takes the standard `LLMMessage` list and `ToolSchema` list.
+2.  **Format:** It translates these into the specific format required by the target LLM's API (e.g., the JSON structure expected by OpenAI's `/chat/completions` endpoint). This might involve renaming roles (like `SystemMessage` to `system`), formatting tool descriptions, etc.
+3.  **Request:** It uses an underlying HTTP client to send a network request (usually a POST request) to the LLM service's API endpoint, including the formatted data and authentication (like an API key).
+4.  **Wait & Receive:** It waits for the LLM service to process the request and send back a response over the network.
+5.  **Parse:** It receives the raw HTTP response (usually JSON) from the API.
+6.  **Standardize:** It parses this specific API response, extracting the generated text or function calls, token usage figures, finish reason, etc.
+7.  **Return:** It packages all this information into a standard `CreateResult` object and returns it to the calling agent code.
+
+**Code Glimpse:**
+
+*   **`ChatCompletionClient` Protocol (`models/_model_client.py`):** This is the abstract base class (or protocol) defining the *contract* that all specific clients must follow.
+
+    ```python
+    # From: models/_model_client.py (Simplified ABC)
+    from abc import ABC, abstractmethod
+    from typing import Sequence, Optional, Mapping, Any, AsyncGenerator, Union
+    from ._types import LLMMessage, CreateResult, RequestUsage
+    from ..tools import Tool, ToolSchema
+    from .. import CancellationToken
+
+    class ChatCompletionClient(ABC):
+        @abstractmethod
+        async def create(
+            self, messages: Sequence[LLMMessage], *,
+            tools: Sequence[Tool | ToolSchema] = [],
+            json_output: Optional[bool] = None, # Hint for JSON mode
+            extra_create_args: Mapping[str, Any] = {}, # API-specific args
+            cancellation_token: Optional[CancellationToken] = None,
+        ) -> CreateResult: ... # The core method
+
+        @abstractmethod
+        def create_stream(
+            self, # Similar to create, but yields results incrementally
+            # ... parameters ...
+        ) -> AsyncGenerator[Union[str, CreateResult], None]: ...
+
+        @abstractmethod
+        def total_usage(self) -> RequestUsage: ... # Get total tracked usage
+
+        @abstractmethod
+        def count_tokens(self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = []) -> int: ... # Estimate token count
+
+        # Other methods like close(), actual_usage(), remaining_tokens(), model_info...
+    ```
+    Concrete classes like `OpenAIChatCompletionClient`, `AnthropicChatCompletionClient` etc., implement these methods using the specific libraries and API calls for each service.
+
+*   **`LLMMessage` Types (`models/_types.py`):** These define the structure of messages passed *to* the client.
+
+    ```python
+    # From: models/_types.py (Simplified)
+    from pydantic import BaseModel
+    from typing import List, Union, Literal
+    from .. import FunctionCall # From Chapter 4 context
+
+    class SystemMessage(BaseModel):
+        content: str
+        type: Literal["SystemMessage"] = "SystemMessage"
+
+    class UserMessage(BaseModel):
+        content: Union[str, List[Union[str, Image]]] # Can include images!
+        source: str
+        type: Literal["UserMessage"] = "UserMessage"
+
+    class AssistantMessage(BaseModel):
+        content: Union[str, List[FunctionCall]] # Can be text or function calls
+        source: str
+        type: Literal["AssistantMessage"] = "AssistantMessage"
+
+    # FunctionExecutionResultMessage also exists here...
+    ```
+
+*   **`CreateResult` (`models/_types.py`):** This defines the structure of the response *from* the client.
+
+    ```python
+    # From: models/_types.py (Simplified)
+    from pydantic import BaseModel
+    from dataclasses import dataclass
+    from typing import Union, List, Optional
+    from .. import FunctionCall
+
+    @dataclass
+    class RequestUsage:
+        prompt_tokens: int
+        completion_tokens: int
+
+    FinishReasons = Literal["stop", "length", "function_calls", "content_filter", "unknown"]
+
+    class CreateResult(BaseModel):
+        finish_reason: FinishReasons
+        content: Union[str, List[FunctionCall]] # LLM output
+        usage: RequestUsage # Token usage for this call
+        cached: bool
+        # Optional fields like logprobs, thought...
+    ```
+    Using these standard types ensures that agent logic can work consistently, even if you switch the underlying LLM service by using a different `ChatCompletionClient` implementation.
+
+## Next Steps
+
+You now understand the role of `ChatCompletionClient` as the crucial link between AutoGen agents and the powerful capabilities of Large Language Models. It provides a standard way to send conversational history and tool definitions, receive generated text or function call requests, and track token usage.
+
+Managing the conversation history (`messages`) sent to the client is very important. How do you ensure the LLM has the right context, especially after tool calls have happened?
+
+*   [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md): Learn how AutoGen helps manage the conversation history, including adding tool call requests and their results, before sending it to the `ChatCompletionClient`.
+
+---
+
+Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)