Tutorial improvement (#5736)

Co-authored-by: Silen Naihin <silen.naihin@gmail.com>
This commit is contained in:
PaperMoose
2023-10-17 17:52:33 -07:00
committed by GitHub
parent bceb66f3b0
commit d173dd772d
9 changed files with 234 additions and 149 deletions

View File

@@ -22,7 +22,7 @@ The getting started [tutorial series](https://aiedge.medium.com/autogpt-forge-e3
4. [AutoGPT Forge: Crafting Intelligent Agent Logic](https://medium.com/@aiedge/autogpt-forge-crafting-intelligent-agent-logic-bc5197b14cb4)
Comming soon:
Coming soon:
3. Interacting with and Benchmarking your Agent

View File

@@ -139,6 +139,9 @@ class ForgeAgent(Agent):
step.output = "Washington D.C"
LOG.info(f"\t✅ Final Step completed: {step.step_id}")
LOG.info(f"\t✅ Final Step completed: {step.step_id}. \n" +
f"Output should be placeholder text Washington D.C. You'll need to \n" +
f"modify execute_step to include LLM behavior. Follow the tutorial " +
f"if confused. ")
return step

View File

@@ -1,4 +1,4 @@
You developed a tool that could help people build agents ?
You developed a tool that could help people build agents?
Fork this repository, integrate your tool to the forge and send us the link of your fork in the autogpt discord: https://discord.gg/autogpt (ping maintainers)

View File

@@ -1,24 +1,19 @@
## [AutoGPT Forge: A Comprehensive Guide to Your First Steps](https://aiedge.medium.com/autogpt-forge-a-comprehensive-guide-to-your-first-steps-a1dfdf46e3b4)
## [AutoGPT Forge Part 1: A Comprehensive Guide to Your First Steps](https://aiedge.medium.com/autogpt-forge-a-comprehensive-guide-to-your-first-steps-a1dfdf46e3b4)
![Header](../../../docs/content/imgs/quickstart/000_header_img.png)
**Written by Craig Swift & [Ryan Brandt](https://github.com/paperMoose)**
Welcome to the getting started Tutorial! This tutorial is designed to walk you through the process of setting up and running your own AutoGPT agent in the Forge environment. Whether you are a seasoned AI developer or just starting out, this guide will equip you with the necessary steps to jumpstart your journey in the world of AI development with AutoGPT.
## Section 1: Understanding the Forge
The Forge serves as a comprehensive template for building your own AutoGPT agent. It not only provides the setting for setting up, creating, and running your agent, but also includes the benchmarking system and the frontend. These integrated components facilitate the development and performance evaluation of your agent.
It plays a pivotal role in the AutoGPT ecosystem, functioning as the stem from which an agent is created. It is designed to be integrated with the agent protocol, the benchmark system, and the AutoGPT frontend, thereby forming a cohesive and robust environment for agent development.
This harmonization ensures that developers adhere to a standardized framework, which significantly streamlines the development process. Consequently, it eliminates the need to construct boilerplate code, allowing developers to channel their efforts and creativity directly into crafting the “brains” of the agent. By focusing on enhancing the agents intelligence and functionalities, developers can truly leverage the potential of AutoGPT, creating agents that are not only efficient but also innovative and advanced. The Forge, therefore, stands as a beacon of innovation and efficiency, propelling the development of AutoGPT agents to new heights.
### System Requirements
This project supports Linux (Debian based), Mac, and Windows Subsystem for Linux (WSL). If you are using a Windows system, you will need to install WSL. You can find the installation instructions for WSL [here](https://learn.microsoft.com/en-us/windows/wsl/).
The Forge serves as a comprehensive template for building your own AutoGPT agent. It not only provides the setting for setting up, creating, and running your agent, but also includes the benchmarking system and the frontend for testing it. We'll touch more on those later! For now just think of the forge as a way to easily generate your boilerplate in a standardized way.
## Section 2: Setting up the Forge Environment
To begin, you need to fork the [repository](https://github.com/Significant-Gravitas/AutoGPT) by navigating to the main page of the repository and clicking "Fork" in the top-right corner.
To begin, you need to fork the [repository](https://github.com/Significant-Gravitas/AutoGPT) by navigating to the main page of the repository and clicking **Fork** in the top-right corner.
![The Github repository](../../../docs/content/imgs/quickstart/001_repo.png)
@@ -27,11 +22,12 @@ Follow the on-screen instructions to complete the process.
![Create Fork Page](../../../docs/content/imgs/quickstart/002_fork.png)
### Cloning the Repository
Next, clone the repository to your local system. Ensure you have Git installed to proceed with this step. You can download Git from [here](https://git-scm.com/downloads). Then clone the repo using the following command and the url for your repo. You can find the corect url by clicking on the green Code button on your repos main page.
Next, clone your newly forked repository to your local system. Ensure you have Git installed to proceed with this step. You can download Git from [here](https://git-scm.com/downloads). Then clone the repo using the following command and the url for your repo. You can find the correct url by clicking on the green Code button on your repos main page.
![img_1.png](../../../docs/content/imgs/quickstart/003A_clone.png)
```bash
# replace the url with the one for your forked repo
git clone https://github.com/Significant-Gravitas/AutoGPT.git
git clone https://github.com/<YOUR REPO PATH HERE>
```
![Clone the Repository](../../../docs/content/imgs/quickstart/003_clone.png)
@@ -62,7 +58,7 @@ Create your agent template using the command:
![Create an Agent](../../../docs/content/imgs/quickstart/007_create_agent.png)
### Entering the Arena
The Arena is a collection of all AutoGPT agents. It serves as a competitive environment where all agents are assessed to find the best generalist agent. Entering the Arena is a required step for participating in AutoGPT hackathons. It allows your agent to be part of a diverse and dynamic ecosystem, where it is periodically assessed by the benchmark to bre scored on the offical leaderboard.
The Arena is a collection of all AutoGPT agents ranked by performance on our benchmark. Entering the Arena is a required step for participating in AutoGPT hackathons. It's early days, so show us what you've got!
Officially enter the Arena by executing the command:
@@ -84,7 +80,7 @@ This will initiate the agent on `http://localhost:8000/`.
![Start the Agent](../../../docs/content/imgs/quickstart/009_start_agent.png)
### Logging in and Sending Tasks to Your Agent
Access the frontend at `http://localhost:8000/` and log in using a Google or GitHub account. You can then send tasks to your agent through the interface.
Access the frontend at `http://localhost:8000/` and log in using a Google or GitHub account. Once you're logged you'll see the agent tasking interface! However... the agent won't do anything yet. We'll implement the logic for our agent to run tasks in the upcoming tutorial chapters.
![Login](../../../docs/content/imgs/quickstart/010_login.png)
![Home](../../../docs/content/imgs/quickstart/011_home.png)
@@ -96,12 +92,16 @@ When needed, use Ctrl+C to end the session or use the stop command:
```
This command forcefully stops the agent. You can also restart it using the start command.
## Conclusion
In our exploration today, weve covered the essentials of working with AutoGPT projects. We began by laying out the groundwork, ensuring you have all the right tools in place. From there, we delved into the specifics of building an effective AutoGPT agent. Trust me, with the right steps, it becomes a straightforward process.
## To Recap
- We've forked the AutoGPT repo and cloned it locally on your machine.
- we connected the library with our personal github access token as part of the setup.
- We've created and named our first agent, and entered it into the arena!
- We've run the agent and it's tasking server successfully without an error.
- We've logged into the server site at localhost:8000 using our github account.
Make sure you've completed every step successfully before moving on :).
### Next Steps: Building and Enhancing Your Agent
With the foundation set, you are now ready to build and enhance your agent, exploring various functionalities and improving its performance. The next tutorial will look into the anatomy of an agent and how to add some basic functionality.
With our foundation set, you are now ready to build and enhance your agent! The next tutorial will look into the anatomy of an agent and how to add basic functionality.
## Additional Resources
@@ -115,6 +115,7 @@ With the foundation set, you are now ready to build and enhance your agent, expl
- Ensure Git is correctly installed before cloning the repository.
- Follow the setup instructions carefully to avoid issues during project setup.
- If encountering issues during agent creation, refer to the guide for naming conventions.
- make sure your github token has the `repo` scopes toggled.
### Glossary of Terms
- **Repository**: A storage space where your project resides.
@@ -126,3 +127,6 @@ With the foundation set, you are now ready to build and enhance your agent, expl
- **Frontend**: The user interface where you can log in, send tasks to your agent, and view the task history.
### System Requirements
This project supports Linux (Debian based), Mac, and Windows Subsystem for Linux (WSL). If you are using a Windows system, you will need to install WSL. You can find the installation instructions for WSL [here](https://learn.microsoft.com/en-us/windows/wsl/).

View File

@@ -1,39 +1,29 @@
# AutoGPT Forge: The Blueprint of an AI Agent
# AutoGPT Forge Part 2: The Blueprint of an AI Agent
**Written by Craig Swift & [Ryan Brandt](https://github.com/paperMoose)**
**Craig Swift**
Craig Swift
*8 min read*
·
*Just now*
---
![Header](../../../docs/content/imgs/quickstart/t2_01.png)
Hello there, fellow pioneers of the AI frontier!
If youve landed here, chances are youve been bitten by the AI bug, eager to harness the incredible power of Large Language Models (LLMs) to build your own intelligent agents, commonly known as AutoGPTs. Remember the thrill when we set up our first project in the initial tutorial? Well, buckle up, because things are about to get even more exciting!
In this tutorial — the sequel to our AI adventure — were going to embark on a journey into the very heart of an AI agent. Imagine peeling back the layers of an onion, but instead of tears, theres a wealth of knowledge waiting at each layer. Well explore the intricate web of components that make these agents tick, take a guided tour of the revered AutoGPT Forges project structure, and yes, get our hands back into the coding trenches to enhance the step function.
By the time we wrap up, you wont just have a working LLM-powered agent; youll have one that passes the essential “write file” test, a testament to your growing prowess in the world of AI development.
So, my fellow agent developers, are you ready to leap into this world where code meets cognition? Let the exploration begin!
## What are LLM-Based AI Agents?
Large Language Models (LLMs) are state-of-the-art machine learning models that harness vast amounts of web knowledge. But what happens when you blend these LLMs with autonomous agents? You get LLM-based AI agents — a new breed of artificial intelligence that promises more human-like decision-making.
Before we add logic to our new agent, we have to understand what an agent actually IS.
Traditional autonomous agents operated with limited knowledge, often confined to specific tasks or environments. They were like calculators — efficient but limited to predefined functions. LLM-based agents, on the other hand, are akin to having an encyclopedia combined with a calculator. They dont just compute; they understand, reason, and then act, drawing from a vast reservoir of information.
Large Language Models (LLMs) are state-of-the-art machine learning models that harness vast amounts of web knowledge. But what happens when you give the LLM the ability to use tools based on it's output? You get LLM-based AI agents — a new breed of artificial intelligence that promises more human-like decision-making in the real world.
Traditional autonomous agents operated with limited knowledge, often confined to specific tasks or environments. They were like calculators — efficient but limited to predefined functions. LLM-based agents, on the other hand dont just compute; they understand, reason, and then act, drawing from a vast reservoir of information.
![AI visualising AI researchers hard at work](../../../docs/content/imgs/quickstart/t2_02.png)
The Agent Landscape Survey underscores this evolution, detailing the remarkable potential LLMs have shown in achieving human-like intelligence. Theyre not just about more data; they represent a more holistic approach to AI, bridging gaps between isolated task knowledge and expansive web information.
Further expanding on this, *The Rise and Potential of Large Language Model Based Agents: A Survey* portrays LLMs as the foundational blocks for the next generation of AI agents. These agents sense, decide, and act, all backed by the comprehensive knowledge and adaptability of LLMs. It is an incredible source of knowledge on AI Agent Research with almost 700 papers referenced and organized by research area.
## The Anatomy of an LLM-Based AI Agent
Diving deep into the core of an LLM-based AI agent, we find its structured much like a human, with distinct components akin to personality, memory, thought process, and abilities. Lets break these down:
@@ -41,22 +31,56 @@ Diving deep into the core of an LLM-based AI agent, we find its structured mu
![The Github repository](../../../docs/content/imgs/quickstart/t2_03.png)
Anatomy of an Agent from the Agent Landscape Survey
1. **Profile**
When we humans focus on various tasks, we condition ourselves for those tasks. Whether were writing, chopping vegetables, driving, or playing sports, we concentrate and even adopt different mindsets. This adaptability is what the concept of profile alludes to when discussing agents. Research has shown that simply informing an agent that it is an expert in a specific task can enhance its performance.
The profiling module has potential applications beyond just prompt engineering. It could be used to adjust an agents memory functions, available actions, or even the underlying large language model (LLM) that drives the agent.
2. **Memory**
Memory, for an agent, is more than just storage — its the bedrock of its identity, capabilities and fundamental for it to learn. Just as our memories inform our decisions, reactions, and even our very personalities, an agents memory serves as its cumulative record of past interactions, learnings, and feedback. Two primary types of memories shape an agents cognition: long-term and short-term.
The Long-Term Memory is akin to the agents foundational knowledge, a vast reservoir that encompasses data and interactions spanning extended periods. Its the agents historical archive, guiding its core behaviors and understanding.
On the other hand, the Short-Term (or Working) Memory focuses on the immediate, handling transient memories much like our recollection of recent events. While essential for real-time tasks, not all short-term memories make it to the agents long-term storage.
An emerging concept in this realm is Memory Reflection. Here, the agent doesnt just store memories but actively revisits them. This introspection allows the agent to reassess, prioritize, or even discard information, akin to a human reminiscing and learning from past experiences.
3. **Planning**
Planning is the agents roadmap to problem-solving. When faced with a complex challenge, humans instinctively break it down into bite-sized, manageable tasks — a strategy mirrored in LLM-based agents. This methodical approach enables agents to navigate problems with a structured mindset, ensuring comprehensive and systematic solutions.
There are two dominant strategies in the agents planning toolkit. The first, Planning with Feedback, is an adaptive approach. Here, the agent refines its strategy based on outcomes, much like iterating through versions of a design based on user feedback.
The second, Planning without Feedback, sees the agent as a strategist, relying solely on its pre-existing knowledge and foresight. Its a game of chess, with the agent anticipating challenges and preparing several moves in advance.
4. **Action**
### **Profile**
Humans naturally adapt our mindset based on the tasks we're tackling, whether it's writing, cooking, or playing sports. Similarly, agents can be conditioned or "profiled" to specialize in specific tasks.
The profile of an agent is it's personality, mindset, and high-level instructions. Research indicates that merely informing an agent that it's an expert in a certain domain can boost its performance.
| **Potential Applications of Profiling** | **Description** |
|-----------------------------------------|----------------------------------------------------------------------------------------------------------|
| **Prompt Engineering** | Tailoring agent prompts for better results. |
| **Memory Adjustments** | Modifying how an agent recalls or prioritizes information. |
| **Action Selection** | Influencing the set of actions an agent might consider. |
| **Driving Mechanism** | Potentially tweaking the underlying large language model (LLM) that powers the agent. |
#### Example Agent Profile: Weather Expert
- **Profile Name:** Weather Specialist
- **Purpose:** Provide detailed and accurate weather information.
- **Preferred Memory Sources:** Meteorological databases, recent weather news, and scientific journals.
- **Action Set:** Fetching weather data, analyzing weather patterns, and providing forecasts.
- **Base Model Tweaks:** Prioritize meteorological terminology and understanding.
### **Memory**
Just as our memories shape our decisions, reactions, and identities, an agent's memory is the cornerstone of its identity and capabilities. Memory is fundamental for an agent to learn and adapt. At a high level, agents possess two core types of memories: long-term and short-term.
| | **Long-Term Memory** | **Short-Term (Working) Memory** |
|-------------------|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| **Purpose** | Serves as the agent's foundational knowledge base. | Handles recent or transient memories, much like our recollection of events from the past few days. |
| **What it Stores**| Historical data and interactions that have taken place over extended periods. | Immediate experiences and interactions. |
| **Role** | Guides the agent's core behaviors and understanding, acting as a vast reservoir of accumulated knowledge. | Essential for real-time tasks and decision-making. Not all these memories transition into long-term storage. |
### **Planning**
Planning is essential for agents to systematically tackle challenges, mirroring how humans break down complex problems into smaller tasks.
#### **1. What is Planning?**
- **Concept:** It's the agent's strategy for problem-solving, ensuring solutions are both comprehensive and systematic.
- **Human Analogy:** Just like humans split challenges into smaller, more manageable tasks, agents adopt a similar methodical approach.
#### **2. Key Planning Strategies**
| **Strategy** | **Description** |
|----------------------------|----------------------------------------------------------------------------------------------------------|
| **Planning with Feedback** | An adaptive approach where agents refine their strategy based on outcomes, similar to iterative design processes.|
| **Planning without Feedback** | The agent acts as a strategist, using only its existing knowledge. It's like playing chess, anticipating challenges and planning several moves ahead. |
### **Action**
After the introspection of memory and the strategizing of planning, comes the finale: Action. This is where the agents cognitive processes manifest into tangible outcomes using the agents Abilities. Every decision, every thought, culminates in the action phase, translating abstract concepts into definitive results.
Whether its penning a response, saving a file, or initiating a new process, the action component is the culmination of the agents decision-making journey. Its the bridge between digital cognition and real-world impact, turning the agents electronic impulses into meaningful and purposeful outcomes.
![t2_agent_flow.png](..%2F..%2F..%2Fdocs%2Fcontent%2Fimgs%2Fquickstart%2Ft2_agent_flow.png)
An example of how a basic agent works
## The Agent Protocol: The Linguistics of AI Communication
After diving deep into the anatomy of an agent, understanding its core components, there emerges a pivotal question: How do we effectively communicate with these diverse, intricately-designed agents? The answer lies in the Agent Protocol.
@@ -69,27 +93,27 @@ In an ecosystem where every developer might have their unique approach to crafti
## AutoGPT Forge: A Peek Inside the LLM Agent Template
Now we understand the architecture of an agent lets look inside the Forge. Its a well-organized template, meticulously architected to cater to the needs of agent developers. Let
![The Github repository](../../../docs/content/imgs/quickstart/t2_04.png)
Now we understand the architecture of an agent lets look inside the Forge. Its a well-organized template, meticulously architected to cater to the needs of agent developers.
#### Forges Project Structure: A Birds-Eye View
![t2_diagram.png](..%2F..%2F..%2Fdocs%2Fcontent%2Fimgs%2Fquickstart%2Ft2_diagram.png)
The Forges directory structure can be likened to a well-organized library, where every book (file or directory) has its designated place:
- **agent.py**: The heart of the Forge, where the agent's logic resides.
- **prompts**: A treasure trove of predefined templates, instrumental for guiding the LLM's responses.
- **sdk**: The boilerplate code and the foundational bedrock of the Forge.
The Forge's agent directory structure consists of three parts:
- **agent.py**: The heart of the Forge, where the agent's actual business logic is.
- **prompts**: A directory of prompts used in agent.py's LLM logic.
- **sdk**: The boilerplate code and the lower level APIs of the Forge.
Lets examine these core sections.
Lets break them down.
#### Unraveling the SDK
#### Understanding the SDK
The sdk directory is the Forge's control center. Think of it as the engine room of a ship, containing the gears and mechanisms that drive the entire vessel. Here's what it encapsulates:
- **Core Components**: The SDK hosts the integral parts of the Forge, like Memory, Abilities, and Planning. These components are fundamental to an agents cognition and actions.
- **Agent Protocol Routes**: Within the routes sub-directory, you'll find the implementation of our previously discussed Agent Protocol. It's here that the standard communication interface is brought to life.
- **Database (db.py)**: The agent's memory bank. It's where experiences, learnings, and other crucial data get stored.
- **Prompting Engine (prompting.py)**: This engine utilizes the templates from the prompts directory to formulate queries for the LLM, ensuring consistent and apt interactions.
- **Agent Class**: Acts as a bridge, connecting the agents logic with the Agent Protocol routes.
The SDK is the main directory for the Forge. Here's a breakdown:
- **Core Components**: These are key parts of the Forge including Memory, Abilities, and Planning. They help the agent think and act.
- **Agent Protocol Routes**: In the routes sub-directory, you'll see the Agent Protocol. This is how the agent communicates.
- **Database (db.py)**: This is where the agent stores its data like experiences and learnings.
- **Prompting Engine (prompting.py)**: This tool uses templates to ask questions to the LLM for consistent interactions.
- **Agent Class**: This connects the agent's actions with the Agent Protocol routes.
#### Configurations and Environment
@@ -100,12 +124,24 @@ Configuration is key to ensuring our agent runs seamlessly. The .env.example fil
- **Port**: `PORT` specifies the listening port for the agent's server.
- **Workspace**: `AGENT_WORKSPACE` points to the agent's working directory.
## Wrapping Up: From Blueprint to Reality
## To Recap
And there we have it — a comprehensive dive into the world of AutoGPTs. Weve traversed the intricate pathways of agent anatomy, understood how the Agent Protocol fits in, and peeked under the hood of the Forge, understanding its core components and structure.
- **LLM-Based AI Agents**:
- LLMs are machine learning models with vast knowledge. When equipped with tools to utilize their outputs, they evolve into LLM-based AI agents, enabling human-like decision-making.
If this tutorial was a journey, think of it as a hike up a mountain. We started at the base, with a broad view of LLM-based AI agents, understanding their significance and potential. As we climbed, the trail led us to the anatomy of these agents, dissecting their structure and functionality. Nearing the summit, we delved into the Agent Protocol, understanding its pivotal role in standardizing communication. And finally, standing at the peak, we had a birds-eye view of the Forge, observing its organized layout and appreciating its design intricacies.
- **Anatomy of an Agent**:
- **Profile**: Sets an agent's personality and specialization.
- **Memory**: Encompasses the agent's long-term and short-term memory, storing both historical data and recent interactions.
- **Planning**: The strategy the agent employs to tackle problems.
- **Action**: The stage where the agent's decisions translate to tangible results.
- **Agent Protocol**:
- A uniform communication interface ensuring smooth interactions between agents and their developers.
But remember, every mountain peak is the bottom of another adventure. Having grasped the theoretical aspects, its time to transition from blueprint to reality.Now the foundations have laid, our next steps involve breathing life into these concepts, turning lines of code into intelligent, responsive agents.
- **AutoGPT Forge**:
- A foundational template for creating agents. Components include:
- **agent.py**: Houses the agent's core logic.
- **prompts**: Directory of templates aiding LLM logic.
- **sdk**: Boilerplate code and essential APIs.
To all the budding agent developers out there, gear up for the next phase of our expedition — the hands-on time! Until then, keep the AI flame burning bright and never stop exploring.
Let's put this blueprint into practice in part 3!

View File

@@ -1,19 +1,21 @@
# AutoGPT Forge: Crafting Intelligent Agent Logic
![Header](../../../docs/content/imgs/quickstart/t3_01.png)
**By Craig Swift & [Ryan Brandt](https://github.com/paperMoose)**
Greetings, AI enthusiasts! Today, we're about to embark on an enlightening journey of crafting intelligent agent logic. This is part 3 in a tutorial series on using the AutoGPT Forge, you can find the earlier parts here:
Hey there! Ready for part 3 of our AutoGPT Forge tutorial series? If you missed the earlier parts, catch up here:
Part 1: AutoGPT Forge: A Comprehensive Guide to Your First Step
Part 2: AutoGPT Forge: The Blueprint of an AI Agent
- [Getting Started](001_getting_started.md)
- [Blueprint of an Agent](002_blueprint_of_an_agent.md)
Alright, folks, let's dive right into the fun part: coding! We're about to set up a nifty system that showcases how to use an LLM as the brainpower behind our agent. The mission? To tackle the simple task of jotting down the capital of the United States into a txt file. The coolest part? We won't spoon-feed our agent the steps. Instead, we'll just hand over the task: "Write the word 'Washington' to a .txt file," and watch in awe as it figures out the 'how-to' all by itself, then swiftly executes the necessary commands. How cool is that?
Now, let's get hands-on! We'll use an LLM to power our agent and complete a task. The challenge? Making the agent write "Washington" to a .txt file. We won't give it step-by-step instructions—just the task. Let's see our agent in action and watch it figure out the steps on its own!
---
## Setting Up Your Smart Agent Project
Before diving in, ensure you've prepped your project and crafted an agent as detailed in our kick-off tutorial. Missed that step? No worries! Just hop over to the project setup by clicking here. Once you're all set, come back and we'll hit the ground running.
In the following screenshot, you'll notice I've crafted an agent named "SmartAgent" and then accessed the agent.py file located in the 'forge' subfolder. This will be our workspace for integrating the LLM-driven logic. While our previous tutorial touched upon the project layout and agent operations, don't fret! I'll highlight the essentials as we delve into the logic implementation.
## Get Your Smart Agent Project Ready
Make sure you've set up your project and created an agent as described in our initial guide. If you skipped that part, [click here](#) to get started. Once you're done, come back, and we'll move forward.
In the image below, you'll see my "SmartAgent" and the agent.py file inside the 'forge' folder. That's where we'll be adding our LLM-based logic. If you're unsure about the project structure or agent functions from our last guide, don't worry. We'll cover the basics as we go!
![SmartAgent](../../../docs/content/imgs/quickstart/t3_02.png)
@@ -25,7 +27,7 @@ The lifecycle of a task, from its creation to execution, is outlined in the agen
Want your agent to perform an action? Start by dispatching a create_task request. This crucial step involves specifying the task details, much like how you'd send a prompt to ChatGPT, using the input field. If you're giving this a shot on your own, the UI is your best friend; it effortlessly handles all the API calls on your behalf.
Once your agent receives this, it triggers the create_task function. The method super().create_task(task_request) effortlessly manages all the requisite protocol record keeping on your behalf. Subsequently, it simply logs the task's creation. For the scope of this tutorial, there's no need to tweak this function.
When the agent gets this, it runs the create_task function. The code `super().create_task(task_request)` takes care of protocol steps. It then logs the task's start. For this guide, you don't need to change this function.
```python
async def create_task(self, task_request: TaskRequestBody) -> Task:
@@ -44,7 +46,7 @@ async def create_task(self, task_request: TaskRequestBody) -> Task:
return task
```
Once a task is initiated, the execute_step function is invoked repeatedly until the very last step is executed. Below is the initial look of the execute_step, and note that I've omitted the lengthy docstring explanation for the sake of brevity, but you'll encounter it in your project.
After starting a task, the `execute_step` function runs until all steps are done. Here's a basic view of `execute_step`. I've left out the detailed comments for simplicity, but you'll find them in your project.
```python
async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Step:
@@ -70,67 +72,72 @@ async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Ste
return step
```
Here's what you're witnessing: a clever way to pass the 'write file' test, broken down into four clear-cut stages:
Here's the breakdown of the 'write file' process in four steps:
1. Database Step Creation: The first stage is all about creating a step within the database, an essential aspect of the agent protocol. You'll observe that while setting up this step, we've flagged it with is_last=True. This signals to the agent protocol that no more steps are pending. For the purpose of this guide, let's work under the assumption that our agent will only tackle single-step tasks. However, hang tight for future tutorials, where we'll level up and let the agent determine its completion point.
1. **Database Step Creation**: The first stage is all about creating a step within the database, an essential aspect of the agent protocol. You'll observe that while setting up this step, we've flagged it with `is_last=True`. This signals to the agent protocol that no more steps are pending. For the purpose of this guide, let's work under the assumption that our agent will only tackle single-step tasks. However, hang tight for future tutorials, where we'll level up and let the agent determine its completion point.
2. File Writing: Next, we pen down "Washington D.C." using the workspace.write function. Simple, right?
2. **File Writing**: Next, we pen down "Washington D.C." using the workspace.write function.
3. Artifact Database Update: Once the file is written, it's time to record this file in the agent's artifact database, ensuring everything's documented.
3. **Artifact Database Update**: After writing, we record the file in the agent's artifact database.
4. Step Output Setting & Logging: To wrap things up, we align the step output with what we've penned in the file, jot down in the logs that our step has been executed, and then bring the step object into play.
4. **Step Output & Logging**: Finally, we set the step output to match the file content, log the executed step, and use the step object.
Now that we've demystified the process to ace the 'write file' test, it's time to crank things up a notch. Let's mold this into a truly intelligent agent, empowering it to navigate and conquer the challenge autonomously. Ready to dive in?
With the 'write file' process clear, let's make our agent smarter and more autonomous. Ready to dive in?
---
## Building the Foundations For Our Smart Agent
Alright, first order of business: Let's remove the hardcoded solution from the execute_step() function and have it work on the provided request instead.
First, we need to update the `execute_step()` function. Instead of a fixed solution, it should use the given request.
To bridge this knowledge gap, we'll summon the task details using the task_id provided. Here's the code magic to make it happen:
To do this, we'll fetch the task details using the provided `task_id`:
```python
task = await self.db.get_task(task_id)
Additionally, we're not forgetting the crucial step of creating a database record. As we did previously, we'll emphasize this is a one-off task with is_last=True:
```
Next, remember to create a database record and mark it as a single-step task with `is_last=True`:
```python
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
```
With these additions, your execute_step function should now have a minimalistic yet essential structure:
Your updated `execute_step` function will look like this:
```python
async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Step:
# Firstly we get the task this step is for so we can access the task input
# Get the task details
task = await self.db.get_task(task_id)
# Create a new step in the database
# Add a new step to the database
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
return step
```
With these foundational bricks laid down, let's plunge into something truly fascinating: introducing, The PromptEngine.
Now that we've set this up, let's move to the next exciting part: The PromptEngine.
---
**The Art of Prompting**
![Prompting 101](../../../docs/content/imgs/quickstart/t3_03.png)
Prompting is akin to a craftsman meticulously shaping messages tailored for powerful language models like ChatGPT. With these models being highly attuned to input nuances, designing the perfect prompt to elicit awe-inspiring behavior can be a labyrinthine challenge. Enter: the **PromptEngine**.
Prompting is like shaping messages for powerful language models like ChatGPT. Since these models respond to input details, creating the right prompt can be a challenge. That's where the **PromptEngine** comes in.
While "PromptEngine" might sound high-brow, its essence is elegantly simple. It lets you store your prompts in text files or, to be precise, in Jinja2 templates. The advantage? You can refine the prompts given to your agent without diving into the code. Plus, it offers the flexibility to customize prompts for specific LLMs. Let's break this down.
The "PromptEngine" helps you store prompts in text files, specifically in Jinja2 templates. This means you can change the prompts without changing the code. It also lets you adjust prompts for different LLMs. Here's how to use it:
Firstly, integrate the PromptEngine from the SDK:
First, add the PromptEngine from the SDK:
```python
from .sdk import PromptEngine
```
Next, within your `execute_step` function, initialize the engine tailored for, say, the `gpt-3.5-turbo` LLM:
In your `execute_step` function, set up the engine for the `gpt-3.5-turbo` LLM:
```python
prompt_engine = PromptEngine("gpt-3.5-turbo")
@@ -145,17 +152,19 @@ system_prompt = prompt_engine.load_prompt("system-format")
For intricate use cases, like the `task-step` prompt which requires parameters, employ the following method:
```python
# Specifying the task parameters
# Define the task parameters
task_kwargs = {
"task": task.input,
"abilities": self.abilities.list_abilities_for_prompt(),
}
# Then, load the task prompt with the designated parameters
# Load the task prompt with those parameters
task_prompt = prompt_engine.load_prompt("task-step", **task_kwargs)
```
Delving deeper, let's peek at the `task-step` prompt template, housed at `prompts/gpt-3.5-turbo/task-step.j2`:
Delving deeper, let's look at the `task-step` prompt template in `prompts/gpt-3.5-turbo/task-step.j2`:
```jinja
{% extends "techniques/expert.j2" %}
@@ -166,7 +175,6 @@ Your task is:
{{ task }}
Ensure to respond in the given format. Always make autonomous decisions, devoid of user guidance. Harness the power of your LLM, opting for straightforward tactics sans any legal entanglements.
{% if constraints %}
## Constraints
Operate under these confines:
@@ -174,7 +182,6 @@ Operate under these confines:
- {{ constraint }}
{% endfor %}
{% endif %}
{% if resources %}
## Resources
Utilize these resources:
@@ -182,7 +189,6 @@ Utilize these resources:
- {{ resource }}
{% endfor %}
{% endif %}
{% if abilities %}
## Abilities
Summon these abilities:
@@ -191,6 +197,14 @@ Summon these abilities:
{% endfor %}
{% endif %}
{% if abilities %}
## Abilities
Use these abilities:
{% for ability in abilities %}
- {{ ability }}
{% endfor %}
{% endif %}
{% if best_practices %}
## Best Practices
{% for best_practice in best_practices %}
@@ -200,7 +214,7 @@ Summon these abilities:
{% endblock %}
```
This template is a marvel of modularity. By using the `extends` directive, it builds upon the base `expert.j2` template. The different blocks constraints, resources, abilities, and best practices allow for a dynamic prompt that adjusts based on the context. It's like a conversation blueprint, guiding the LLM to understand the task, abide by constraints, and deploy resources and abilities to achieve the desired outcome.
This template is modular. It uses the `extends` directive to build on the `expert.j2` template. The different sections like constraints, resources, abilities, and best practices make the prompt dynamic. It guides the LLM in understanding the task and using resources and abilities.
The PromptEngine equips us with a potent tool to converse seamlessly with large language models. By externalizing prompts and using templates, we can ensure that our agent remains agile, adapting to new challenges without a code overhaul. As we march forward, keep this foundation in mind—it's the bedrock of our agent's intelligence.
@@ -208,55 +222,57 @@ The PromptEngine equips us with a potent tool to converse seamlessly with large
## Engaging with your LLM
To fully exploit the capabilities of LLMd, it goes beyond simply sending a solitary prompt. Its about tasking the model with a series of structured directives. To do this we need to structure our prompts into the format our LLM is primed to process a list of messages. Using the system_prompt and task_prompt we previously prepared create the
To make the most of the LLM, you'll send a series of organized instructions, not just one prompt. Structure your prompts as a list of messages for the LLM. Using the `system_prompt` and `task_prompt` from before, create the `messages` list:
```python
messages list:
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": task_prompt}
]
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": task_prompt}
]
```
With our prompt shaped and ready, its time to task our LLM! While this phase entails some foundational code, the spotlight is on the chat_completion_request. This crucial function tasks the LLM and retrieves its output. The adjacent code merely packages our request and deciphers the model's feedback:
With the prompt set, send it to the LLM. This step involves foundational code, focusing on the `chat_completion_request`. This function gives the LLM your prompt, and then gets the LLM's output. The other code sets up our request and interprets the feedback:
```python
try:
# Define the parameters for the chat completion request
chat_completion_kwargs = {
"messages": messages,
"model": "gpt-3.5-turbo",
}
# Make the chat completion request and parse the response
chat_response = await chat_completion_request(**chat_completion_kwargs)
answer = json.loads(chat_response["choices"][0]["message"]["content"])
try:
# Set the parameters for the chat completion
chat_completion_kwargs = {
"messages": messages,
"model": "gpt-3.5-turbo",
}
# Get the LLM's response and interpret it
chat_response = await chat_completion_request(**chat_completion_kwargs)
answer = json.loads(chat_response["choices"][0]["message"]["content"])
# Log the answer for debugging purposes
LOG.info(pprint.pformat(answer))
# Log the answer for reference
LOG.info(pprint.pformat(answer))
except json.JSONDecodeError as e:
# Handle JSON decoding errors
LOG.error(f"Unable to decode chat response: {chat_response}")
except Exception as e:
# Handle other exceptions
LOG.error(f"Unable to generate chat response: {e}")
```
except json.JSONDecodeError as e:
# Handle JSON decoding errors
LOG.error(f"Can't decode chat response: {chat_response}")
except Exception as e:
# Handle other errors
LOG.error(f"Can't get chat response: {e}")
```
Navigating through the quirks of LLM outputs to extract a clear processable message can be a nuanced endeavor. Our current approach is simple and will usually work with GPT-3.5 and GPT-4. However, future tutorials will broaden your horizons with more intricate ways to process LLM outputs. The aim? To ensure that youre not just limited to JSON, especially when some LLMs excel with alternative response patterns. Stay tuned!
Extracting clear messages from LLM outputs can be complex. Our method is simple and works with GPT-3.5 and GPT-4. Future guides will show more ways to interpret LLM outputs. The goal? To go beyond JSON, as some LLMs work best with other response types. Stay tuned!
---
## Using and Creating Abilities
For those of you with an acute attention to detail, you mightve picked up on the reference to agent abilities when we discussed creating the task-step prompt. Abilities are the gears and levers that enable the agent to interact with tasks at hand. Let's unpack the mechanisms behind these abilities and how you can harness, and even extend, them.
Within the SDK, theres a designated folder titled abilities. As of this writing, it houses registry.py, finish.py, and a subfolder named file_system. And there's space for expansion perhaps your own innovative ability will find its home here soon!
Abilities are the gears and levers that enable the agent to interact with tasks at hand. Let's unpack the mechanisms behind these abilities and how you can harness, and even extend, them.
The file registry.py plays a pivotal role. It provides the foundational blueprint for abilities, integrating the essential @ability decorator and the AbilityRegister class. This class isn't just a passive list; it's an active catalog that keeps tabs on available abilities and delineates the function necessary for their execution. What's more, a default ability register is seamlessly integrated into the base Agent class, making it effortlessly accessible via the self.abilities handle. This is added to the Agent class in its init function like so:
In the SDK, there's a `abilities` folder containing `registry.py`, `finish.py`, and a `file_system` subfolder. You can also add your own abilities here. `registry.py` is the main file for abilities. It contains the `@ability` decorator and the `AbilityRegister` class. This class actively tracks abilities and outlines their function. The base Agent class includes a default ability register available via `self.abilities`. It looks like this:
```python
self.abilities = AbilityRegister(self)
While AbilityRegister is studded with utility methods, two stand out. The list_abilities_for_prompt method curates and structures abilities for prompt integration. Conversely, run_ability operationalizes the designated ability, translating it from code to action.
An abilitys DNA comprises a function embellished with the @ability decorator and mandatorily paired with parameters, notably the agent and task_id.
```
The `AbilityRegister` has two key methods. `list_abilities_for_prompt` prepares abilities for prompts. `run_ability` makes the ability work. An ability is a function with the `@ability` decorator. It must have specific parameters, including the agent and `task_id`.
```python
@ability(
name="write_file",
description="Write data to a file",
@@ -280,7 +296,7 @@ async def write_file(agent, task_id: str, file_path: str, data: bytes) -> None:
pass
```
Here, the @ability decorator is not just an adornment but a functional specifier. It encompasses the ability's metadata: its identity (name), functionality (description), and operational parameters. Each parameter is delineated with precision, encapsulating its identity, datatype, and operational necessity.
The `@ability` decorator defines the ability's details, like its identity (name), functionality (description), and operational parameters.
## Example of a Custom Ability: Webpage Fetcher
@@ -305,18 +321,22 @@ async def fetch_webpage(agent, task_id: str, url: str) -> str:
return response.text
```
This ability, fetch_webpage, accepts a URL as input and returns the HTML content of the webpage as a string. As you can see, custom abilities allow you to extend the core functions of your agent seamlessly, integrating external tools and libraries to augment its capabilities.
Crafting a custom ability demands a synthesis of architectural comprehension and technical prowess. Its about articulating a function, enlisting its operational parameters, and intricately weaving them with the @ability decorator's specifications. With custom abilities like the "fetch_webpage", the agents potential is only limited by your imagination, readying it to tackle complex tasks with refined competence.
This ability, `fetch_webpage`, accepts a URL as input and returns the HTML content of the webpage as a string. Custom abilities let you add more features to your agent. They can integrate other tools and libraries to enhance its functions. To make a custom ability, you need to understand the structure and add technical details. With abilities like "fetch_webpage", your agent can handle complex tasks efficiently.
## Running an Ability
Now that youre well-acquainted with the essence of abilities and have the prowess to craft them, its time to put these skills into action. The final piece of our puzzle is the execute_step function. Our goal? To interpret the agent's response, isolate the desired ability, and bring it to life.
First and foremost, we derive the ability details from the agents response. This gives us a clear picture of the task at hand:
Now that you understand abilities and how to create them, let's use them. The last piece is the `execute_step` function. Our goal is to understand the agent's response, find the ability, and use it.
First, we get the ability details from the agent's answer:
```python
# Extract the ability from the answer
ability = answer["ability"]
With the ability details at our fingertips, the next step is to mobilize it. This involves calling our previously discussed run_ability function
```
With the ability details, we use it. We call the `run_ability` function:
```python
# Run the ability and get the output
# We don't actually use the output in this example
output = await self.abilities.run_ability(
@@ -324,10 +344,9 @@ output = await self.abilities.run_ability(
)
```
Here, were invoking the specified ability. The task_id ensures continuity, ability['name'] pinpoints the exact function, and the arguments (ability["args"]) provide necessary context.
Finishing up, well craft the steps output to echo the agents thoughts. This not only provides transparency but also offers a glimpse into the agents decision-making process:
Finally, we make the step's output show the agent's thinking:
```python
# Set the step output to the "speak" part of the answer
@@ -337,7 +356,6 @@ step.output = answer["thoughts"]["speak"]
return step
```
And there you have it! Your first Smart Agent, sculpted with precision and purpose, stands ready to take on challenges. The stage is set. Its showtime!
Here is what your function should look like:
@@ -450,20 +468,44 @@ d88P 888 "Y88888 "Y888 "Y88P" "Y8888P88 888 888
[2023-09-27 15:39:07,832] [forge.sdk.agent] [INFO] 📝 Agent server starting on http://localhost:8000
```
1. **Get Started**
- Click the link to access the AutoGPT Agent UI.
A simple click on that link will unveil the AutoGPT Agent UI. But wait, theres a tiny pit-stop first! Log in with your Gmail or Github credentials. Now, spot that trophy icon on the left? Click it to waltz into the benchmarking arena. Opt for the WriteFile test and hit Initiate test suite to set the wheels in motion.
2. **Login**
- Log in using your Gmail or Github credentials.
3. **Navigate to Benchmarking**
- Look to the left, and you'll spot a trophy icon. Click it to enter the benchmarking arena.
![Benchmarking page of the AutoGPT UI](../../../docs/content/imgs/quickstart/t3_04.png)
Your eyes will be glued to the right panel as it spews out real-time output. And, if you sneak a peek at your console, these celebratory messages hint that your task reached its grand finale:
4. **Select the 'WriteFile' Test**
- Choose the 'WriteFile' test from the available options.
5. **Initiate the Test Suite**
- Hit 'Initiate test suite' to start the benchmarking process.
6. **Monitor in Real-Time**
- Keep your eyes on the right panel as it displays real-time output.
7. **Check the Console**
- For additional information, you can also monitor your console for progress updates and messages.
```bash
📝 📦 Task created: 70518b75-0104-49b0-923e-f607719d042b input: Write the word 'Washington' to a .txt fi...
📝 ✅ Final Step completed: a736c45f-65a5-4c44-a697-f1d6dcd94d5c input: y
```
If you see this, you've done it!
Oops! Hit a snag or saw some cryptic error messages? No sweat. Hit retry. Remember, while LLMs pack a punch as an agents intellect, theyre a bit like wizards — incredibly potent, but occasionally need a gentle nudge to stay on track!
8. **Troubleshooting**
- If you encounter any issues or see cryptic error messages, don't worry. Just hit the retry button. Remember, LLMs are powerful but may occasionally need some guidance.
## Wrap Up
- Stay tuned for our next tutorial, where we'll enhance the agent's capabilities by adding memory!
## Keep Exploring
- Keep experimenting and pushing the boundaries of AI. Happy coding! 🚀
## Wrap Up
In our next tutorial, well further refine this process, enhancing the agents capabilities, through the addition of memory!

Binary file not shown.

After

Width:  |  Height:  |  Size: 113 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 184 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 MiB