mirror of
https://github.com/aljazceru/Auto-GPT.git
synced 2025-12-30 12:24:29 +01:00
Documentation/collate rearch notes (#4958)
* Add links to github issues in the README and clarify run instructions * Added a new doc in the core package with architecture notes.
This commit is contained in:
272
autogpt/core/ARCHITECTURE_NOTES.md
Normal file
272
autogpt/core/ARCHITECTURE_NOTES.md
Normal file
@@ -0,0 +1,272 @@
|
||||
# Re-architecture Notes
|
||||
|
||||
## Key Documents
|
||||
|
||||
- [Planned Agent Workflow](https://whimsical.com/agent-workflow-v2-NmnTQ8R7sVo7M3S43XgXmZ)
|
||||
- [Original Architecture Diagram](https://www.figma.com/file/fwdj44tPR7ArYtnGGUKknw/Modular-Architecture?type=whiteboard&node-id=0-1) - This is sadly well out of date at this point.
|
||||
- [Kanban](https://github.com/orgs/Significant-Gravitas/projects/1/views/1?filterQuery=label%3Are-arch)
|
||||
|
||||
## The Motivation
|
||||
|
||||
The `master` branch of Auto-GPT is an organically grown amalgamation of many thoughts
|
||||
and ideas about agent-driven autonomous systems. It lacks clear abstraction boundaries,
|
||||
has issues of global state and poorly encapsulated state, and is generally just hard to
|
||||
make effective changes to. Mainly it's just a system that's hard to make changes to.
|
||||
And research in the field is moving fast, so we want to be able to try new ideas
|
||||
quickly.
|
||||
|
||||
## Initial Planning
|
||||
|
||||
A large group of maintainers and contributors met do discuss the architectural
|
||||
challenges associated with the existing codebase. Many much-desired features (building
|
||||
new user interfaces, enabling project-specific agents, enabling multi-agent systems)
|
||||
are bottlenecked by the global state in the system. We discussed the tradeoffs between
|
||||
an incremental system transition and a big breaking version change and decided to go
|
||||
for the breaking version change. We justified this by saying:
|
||||
|
||||
- We can maintain, in essence, the same user experience as now even with a radical
|
||||
restructuring of the codebase
|
||||
- Our developer audience is struggling to use the existing codebase to build
|
||||
applications and libraries of their own, so this breaking change will largely be
|
||||
welcome.
|
||||
|
||||
## Primary Goals
|
||||
|
||||
- Separate the AutoGPT application code from the library code.
|
||||
- Remove global state from the system
|
||||
- Allow for multiple agents per user (with facilities for running simultaneously)
|
||||
- Create a serializable representation of an Agent
|
||||
- Encapsulate the core systems in abstractions with clear boundaries.
|
||||
|
||||
## Secondary goals
|
||||
|
||||
- Use existing tools to ditch any unneccesary cruft in the codebase (document loading,
|
||||
json parsing, anything easier to replace than to port).
|
||||
- Bring in the [core agent loop updates](https://whimsical.com/agent-workflow-v2-NmnTQ8R7sVo7M3S43XgXmZ)
|
||||
being developed simultaneously by @Pwuts
|
||||
|
||||
# The Agent Subsystems
|
||||
|
||||
## Configuration
|
||||
|
||||
We want a lot of things from a configuration system. We lean heavily on it in the
|
||||
`master` branch to allow several parts of the system to communicate with each other.
|
||||
[Recent work](https://github.com/Significant-Gravitas/Auto-GPT/pull/4737) has made it
|
||||
so that the config is no longer a singleton object that is materialized from the import
|
||||
state, but it's still treated as a
|
||||
[god object](https://en.wikipedia.org/wiki/God_object) containing all information about
|
||||
the system and _critically_ allowing any system to reference configuration information
|
||||
about other parts of the system.
|
||||
|
||||
### What we want
|
||||
|
||||
- It should still be reasonable to collate the entire system configuration in a
|
||||
sensible way.
|
||||
- The configuration should be validatable and validated.
|
||||
- The system configuration should be a _serializable_ representation of an `Agent`.
|
||||
- The configuration system should provide a clear (albeit very low-level) contract
|
||||
about user-configurable aspects of the system.
|
||||
- The configuration should reasonably manage default values and user-provided overrides.
|
||||
- The configuration system needs to handle credentials in a reasonable way.
|
||||
- The configuration should be the representation of some amount of system state, like
|
||||
api budgets and resource usage. These aspects are recorded in the configuration and
|
||||
updated by the system itself.
|
||||
- Agent systems should have encapsulated views of the configuration. E.g. the memory
|
||||
system should know about memory configuration but nothing about command configuration.
|
||||
|
||||
## Workspace
|
||||
|
||||
There are two ways to think about the workspace:
|
||||
|
||||
- The workspace is a scratch space for an agent where it can store files, write code,
|
||||
and do pretty much whatever else it likes.
|
||||
- The workspace is, at any given point in time, the single source of truth for what an
|
||||
agent is. It contains the serializable state (the configuration) as well as all
|
||||
other working state (stored files, databases, memories, custom code).
|
||||
|
||||
In the existing system there is **one** workspace. And because the workspace holds so
|
||||
much agent state, that means a user can only work with one agent at a time.
|
||||
|
||||
## Memory
|
||||
|
||||
The memory system has been under extremely active development.
|
||||
See [#3536](https://github.com/Significant-Gravitas/Auto-GPT/issues/3536) and
|
||||
[#4208](https://github.com/Significant-Gravitas/Auto-GPT/pull/4208) for discussion and
|
||||
work in the `master` branch. The TL;DR is
|
||||
that we noticed a couple of months ago that the `Agent` performed **worse** with
|
||||
permanent memory than without it. Since then the knowledge storage and retrieval
|
||||
system has been [redesigned](https://whimsical.com/memory-system-8Ae6x6QkjDwQAUe9eVJ6w1)
|
||||
and partially implemented in the `master` branch.
|
||||
|
||||
## Planning/Prompt-Engineering
|
||||
|
||||
The planning system is the system that translates user desires/agent intentions into
|
||||
language model prompts. In the course of development, it has become pretty clear
|
||||
that `Planning` is the wrong name for this system
|
||||
|
||||
### What we want
|
||||
|
||||
- It should be incredibly obvious what's being passed to a language model, when it's
|
||||
being passed, and what the language model response is. The landscape of language
|
||||
model research is developing very rapidly, so building complex abstractions between
|
||||
users/contributors and the language model interactions is going to make it very
|
||||
difficult for us to nimbly respond to new research developments.
|
||||
- Prompt-engineering should ideally be exposed in a parameterizeable way to users.
|
||||
- We should, where possible, leverage OpenAI's new
|
||||
[function calling api](https://openai.com/blog/function-calling-and-other-api-updates)
|
||||
to get outputs in a standard machine-readable format and avoid the deep pit of
|
||||
parsing json (and fixing unparsable json).
|
||||
|
||||
### Planning Strategies
|
||||
|
||||
The [new agent workflow](https://whimsical.com/agent-workflow-v2-NmnTQ8R7sVo7M3S43XgXmZ)
|
||||
has many, many interaction points for language models. We really would like to not
|
||||
distribute prompt templates and raw strings all through the system. The re-arch solution
|
||||
is to encapsulate language model interactions into planning strategies.
|
||||
These strategies are defined by
|
||||
|
||||
- The `LanguageModelClassification` they use (`FAST` or `SMART`)
|
||||
- A function `build_prompt` that takes strategy specific arguments and constructs a
|
||||
`LanguageModelPrompt` (a simple container for lists of messages and functions to
|
||||
pass to the language model)
|
||||
- A function `parse_content` that parses the response content (a dict) into a better
|
||||
formatted dict. Contracts here are intentionally loose and will tighten once we have
|
||||
at least one other language model provider.
|
||||
|
||||
## Resources
|
||||
|
||||
Resources are kinds of services we consume from external APIs. They may have associated
|
||||
credentials and costs we need to manage. Management of those credentials is implemented
|
||||
as manipulation of the resource configuration. We have two categories of resources
|
||||
currently
|
||||
|
||||
- AI/ML model providers (including language model providers and embedding model providers, ie OpenAI)
|
||||
- Memory providers (e.g. Pinecone, Weaviate, ChromaDB, etc.)
|
||||
|
||||
### What we want
|
||||
|
||||
- Resource abstractions should provide a common interface to different service providers
|
||||
for a particular kind of service.
|
||||
- Resource abstractions should manipulate the configuration to manage their credentials
|
||||
and budget/accounting.
|
||||
- Resource abstractions should be composable over an API (e.g. I should be able to make
|
||||
an OpenAI provider that is both a LanguageModelProvider and an EmbeddingModelProvider
|
||||
and use it wherever I need those services).
|
||||
|
||||
## Abilities
|
||||
|
||||
Along with planning and memory usage, abilities are one of the major augmentations of
|
||||
augmented language models. They allow us to expand the scope of what language models
|
||||
can do by hooking them up to code they can execute to obtain new knowledge or influence
|
||||
the world.
|
||||
|
||||
### What we want
|
||||
|
||||
- Abilities should have an extremely clear interface that users can write to.
|
||||
- Abilities should have an extremely clear interface that a language model can
|
||||
understand
|
||||
- Abilities should be declarative about their dependencies so the system can inject them
|
||||
- Abilities should be executable (where sensible) in an async run loop.
|
||||
- Abilities should be not have side effects unless those side effects are clear in
|
||||
their representation to an agent (e.g. the BrowseWeb ability shouldn't write a file,
|
||||
but the WriteFile ability can).
|
||||
|
||||
## Plugins
|
||||
|
||||
Users want to add lots of features that we don't want to support as first-party.
|
||||
Or solution to this is a plugin system to allow users to plug in their functionality or
|
||||
to construct their agent from a public plugin marketplace. Our primary concern in the
|
||||
re-arch is to build a stateless plugin service interface and a simple implementation
|
||||
that can load plugins from installed packages or from zip files. Future efforts will
|
||||
expand this system to allow plugins to load from a marketplace or some other kind
|
||||
of service.
|
||||
|
||||
### What is a Plugin
|
||||
|
||||
Plugins are a kind of garbage term. They refer to a number of things.
|
||||
|
||||
- New commands for the agent to execute. This is the most common usage.
|
||||
- Replacements for entire subsystems like memory or language model providers
|
||||
- Application plugins that do things like send emails or communicate via whatsapp
|
||||
- The repositories contributors create that may themselves have multiple plugins in them.
|
||||
|
||||
### Usage in the existing system
|
||||
|
||||
The current plugin system is _hook-based_. This means plugins don't correspond to
|
||||
kinds of objects in the system, but rather to times in the system at which we defer
|
||||
execution to them. The main advantage of this setup is that user code can hijack
|
||||
pretty much any behavior of the agent by injecting code that supercedes the normal
|
||||
agent execution. The disadvantages to this approach are numerous:
|
||||
|
||||
- We have absolutely no mechanisms to enforce any security measures because the threat
|
||||
surface is everything.
|
||||
- We cannot reason about agent behavior in a cohesive way because control flow can be
|
||||
ceded to user code at pretty much any point and arbitrarily change or break the
|
||||
agent behavior
|
||||
- The interface for designing a plugin is kind of terrible and difficult to standardize
|
||||
- The hook based implementation means we couple ourselves to a particular flow of
|
||||
control (or otherwise risk breaking plugin behavior). E.g. many of the hook targets
|
||||
in the [old workflow](https://whimsical.com/agent-workflow-VAzeKcup3SR7awpNZJKTyK)
|
||||
are not present or mean something entirely different in the
|
||||
[new workflow](https://whimsical.com/agent-workflow-v2-NmnTQ8R7sVo7M3S43XgXmZ).
|
||||
- Etc.
|
||||
|
||||
### What we want
|
||||
|
||||
- A concrete definition of a plugin that is narrow enough in scope that we can define
|
||||
it well and reason about how it will work in the system.
|
||||
- A set of abstractions that let us define a plugin by its storage format and location
|
||||
- A service interface that knows how to parse the plugin abstractions and turn them
|
||||
into concrete classes and objects.
|
||||
|
||||
|
||||
## Some Notes on how and why we'll use OO in this project
|
||||
|
||||
First and foremost, Python itself is an object-oriented language. It's
|
||||
underlying [data model](https://docs.python.org/3/reference/datamodel.html) is built
|
||||
with object-oriented programming in mind. It offers useful tools like abstract base
|
||||
classes to communicate interfaces to developers who want to, e.g., write plugins, or
|
||||
help work on implementations. If we were working in a different language that offered
|
||||
different tools, we'd use a different paradigm.
|
||||
|
||||
While many things are classes in the re-arch, they are not classes in the same way.
|
||||
There are three kinds of things (roughly) that are written as classes in the re-arch:
|
||||
1. **Configuration**: Auto-GPT has *a lot* of configuration. This configuration
|
||||
is *data* and we use **[Pydantic](https://docs.pydantic.dev/latest/)** to manage it as
|
||||
pydantic is basically industry standard for this stuff. It provides runtime validation
|
||||
for all the configuration and allows us to easily serialize configuration to both basic
|
||||
python types (dicts, lists, and primatives) as well as serialize to json, which is
|
||||
important for us being able to put representations of agents
|
||||
[on the wire](https://en.wikipedia.org/wiki/Wire_protocol) for web applications and
|
||||
agent-to-agent communication. *These are essentially
|
||||
[structs](https://en.wikipedia.org/wiki/Struct_(C_programming_language)) rather than
|
||||
traditional classes.*
|
||||
2. **Internal Data**: Very similar to configuration, Auto-GPT passes around boatloads
|
||||
of internal data. We are interacting with language models and language model APIs
|
||||
which means we are handling lots of *structured* but *raw* text. Here we also
|
||||
leverage **pydantic** to both *parse* and *validate* the internal data and also to
|
||||
give us concrete types which we can use static type checkers to validate against
|
||||
and discover problems before they show up as bugs at runtime. *These are
|
||||
essentially [structs](https://en.wikipedia.org/wiki/Struct_(C_programming_language))
|
||||
rather than traditional classes.*
|
||||
3. **System Interfaces**: This is our primary traditional use of classes in the
|
||||
re-arch. We have a bunch of systems. We want many of those systems to have
|
||||
alternative implementations (e.g. via plugins). We use abstract base classes to
|
||||
define interfaces to communicate with people who might want to provide those
|
||||
plugins. We provide a single concrete implementation of most of those systems as a
|
||||
subclass of the interface. This should not be controversial.
|
||||
|
||||
The approach is consistent with
|
||||
[prior](https://github.com/Significant-Gravitas/Auto-GPT/issues/2458)
|
||||
[work](https://github.com/Significant-Gravitas/Auto-GPT/pull/2442) done by other
|
||||
maintainers in this direction.
|
||||
|
||||
From an organization standpoint, OO programming is by far the most popular programming
|
||||
paradigm (especially for Python). It's the one most often taught in programming classes
|
||||
and the one with the most available online training for people interested in
|
||||
contributing.
|
||||
|
||||
Finally, and importantly, we scoped the plan and initial design of the re-arch as a
|
||||
large group of maintainers and collaborators early on. This is consistent with the
|
||||
design we chose and no-one offered alternatives.
|
||||
|
||||
@@ -6,6 +6,12 @@ a work in progress and is not yet feature complete. In particular, it does not
|
||||
have many of the Auto-GPT commands implemented and is pending ongoing work to
|
||||
[re-incorporate vector-based memory and knowledge retrieval](https://github.com/Significant-Gravitas/Auto-GPT/issues/3536).
|
||||
|
||||
## [Overview](ARCHITECTURE_NOTES.md)
|
||||
|
||||
The Auto-GPT Re-arch is a re-implementation of the Auto-GPT agent that is designed to be more modular,
|
||||
more extensible, and more maintainable than the original Auto-GPT agent. It is also designed to be
|
||||
more accessible to new developers and to be easier to contribute to. The re-arch is a work in progress
|
||||
and is not yet feature complete. It is also not yet ready for production use.
|
||||
|
||||
## Running the Re-arch Code
|
||||
|
||||
|
||||
Reference in New Issue
Block a user