Auto-GPT Benchmark

A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work

Scores:

Radio chart for each agent coming soon !

Detailed results

⚠️ These results are constantly evolving at the moment. We will publish an official benchmark result very soon.

Auto-GPT

Interface

Task Results
Write File
Read File
Search File

Code

Task Results
Debug Simple Typo With Guidance
Debug Simple Typo Without Guidance
Basic Code Generation
Create Simple Web Server

Memory

Task Results
Basic Memory
Remember Multiple Ids
Remember Multiple Ids With Noise
Remember Multiple Phrases With Noise

gpt-engineer

Interface

Task Results
Write File
Read File
Search File

Code

Task Results
Debug Simple Typo With Guidance
Debug Simple Typo Without Guidance
Basic Code Generation
Create Simple Web Server

mini-agi

Coming Soon!

smol-developer

Interface

Task Results
Write File
Read File
Search File

Code

Task Results
Debug Simple Typo With Guidance
Debug Simple Typo Without Guidance
Basic Code Generation
Create Simple Web Server
Description
No description provided
Readme MIT 81 MiB
Languages
JavaScript 68.5%
Python 18.3%
Jupyter Notebook 8.3%
Dart 3.4%
C++ 0.4%
Other 0.8%