From 2aa88fd163f6a6fa7c61bbdfb24b2bebddea62d4 Mon Sep 17 00:00:00 2001 From: merwanehamadi Date: Tue, 25 Jul 2023 11:09:49 -0700 Subject: [PATCH] Update Scores Benchmark (#192) --- README.md | 38 ++++++++++---------------------------- 1 file changed, 10 insertions(+), 28 deletions(-) diff --git a/README.md b/README.md index 368c79ee..a1b54a14 100644 --- a/README.md +++ b/README.md @@ -1,37 +1,19 @@ -# Auto-GPT Benchmark +# Auto-GPT Benchmarks A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work ## Scores: -Radio chart for each agent coming soon ! +Screenshot 2023-07-25 at 10 35 01 AM -## Detailed results -:warning: These results are constantly evolving at the moment. We will publish an official benchmark result very soon. - -Interface - -| Task | Auto-GPT | gpt-engineer | mini-agi | smol-developer | -|--------------|----------|--------------------|----------|--------------------| -| Write File | :x: | :white_check_mark: | tbd | :white_check_mark: | -| Read File | :x: | :x: | tbd | :x: | -| Search File | :x: | :x: | tbd | :x: | +## Ranking overall: +- 1- [Beebot](https://github.com/Significant-Gravitas/Auto-GPT) +- 2- [mini-agi](https://github.com/muellerberndt/mini-agi) +- 3- [Auto-GPT](https://github.com/Significant-Gravitas/Auto-GPT) +## Detailed results: -Code +Screenshot 2023-07-25 at 10 42 15 AM -| Task | Auto-GPT | gpt-engineer | mini-agi | smol-developer | -|------------------------------------|----------|--------------------|----------|--------------------| -| Debug Simple Typo With Guidance | :x: | :x: | tbd | :x: | -| Debug Simple Typo Without Guidance | :x: | :x: | tbd | :x: | -| Basic Code Generation | :x: | :white_check_mark: | tbd | :white_check_mark: | -| Create Simple Web Server | :x: | :x: | tbd | :x: | +[Click here to see the results and the raw data!](https://docs.google.com/spreadsheets/d/1WXm16P2AHNbKpkOI0LYBpcsGG0O7D8HYTG5Uj0PaJjA/edit#gid=203558751)! - -Memory - -| Task | Auto-GPT | -|--------------------------------------------|----------| -| Basic Memory | :x: | -| Remember Multiple Ids | :x: | -| Remember Multiple Ids With Noise | :x: | -| Remember Multiple Phrases With Noise | :x: | +More agents coming soon !