Start showing benchmark results (#100)

2026-02-21 22:24:30 +01:00 · 2023-07-14 14:56:56 -07:00
parent 7bc7d9213d
commit 281cb0ef37
1 changed files with 20 additions and 7 deletions
--- a/README.md
+++ b/README.md
@@ -2,13 +2,26 @@

 A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work

-### Scores:
+## Scores:
+Spider chart for each agent coming soon !

-Scoring of agents will go here. Both overall and by category.
+## Detailed results
+:warning: These results are constantly evolving at the moment. We will publish an official benchmark result very soon.

-### Integrated Agents
+### Auto-GPT
+Coming Soon!

- Auto-GPT
- gpt-engineer
- mini-agi
- smol-developer
+### gpt-engineer
+
+| Task                              | Results              |
+|-----------------------------------|----------------------|
+| Debug Simple Typo With Guidance   | :x:                  |
+| Debug Simple Typo Without Guidance| :x:                  |
+| Basic Code Generation             | :white_check_mark:   |
+| Create Simple Web Server          | :x:                  |
+
+### mini-agi
+Coming Soon!
+
+### smol-developer
+Coming Soon!