From 281cb0ef37c3b8934af787f6681858b0c472556b Mon Sep 17 00:00:00 2001 From: merwanehamadi Date: Fri, 14 Jul 2023 14:56:56 -0700 Subject: [PATCH] Start showing benchmark results (#100) --- README.md | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index ed348b5a..e73f3989 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,26 @@ A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work -### Scores: +## Scores: +Spider chart for each agent coming soon ! -Scoring of agents will go here. Both overall and by category. +## Detailed results +:warning: These results are constantly evolving at the moment. We will publish an official benchmark result very soon. -### Integrated Agents +### Auto-GPT +Coming Soon! -- Auto-GPT -- gpt-engineer -- mini-agi -- smol-developer +### gpt-engineer + +| Task | Results | +|-----------------------------------|----------------------| +| Debug Simple Typo With Guidance | :x: | +| Debug Simple Typo Without Guidance| :x: | +| Basic Code Generation | :white_check_mark: | +| Create Simple Web Server | :x: | + +### mini-agi +Coming Soon! + +### smol-developer +Coming Soon!