aljaz/Auto-GPT

Fork 0

mirror of https://github.com/aljazceru/Auto-GPT.git synced 2026-02-10 08:44:27 +01:00

Go to file

merwanehamadi 66fc7ccb31 Display smol-developer-results (#103 )

2023-07-14 18:26:17 -07:00

.github

Add basic code generation challenge (#98 )

2023-07-14 13:27:48 -04:00

.vscode

init agbenchmark

2023-06-18 11:14:54 -04:00

agbenchmark

Replace hidden files with custom python (#99 )

2023-07-14 14:39:47 -07:00

agent

Display smol-developer-results (#103 )

2023-07-14 18:26:17 -07:00

.env.example

moving run agent to tests & agnostic run working

2023-06-30 10:50:54 -04:00

.flake8

Add static linters ci (#45 )

2023-07-02 16:14:49 -04:00

.gitignore

Quality of life improvements & fixes (#75 )

2023-07-08 18:43:38 -07:00

.gitmodules

reverting accidental previous changes

2023-07-08 12:50:39 -04:00

.python-version

Add static linters ci (#45 )

2023-07-02 16:14:49 -04:00

LICENSE

init agbenchmark

2023-06-18 11:14:54 -04:00

mypy.ini

Added --test, consolidate files, reports working (#83 )

2023-07-10 19:25:19 -07:00

poetry.lock

Fix tests ci (#82 )

2023-07-10 21:54:25 -07:00

pyproject.toml

Add basic code generation challenge (#98 )

2023-07-14 13:27:48 -04:00

README.md

Display smol-developer-results (#103 )

2023-07-14 18:26:17 -07:00

README.md

Auto-GPT Benchmark

A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work

Scores:

Radio chart for each agent coming soon !

Detailed results

⚠️ These results are constantly evolving at the moment. We will publish an official benchmark result very soon.

Auto-GPT

Interface

Task	Results
Write File	✅
Read File	✅
Search File	❌

Code

Task	Results
Debug Simple Typo With Guidance	❌
Debug Simple Typo Without Guidance	❌
Basic Code Generation	✅
Create Simple Web Server	❌

Memory

Task	Results
Basic Memory	✅
Remember Multiple Ids	❌
Remember Multiple Ids With Noise	❌
Remember Multiple Phrases With Noise	❌

gpt-engineer

Interface

Task	Results
Write File	✅
Read File	❌
Search File	❌

Code

Task	Results
Debug Simple Typo With Guidance	❌
Debug Simple Typo Without Guidance	❌
Basic Code Generation	✅
Create Simple Web Server	❌

mini-agi

Coming Soon!

smol-developer

Interface

Task	Results
Write File	✅
Read File	❌
Search File	❌

Code

Task	Results
Debug Simple Typo With Guidance	❌
Debug Simple Typo Without Guidance	❌
Basic Code Generation	✅
Create Simple Web Server	❌

Languages

JavaScript 68.5%

Python 18.3%

Jupyter Notebook 8.3%

Dart 3.4%

C++ 0.4%

Other 0.8%