mirror of https://github.com/aljazceru/Auto-GPT.git synced 2026-02-09 08:14:27 +01:00

Go to file

Silen Naihin f07e7b60d4 Advanced LLM Evaluation Implementation (#205 )

Co-authored-by: Auto-GPT-Bot <github-bot@agpt.co>

2023-07-29 10:26:19 +01:00

.github

Advanced LLM Evaluation Implementation (#205 )

2023-07-29 10:26:19 +01:00

.vscode

init agbenchmark

2023-06-18 11:14:54 -04:00

agbenchmark

Advanced LLM Evaluation Implementation (#205 )

2023-07-29 10:26:19 +01:00

agent

Fix tests not being run (#207 )

2023-07-27 20:50:53 -07:00

benchmark_runs

gpt-engineer-20230716225908

2023-07-16 22:59:08 +00:00

notebooks

Advanced LLM Evaluation Implementation (#205 )

2023-07-29 10:26:19 +01:00

reports

Advanced LLM Evaluation Implementation (#205 )

2023-07-29 10:26:19 +01:00

.env.example

Advanced LLM Evaluation Implementation (#205 )

2023-07-29 10:26:19 +01:00

.flake8

Use beebot autopackai (#203 )

2023-07-27 12:21:43 -07:00

.gitignore

Push reports to google drive (#167 )

2023-07-18 09:17:45 -07:00

.gitmodules

Use beebot autopackai (#203 )

2023-07-27 12:21:43 -07:00

.python-version

Add static linters ci (#45 )

2023-07-02 16:14:49 -04:00

get_data_from_helicone.py

Delete reports (#201 )

2023-07-27 11:42:24 -07:00

json_to_base_64.py

Push reports to google drive (#167 )

2023-07-18 09:17:45 -07:00

LICENSE

init agbenchmark

2023-06-18 11:14:54 -04:00

mypy.ini

report # bug, adding submodule challenges (#193 )

2023-07-26 13:53:10 +01:00

poetry.lock

Add dynamic headers using environment variables (#200 )

2023-07-26 21:26:03 -07:00

pyproject.toml

Advanced LLM Evaluation Implementation (#205 )

2023-07-29 10:26:19 +01:00

README.md

Update Scores Benchmark (#192 )

2023-07-25 11:09:49 -07:00

send_to_googledrive.py

Add helicone dynamic headers (#199 )

2023-07-26 16:03:13 -07:00

README.md

Auto-GPT Benchmarks

A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work

Scores:

Ranking overall:

Detailed results:

Click here to see the results and the raw data!!

More agents coming soon !

Languages

JavaScript 68.5%

Python 18.3%

Jupyter Notebook 8.3%

Dart 3.4%

C++ 0.4%

Other 0.8%