Reinier van der Leer 9012ff4db2 refactor(benchmark): Interface & type consoledation, and arch change, to allow adding challenge providers
Squashed commit of the following:

commit 7d6476d3297860f74c276d571da995d958a8cc1a
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 18:10:45 2024 +0100

    refactor(benchmark/challenge): Set up structure to support more challenge providers

    - Move `Challenge`, `ChallengeData`, `load_challenges` to `challenges/builtin.py` and rename to `BuiltinChallenge`, `BuiltinChallengeSpec`, `load_builtin_challenges`
    - Create `BaseChallenge` to serve as interface and base class for different challenge implementations
    - Create `ChallengeInfo` model to serve as universal challenge info object
    - Create `get_challenge_from_source_uri` function in `challenges/__init__.py`
    - Replace `ChallengeData` by `ChallengeInfo` everywhere except in `BuiltinChallenge`
    - Add strong typing to `task_informations` store in app.py
    - Use `call.duration` in `finalize_test_report` and remove `timer` fixture
    - Update docstring on `challenges/__init__.py:get_unique_categories`
    - Add docstring to `generate_test.py`

commit 5df2aa7939b45d85a2c2b5de9ac0522330d1502a
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 16:58:01 2024 +0100

    refactor(benchmark): Refactor & rename functions in agent_interface.py and agent_api_interface.py

    - `copy_artifacts_into_temp_folder` -> `copy_challenge_artifacts_into_workspace`
    - `copy_agent_artifacts_into_folder` -> `download_agent_artifacts_into_folder`
    - Reorder parameters of `run_api_agent`, `copy_challenge_artifacts_into_workspace`; use `Path` instead of `str`

commit 6a256fef4c7950b7ee82fb801e70c83afe6b6f8b
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 16:02:25 2024 +0100

    refactor(benchmark): Refactor & typefix report generation and handling logic

    - Rename functions in reports.py and ReportManager.py to better reflect what they do
       - `get_previous_test_results` -> `get_and_update_success_history`
       - `generate_single_call_report` -> `initialize_test_report`
       - `finalize_reports` -> `finalize_test_report`
       - `ReportManager.end_info_report` -> `SessionReportManager.finalize_session_report`
    - Modify `pytest_runtest_makereport` hook in conftest.py to finalize the report immediately after the challenge finishes running instead of after teardown
       - Move result processing logic from `initialize_test_report` to `finalize_test_report` in reports.py
    - Use `Test` and `Report` types from report_types.py where possible instead of untyped dicts: reports.py, utils.py, ReportManager.py
    - Differentiate `ReportManager` into `SessionReportManager`, `RegressionTestsTracker`, `SuccessRateTracker`
    - Move filtering of optional challenge categories from challenge.py (`Challenge.skip_optional_categories`) to conftest.py (`pytest_collection_modifyitems`)
    - Remove unused `scores` fixture in conftest.py

commit 370d6dbf5df75d78e3878877968e8cd309d6d7fb
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 15:16:43 2024 +0100

    refactor(benchmark): Simplify models in report_types.py

    - Removed ForbidOptionalMeta and BaseModelBenchmark classes.
    - Changed model attributes to optional: `Metrics.difficulty`, `Metrics.success`, `Metrics.success_percentage`, `Metrics.run_time`, and `Test.reached_cutoff`.
    - Added validator to `Metrics` model to require `success` and `run_time` fields if `attempted=True`.
    - Added default values to all optional model fields.
    - Removed duplicate imports.
    - Added condition in process_report.py to prevent null lookups if `metrics.difficulty` is not set.
2024-01-18 15:19:06 +01:00
2023-10-16 15:49:50 -07:00
2023-09-08 23:07:17 +02:00
2023-12-06 14:25:38 +01:00

AutoGPT: build & use AI agents

Discord Follow Twitter Follow License: MIT

AutoGPT is the vision of the power of AI accessible to everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters:

  • 🏗️ Building - Lay the foundation for something amazing.
  • 🧪 Testing - Fine-tune your agent to perfection.
  • 🤝 Delegating - Let AI work for you, and have your ideas come to life.

Be part of the revolution! AutoGPT is here to stay, at the forefront of AI innovation.

📖 Documentation | 🚀 Contributing | 🛠️ Build your own Agent - Quickstart

🥇 Current Best Agent: evo.ninja

The AutoGPT Arena Hackathon saw evo.ninja earn the top spot on our Arena Leaderboard, proving itself as the best open-source generalist agent. Try it now at https://evo.ninja!

📈 To challenge evo.ninja, AutoGPT, and others, submit your benchmark run to the Leaderboard, and maybe your agent will be up here next!

🧱 Building blocks

🏗️ Forge

Forge your own agent! Forge is a ready-to-go template for your agent application. All the boilerplate code is already handled, letting you channel all your creativity into the things that set your agent apart. All tutorials are located here. Components from the forge.sdk can also be used individually to speed up development and reduce boilerplate in your agent project.

🚀 Getting Started with Forge This guide will walk you through the process of creating your own agent and using the benchmark and user interface.

📘 Learn More about Forge

🎯 Benchmark

Measure your agent's performance! The agbenchmark can be used with any agent that supports the agent protocol, and the integration with the project's CLI makes it even easier to use with AutoGPT and forge-based agents. The benchmark offers a stringent testing environment. Our framework allows for autonomous, objective performance evaluations, ensuring your agents are primed for real-world action.

📦 agbenchmark on Pypi | 📘 Learn More about the Benchmark

🏆 Leaderboard

Submit your benchmark run through the UI and claim your place on the AutoGPT Arena Leaderboard! The best scoring general agent earns the title of Current Best Agent, and will be adopted into our repo so people can easily run it through the CLI.

Screenshot of the AutoGPT Arena leaderboard

💻 UI

Makes agents easy to use! The frontend gives you a user-friendly interface to control and monitor your agents. It connects to agents through the agent protocol, ensuring compatibility with many agents from both inside and outside of our ecosystem.

The frontend works out-of-the-box with all agents in the repo. Just use the CLI to run your agent of choice!

📘 Learn More about the Frontend

⌨️ CLI

To make it as easy as possible to use all of the tools offered by the repository, a CLI is included at the root of the repo:

$ ./run
Usage: cli.py [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  agent      Commands to create, start and stop agents
  arena      Commands to enter the arena
  benchmark  Commands to start the benchmark and list tests and categories
  setup      Installs dependencies needed for your system.

Just clone the repo, install dependencies with ./run setup, and you should be good to go!

🤔 Questions? Problems? Suggestions?

Get help - Discord 💬

Join us on Discord

To report a bug or request a feature, create a GitHub Issue. Please ensure someone else hasnt created an issue for the same topic.

🤝 Sister projects

🔄 Agent Protocol

To maintain a uniform standard and ensure seamless compatibility with many current and future applications, AutoGPT employs the agent protocol standard by the AI Engineer Foundation. This standardizes the communication pathways from your agent to the frontend and benchmark.


Star History Chart

Description
No description provided
Readme MIT Cite this repository 81 MiB
Languages
JavaScript 68.5%
Python 18.3%
Jupyter Notebook 8.3%
Dart 3.4%
C++ 0.4%
Other 0.8%