Auto-GPT

mirror of https://github.com/aljazceru/Auto-GPT.git synced 2026-01-11 02:04:24 +01:00

Files

Reinier van der Leer b106a61352 Clean up & fix GitHub workflows (#6313 )

* ci: Mitigate security issues in autogpt-ci.yml

- Remove unnecessary pull_request_target paths and related variables and config
- Set permissions for contents to read only

* ci: Simplify steps in autogpt-ci.yml workflow using GitHub CLI

- Simplify step in 'autogpt-ci.yml' by using GitHub CLI instead of API for adding label and comment functionality
- Replace curl command with 'gh issue edit' to add "behaviour change" label to the pull request
- Replace gh api command with 'gh issue comment' to leave a comment about the changed behavior of AutoGPT in the pull request

* ci: Fix issues in workflows

- Move environment variable definition to top level in benchmark-ci.yml (because the other job also needs it)
- Removed invalid 'branches: [hackathon]' restriction in hackathon.yml workflow
- Removed redundant 'ref' and 'repository' fields in the 'checkout' step of both workflows.

* ci: Delete legacy benchmarks.yml workflow

* ci: Add triggers for CI workflows

- Add triggers to run CI workflows when they are edited.
- Update the paths for the CI workflows in the trigger configuration.

* fix: Fix benchmark lint error

- Removed unnecessary blank lines in report_types.py
- Fixed string quotes in challenge.py to maintain consistency

* fix: Update task description in password generator data.json

- Update task description in `data.json` file for the password generator challenge to clarify the input requirements and error handling.
- This change is made in an attempt to make the Benchmark CI pass.

* fix: Fix PasswordGenerator challenge in CI

- Fix the behavior of the reference password_generator.py to align with the task description
- Use default password length 8 instead of a random length in the generate_password function
- Retrieve the password length from the command line arguments if "--length" is provided, else set it to 8

2023-11-21 10:58:54 +01:00

agent_protocol_client

Fix agbenchmark client (#5578 )

2023-10-06 12:02:59 -07:00

challenges

Clean up & fix GitHub workflows (#6313 )

2023-11-21 10:58:54 +01:00

reports

Clean up & fix GitHub workflows (#6313 )

2023-11-21 10:58:54 +01:00

utils

Clean up & fix GitHub workflows (#6313 )

2023-11-21 10:58:54 +01:00

__init__.py

fixed multiple report folder bug

2023-09-13 12:18:04 +02:00

__main__.py

Fix benchmark ci (#5478 )

2023-10-02 12:41:32 -07:00

agent_api_interface.py

Fix pathing issues

2023-09-28 12:29:03 +02:00

agent_interface.py

Implement old polling mechanism (#5248 )

2023-09-18 16:23:06 -07:00

app.py

Fix custom_python not being copied (#5512 )

2023-10-03 11:24:16 -07:00

conftest.py

Add eval_id and sync Skill Tree with Frontend(#5287 )

2023-09-21 13:36:17 -07:00

execute_sub_process.py

Implement old polling mechanism (#5248 )

2023-09-18 16:23:06 -07:00

generate_test.py

Add eval_id and sync Skill Tree with Frontend(#5287 )

2023-09-21 13:36:17 -07:00

README.md

Remove start from agbenchmark (#5241 )

2023-09-16 17:22:49 -07:00

schema.py

Make agbenchmark a proxy of the evaluated agent (#5279 )

2023-09-20 16:06:00 -07:00

README.md

As a user

pip install auto-gpt-benchmarks
Add boilerplate code to run and kill agent
agbenchmark
- --category challenge_category to run tests in a specific category
- --mock to only run mock tests if they exists for each test
- --noreg to skip any tests that have passed in the past. When you run without this flag and a previous challenge that passed fails, it will now not be regression tests
We call boilerplate code for your agent
Show pass rate of tests, logs, and any other metrics

Contributing

Diagrams: https://whimsical.com/agbenchmark-5n4hXBq1ZGzBwRsK4TVY7x

To run the existing mocks

clone the repo auto-gpt-benchmarks
pip install poetry
poetry shell
poetry install
cp .env_example .env
git submodule update --init --remote --recursive
uvicorn server:app --reload
agbenchmark --mock Keep config the same and watch the logs :)

To run with mini-agi

Navigate to auto-gpt-benchmarks/agent/mini-agi
pip install -r requirements.txt
cp .env_example .env, set PROMPT_USER=false and add your OPENAI_API_KEY=. Sset MODEL="gpt-3.5-turbo" if you don't have access to gpt-4 yet. Also make sure you have Python 3.10^ installed
set AGENT_NAME=mini-agi in .env file and where you want your REPORT_LOCATION to be
Make sure to follow the commands above, and remove mock flag agbenchmark

To add requirements poetry add requirement.

Feel free to create prs to merge with main at will (but also feel free to ask for review) - if you can't send msg in R&D chat for access.

If you push at any point and break things - it'll happen to everyone - fix it asap. Step 1 is to revert master to last working commit

Let people know what beautiful code you write does, document everything well

Share your progress :)

Dataset

Manually created, existing challenges within Auto-Gpt, https://osu-nlp-group.github.io/Mind2Web/

How do I add new agents to agbenchmark ?

Example with smol developer.

1- Create a github branch with your agent following the same pattern as this example:

https://github.com/smol-ai/developer/pull/114/files

2- Create the submodule and the github workflow by following the same pattern as this example:

https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks/pull/48/files

How do I run agent in different environments?

To just use as the benchmark for your agent. pip install the package and run agbenchmark

For internal Auto-GPT ci runs, specify the AGENT_NAME you want you use and set the HOME_ENV. Ex. AGENT_NAME=mini-agi

To develop agent alongside benchmark, you can specify the AGENT_NAME you want you use and add as a submodule to the repo