Commit Graph

59 Commits

Author SHA1 Message Date
Reinier van der Leer
b106a61352 Clean up & fix GitHub workflows (#6313)
* ci: Mitigate security issues in autogpt-ci.yml

- Remove unnecessary pull_request_target paths and related variables and config
- Set permissions for contents to read only

* ci: Simplify steps in autogpt-ci.yml workflow using GitHub CLI

- Simplify step in 'autogpt-ci.yml' by using GitHub CLI instead of API for adding label and comment functionality
- Replace curl command with 'gh issue edit' to add "behaviour change" label to the pull request
- Replace gh api command with 'gh issue comment' to leave a comment about the changed behavior of AutoGPT in the pull request

* ci: Fix issues in workflows

- Move environment variable definition to top level in benchmark-ci.yml (because the other job also needs it)
- Removed invalid 'branches: [hackathon]' restriction in hackathon.yml workflow
- Removed redundant 'ref' and 'repository' fields in the 'checkout' step of both workflows.

* ci: Delete legacy benchmarks.yml workflow

* ci: Add triggers for CI workflows

- Add triggers to run CI workflows when they are edited.
- Update the paths for the CI workflows in the trigger configuration.

* fix: Fix benchmark lint error

- Removed unnecessary blank lines in report_types.py
- Fixed string quotes in challenge.py to maintain consistency

* fix: Update task description in password generator data.json

- Update task description in `data.json` file for the password generator challenge to clarify the input requirements and error handling.
- This change is made in an attempt to make the Benchmark CI pass.

* fix: Fix PasswordGenerator challenge in CI

- Fix the behavior of the reference password_generator.py to align with the task description
- Use default password length 8 instead of a random length in the generate_password function
- Retrieve the password length from the command line arguments if "--length" is provided, else set it to 8
2023-11-21 10:58:54 +01:00
SwiftyOS
fa357dd139 fix: Fixing Benchmarking
- Importing missing metadata field in Test class in report_types.py
- Adding GAIA categories 1, 2, and 3 in data_types.py
2023-11-09 10:00:50 +01:00
Silen Naihin
e5e0c4bf9d reverting new challenges 2023-10-20 21:13:09 -07:00
Silen Naihin
825c3adf62 case sensitivity, updating challenges 2023-10-20 08:26:29 -07:00
Silen Naihin
09f6a37292 fix capitalization, rename 2023-10-20 07:21:41 -07:00
Silen Naihin
655bc8b08e fix data challenges 2023-10-19 17:42:24 -07:00
Silen Naihin
7ddef39918 scrape synthesize challenge additions 2023-10-19 17:39:09 -07:00
Silen Naihin
344ef3bf8b fixing password gen and revenue retrieval 2 challenges 2023-10-17 20:28:49 -07:00
Silen Naihin
74ee69daf1 Update data.json 2023-10-14 08:04:37 -07:00
merwanehamadi
93e3ec36ed Update test.py (#5721) 2023-10-13 06:56:52 -07:00
merwanehamadi
4841d31179 fix label csv (#5656) 2023-10-09 09:22:36 -07:00
merwanehamadi
abf88fe509 Fix password generator (#5581) 2023-10-06 12:55:48 -07:00
merwanehamadi
1d80969b7f Fix agbenchmark client (#5578)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-10-06 12:02:59 -07:00
merwanehamadi
bcb24c1a58 Fix challenges (#5561)
Fix challenges and CI

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-10-05 10:59:50 -07:00
merwanehamadi
c7a9ac3bf7 Fix custom_python not being copied (#5512) 2023-10-03 11:24:16 -07:00
Albert Örwall
949ab477a8 Correct create_game method definition in the challenge input (#5460)
Co-authored-by: merwanehamadi <merwanehamadi@gmail.com>
2023-10-02 16:07:18 -07:00
merwanehamadi
a30cbcc2ce Fix benchmark ci (#5478)
Fix benchmark CI

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-10-02 12:41:32 -07:00
merwanehamadi
80487bc4b1 add load_dotenv (#5474) 2023-10-02 10:56:56 -07:00
merwanehamadi
2be14cab3e Correct revenue retrieval challenge (#5471) 2023-10-02 09:52:59 -07:00
merwanehamadi
8252a2fa8f Correct Battleship Challenge (#5450)
* Update abstract_class.py

* Update abstract_class.py
2023-10-01 13:53:46 -07:00
merwanehamadi
0e804e27dd Add more data challenges (#5390)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-28 19:30:08 -07:00
SwiftyOS
4f15b1c582 Fix pathing issues 2023-09-28 12:29:03 +02:00
SwiftyOS
f0087ab80a fix artifact bug 2023-09-28 12:01:02 +02:00
SwiftyOS
5360313271 Fixed CORS and proxy timeout issues 2023-09-28 11:39:15 +02:00
merwanehamadi
37fbb52d19 Add more challenges + cleanup (#5368)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-27 17:58:58 -07:00
merwanehamadi
793ff1c163 Add data challenges (#5361)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-27 10:47:34 -07:00
merwanehamadi
e0aa11f4d7 Duplicate tasks created (#5358)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-27 07:41:51 -07:00
merwanehamadi
fa9fc18e22 Validate skill tree so the UI never breaks (#5306)
Validate skill tree to prevent it from breaking the UI

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-22 17:32:05 -07:00
merwanehamadi
a0e383f4d9 Fix skill tree (#5303)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-22 13:09:57 -07:00
merwanehamadi
18e576cb53 Structure challenges (#5296) 2023-09-21 20:06:37 -07:00
merwanehamadi
f67a352937 Add categories skill tree (#5295)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-21 17:39:16 -07:00
merwanehamadi
f4e7b1c61c Add eval_id and sync Skill Tree with Frontend(#5287)
Add eval_id to skill tree

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-21 13:36:17 -07:00
merwanehamadi
ff4c76ba00 Make agbenchmark a proxy of the evaluated agent (#5279)
Make agbenchmark a Proxy of the evaluated agent

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-20 16:06:00 -07:00
merwanehamadi
c09a0e7afa Implement old polling mechanism (#5248)
Implement old polling mechanism

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-18 16:23:06 -07:00
merwanehamadi
2cf350b783 Agent Protocol v1 (#5254)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-18 11:09:55 -07:00
merwanehamadi
f4d319cee4 Refactor benchmark (#5247)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-17 06:55:20 -07:00
merwanehamadi
f76d45cd9e Remove start from agbenchmark (#5241)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-16 17:22:49 -07:00
merwanehamadi
ece9e85b41 Add agent protocol within agbenchmark (#5239)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-16 15:31:12 -07:00
merwanehamadi
b101fec16b Add ability to run multiple tests (#5233)
Add multiple tests

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-16 13:01:11 -07:00
merwanehamadi
991e816ea2 Fix CORS issue (#5232)
* Allow Cors

* Update app.py
2023-09-16 10:56:21 -07:00
merwanehamadi
295702867a Ability to run by categories (#5229)
* Ability to run by categories

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>

* always use Path.cwd()

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>

---------

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-15 20:04:12 -07:00
merwanehamadi
b4401cd409 add benchmark endpoints mock (#5221)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-15 08:48:12 -07:00
merwanehamadi
6342a77037 Fix SDK client (#5214) 2023-09-13 20:25:58 -07:00
merwanehamadi
4bb86c0cb5 Support agent protocol in benchmark (#5213)
Benchmark/Forge/Agent Protocol
2023-09-13 18:50:39 -07:00
merwanehamadi
52c8b53122 Fix API Mode (#5209) 2023-09-13 07:30:46 -07:00
Luke
d319473e3c Fix TestUrlShortener to prevent conflicting test.py file and clarify instructions (#5177) 2023-09-13 06:11:40 -07:00
SwiftyOS
ed172dec19 fixed datetime and changed benchmark defaults for autogpt 2023-09-13 13:47:26 +02:00
SwiftyOS
9eb01d85a3 fixed multiple report folder bug 2023-09-13 12:18:04 +02:00
SwiftyOS
d44a4f591d Added ability to keep answers 2023-09-13 11:56:31 +02:00
SwiftyOS
bacd0e5e4e Added answers to the report 2023-09-13 10:40:55 +02:00