5268 Commits

Author SHA1 Message Date
Reinier van der Leer
1f1e8c9f7d Update CODEOWNERS 2024-02-22 17:26:46 +01:00
Reinier van der Leer
e44ca4185a fix(frontend): Unbreak ChatInputField
Fix specification of `onSubmitted` hook in the `TextField`.
2024-02-21 02:09:23 +01:00
Reinier van der Leer
8fd2e48c1b fix(ci/frontend): Add trigger on push including workflow file 2024-02-21 02:04:13 +01:00
Reinier van der Leer
69ccb185e8 fix(ci/frontend): Add and fix trigger on workflow file 2024-02-21 02:02:41 +01:00
Reinier van der Leer
a88e833831 ci: Revise Frontend CI
- Rename build-frontend.yml to frontend-ci.yml
- Add a `pull_request` trigger
- Disable committing and pushing to a `frontend_build_{hash}` branch
- (Re)enable auto-creating a pull request for the new frontend build
2024-02-21 02:00:33 +01:00
Reinier van der Leer
64f48df62d chore(agent/llm): Update model alias gpt-3.5-turbo -> gpt-3.5-turbo-0125 2024-02-20 17:13:51 +01:00
Reinier van der Leer
0f5490075b fix(ci/benchmark): Install benchmark dependencies
Otherwise `poetry -C benchmark run benchmark/reports/format.py` fails.
2024-02-20 16:56:47 +01:00
Reinier van der Leer
d5f2bbf093 fix(benchmark/reports): Make format.py executable 2024-02-20 14:50:32 +01:00
Reinier van der Leer
7dd97f2f74 fix(agent/browser): Print descriptive error if ChromeDriver install fails
We have been seeing `AttributeError: 'NoneType' object has no attribute 'split'` in Sentry.
According to https://github.com/SergeyPirogov/webdriver_manager/issues/649, this occurs
when Chrome is not installed, or when no suitable ChromeDriver version is available for
the installed version of Chrome.
Instead of the `AttributeError`, we can print a better message for the agent and user to help them fix the issue.

---
Co-authored-by: kcze <kpczerwinski@gmail.com>
2024-02-20 14:04:15 +01:00
Reinier van der Leer
8e464c53a8 fix(agent/llm): Include id in tool_calls in prompt
OpenAI requires the `id` property on tool calls in assistant messages.
We weren't storing it in the `AssistantChatMessage` that is created from the LLM's response,
causing an error when adding those messages back to the prompt.

Fix:
* Add `id` to `AssistantToolCall` and `AssistantToolCallDict` types in model_providers/schema.py
* Amend `_tool_calls_compat_extract_calls` to generate an ID for extracted tool calls

---
Co-authored-by: kcze <kpczerwinski@gmail.com>
2024-02-20 13:28:57 +01:00
Reinier van der Leer
7689a51f53 fix(autogpt/llm): Omit AssistantChatMessage.tool_calls if no tool calls are present
OpenAI likes neither `tool_calls=[]` nor `tool_calls=None`. If no `tool_calls` are in the message, the key must be omitted.

This partially reverts commit 67bafa6302.

---

Co-authored-by: kcze <kpczerwinski@gmail.com>
2024-02-20 13:04:55 +01:00
Reinier van der Leer
c8a40727d1 fix(ci/benchmark): Specify poetry env path for report conversion step 2024-02-20 12:10:49 +01:00
Albert Örwall
4ef912d734 fix(benchmark/challenges): Improve spec and eval of TicTacToe challenge
* In challenge specification, specify `subprocess.PIPE` for `stdin` and `stderr` for completeness
* Additional tweak: let Pytest load only the current file when running the test file as a script

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2024-02-20 11:52:59 +01:00
Thunder Drag
49a6d68200 fix(agent/setup): Fix revising constraints and best practices (#6777)
In a `for item in list` loop, removing items from the list while iterating causes it to skip over the next item. Fix: refactor `interactively_revise_ai_settings` routine to use while loop for iterating over constraints, resources, and best practices.

---------

Co-authored-by: Kripanshu Jindal <polaris@Polaris.local>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2024-02-20 11:06:30 +01:00
Ethan Presberg
6cfe229332 feat(frontend): Allow sending a message with the enter key (#6378)
This has not yet been tested due to an issue with compiling on WSL. This was the fix suggested by Pwuts.
2024-02-20 10:49:37 +01:00
Reinier van der Leer
1079d71699 fix(ci/benchmark): Unbreak "Push reports to data branch" step
The `report_subfolder` variable was being populated with two identical lines, because there will be two untracked files in the folder, resulting in the same dirname.
This caused later commands using that variable to fail. Fix is to `sort -u` before storing the value to `report_subfolder`.
2024-02-20 10:35:14 +01:00
Reinier van der Leer
e104427767 feat(ci/benchmark): Generate step summary from benchmark report 2024-02-19 17:13:41 +01:00
Reinier van der Leer
bfd479a50b feat(benchmark): Add reports/format.py script to convert report.json to markdown 2024-02-19 17:13:05 +01:00
Reinier van der Leer
fb63bf4425 chore: Update agbenchmark dependency for agent and forge 2024-02-19 17:11:19 +01:00
Reinier van der Leer
3a17011129 feat(benchmark): Include Steps in Report 2024-02-19 17:08:24 +01:00
Reinier van der Leer
c339c6b54f chore: Update agbenchmark dependency for agent and forge 2024-02-18 17:37:03 +01:00
Reinier van der Leer
7f71d6d9fd debug(benchmark): Improve TestResult validation error output format 2024-02-18 17:10:14 +01:00
Reinier van der Leer
784e2bbb1c fix(ci/benchmark): Mitigate VCS conflicts with files in data branch
`agbenchmark` currently creates files like success_rate.json in the base REPORTS_FOLDER, which causes conflicts in the last step of the benchmark workflow.
To prevent issues, these files must be removed prior to switching to the data branch.
2024-02-17 18:09:44 +01:00
Reinier van der Leer
959377f54c fix(ci/benchmark): Add set +e because we expect (some) challenges to fail 2024-02-17 15:56:55 +01:00
Reinier van der Leer
6bc83e925c chore: Update agbenchmark dependency for agent and forge 2024-02-17 15:56:33 +01:00
Reinier van der Leer
4ede773f5a debug(benchmark): Add more debug code to pinpoint cause of rare crash
Target: https://github.com/Significant-Gravitas/AutoGPT/actions/runs/7941977633/job/21684817491
2024-02-17 15:48:57 +01:00
Reinier van der Leer
d5ad719757 ci: Allow telemetry for non-push events, as long as it's on master
Also disable telemetry for AutoGPT's unit/integration tests.
2024-02-17 15:12:43 +01:00
Reinier van der Leer
1ca9b9fa93 ci: Fix setting/passing TELEMETRY_* environment variables 2024-02-17 14:26:03 +01:00
Reinier van der Leer
15024fb5a1 chore: Update agbenchmark dependency for agent and forge 2024-02-17 14:18:02 +01:00
Reinier van der Leer
fa4bdef17c ci: Update actions to newest versions
- `actions/stale` -> `v9`
- `actions/cache` -> `v4`
- `actions/checkout` -> `v4`
- `actions/setup-node` -> `v4`
- `docker/login-action` -> `v3`
- `actions/setup-python` -> `v5`
- `codecov/codecov-action` -> `v4`
- `actions/upload-artifact` -> `v4`
- `subosito/flutter-action` -> `v2`
- `docker/build-push-action` -> `v5`
- `docker/setup-buildx-action` -> `v3`
2024-02-17 13:59:13 +01:00
Reinier van der Leer
e2b519ef3b debug(benchmark): Make sure TestResult validator error output is sufficient to debug 2024-02-17 13:36:17 +01:00
Reinier van der Leer
09c307d679 debug(benchmark): Add log statement to validator on TestResult
Validation errors don't mention the values causing the error, making it hard to debug. This happened a few times in autogpts-benchmark.yml, so let's put this log statement here until we figure out what makes it crash.
2024-02-17 13:32:22 +01:00
Reinier van der Leer
880c8e804c fix(ci/benchmark): Allow workflow to continue regardless of challenge outcomes 2024-02-17 11:52:26 +01:00
Reinier van der Leer
5f0764b65c chore: Update agbenchmark dependency for agent and forge 2024-02-16 19:07:37 +01:00
Reinier van der Leer
63e6014b27 fix(benchmark): Fix TestResult.fail_reason assignment condition
The condition must be the same as for `success`, because otherwise it causes a crash when `call.excinfo` evaluates to `False` but is not `None`.
2024-02-16 19:05:00 +01:00
Reinier van der Leer
83fcd9ad16 chore: Update agbenchmark dependency for agent and forge 2024-02-16 18:44:58 +01:00
Reinier van der Leer
f9792ed7f3 fix(benchmark): Unbreak -N/--attempts option 2024-02-16 18:43:37 +01:00
Reinier van der Leer
d6ab470c58 Rename autogpts-benchmark-nightly.yml to autogpts-benchmark.yml 2024-02-16 18:32:50 +01:00
Reinier van der Leer
666a5a8777 feat(agent/serve): Report task cost through Step.additional_output
- Added `task_cumulative_cost` and `task_total_cost` attributes to the `Step.additional_output` in the `AgentProtocolServer.execute_step` endpoint.
- Updated `agbenchmark` dependency in Agent and Forge
2024-02-16 18:19:04 +01:00
Reinier van der Leer
21f1e64559 feat(benchmark): Get agent task cost from Step.additional_output 2024-02-16 18:10:46 +01:00
Reinier van der Leer
752bac099b feat(benchmark/report): Add and record TestResult.n_steps
- Added `n_steps` attribute to `TestResult` type
- Added logic to record the number of steps to `BuiltinChallenge.test_method`, `WebArenaChallenge.test_method`, and `.reports.add_test_result_to_report`
2024-02-16 17:53:19 +01:00
Reinier van der Leer
a5de79beb6 ci(benchmark): Add nightly benchmark workflow
Added autogpts-benchmark-nightly.yml, which will run every night at 02:00 UTC with a selection of challenges.
2024-02-16 17:41:58 +01:00
Reinier van der Leer
483c01b681 lint(benchmark): Remove unnecessary pass statement in __main__.py 2024-02-16 17:27:56 +01:00
Reinier van der Leer
992b8874fc chore: Update agbenchmark dependency for agent and forge 2024-02-16 17:22:58 +01:00
Reinier van der Leer
2a55efb322 fix(benchmark): Include WebArenaSiteInfo.additional_info (e.g. credentials) in task input
Without the `additional_info`, it is impossible to get past the login page on challenges where that is necessary.
2024-02-16 17:20:44 +01:00
Reinier van der Leer
23d58a3cc0 feat(benchmark/cli): Add challenge list, challenge info subcommands
- Add `challenge list` command with options `--all`, `--names`, `--json`
   - Add `tabular` dependency
   - Add `.utils.utils.sorted_by_enum_index` function to easily sort lists by an enum value/property based on the order of the enum's definition
- Add `challenge info [name]` command with option `--json`
   - Add `.utils.utils.pretty_print_model` routine to pretty-print Pydantic models
- Refactor `config` subcommand to use `pretty_print_model`
2024-02-16 15:17:11 +01:00
Reinier van der Leer
70e345b2ce refactor(benchmark): load_webarena_challenges
- Reduce duplicate and nested statements
- Add `skip_unavailable` parameter

Related changes:
- Add `available` and `unavailable_reason` attributes to `ChallengeInfo` and `WebArenaChallengeSpec`
- Add `pytest.skip` statement to `WebArenaChallenge.test_method` to make sure unavailable challenges are not run
2024-02-16 15:11:48 +01:00
Reinier van der Leer
650a701317 chore: Update agbenchmark dependency for agent and forge 2024-02-15 18:19:06 +01:00
Reinier van der Leer
679339d00c feat(benchmark): Make report output folder configurable
- Make `AgentBenchmarkConfig.reports_folder` directly configurable (through `REPORTS_FOLDER` env variable). The default is still `./agbenchmark_config/reports`.
- Change all mentions of `REPORT_LOCATION` (which fulfilled the same function at some point in the past) to `REPORTS_FOLDER`.
2024-02-15 18:07:45 +01:00
Reinier van der Leer
fd5730b04a feat(agent/telemetry): Distinguish between production and dev environment based on VCS state
- Added a helper function `.app.utils.vcs_state_diverges_from_master()`. This function determines whether the relevant part of the codebase diverges from our `master`.
- Updated `.app.telemetry._setup_sentry()` to determine the default environment name using `vcs_state_diverges_from_master`.
2024-02-15 16:00:30 +01:00