- Rename build-frontend.yml to frontend-ci.yml
- Add a `pull_request` trigger
- Disable committing and pushing to a `frontend_build_{hash}` branch
- (Re)enable auto-creating a pull request for the new frontend build
We have been seeing `AttributeError: 'NoneType' object has no attribute 'split'` in Sentry.
According to https://github.com/SergeyPirogov/webdriver_manager/issues/649, this occurs
when Chrome is not installed, or when no suitable ChromeDriver version is available for
the installed version of Chrome.
Instead of the `AttributeError`, we can print a better message for the agent and user to help them fix the issue.
---
Co-authored-by: kcze <kpczerwinski@gmail.com>
OpenAI requires the `id` property on tool calls in assistant messages.
We weren't storing it in the `AssistantChatMessage` that is created from the LLM's response,
causing an error when adding those messages back to the prompt.
Fix:
* Add `id` to `AssistantToolCall` and `AssistantToolCallDict` types in model_providers/schema.py
* Amend `_tool_calls_compat_extract_calls` to generate an ID for extracted tool calls
---
Co-authored-by: kcze <kpczerwinski@gmail.com>
OpenAI likes neither `tool_calls=[]` nor `tool_calls=None`. If no `tool_calls` are in the message, the key must be omitted.
This partially reverts commit 67bafa6302.
---
Co-authored-by: kcze <kpczerwinski@gmail.com>
* In challenge specification, specify `subprocess.PIPE` for `stdin` and `stderr` for completeness
* Additional tweak: let Pytest load only the current file when running the test file as a script
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
In a `for item in list` loop, removing items from the list while iterating causes it to skip over the next item. Fix: refactor `interactively_revise_ai_settings` routine to use while loop for iterating over constraints, resources, and best practices.
---------
Co-authored-by: Kripanshu Jindal <polaris@Polaris.local>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
The `report_subfolder` variable was being populated with two identical lines, because there will be two untracked files in the folder, resulting in the same dirname.
This caused later commands using that variable to fail. Fix is to `sort -u` before storing the value to `report_subfolder`.
`agbenchmark` currently creates files like success_rate.json in the base REPORTS_FOLDER, which causes conflicts in the last step of the benchmark workflow.
To prevent issues, these files must be removed prior to switching to the data branch.
Validation errors don't mention the values causing the error, making it hard to debug. This happened a few times in autogpts-benchmark.yml, so let's put this log statement here until we figure out what makes it crash.
- Added `task_cumulative_cost` and `task_total_cost` attributes to the `Step.additional_output` in the `AgentProtocolServer.execute_step` endpoint.
- Updated `agbenchmark` dependency in Agent and Forge
- Added `n_steps` attribute to `TestResult` type
- Added logic to record the number of steps to `BuiltinChallenge.test_method`, `WebArenaChallenge.test_method`, and `.reports.add_test_result_to_report`
- Add `challenge list` command with options `--all`, `--names`, `--json`
- Add `tabular` dependency
- Add `.utils.utils.sorted_by_enum_index` function to easily sort lists by an enum value/property based on the order of the enum's definition
- Add `challenge info [name]` command with option `--json`
- Add `.utils.utils.pretty_print_model` routine to pretty-print Pydantic models
- Refactor `config` subcommand to use `pretty_print_model`
- Reduce duplicate and nested statements
- Add `skip_unavailable` parameter
Related changes:
- Add `available` and `unavailable_reason` attributes to `ChallengeInfo` and `WebArenaChallengeSpec`
- Add `pytest.skip` statement to `WebArenaChallenge.test_method` to make sure unavailable challenges are not run
- Make `AgentBenchmarkConfig.reports_folder` directly configurable (through `REPORTS_FOLDER` env variable). The default is still `./agbenchmark_config/reports`.
- Change all mentions of `REPORT_LOCATION` (which fulfilled the same function at some point in the past) to `REPORTS_FOLDER`.
- Added a helper function `.app.utils.vcs_state_diverges_from_master()`. This function determines whether the relevant part of the codebase diverges from our `master`.
- Updated `.app.telemetry._setup_sentry()` to determine the default environment name using `vcs_state_diverges_from_master`.