Auto-GPT

mirror of https://github.com/aljazceru/Auto-GPT.git synced 2025-12-18 06:24:20 +01:00

Author	SHA1	Message	Date
Reinier van der Leer	3a17011129	feat(benchmark): Include Steps in Report	2024-02-19 17:08:24 +01:00
Reinier van der Leer	c339c6b54f	chore: Update `agbenchmark` dependency for agent and forge	2024-02-18 17:37:03 +01:00
Reinier van der Leer	7f71d6d9fd	debug(benchmark): Improve `TestResult` validation error output format	2024-02-18 17:10:14 +01:00
Reinier van der Leer	784e2bbb1c	fix(ci/benchmark): Mitigate VCS conflicts with files in data branch `agbenchmark` currently creates files like success_rate.json in the base REPORTS_FOLDER, which causes conflicts in the last step of the benchmark workflow. To prevent issues, these files must be removed prior to switching to the data branch.	2024-02-17 18:09:44 +01:00
Reinier van der Leer	959377f54c	fix(ci/benchmark): Add `set +e` because we expect (some) challenges to fail	2024-02-17 15:56:55 +01:00
Reinier van der Leer	6bc83e925c	chore: Update `agbenchmark` dependency for agent and forge	2024-02-17 15:56:33 +01:00
Reinier van der Leer	4ede773f5a	debug(benchmark): Add more debug code to pinpoint cause of rare crash Target: https://github.com/Significant-Gravitas/AutoGPT/actions/runs/7941977633/job/21684817491	2024-02-17 15:48:57 +01:00
Reinier van der Leer	d5ad719757	ci: Allow telemetry for non-push events, as long as it's on `master` Also disable telemetry for AutoGPT's unit/integration tests.	2024-02-17 15:12:43 +01:00
Reinier van der Leer	1ca9b9fa93	ci: Fix setting/passing `TELEMETRY_*` environment variables	2024-02-17 14:26:03 +01:00
Reinier van der Leer	15024fb5a1	chore: Update `agbenchmark` dependency for agent and forge	2024-02-17 14:18:02 +01:00
Reinier van der Leer	fa4bdef17c	ci: Update actions to newest versions - `actions/stale` -> `v9` - `actions/cache` -> `v4` - `actions/checkout` -> `v4` - `actions/setup-node` -> `v4` - `docker/login-action` -> `v3` - `actions/setup-python` -> `v5` - `codecov/codecov-action` -> `v4` - `actions/upload-artifact` -> `v4` - `subosito/flutter-action` -> `v2` - `docker/build-push-action` -> `v5` - `docker/setup-buildx-action` -> `v3`	2024-02-17 13:59:13 +01:00
Reinier van der Leer	e2b519ef3b	debug(benchmark): Make sure `TestResult` validator error output is sufficient to debug	2024-02-17 13:36:17 +01:00
Reinier van der Leer	09c307d679	debug(benchmark): Add log statement to validator on `TestResult` Validation errors don't mention the values causing the error, making it hard to debug. This happened a few times in autogpts-benchmark.yml, so let's put this log statement here until we figure out what makes it crash.	2024-02-17 13:32:22 +01:00
Reinier van der Leer	880c8e804c	fix(ci/benchmark): Allow workflow to continue regardless of challenge outcomes	2024-02-17 11:52:26 +01:00
Reinier van der Leer	5f0764b65c	chore: Update agbenchmark dependency for agent and forge	2024-02-16 19:07:37 +01:00
Reinier van der Leer	63e6014b27	fix(benchmark): Fix `TestResult.fail_reason` assignment condition The condition must be the same as for `success`, because otherwise it causes a crash when `call.excinfo` evaluates to `False` but is not `None`.	2024-02-16 19:05:00 +01:00
Reinier van der Leer	83fcd9ad16	chore: Update `agbenchmark` dependency for agent and forge	2024-02-16 18:44:58 +01:00
Reinier van der Leer	f9792ed7f3	fix(benchmark): Unbreak `-N`/`--attempts` option	2024-02-16 18:43:37 +01:00
Reinier van der Leer	d6ab470c58	Rename autogpts-benchmark-nightly.yml to autogpts-benchmark.yml	2024-02-16 18:32:50 +01:00
Reinier van der Leer	666a5a8777	feat(agent/serve): Report task cost through `Step.additional_output` - Added `task_cumulative_cost` and `task_total_cost` attributes to the `Step.additional_output` in the `AgentProtocolServer.execute_step` endpoint. - Updated `agbenchmark` dependency in Agent and Forge	2024-02-16 18:19:04 +01:00
Reinier van der Leer	21f1e64559	feat(benchmark): Get agent task cost from `Step.additional_output`	2024-02-16 18:10:46 +01:00
Reinier van der Leer	752bac099b	feat(benchmark/report): Add and record `TestResult.n_steps` - Added `n_steps` attribute to `TestResult` type - Added logic to record the number of steps to `BuiltinChallenge.test_method`, `WebArenaChallenge.test_method`, and `.reports.add_test_result_to_report`	2024-02-16 17:53:19 +01:00
Reinier van der Leer	a5de79beb6	ci(benchmark): Add nightly benchmark workflow Added autogpts-benchmark-nightly.yml, which will run every night at 02:00 UTC with a selection of challenges.	2024-02-16 17:41:58 +01:00
Reinier van der Leer	483c01b681	lint(benchmark): Remove unnecessary `pass` statement in __main__.py	2024-02-16 17:27:56 +01:00
Reinier van der Leer	992b8874fc	chore: Update `agbenchmark` dependency for agent and forge	2024-02-16 17:22:58 +01:00
Reinier van der Leer	2a55efb322	fix(benchmark): Include `WebArenaSiteInfo.additional_info` (e.g. credentials) in task input Without the `additional_info`, it is impossible to get past the login page on challenges where that is necessary.	2024-02-16 17:20:44 +01:00
Reinier van der Leer	23d58a3cc0	feat(benchmark/cli): Add `challenge list`, `challenge info` subcommands - Add `challenge list` command with options `--all`, `--names`, `--json` - Add `tabular` dependency - Add `.utils.utils.sorted_by_enum_index` function to easily sort lists by an enum value/property based on the order of the enum's definition - Add `challenge info [name]` command with option `--json` - Add `.utils.utils.pretty_print_model` routine to pretty-print Pydantic models - Refactor `config` subcommand to use `pretty_print_model`	2024-02-16 15:17:11 +01:00
Reinier van der Leer	70e345b2ce	refactor(benchmark): `load_webarena_challenges` - Reduce duplicate and nested statements - Add `skip_unavailable` parameter Related changes: - Add `available` and `unavailable_reason` attributes to `ChallengeInfo` and `WebArenaChallengeSpec` - Add `pytest.skip` statement to `WebArenaChallenge.test_method` to make sure unavailable challenges are not run	2024-02-16 15:11:48 +01:00
Reinier van der Leer	650a701317	chore: Update `agbenchmark` dependency for agent and forge	2024-02-15 18:19:06 +01:00
Reinier van der Leer	679339d00c	feat(benchmark): Make report output folder configurable - Make `AgentBenchmarkConfig.reports_folder` directly configurable (through `REPORTS_FOLDER` env variable). The default is still `./agbenchmark_config/reports`. - Change all mentions of `REPORT_LOCATION` (which fulfilled the same function at some point in the past) to `REPORTS_FOLDER`.	2024-02-15 18:07:45 +01:00
Reinier van der Leer	fd5730b04a	feat(agent/telemetry): Distinguish between `production` and `dev` environment based on VCS state - Added a helper function `.app.utils.vcs_state_diverges_from_master()`. This function determines whether the relevant part of the codebase diverges from our `master`. - Updated `.app.telemetry._setup_sentry()` to determine the default environment name using `vcs_state_diverges_from_master`.	2024-02-15 16:00:30 +01:00
Reinier van der Leer	b7f08cd0f7	feat(agent/telemetry): Enable performance tracing & update opt-in prompt accordingly	2024-02-15 14:46:36 +01:00
Reinier van der Leer	8762f7ab3d	fix(forge): Make `watchfiles` pattern more specific to prevent unwanted (breaking) reloads This fixes the issue of changes in artifacts triggering an application reload (which caused connection errors for in-progress requests).	2024-02-15 13:42:38 +01:00
Reinier van der Leer	a9b7b175ff	fix(agent/profile_generator): Improve robustness by leveraging `create_chat_completion`'s parse handling	2024-02-15 11:48:07 +01:00
Reinier van der Leer	52b93dd84e	fix(cli/agent start): Wait for applications to finish starting before returning - Added a helper function `wait_until_conn_ready(port)` to wait for the benchmark and agent applications to finish starting - Improved the CLI's own logging (within the `agent start` command)	2024-02-15 11:26:26 +01:00
Reinier van der Leer	6a09a44ef7	lint(agent): Fix telemetry.py linting error & formatting	2024-02-14 23:31:35 +01:00
Toran Bruce Richards	32a627eda9	Add Privacy Policy link to telementry opt-in.	2024-02-14 16:42:34 +00:00
Reinier van der Leer	67bafa6302	fix(autogpt/llm): `AssistantChatMessage.tool_calls` default `[]` instead of `None` OpenAI ChatCompletion calls fail when `tool_calls = None`. This issue came to light after `22aba6d`.	2024-02-14 14:34:04 +01:00
Reinier van der Leer	6017eefb32	ci: Enable telemetry in CI runs on `master`	2024-02-14 12:03:54 +01:00
Reinier van der Leer	ae197fc85f	feat(agent/telemetry): Distinguish between users This allows us to get a much better sense of how many users actually experience issues, and how issue occurrence is distributed among users.	2024-02-14 11:50:45 +01:00
Reinier van der Leer	22aba6dd8a	fix(agent/llm): Include bad response in parse-fix prompt in `OpenAIProvider.create_chat_completion` Apparently I forgot to also append the response that caused the parse error before throwing it back to the LLM and letting it fix its mistake(s).	2024-02-14 11:20:31 +01:00
Reinier van der Leer	88bbdfc7fc	ci: Pick 3 challenges to run with `--mock` in smoke test CI	2024-02-14 02:30:03 +01:00
Reinier van der Leer	d0c9b7c405	lint(benchmark): Remove unused imports	2024-02-14 01:34:30 +01:00
Reinier van der Leer	e7698a4610	chore(agent): Update `forge` and `agbenchmark` dependencies	2024-02-14 01:32:28 +01:00
Reinier van der Leer	ab05b7ae70	chore(forge): Update `agbenchmark` dependency	2024-02-14 01:27:07 +01:00
Reinier van der Leer	327fb1f916	fix(benchmark): Mock mode, python evals, `--attempts` flag, challenge definitions - Fixed `--mock` mode - Moved interrupt to beginning of the step iterator pipeline (from `BuiltinChallenge` to `agent_api_interface.py:run_api_agent`). This ensures that any finish-up code is properly executed after executing a single step. - Implemented mock mode in `WebArenaChallenge` - Fixed `fixture 'i_attempt' not found` error when `--attempts`/`-N` is omitted - Fixed handling of `python`/`pytest` evals in `BuiltinChallenge` - Disabled left-over Helicone code (see `056163e`) - Fixed a couple of challenge definitions - WebArena task 107: fix spelling of months (Sepetember, Octorbor lmao) - synthesize/1_basic_content_gen (SynthesizeInfo): remove empty string from `should_contain` list - Added some debug logging in agent_api_interface.py and challenges/builtin.py	2024-02-14 01:05:34 +01:00
Reinier van der Leer	bb7f5abc6c	fix(agent/text_processing): Fix `extract_information` LLM response parsing OpenAI's newest models return JSON with markdown fences around it, breaking the `json.loads` parser. This commit adds an `extract_list_from_response` function to json_utils/utilities.py and uses this function to replace `json.loads` in `_process_text`.	2024-02-13 18:28:17 +01:00
Reinier van der Leer	393d6b97e6	feat(agent): Add Sentry integration for telemetry * Add Sentry integration for telemetry - Add `sentry_sdk` dependency - Add setup logic and config flow using `TELEMETRY_OPT_IN` environment variable - Add app/telemetry.py with `setup_telemetry` helper routine - Call `setup_telemetry` in `cli()` in app/cli.py - Add `TELEMETRY_OPT_IN` to .env.template - Add helper function `env_file_exists` and routine `set_env_config_value` to app/utils.py - Add unit tests for `set_env_config_value` in test_utils.py - Add prompt to startup to ask whether the user wants to enable telemetry if the env variable isn't set * Add `capture_exception` statements for LLM parsing errors and command failures	2024-02-13 18:10:52 +01:00
Reinier van der Leer	3b8d63dfb6	chore(agent): Update autogpt-forge and agbenchmark dependencies to propagate dependency updates This also indirectly updates `python-multipart` and fixes "python-multipart vulnerable to Content-Type Header ReDoS" https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/57	2024-02-13 13:24:24 +01:00
Reinier van der Leer	6763196d78	chore(forge): Update agbenchmark dependency	2024-02-13 12:44:17 +01:00

1 2 3 4 5 ...

5249 Commits