Commit Graph

  • 1f1e8c9f7d Update CODEOWNERS Reinier van der Leer 2024-02-22 17:26:46 +01:00
  • e44ca4185a fix(frontend): Unbreak ChatInputField Reinier van der Leer 2024-02-21 02:09:23 +01:00
  • 8fd2e48c1b fix(ci/frontend): Add trigger on push including workflow file Reinier van der Leer 2024-02-21 02:04:13 +01:00
  • 69ccb185e8 fix(ci/frontend): Add and fix trigger on workflow file Reinier van der Leer 2024-02-21 02:02:41 +01:00
  • a88e833831 ci: Revise Frontend CI Reinier van der Leer 2024-02-21 02:00:33 +01:00
  • 64f48df62d chore(agent/llm): Update model alias gpt-3.5-turbo -> gpt-3.5-turbo-0125 Reinier van der Leer 2024-02-20 17:13:51 +01:00
  • 0f5490075b fix(ci/benchmark): Install benchmark dependencies Reinier van der Leer 2024-02-20 16:56:47 +01:00
  • d5f2bbf093 fix(benchmark/reports): Make format.py executable Reinier van der Leer 2024-02-20 14:50:32 +01:00
  • 7dd97f2f74 fix(agent/browser): Print descriptive error if ChromeDriver install fails Reinier van der Leer 2024-02-20 14:04:15 +01:00
  • 8e464c53a8 fix(agent/llm): Include id in tool_calls in prompt Reinier van der Leer 2024-02-20 13:25:37 +01:00
  • 7689a51f53 fix(autogpt/llm): Omit AssistantChatMessage.tool_calls if no tool calls are present Reinier van der Leer 2024-02-20 13:04:55 +01:00
  • c8a40727d1 fix(ci/benchmark): Specify poetry env path for report conversion step Reinier van der Leer 2024-02-20 12:10:49 +01:00
  • 4ef912d734 fix(benchmark/challenges): Improve spec and eval of TicTacToe challenge Albert Örwall 2024-02-20 11:52:59 +01:00
  • 49a6d68200 fix(agent/setup): Fix revising constraints and best practices (#6777) Thunder Drag 2024-02-20 15:36:30 +05:30
  • 6cfe229332 feat(frontend): Allow sending a message with the enter key (#6378) Ethan Presberg 2024-02-20 04:49:37 -05:00
  • 1079d71699 fix(ci/benchmark): Unbreak "Push reports to data branch" step Reinier van der Leer 2024-02-20 10:35:14 +01:00
  • e104427767 feat(ci/benchmark): Generate step summary from benchmark report Reinier van der Leer 2024-02-19 17:13:41 +01:00
  • bfd479a50b feat(benchmark): Add reports/format.py script to convert report.json to markdown Reinier van der Leer 2024-02-19 17:13:05 +01:00
  • fb63bf4425 chore: Update agbenchmark dependency for agent and forge Reinier van der Leer 2024-02-19 17:11:19 +01:00
  • 3a17011129 feat(benchmark): Include Steps in Report Reinier van der Leer 2024-02-19 17:08:24 +01:00
  • c339c6b54f chore: Update agbenchmark dependency for agent and forge Reinier van der Leer 2024-02-18 17:37:03 +01:00
  • 7f71d6d9fd debug(benchmark): Improve TestResult validation error output format Reinier van der Leer 2024-02-18 17:10:14 +01:00
  • 784e2bbb1c fix(ci/benchmark): Mitigate VCS conflicts with files in data branch Reinier van der Leer 2024-02-17 18:09:44 +01:00
  • 959377f54c fix(ci/benchmark): Add set +e because we expect (some) challenges to fail Reinier van der Leer 2024-02-17 15:56:55 +01:00
  • 6bc83e925c chore: Update agbenchmark dependency for agent and forge Reinier van der Leer 2024-02-17 15:56:33 +01:00
  • 4ede773f5a debug(benchmark): Add more debug code to pinpoint cause of rare crash Reinier van der Leer 2024-02-17 15:48:57 +01:00
  • d5ad719757 ci: Allow telemetry for non-push events, as long as it's on master Reinier van der Leer 2024-02-17 15:10:11 +01:00
  • 1ca9b9fa93 ci: Fix setting/passing TELEMETRY_* environment variables Reinier van der Leer 2024-02-17 14:26:03 +01:00
  • 15024fb5a1 chore: Update agbenchmark dependency for agent and forge Reinier van der Leer 2024-02-17 14:18:02 +01:00
  • fa4bdef17c ci: Update actions to newest versions Reinier van der Leer 2024-02-17 13:59:13 +01:00
  • e2b519ef3b debug(benchmark): Make sure TestResult validator error output is sufficient to debug Reinier van der Leer 2024-02-17 13:36:17 +01:00
  • 09c307d679 debug(benchmark): Add log statement to validator on TestResult Reinier van der Leer 2024-02-17 13:32:22 +01:00
  • 880c8e804c fix(ci/benchmark): Allow workflow to continue regardless of challenge outcomes Reinier van der Leer 2024-02-17 11:52:26 +01:00
  • 5f0764b65c chore: Update agbenchmark dependency for agent and forge Reinier van der Leer 2024-02-16 19:07:37 +01:00
  • 63e6014b27 fix(benchmark): Fix TestResult.fail_reason assignment condition Reinier van der Leer 2024-02-16 19:05:00 +01:00
  • 83fcd9ad16 chore: Update agbenchmark dependency for agent and forge Reinier van der Leer 2024-02-16 18:44:58 +01:00
  • f9792ed7f3 fix(benchmark): Unbreak -N/--attempts option Reinier van der Leer 2024-02-16 18:43:37 +01:00
  • d6ab470c58 Rename autogpts-benchmark-nightly.yml to autogpts-benchmark.yml Reinier van der Leer 2024-02-16 18:32:50 +01:00
  • 666a5a8777 feat(agent/serve): Report task cost through Step.additional_output Reinier van der Leer 2024-02-16 18:17:51 +01:00
  • 21f1e64559 feat(benchmark): Get agent task cost from Step.additional_output Reinier van der Leer 2024-02-16 18:10:46 +01:00
  • 752bac099b feat(benchmark/report): Add and record TestResult.n_steps Reinier van der Leer 2024-02-16 17:53:19 +01:00
  • a5de79beb6 ci(benchmark): Add nightly benchmark workflow Reinier van der Leer 2024-02-16 17:41:58 +01:00
  • 483c01b681 lint(benchmark): Remove unnecessary pass statement in __main__.py Reinier van der Leer 2024-02-16 17:27:04 +01:00
  • 992b8874fc chore: Update agbenchmark dependency for agent and forge Reinier van der Leer 2024-02-16 17:22:58 +01:00
  • 2a55efb322 fix(benchmark): Include WebArenaSiteInfo.additional_info (e.g. credentials) in task input Reinier van der Leer 2024-02-16 17:18:49 +01:00
  • 23d58a3cc0 feat(benchmark/cli): Add challenge list, challenge info subcommands Reinier van der Leer 2024-02-16 15:17:11 +01:00
  • 70e345b2ce refactor(benchmark): load_webarena_challenges Reinier van der Leer 2024-02-16 14:58:53 +01:00
  • 650a701317 chore: Update agbenchmark dependency for agent and forge Reinier van der Leer 2024-02-15 18:19:06 +01:00
  • 679339d00c feat(benchmark): Make report output folder configurable Reinier van der Leer 2024-02-15 18:07:45 +01:00
  • fd5730b04a feat(agent/telemetry): Distinguish between production and dev environment based on VCS state Reinier van der Leer 2024-02-15 16:00:30 +01:00
  • b7f08cd0f7 feat(agent/telemetry): Enable performance tracing & update opt-in prompt accordingly Reinier van der Leer 2024-02-15 14:46:36 +01:00
  • 8762f7ab3d fix(forge): Make watchfiles pattern more specific to prevent unwanted (breaking) reloads Reinier van der Leer 2024-02-15 13:42:38 +01:00
  • a9b7b175ff fix(agent/profile_generator): Improve robustness by leveraging create_chat_completion's parse handling Reinier van der Leer 2024-02-15 11:48:07 +01:00
  • 52b93dd84e fix(cli/agent start): Wait for applications to finish starting before returning Reinier van der Leer 2024-02-15 11:26:26 +01:00
  • 6a09a44ef7 lint(agent): Fix telemetry.py linting error & formatting Reinier van der Leer 2024-02-14 23:31:35 +01:00
  • 32a627eda9 Add Privacy Policy link to telementry opt-in. Toran Bruce Richards 2024-02-14 16:42:34 +00:00
  • 67bafa6302 fix(autogpt/llm): AssistantChatMessage.tool_calls default [] instead of None Reinier van der Leer 2024-02-14 14:34:04 +01:00
  • 6017eefb32 ci: Enable telemetry in CI runs on master Reinier van der Leer 2024-02-14 12:03:54 +01:00
  • ae197fc85f feat(agent/telemetry): Distinguish between users Reinier van der Leer 2024-02-14 11:50:45 +01:00
  • 22aba6dd8a fix(agent/llm): Include bad response in parse-fix prompt in OpenAIProvider.create_chat_completion Reinier van der Leer 2024-02-14 11:17:34 +01:00
  • 88bbdfc7fc ci: Pick 3 challenges to run with --mock in smoke test CI Reinier van der Leer 2024-02-14 02:30:03 +01:00
  • d0c9b7c405 lint(benchmark): Remove unused imports Reinier van der Leer 2024-02-14 01:34:30 +01:00
  • e7698a4610 chore(agent): Update forge and agbenchmark dependencies Reinier van der Leer 2024-02-14 01:32:28 +01:00
  • ab05b7ae70 chore(forge): Update agbenchmark dependency Reinier van der Leer 2024-02-14 01:27:07 +01:00
  • 327fb1f916 fix(benchmark): Mock mode, python evals, --attempts flag, challenge definitions Reinier van der Leer 2024-02-14 01:05:34 +01:00
  • bb7f5abc6c fix(agent/text_processing): Fix extract_information LLM response parsing Reinier van der Leer 2024-02-13 18:28:17 +01:00
  • 393d6b97e6 feat(agent): Add Sentry integration for telemetry Reinier van der Leer 2024-02-13 18:10:52 +01:00
  • 3b8d63dfb6 chore(agent): Update autogpt-forge and agbenchmark dependencies to propagate dependency updates Reinier van der Leer 2024-02-13 13:24:24 +01:00
  • 6763196d78 chore(forge): Update agbenchmark dependency Reinier van der Leer 2024-02-13 12:44:17 +01:00
  • e1da58da02 chore(forge): Update aiohttp, fastapi, and python-multipart dependencies to mitigate vulnerabilities Reinier van der Leer 2024-02-13 12:38:36 +01:00
  • 91cec515d4 chore(benchmark): Update python-multipart dependency to mitigate vulnerability Reinier van der Leer 2024-02-13 12:36:00 +01:00
  • cc585a014f chore(agent): Update aiohttp and fastapi dependencies to mitigate vulnerabilities Reinier van der Leer 2024-02-13 12:30:12 +01:00
  • e641cccb42 chore(benchmark): Update aiohttp and fastapi dependencies to mitigate vulnerabilities Reinier van der Leer 2024-02-13 12:21:52 +01:00
  • cc73d4104b fix(forge): incorrect import 'sdk' in .actions.finish (#6822) Mahdi Karami 2024-02-13 13:32:03 +03:30
  • 250552cb3d fix(agent/tests): Update test_config.py:test_initial_values Reinier van der Leer 2024-02-12 13:26:47 +01:00
  • 1d653973e9 feat(agent/llm): Use new OpenAI models as default SMART_LLM, FAST_LLM, and EMBEDDING_MODEL Reinier van der Leer 2024-02-12 13:19:37 +01:00
  • 7bf9ba5502 chore(agent/llm): Update OpenAI model info Reinier van der Leer 2024-02-12 12:59:58 +01:00
  • 14c9773890 ci(agent): Add GIT_REVISION label to Docker builds Reinier van der Leer 2024-02-12 12:31:04 +01:00
  • 39fddb1214 fix(agent): Fix application of extra_request_headers in OpenAIProvider Reinier van der Leer 2024-02-12 12:21:30 +01:00
  • fe0923ba6c feat(agent/web): Add browser extensions to deal with cookie walls and ads (#6778) Reinier van der Leer 2024-02-02 18:30:37 +01:00
  • dfaeda7cd5 lint(agent/tests): Fix line length in test_utils.py Reinier van der Leer 2024-02-02 18:29:28 +01:00
  • 9b7fee673e fix(agent/tests): Update test_utils.py:test_extract_json_from_response* in accordance with 956cdc7 Reinier van der Leer 2024-02-02 18:18:45 +01:00
  • 925269d17b lint(agent): Fix line length in docstring of EpisodicActionHistory.handle_compression Reinier van der Leer 2024-02-02 17:43:42 +01:00
  • 266fe3a3f7 fix(forge): Fix "no module named 'forge.sdk.abilities'" (#6571) Fernando Navarro Páez 2024-02-01 11:23:35 +01:00
  • 66e0c87894 feat(agent): Add history compression to increase longevity and efficiency Reinier van der Leer 2024-01-31 17:51:45 +01:00
  • 55433f468a feat(agent/web): Improve read_webpage information extraction abilities Reinier van der Leer 2024-01-31 15:08:08 +01:00
  • 956cdc77fa fix(agent/json_utils): Decode as JSON rather than Python objects Reinier van der Leer 2024-01-31 14:15:02 +01:00
  • 83a0b03523 fix(agent/prompting): Fix representation of (optional) command parameters in prompt Reinier van der Leer 2024-01-31 14:10:22 +01:00
  • 25b9e290a5 fix(agent/json_utils): Make extract_dict_from_response more robust Reinier van der Leer 2024-01-29 15:03:09 +01:00
  • ab860981d8 feat(agent/llm): Add support for gpt-4-0125-preview Reinier van der Leer 2024-01-29 11:22:32 +01:00
  • a0cae78ba3 feat(benchmark): Add -N, --attempts option for multiple attempts per challenge Reinier van der Leer 2024-01-22 14:37:12 +01:00
  • 488f40a20f feat(benchmark): JungleGym WebArena (#6691) Reinier van der Leer 2024-01-19 20:34:04 +01:00
  • 05b018a837 fix(benchmark/report): Fix and clean up logic in update_challenges_already_beaten Reinier van der Leer 2024-01-19 19:52:09 +01:00
  • fc37ffdfcf feat(agent/llm/openai): Include compatibility tool call extraction in LLM response parse-fix loop Reinier van der Leer 2024-01-19 19:23:17 +01:00
  • 8c65f3c748 fix(agent/serve): Fix task cost tracking persistence in AgentProtocolServer Reinier van der Leer 2024-01-19 19:17:36 +01:00
  • 354106be7b feat(agent/llm): Add cost tracking and logging to AgentProtocolServer Reinier van der Leer 2024-01-19 17:31:59 +01:00
  • 9e4dfd8058 fix(benchmark): Fix challenge input artifact upload Reinier van der Leer 2024-01-19 17:29:03 +01:00
  • faf5f9e5a4 fix(agent): Fix extract_dict_from_response flakiness Reinier van der Leer 2024-01-19 15:49:32 +01:00
  • e4687e0f03 fix(agent): Fix "ChatModelResponse not subscriptable" errors in summarize_text and QueryLanguageModel ability Reinier van der Leer 2024-01-19 15:45:31 +01:00
  • c5b17851e0 fix(agent): Handle artifact modification properly Reinier van der Leer 2024-01-19 12:08:59 +01:00