Commit Graph

25 Commits

Author SHA1 Message Date
NeonN3mesis
2fcd91b765 New Challenge test_information_retrieval_challenge_c (#4855)
* New Challenge test_information_retrieval_challenge_c

I created a new challenge needs a bit of work

* Update current_score.json

Changed max level beaten to null

* reformatted test_information_retrieval_challenge_c with black

reformatted test_information_retrieval_challenge_c with black

---------

Co-authored-by: merwanehamadi <merwanehamadi@gmail.com>
2023-07-06 10:05:48 -07:00
uta
bfb45f2cbd Fix errors in Mandatory Tasks of Benchmarks (#4893)
Co-authored-by: merwanehamadi <merwanehamadi@gmail.com>
2023-07-05 16:37:01 -07:00
merwanehamadi
cfdb24efac Link all challenges to benchmark python hook (#4786) 2023-06-24 06:20:58 -07:00
merwanehamadi
222101b30e Create run_task python hook to interface with benchmarks (#4778)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-06-23 21:15:20 -07:00
merwanehamadi
4e3f832dc3 Remove config singleton (#4737) 2023-06-20 06:47:59 -07:00
merwanehamadi
a30e5a85b2 Remove write_tests command (#4707)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
Co-authored-by: Erik Peterson <e@eriklp.com>
2023-06-15 13:32:20 -07:00
merwanehamadi
3df8c1b501 Make benchmarks create cassettes without using them (#4664) 2023-06-13 11:29:20 -07:00
Auto-GPT-Bot
857c330d58 Update challenge scores 2023-06-13 16:59:27 +00:00
Erik Peterson
07d9b584f7 Correct and clean up JSON handling (#4655)
* Correct and clean up JSON handling

* Use ast for message history too

* Lint

* Add comments explaining why we use literal_eval

* Add descriptions to llm_response_format schema

* Parse responses in code blocks

* Be more careful when parsing in code blocks

* Lint
2023-06-13 09:54:50 -07:00
merwanehamadi
d5afbbee26 Add challenge name and level to pytest logs (#4661) 2023-06-12 08:03:14 -07:00
merwanehamadi
2ce6ae6707 Change memory challenge expectations (#4657) 2023-06-11 14:34:02 -07:00
Erik Peterson
fd04db12fa Use prompt_toolkit to enable keyboard navigation in CLI (#4649)
* Use prompt_toolkit to enable keyboard navigation in CLI

* Also update other tests

---------

Co-authored-by: merwanehamadi <merwanehamadi@gmail.com>
2023-06-11 14:19:42 -07:00
merwanehamadi
9150f32f8b Fix benchmark logs (#4653) 2023-06-11 15:34:57 +01:00
merwanehamadi
6fb9b6d03b Retry regression tests (#4648) 2023-06-11 15:21:26 +01:00
Erik Peterson
0594ba33a2 Pass agent to commands instead of config (#4645)
* Add config as attribute to Agent, rename old config to ai_config

* Code review: Pass ai_config

* Pass agent to commands instead of config

* Lint

* Fix merge error

* Fix memory challenge a

---------

Co-authored-by: Nicholas Tindle <nick@ntindle.com>
Co-authored-by: merwanehamadi <merwanehamadi@gmail.com>
2023-06-10 15:48:50 -07:00
merwanehamadi
097ce08908 Create benchmarks.yml (#4647) 2023-06-10 15:11:24 -07:00
merwanehamadi
3c51ff501f dcrement memory challenge c (#4639) 2023-06-09 20:46:06 -07:00
javableu
474a9c4d95 False believes challenge based on sally anne test. (#4167)
* False believes challenge based on sally anne test.

* Update test_memory_challenge_d.py

* Update challenge_d.md

Some text appearing in bold

* Update test_memory_challenge_d.py

* Update test_memory_challenge_d.py

* Update test_memory_challenge_d.py

* Update test_memory_challenge_d.py

black  test_memory_challenge_d.py

* Update test_memory_challenge_d.py

replaced the dynamic time depending of the level to a fix time

* Update test_memory_challenge_d.py

isort command for the libraries

* Refactored memory challenge a

---------

Co-authored-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-06-09 15:02:41 -07:00
merwanehamadi
3d06b2e4c0 Decrement information retrieval challenge a (#4637) 2023-06-09 10:02:03 -07:00
merwanehamadi
12ed5a957b Fix debug code challenge (#4632) 2023-06-09 08:40:06 -07:00
merwanehamadi
3b0d49a3e0 Make test write file hard (#4481) 2023-06-09 07:50:57 -07:00
merwanehamadi
bd2e26a20f Inform users that challenges can be flaky (#4616)
* Inform users that challenges can be flaky

* Update challenge_decorator.py
2023-06-09 07:43:56 -07:00
Auto-GPT-Bot
835decc6c1 Update challenge scores 2023-06-06 21:48:57 +00:00
merwanehamadi
53efa8f6bf Update cassette submodule & fix current_score.json generation (#4601)
* Update cassette submodule

* add a new line when building current_score.json
2023-06-06 23:46:41 +02:00
Reinier van der Leer
dafbd11686 Rearrange tests & fix CI (#4596)
* Rearrange tests into unit/integration/challenge categories

* Fix linting + `tests.challenges` imports

* Fix obscured duplicate test in test_url_validation.py

* Move VCR conftest to tests.vcr

* Specify tests to run & their order (unit -> integration -> challenges) in CI

* Fail Docker CI when tests fail

* Fix import & linting errors in tests

* Fix `get_text_summary`

* Fix linting errors

* Clean up pytest args in CI

* Remove bogus tests from GoCodeo
2023-06-06 10:48:49 -07:00