feat(benchmark): JungleGym WebArena (#6691)

* feat(benchmark): Add JungleGym WebArena challenges
   - Add `WebArenaChallenge`, `WebArenaChallengeSpec`, and other logic to make these challenges work
   - Add WebArena challenges to Pytest collection endpoint generate_test.py

* feat(benchmark/webarena): Add hand-picked selection of WebArena challenges
This commit is contained in:
Reinier van der Leer
2024-01-19 20:34:04 +01:00
committed by GitHub
parent 05b018a837
commit 488f40a20f
4 changed files with 1005 additions and 1 deletions

View File

@@ -7,15 +7,17 @@ classes in the module that conform to the `Test*` pattern are collected.
import importlib
import logging
from itertools import chain
from agbenchmark.challenges.builtin import load_builtin_challenges
from agbenchmark.challenges.webarena import load_webarena_challenges
logger = logging.getLogger(__name__)
DATA_CATEGORY = {}
# Load challenges and attach them to this module
for challenge in load_builtin_challenges():
for challenge in chain(load_builtin_challenges(), load_webarena_challenges()):
# Attach the Challenge class to this module so it can be discovered by pytest
module = importlib.import_module(__name__)
setattr(module, challenge.__name__, challenge)