* feat(benchmark): Add JungleGym WebArena challenges
- Add `WebArenaChallenge`, `WebArenaChallengeSpec`, and other logic to make these challenges work
- Add WebArena challenges to Pytest collection endpoint generate_test.py
* feat(benchmark/webarena): Add hand-picked selection of WebArena challenges