Alice Hau
|
d47ab3e31b
|
fix: fix allowing multiple selectors in goosebench (#1814)
|
2025-03-23 22:29:16 -04:00 |
|
marcelle
|
4c03b34058
|
feat: refactor register eval (#1713)
|
2025-03-18 15:18:09 -04:00 |
|
Zaki Ali
|
a9fefd0e43
|
feat: add default metrics for core evals (#1602)
|
2025-03-14 18:11:31 -07:00 |
|
marcelle
|
c23be1eb19
|
fix: ensure repeating benches return to initial run-dir (#1617)
|
2025-03-11 11:44:57 -04:00 |
|
Zaki Ali
|
c0e719eaba
|
fix: merge error logging in goose bench (#1545)
|
2025-03-10 15:45:00 -07:00 |
|
Alice Hau
|
bb4feacf03
|
feat: add additional goosebench evals (#1571)
Co-authored-by: Alice Hau <alice.a.hau@gmail.com>
|
2025-03-10 15:11:44 -04:00 |
|
marcelle
|
1354133b18
|
fix: included files was panicing because dir didnt exist (#1583)
|
2025-03-09 22:47:46 -04:00 |
|
marcelle
|
00fc3a5de8
|
Feat: support auto-including dirs in binary/bench-work-dir (#1576)
|
2025-03-07 17:53:39 -05:00 |
|
marcelle
|
798d657e7e
|
bugfix: refactor workdirs to be async-safe, and simpler (#1558)
|
2025-03-06 21:11:35 -05:00 |
|
Zaki Ali
|
ebf7cb1231
|
feat: split required_extensions in bench to builtin/external (#1547)
|
2025-03-06 17:12:21 -08:00 |
|
marcelle
|
49dee048e4
|
feat: goose bench framework for functional and regression testing
Co-authored-by: Zaki Ali <zaki@squareup.com>
|
2025-03-05 21:23:00 -05:00 |
|