Commit Graph

19 Commits

Author SHA1 Message Date
Alice Hau
4f590175cb feat: update goosebench vibes suite metrics (#2135) 2025-04-10 16:06:22 -04:00
Alice Hau
21971db722 chore: cleanup bench evals copy session dir code (#2131) 2025-04-10 14:05:37 -04:00
Alice Hau
2a70707c91 fix: goosebench selector collection issue (#2129) 2025-04-10 12:16:31 -04:00
marcelle
45a520a42e bugfix: multiple runs appending to session file (#2095) 2025-04-08 21:50:06 -04:00
marcelle
8fbd9eb327 feat: efficient benching (#1921)
Co-authored-by: Tyler Rockwood <rockwotj@gmail.com>
Co-authored-by: Kalvin C <kalvinnchau@users.noreply.github.com>
Co-authored-by: Alice Hau <110418948+ahau-square@users.noreply.github.com>
2025-04-08 14:43:43 -04:00
dependabot[bot]
48ac6a3925 chore(deps): bump tokio from 1.43.0 to 1.43.1 (#2077)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-08 09:47:50 -07:00
Jim Bennett
050a8f2f42 Add -with-remote-extension (#2062) 2025-04-07 16:42:38 -04:00
Michael Neale
c719518683 chore: add to the benchmark suite core developer a git project scenario (#2032) 2025-04-07 08:48:30 +10:00
Alice Hau
d47ab3e31b fix: fix allowing multiple selectors in goosebench (#1814) 2025-03-23 22:29:16 -04:00
marcelle
4c03b34058 feat: refactor register eval (#1713) 2025-03-18 15:18:09 -04:00
Zaki Ali
a9fefd0e43 feat: add default metrics for core evals (#1602) 2025-03-14 18:11:31 -07:00
marcelle
c23be1eb19 fix: ensure repeating benches return to initial run-dir (#1617) 2025-03-11 11:44:57 -04:00
Zaki Ali
c0e719eaba fix: merge error logging in goose bench (#1545) 2025-03-10 15:45:00 -07:00
Alice Hau
bb4feacf03 feat: add additional goosebench evals (#1571)
Co-authored-by: Alice Hau <alice.a.hau@gmail.com>
2025-03-10 15:11:44 -04:00
marcelle
1354133b18 fix: included files was panicing because dir didnt exist (#1583) 2025-03-09 22:47:46 -04:00
marcelle
00fc3a5de8 Feat: support auto-including dirs in binary/bench-work-dir (#1576) 2025-03-07 17:53:39 -05:00
marcelle
798d657e7e bugfix: refactor workdirs to be async-safe, and simpler (#1558) 2025-03-06 21:11:35 -05:00
Zaki Ali
ebf7cb1231 feat: split required_extensions in bench to builtin/external (#1547) 2025-03-06 17:12:21 -08:00
marcelle
49dee048e4 feat: goose bench framework for functional and regression testing
Co-authored-by: Zaki Ali <zaki@squareup.com>
2025-03-05 21:23:00 -05:00