Commit Graph

95 Commits

Author SHA1 Message Date
Silen Naihin
4011cb228f working bar and radar charts (#221) 2023-07-31 12:22:38 +01:00
merwanehamadi
ad00a0634e Get helicone costs (#220)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-30 21:33:09 -07:00
merwanehamadi
6309bc9c3d Update submodule (#219) 2023-07-30 20:03:53 -07:00
merwanehamadi
d93950e6d9 Fix timeout not working (#218) 2023-07-30 19:05:09 -07:00
Silen Naihin
19db3151dd Feature: Visualize Test Results (#211) 2023-07-30 23:51:17 +01:00
merwanehamadi
a6c3730ac8 Add timeout that allows teardown (#216)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-29 20:02:41 -07:00
merwanehamadi
c4554225bd Update submodules (#212)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-29 10:18:35 -07:00
Silen Naihin
ecc386ec7b returning scores (#210)
Co-authored-by: Auto-GPT-Bot <github-bot@agpt.co>
2023-07-29 11:43:22 +01:00
Silen Naihin
f07e7b60d4 Advanced LLM Evaluation Implementation (#205)
Co-authored-by: Auto-GPT-Bot <github-bot@agpt.co>
2023-07-29 10:26:19 +01:00
merwanehamadi
80bd0c4260 Fix tests not being run (#207)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-27 20:50:53 -07:00
merwanehamadi
6098b70408 Use beebot autopackai (#203)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-27 12:21:43 -07:00
merwanehamadi
31897e7892 Delete reports (#201) 2023-07-27 11:42:24 -07:00
Silen Naihin
71e0c598d6 forcing AGENT_NAME to be defined from repo 2023-07-27 14:28:11 +01:00
Silen Naihin
0e6be16d07 helicone and llm eval fixes 2023-07-27 14:07:46 +01:00
merwanehamadi
eb57b15380 Add dynamic headers using environment variables (#200)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-26 21:26:03 -07:00
merwanehamadi
5df710fd35 Add helicone dynamic headers (#199)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-26 16:03:13 -07:00
Silen Naihin
66d1fec07e attempting more logs 2023-07-26 23:36:45 +01:00
merwanehamadi
01b118e590 Add llm eval (#197)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-26 14:00:24 -07:00
Silen Naihin
80506e9a3b report # bug, adding submodule challenges (#193) 2023-07-26 13:53:10 +01:00
merwanehamadi
a1e02f243c Add safety suite (#196)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-25 20:13:01 -07:00
Silen Naihin
5e3bbb946f fix suite dependencies (#194) 2023-07-26 01:50:53 +01:00
Silen Naihin
b82277515f hotfix reports (#191) 2023-07-25 19:07:24 +01:00
Silen Naihin
d9b3d7da37 Safety challenges, adaptability challenges, suite same_task (#177) 2023-07-24 13:57:44 -07:00
Silen Naihin
2b3abeff4e Integrate baby-agi (#168)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
Co-authored-by: merwanehamadi <merwanehamadi@gmail.com>
2023-07-21 11:15:42 -07:00
Erik Peterson
5a3b4f3d1d Kill subprocesses when test ends (#172)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
Co-authored-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-20 15:41:59 -07:00
Silen Naihin
12c5d54583 Fixing memory challenges, naming, testing mini-agi, smooth retrieval scaling (#166) 2023-07-17 19:41:58 -07:00
merwanehamadi
2d8fa5ca6f Use report location (#165) 2023-07-17 20:15:10 -04:00
Silen Naihin
8aa6452cc4 file naming when --test (#164) 2023-07-17 11:24:16 -04:00
Silen Naihin
dffc1dfd51 internal_info.json dynamic changes (#163) 2023-07-17 09:39:24 -04:00
Silen Naihin
ce4cefe7e7 Dynamic home path for runs (#119) 2023-07-16 18:24:06 -07:00
merwanehamadi
2704bcee5e Allow change location of reports (#115)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-16 07:26:36 -07:00
Silen Naihin
9f3a2d4f05 Dynamic cutoff and other quality of life (#101) 2023-07-15 22:10:20 -04:00
merwanehamadi
5886d75059 Add three sum challenge (#108)
Co-authored-by: Silen Naihin <silen.naihin@gmail.com>
2023-07-15 19:52:42 -04:00
Erik Peterson
cbd2e49d97 Clean up workspace between each test (#109) 2023-07-15 16:23:49 -07:00
merwanehamadi
7bc7d9213d Replace hidden files with custom python (#99)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-14 14:39:47 -07:00
merwanehamadi
a9702e4629 Add basic code generation challenge (#98) 2023-07-14 13:27:48 -04:00
merwanehamadi
78df4915cf Remove dependencies if a specific test is asked by the user (#95)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-07-12 14:35:12 -07:00
Silen Naihin
8d0c5179ed fixing backslashes, adding basic metrics (#89) 2023-07-12 01:37:59 -04:00
merwanehamadi
b3c506cd94 Fix Auto-GPT looping forever (#87) 2023-07-11 20:02:29 -04:00
merwanehamadi
4ecb70c5e3 Fix Auto-GPT integration by adding python module as entrypoint (#86)
Co-authored-by: Silen Naihin <silen.naihin@gmail.com>
2023-07-11 15:11:24 -04:00
merwanehamadi
0799be7e28 Fix tests ci (#82) 2023-07-10 21:54:25 -07:00
Silen Naihin
8df82909b2 Added --test, consolidate files, reports working (#83) 2023-07-10 19:25:19 -07:00
merwanehamadi
437e066a66 Add "Simple web server" challenge (#74)
Co-authored-by: Silen Naihin <silen.naihin@gmail.com>
2023-07-10 20:46:03 -04:00
merwanehamadi
30ba51593f Add Helicone (#81) 2023-07-10 12:19:12 -04:00
Silen Naihin
b8830f8625 Adding search interface challenge and cleaning repo (#80) 2023-07-09 18:33:08 -07:00
Silen Naihin
3d43117554 Just json, no test files (#77) 2023-07-09 17:27:21 -07:00
merwanehamadi
573130549f Add gpt engineer to ci (#78) 2023-07-09 13:31:31 -07:00
merwanehamadi
d89264998d Fix debug code challenge (#76)
Co-authored-by: Silen Naihin <silen.naihin@gmail.com>
2023-07-08 21:46:37 -04:00
Silen Naihin
69bd41f741 Quality of life improvements & fixes (#75) 2023-07-08 18:43:38 -07:00
Silen Naihin
e56b112aab i/o workspace, adding superagi (#60) 2023-07-08 03:27:31 -04:00