From 1eb278f3cc36ad5087f3ec30ea8c4e6fc8efca3a Mon Sep 17 00:00:00 2001 From: Silen Naihin Date: Mon, 19 Jun 2023 09:53:30 -0400 Subject: [PATCH] Update README.md --- README.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 820c0f51..02f792b7 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,14 @@ A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work -#### MVP: function calls api, api returns presigned url, folder is uploaded, write file challenge is measured, score is given +Simple boilerplate code that spins up a webserver to plug their agent into. We call multiple tasks by invoking different pytest commands on folders and once the agent stops or reaches 50 loops (which they will have to define). We handle the deletion of files after a run loop ends. Then we call call the POST request for the next task. Then we will spit out a combined benchmark once all tests run -#### Diagrams: https://whimsical.com/agbenchmark-5n4hXBq1ZGzBwRsK4TVY7x +- Agent adds tests by adding to our repo +- Agent abstracted from benchmark +- Scalable (parallel servers running tests) +- Better standardization + +##### Diagrams (out of date, cloud oriented): https://whimsical.com/agbenchmark-5n4hXBq1ZGzBwRsK4TVY7x ## Contributing