Commit Graph

4198 Commits

Author SHA1 Message Date
merwanehamadi
ff4c76ba00 Make agbenchmark a proxy of the evaluated agent (#5279)
Make agbenchmark a Proxy of the evaluated agent

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-20 16:06:00 -07:00
hunteraraujo
1a471b73cd Fix _leaderboardBaseUrl 2023-09-20 14:44:29 -07:00
hunteraraujo
7901c750b6 Extend RestApiUtility to Support Leaderboard Base URL
This commit extends the `RestApiUtility` class to include support for a new leaderboard base URL. A new `ApiType` enum value `ApiType.leaderboard` has been added, and the `_getEffectiveBaseUrl` method has been updated to handle this new type. The leaderboard base URL is "https://leaderboard.vercel.app/".
2023-09-20 14:43:45 -07:00
hunteraraujo
a0512254ca Add LeaderboardService with submitReport Method
This commit adds a new `LeaderboardService` class featuring a `submitReport` method. This method allows for the submission of `BenchmarkRun` objects to the leaderboard via a POST request to the `/api/reports` endpoint. The new service uses the `ApiType.leaderboard` enum value.
2023-09-20 14:38:48 -07:00
hunteraraujo
fe96664afb Update ApiType Enum to Include Leaderboard 2023-09-20 14:31:27 -07:00
SwiftyOS
d4222519eb Added instructions about cloning and changing dir 2023-09-20 22:50:39 +02:00
SwiftyOS
88f0b04015 fixed grammer 2023-09-20 22:50:39 +02:00
hunteraraujo
cfc6180233 Add BenchmarkRun Class to Model Complete Benchmark Runs
This commit introduces the `BenchmarkRun` class, designed to model a complete benchmark run. The class encapsulates all data and sub-models related to a benchmark, providing a centralized object to handle various aspects of a benchmark run.

The `BenchmarkRun` class includes the following sub-models:
- `RepositoryInfo`: Information about the repository and team.
- `RunDetails`: Specific details like the run identifier, command, and timings.
- `TaskInfo`: Information about the task being benchmarked.
- `Metrics`: Performance metrics for the benchmark run.
- `Config`: Configuration settings for the benchmark run.

A `reachedCutoff` field is also included to indicate whether a certain cutoff was reached during the benchmark run.

Methods for serializing and deserializing the object to and from JSON are also provided.
2023-09-20 13:24:19 -07:00
hunteraraujo
311f69b7cf Add RepositoryInfo Class for Benchmark Repository and Team Details
This commit introduces the RepositoryInfo class, designed to encapsulate details about the repository and team associated with a benchmark run.

The class includes the following fields:
- repoUrl: The URL of the repository where the benchmark code resides.
- teamName: The name of the team responsible for the benchmark.
- benchmarkGitCommitSha: The Git commit SHA for the benchmark code.
- agentGitCommitSha: The Git commit SHA for the agent code.

The class supports JSON serialization and deserialization, making it easy to use with Flutter's JSON handling mechanisms.
2023-09-20 13:17:46 -07:00
hunteraraujo
fc193568b9 Add RunDetails class for encapsulating benchmark run information
Added a new Dart class called `RunDetails` to represent specific details related to a benchmark run.

The class includes fields for:
- The unique run identifier (`runId`)
- The command used to initiate the benchmark (`command`)
- The time the benchmark was completed (`completionTime`)
- The time the benchmark started (`benchmarkStartTime`)
- The name of the test being run (`testName`)

Serialization and deserialization methods are also provided for JSON compatibility.
2023-09-20 13:12:03 -07:00
hunteraraujo
afe77bbc4f Add TaskInfo class with serialization and documentation
Added a new TaskInfo class to encapsulate information related to a specific benchmark task.

- The TaskInfo class holds attributes like the data file path, regression status, task categories, task details, expected answer, and description.
- Included methods for JSON serialization and deserialization.
- Added comprehensive documentation to describe the purpose, properties, and methods of the TaskInfo class.
2023-09-20 13:07:54 -07:00
hunteraraujo
50ef7b31eb Add Metrics class with serialization and documentation
Added a new Metrics class to represent key performance metrics of a benchmark test run.

- The Metrics class encapsulates various data points like difficulty, success rate, attempted status, success percentage, cost, and runtime.
- Included serialization and deserialization methods for converting between Metrics objects and JSON.
- Added comprehensive documentation to describe the purpose, properties, and methods of the Metrics class.
2023-09-20 13:04:47 -07:00
hunteraraujo
39f8ae515b Add Config Class for Benchmark Configuration Management
This commit introduces a new `Config` class, designed to manage and store configuration settings related to the benchmark run. The class contains two key fields:

1. `agentBenchmarkConfigPath`: The path to the agent's benchmark configuration file.
2. `host`: The address of the host where the benchmark is running.

The class includes methods for serialization and deserialization, allowing easy conversion between `Config` objects and JSON maps.

Documentation comments have also been added for better code readability and understanding.
2023-09-20 13:00:22 -07:00
SwiftyOS
c72a35e92e Added blueprint of an agent tutorial 2023-09-20 17:29:14 +02:00
SwiftyOS
7e65df3f39 Changed repos stats to run daily 2023-09-20 16:46:03 +02:00
SwiftyOS
4d629960bb renamed skills -> abilities 2023-09-20 16:45:47 +02:00
SwiftyOS
9c4617eefa Added the getting started tutorial 2023-09-20 16:45:32 +02:00
SwiftyOS
b952d0d2e0 Updated server endpoint message 2023-09-20 16:24:18 +02:00
SwiftyOS
55bcb99e91 Edited the cron to run every 90mins 2023-09-20 13:23:35 +02:00
SwiftyOS
6dcee70eab Added repo stats 2023-09-20 12:53:29 +02:00
SwiftyOS
8fdccfa05a Added outline for memory tutorial 2023-09-20 12:41:04 +02:00
SwiftyOS
4f002d66be Added outline for skills tutorial 2023-09-20 12:40:52 +02:00
SwiftyOS
93be3f54e3 Adding outline of the planning tutorial 2023-09-20 11:48:42 +02:00
SwiftyOS
309a6af359 Added outline of benchmarking tutorial 2023-09-20 11:48:05 +02:00
SwiftyOS
585ba1a1fd Add outline of agent overview tutorial 2023-09-20 11:47:36 +02:00
SwiftyOS
c707cec362 Added outline of tutorial 1 2023-09-20 11:46:55 +02:00
SwiftyOS
edcd103958 Added llm functions 2023-09-20 09:57:10 +02:00
Swifty
8897e47691 Update frontend build (#5270)
Co-authored-by: GitHub Action <action@github.com>
2023-09-20 09:53:00 +02:00
hunteraraujo
377d0af228 Refactor SkillTreeViewModel and Update TaskQueueView UI for Task Status (#5269)
* Refactor SkillTreeViewModel and Update TaskQueueView UI for Task Status

* Notify UI when updating benchmark status
2023-09-19 23:30:22 -07:00
hunteraraujo
99035103e0 Rename benchmark_service directory to benchmark 2023-09-19 22:16:58 -07:00
hunteraraujo
525571c32e Enhance runBenchmark with TestSuite Tracking (#5268) 2023-09-19 21:31:02 -07:00
hunteraraujo
80682b41cb Add Early Termination to runBenchmark on Benchmark Failure (#5267) 2023-09-19 20:24:52 -07:00
hunteraraujo
a37b486227 Enhance SkillTreeViewModel to Manage Benchmark Status (#5266)
Enhance SkillTreeViewModel to Manage Benchmark Execution and Status
2023-09-19 20:20:31 -07:00
hunteraraujo
f130aa7972 Correct triggerEvaluation endpoint 2023-09-19 17:19:59 -07:00
hunteraraujo
5afab461ee Refactor Benchmarking Workflow and Introduce New Data Models (#5264)
* New benchmark data models

* Update _benchmarkBaseUrl

* Remove ReportRequestBody

* Update benchmark service methods for proxy approach

* Add eval id to SkillNodeData

* Refactor runBenchmark Method for proxy approach
2023-09-19 17:01:15 -07:00
SwiftyOS
2098e192da Removed additional refs to frontend 2023-09-19 15:09:51 +02:00
SwiftyOS
cc7476656f removed frontend command from the cli 2023-09-19 15:08:26 +02:00
SwiftyOS
fa265fdf25 Updated quickstart 2023-09-19 15:02:06 +02:00
SwiftyOS
08db74b8ee Updated the forge readme 2023-09-19 14:53:53 +02:00
SwiftyOS
aa1a65c59c Updated forge to server the frontend again 2023-09-19 13:24:06 +02:00
Swifty
ccd0eb800b Update frontend build (#5258)
Co-authored-by: GitHub Action <action@github.com>
2023-09-19 13:06:20 +02:00
SwiftyOS
360ce60b83 commened out create PR bit 2023-09-19 13:04:57 +02:00
SwiftyOS
172d256e15 Switched pull request step 2023-09-19 12:57:49 +02:00
SwiftyOS
2c187b66b7 More messing with the action 2023-09-19 12:50:44 +02:00
SwiftyOS
9a94ce31d8 Testing PR creation 2023-09-19 12:44:21 +02:00
SwiftyOS
c7f4bd265d Changed to push to a branch and make a pr 2023-09-19 12:35:04 +02:00
SwiftyOS
de4839b050 Testing build action 2023-09-19 12:11:32 +02:00
SwiftyOS
50842af1e5 Made the action only trigger if the frontend is modified 2023-09-19 12:10:39 +02:00
SwiftyOS
833a37e9a6 Added action to build and commit the frontend 2023-09-19 12:02:50 +02:00
hunteraraujo
bf03dd8739 Refactor runBenchmark in SkillTreeViewModel for New Report Generation Flow
This commit updates the runBenchmark method in the SkillTreeViewModel class to align with the new report generation flow. The updated method does the following:

1. Checks if a benchmark is already running to prevent overlapping runs.
2. Sets a flag to indicate that the benchmark is running and notifies the UI.
3. Reverses the selected node hierarchy for report generation.
4. Loops through each node in the reversed hierarchy to:
  - Generate a unique UUID for each test run.
  - Create a ReportRequestBody object.
  - Call the generateSingleReport method in the BenchmarkService.
  - Update the UI after each single report is generated.

5. After all single reports are generated, it calls the generateCombinedReport method in the BenchmarkService, passing in all the generated UUIDs.

6. Finally, it sets the benchmark running flag to false and notifies the UI.

This change improves the report generation flow and allows for both individual and combined reports.
2023-09-18 19:55:01 -07:00