- Introduced a new `SkillTreeType` enum to represent different skill tree categories: General, Coding, Data, and Scrape/Synthesize.
- Extended the `SkillTreeType` enum to provide associated string values and JSON file names for each category.
- Refactored the `SkillTreeViewModel` to reload the skill tree data based on the selected category.
- Enhanced `SkillTreeView` by adding a positioned dropdown in the top-left corner to allow users to select and load different skill tree categories dynamically.
This commit introduces the `BenchmarkRun` class, designed to model a complete benchmark run. The class encapsulates all data and sub-models related to a benchmark, providing a centralized object to handle various aspects of a benchmark run.
The `BenchmarkRun` class includes the following sub-models:
- `RepositoryInfo`: Information about the repository and team.
- `RunDetails`: Specific details like the run identifier, command, and timings.
- `TaskInfo`: Information about the task being benchmarked.
- `Metrics`: Performance metrics for the benchmark run.
- `Config`: Configuration settings for the benchmark run.
A `reachedCutoff` field is also included to indicate whether a certain cutoff was reached during the benchmark run.
Methods for serializing and deserializing the object to and from JSON are also provided.
This commit introduces the RepositoryInfo class, designed to encapsulate details about the repository and team associated with a benchmark run.
The class includes the following fields:
- repoUrl: The URL of the repository where the benchmark code resides.
- teamName: The name of the team responsible for the benchmark.
- benchmarkGitCommitSha: The Git commit SHA for the benchmark code.
- agentGitCommitSha: The Git commit SHA for the agent code.
The class supports JSON serialization and deserialization, making it easy to use with Flutter's JSON handling mechanisms.
Added a new Dart class called `RunDetails` to represent specific details related to a benchmark run.
The class includes fields for:
- The unique run identifier (`runId`)
- The command used to initiate the benchmark (`command`)
- The time the benchmark was completed (`completionTime`)
- The time the benchmark started (`benchmarkStartTime`)
- The name of the test being run (`testName`)
Serialization and deserialization methods are also provided for JSON compatibility.
Added a new TaskInfo class to encapsulate information related to a specific benchmark task.
- The TaskInfo class holds attributes like the data file path, regression status, task categories, task details, expected answer, and description.
- Included methods for JSON serialization and deserialization.
- Added comprehensive documentation to describe the purpose, properties, and methods of the TaskInfo class.
Added a new Metrics class to represent key performance metrics of a benchmark test run.
- The Metrics class encapsulates various data points like difficulty, success rate, attempted status, success percentage, cost, and runtime.
- Included serialization and deserialization methods for converting between Metrics objects and JSON.
- Added comprehensive documentation to describe the purpose, properties, and methods of the Metrics class.
This commit introduces a new `Config` class, designed to manage and store configuration settings related to the benchmark run. The class contains two key fields:
1. `agentBenchmarkConfigPath`: The path to the agent's benchmark configuration file.
2. `host`: The address of the host where the benchmark is running.
The class includes methods for serialization and deserialization, allowing easy conversion between `Config` objects and JSON maps.
Documentation comments have also been added for better code readability and understanding.
* New benchmark data models
* Update _benchmarkBaseUrl
* Remove ReportRequestBody
* Update benchmark service methods for proxy approach
* Add eval id to SkillNodeData
* Refactor runBenchmark Method for proxy approach
This commit introduces a new class, TestSuite, designed to encapsulate a collection of Task objects under a common timestamp. This will help in grouping tasks that belong to a particular test suite.
Key Features:
- Add a TestSuite class with fields for `timestamp` and a list of `tests` (Task objects).
- Implement `toJson` method for serializing TestSuite objects to JSON-compatible format.
- Implement `fromJson` factory method for deserializing JSON data back into a TestSuite object.
By providing serialization and deserialization support directly in the model, we facilitate easier storage and data exchange for test suites.
This commit adds serialization support to the Task model by including a `toJson` method. This will allow easy conversion of Task objects to a JSON-compatible format, facilitating storage or network transmission.
This commit adds a new boolean field, "mock", to the `ReportRequestBody` class. This additional field is in line with the new requirements to specify whether the report is a mock or not.
The `toJson()` method is also updated to include this new field during serialization.
This commit adds a new enum, `ApiType`, to allow dynamic selection between different base URLs for API calls. The enum has two values: `agent` and `benchmark`, corresponding to different services.
The `ApiType` enum is designed to be passed as a parameter to the `RestApiUtility` methods, enabling the utility to decide which base URL to use for HTTP requests.
This commit introduces a new Dart class, `ReportRequestBody`, which represents the request body for generating reports in the `BenchmarkService`. The class includes a `toJson` method for easy serialization to JSON format.
- `category`: Specifies the category of the report (e.g., "coding").
- `tests`: A list of tests to be included in the report.
This commit extends the SkillTreeNode class to incorporate new attributes such as 'data', 'label', and 'shape', making the model more comprehensive. The JSON deserialization is also updated to handle optional or missing fields by providing default values, improving the robustness of the model.
This commit updates the SkillNodeData class to handle optional or missing JSON fields more robustly. Now, the model provides default values for each field, ensuring that the object can be instantiated successfully even if some JSON fields are missing or set to null.
This commit updates the Info class to provide default values for optional or missing fields in the JSON payload. This ensures that the model can be successfully instantiated even when some JSON fields are absent or set to null.
This commit modifies the Ground class to make it more robust against optional or missing fields in the incoming JSON data. Default values have been added to ensure that the model can be instantiated even if some JSON fields are missing or set to null.
The SkillTreeEdge model represents the relationship between different skill nodes.
It includes:
- Edge ID
- Source node ID
- Destination node ID
- Arrows property to indicate directionality
The SkillNodeData model aggregates various data related to a skill node.
It includes:
- Node name
- Node category
- Associated task
- Dependencies
- Cutoff value
- Ground object for evaluation details
- Info object for metadata
The Info data model holds metadata about a skill node.
It includes:
- The difficulty level of the skill node
- A description of the skill node
- A list of potential side effects related to the skill node
The Ground data model stores evaluation information for each skill node.
It includes:
- The answer to be evaluated
- A list of terms that should be contained in the answer
- A list of terms that should not be contained in the answer
- A list of associated files
- A map for additional evaluation criteria
Updated the StepRequestBody class to allow both 'input' and 'additionalInput' to be optional. Added logic in toJson() method to return an empty JSON object if both fields are null.