mirror of
https://github.com/aljazceru/Auto-GPT.git
synced 2026-01-04 14:54:32 +01:00
49 lines
1.8 KiB
Markdown
49 lines
1.8 KiB
Markdown
# Challenges Data Schema of Benchmark
|
|
|
|
## General challenges
|
|
|
|
Input:
|
|
|
|
- **name** (str): Name of the challenge.
|
|
- **category** (str[]): Category of the challenge such as 'basic', 'retrieval', 'comprehension', etc. _this is not currently used. for the future it may be needed_
|
|
- **task** (str): The task that the agent needs to solve.
|
|
- **dependencies** (str[]): The dependencies that the challenge needs to run. Needs to be the full node to the test function.
|
|
- **ground** (dict): The ground truth.
|
|
- **answer** (str): The raw text of the ground truth answer.
|
|
- **should_contain** (list): The exact strings that are required in the final answer.
|
|
- **should_not_contain** (list): The exact strings that should not be in the final answer.
|
|
- **files** (list): Files that are used for retrieval. Can specify file here or an extension.
|
|
- **mock** (dict): Mock response for testing.
|
|
- **mock_func** (str): Function to mock the agent's response. This is used for testing purposes.
|
|
- **mock_task** (str): Task to provide for the mock function.
|
|
- **info** (dict): Additional info about the challenge.
|
|
- **difficulty** (str): The difficulty of this query.
|
|
- **description** (str): Description of the challenge.
|
|
- **side_effects** (str[]): Describes the effects of the challenge.
|
|
|
|
Example:
|
|
|
|
```python
|
|
{
|
|
"category": ["basic"],
|
|
"task": "Print the the capital of America to a .txt file",
|
|
"dependencies": ["TestWriteFile"], # the class name of the test
|
|
"ground": {
|
|
"answer": "Washington",
|
|
"should_contain": ["Washington"],
|
|
"should_not_contain": ["New York", "Los Angeles", "San Francisco"],
|
|
"files": [".txt"],
|
|
"type": "file"
|
|
},
|
|
"info": {
|
|
"difficulty": "basic",
|
|
"description": "Tests the writing to file",
|
|
"side_effects": ["tests if there is in fact an LLM attached"]
|
|
}
|
|
}
|
|
```
|
|
|
|
Current Output:
|
|
|
|
- **score** (float): scores range from [0, 1]
|