aljaz/Auto-GPT

mirror of https://github.com/aljazceru/Auto-GPT.git synced 2026-02-11 09:14:19 +01:00

Files

History

Silen Naihin a5073ab577 basic challenges, more ChallengeData structure

2023-06-24 09:42:36 -04:00

..

start click, fixtures, types, challenge creation, mock run -stable (#37 )

2023-06-21 11:43:18 -04:00

start click, fixtures, types, challenge creation, mock run -stable (#37 )

2023-06-21 11:43:18 -04:00

start click, fixtures, types, challenge creation, mock run -stable (#37 )

2023-06-21 11:43:18 -04:00

basic challenges, more ChallengeData structure

2023-06-24 09:42:36 -04:00

start click, fixtures, types, challenge creation, mock run -stable (#37 )

2023-06-21 11:43:18 -04:00

start click, fixtures, types, challenge creation, mock run -stable (#37 )

2023-06-21 11:43:18 -04:00

__init__.py

start click, fixtures, types, challenge creation, mock run -stable (#37 )

2023-06-21 11:43:18 -04:00

define_task_types.py

basic challenges, more ChallengeData structure

2023-06-24 09:42:36 -04:00

README.md

MockManager, mock_func in data.json (#39 )

2023-06-23 07:53:57 -04:00

README.md

Challenges Data Schema of Benchmark

General challenges

Input:

category (str): information-retrieval
difficulty(str): the difficulty of this query. choices from

Information-retrieval challenges

Input:

category (str): information-retrieval
task (str): the question the agent needs to be solve.
ground (dict): The ground truth.
- answer (str): The raw text of ground truth answer
- should_contain (list): the exact strings that is required in the final answer
- should_not_contain (list): the exact strings that should not be in the final answer
- files: files that the are used for retrieval. Can specify file here or an extension TODO: like .txt
difficulty(str): the difficulty of this query. choices from
mock_func: function to mock the agent's response. This is used for testing purposes

Example:

{
  "category": "retrieval",
  "task": "What is the capital of America?",
  "ground": {
    "answer": "Washington",
    "should_contain": ["Washington"],
    "should_not_contain": ["New York", "Los Angeles", "San Francisco"],
    "files": ["file_to_check.txt"]
  },
  "difficulty": "easy"
}

Output:

score (float): scores range from [0, 1]