mirror of
https://github.com/aljazceru/Auto-GPT.git
synced 2026-01-31 11:54:30 +01:00
AGBenchmark codebase clean-up (#6650)
* refactor(benchmark): Deduplicate configuration loading logic
- Move the configuration loading logic to a separate `load_agbenchmark_config` function in `agbenchmark/config.py` module.
- Replace the duplicate loading logic in `conftest.py`, `generate_test.py`, `ReportManager.py`, `reports.py`, and `__main__.py` with calls to `load_agbenchmark_config` function.
* fix(benchmark): Fix type errors, linting errors, and clean up CLI validation in __main__.py
- Fixed type errors and linting errors in `__main__.py`
- Improved the readability of CLI argument validation by introducing a separate function for it
* refactor(benchmark): Lint and typefix app.py
- Rearranged and cleaned up import statements
- Fixed type errors caused by improper use of `psutil` objects
- Simplified a number of `os.path` usages by converting to `pathlib`
- Use `Task` and `TaskRequestBody` classes from `agent_protocol_client` instead of `.schema`
* refactor(benchmark): Replace `.agent_protocol_client` by `agent-protcol-client`, clean up schema.py
- Remove `agbenchmark.agent_protocol_client` (an offline copy of `agent-protocol-client`).
- Add `agent-protocol-client` as a dependency and change imports to `agent_protocol_client`.
- Fix type annotation on `agent_api_interface.py::upload_artifacts` (`ApiClient` -> `AgentApi`).
- Remove all unused types from schema.py (= most of them).
* refactor(benchmark): Use pathlib in agent_interface.py and agent_api_interface.py
* refactor(benchmark): Improve typing, response validation, and readability in app.py
- Simplified response generation by leveraging type checking and conversion by FastAPI.
- Introduced use of `HTTPException` for error responses.
- Improved naming, formatting, and typing in `app.py::create_evaluation`.
- Updated the docstring on `app.py::create_agent_task`.
- Fixed return type annotations of `create_single_test` and `create_challenge` in generate_test.py.
- Added default values to optional attributes on models in report_types_v2.py.
- Removed unused imports in `generate_test.py`
* refactor(benchmark): Clean up logging and print statements
- Introduced use of the `logging` library for unified logging and better readability.
- Converted most print statements to use `logger.debug`, `logger.warning`, and `logger.error`.
- Improved descriptiveness of log statements.
- Removed unnecessary print statements.
- Added log statements to unspecific and non-verbose `except` blocks.
- Added `--debug` flag, which sets the log level to `DEBUG` and enables a more comprehensive log format.
- Added `.utils.logging` module with `configure_logging` function to easily configure the logging library.
- Converted raw escape sequences in `.utils.challenge` to use `colorama`.
- Renamed `generate_test.py::generate_tests` to `load_challenges`.
* refactor(benchmark): Remove unused server.py and agent_interface.py::run_agent
- Remove unused server.py file
- Remove unused run_agent function from agent_interface.py
* refactor(benchmark): Clean up conftest.py
- Fix and add type annotations
- Rewrite docstrings
- Disable or remove unused code
- Fix definition of arguments and their types in `pytest_addoption`
* refactor(benchmark): Clean up generate_test.py file
- Refactored the `create_single_test` function for clarity and readability
- Removed unused variables
- Made creation of `Challenge` subclasses more straightforward
- Made bare `except` more specific
- Renamed `Challenge.setup_challenge` method to `run_challenge`
- Updated type hints and annotations
- Made minor code/readability improvements in `load_challenges`
- Added a helper function `_add_challenge_to_module` for attaching a Challenge class to the current module
* fix(benchmark): Fix and add type annotations in execute_sub_process.py
* refactor(benchmark): Simplify const determination in agent_interface.py
- Simplify the logic that determines the value of `HELICONE_GRAPHQL_LOGS`
* fix(benchmark): Register category markers to prevent warnings
- Use the `pytest_configure` hook to register the known challenge categories as markers. Otherwise, Pytest will raise "unknown marker" warnings at runtime.
* refactor(benchmark/challenges): Fix indentation in 4_revenue_retrieval_2/data.json
* refactor(benchmark): Update agent_api_interface.py
- Add type annotations to `copy_agent_artifacts_into_temp_folder` function
- Add note about broken endpoint in the `agent_protocol_client` library
- Remove unused variable in `run_api_agent` function
- Improve readability and resolve linting error
* feat(benchmark): Improve and centralize pathfinding
- Search path hierarchy for applicable `agbenchmark_config`, rather than assuming it's in the current folder.
- Create `agbenchmark.utils.path_manager` with `AGBenchmarkPathManager` and exporting a `PATH_MANAGER` const.
- Replace path constants defined in __main__.py with usages of `PATH_MANAGER`.
* feat(benchmark/cli): Clean up and improve CLI
- Updated commands, options, and their descriptions to be more intuitive and consistent
- Moved slow imports into the entrypoints that use them to speed up application startup
- Fixed type hints to match output types of Click options
- Hid deprecated `agbenchmark start` command
- Refactored code to improve readability and maintainability
- Moved main entrypoint into `run` subcommand
- Fixed `version` and `serve` subcommands
- Added `click-default-group` package to allow using `run` implicitly (for backwards compatibility)
- Renamed `--no_dep` to `--no-dep` for consistency
- Fixed string formatting issues in log statements
* refactor(benchmark/config): Move AgentBenchmarkConfig and related functions to config.py
- Move the `AgentBenchmarkConfig` class from `utils/data_types.py` to `config.py`.
- Extract the `calculate_info_test_path` function from `utils/data_types.py` and move it to `config.py` as a private helper function `_calculate_info_test_path`.
- Move `load_agent_benchmark_config()` to `AgentBenchmarkConfig.load()`.
- Changed simple getter methods on `AgentBenchmarkConfig` to calculated properties.
- Update all code references according to the changes mentioned above.
* refactor(benchmark): Fix ReportManager init parameter types and use pathlib
- Fix the type annotation of the `benchmark_start_time` parameter in `ReportManager.__init__`, was mistyped as `str` instead of `datetime`.
- Change the type of the `filename` parameter in the `ReportManager.__init__` method from `str` to `Path`.
- Rename `self.filename` with `self.report_file` in `ReportManager`.
- Change the way the report file is created, opened and saved to use the `Path` object.
* refactor(benchmark): Improve typing surrounding ChallengeData and clean up its implementation
- Use `ChallengeData` objects instead of untyped `dict` in app.py, generate_test.py, reports.py.
- Remove unnecessary methods `serialize`, `get_data`, `get_json_from_path`, `deserialize` from `ChallengeData` class.
- Remove unused methods `challenge_from_datum` and `challenge_from_test_data` from `ChallengeData class.
- Update function signatures and annotations of `create_challenge` and `generate_single_test` functions in generate_test.py.
- Add types to function signatures of `generate_single_call_report` and `finalize_reports` in reports.py.
- Remove unnecessary `challenge_data` parameter (in generate_test.py) and fixture (in conftest.py).
* refactor(benchmark): Clean up generate_test.py, conftest.py and __main__.py
- Cleaned up generate_test.py and conftest.py
- Consolidated challenge creation logic in the `Challenge` class itself, most notably the new `Challenge.from_challenge_spec` method.
- Moved challenge selection logic from generate_test.py to the `pytest_collection_modifyitems` hook in conftest.py.
- Converted methods in the `Challenge` class to class methods where appropriate.
- Improved argument handling in the `run_benchmark` function in `__main__.py`.
* refactor(benchmark/config): Merge AGBenchmarkPathManager into AgentBenchmarkConfig and reduce fragmented/global state
- Merge the functionality of `AGBenchmarkPathManager` into `AgentBenchmarkConfig` to consolidate the configuration management.
- Remove the `.path_manager` module containing `AGBenchmarkPathManager`.
- Pass the `AgentBenchmarkConfig` and its attributes through function arguments to reduce global state and improve code clarity.
* feat(benchmark/serve): Configurable port for `serve` subcommand
- Added `--port` option to `serve` subcommand to allow for specifying the port to run the API on.
- If no `--port` option is provided, the port will default to the value specified in the `PORT` environment variable, or 8080 if not set.
* feat(benchmark/cli): Add `config` subcommand
- Added a new subcommand `config` to the AGBenchmark CLI, to display information about the present AGBenchmark config.
* fix(benchmark): Gracefully handle incompatible challenge spec files in app.py
- Added a check to skip deprecated challenges
- Added logging to allow debugging of the loading process
- Added handling of validation errors when parsing challenge spec files
- Added missing `spec_file` attribute to `ChallengeData`
* refactor(benchmark): Move `run_benchmark` entrypoint to main.py, use it in `/reports` endpoint
- Move `run_benchmark` and `validate_args` from __main__.py to main.py
- Replace agbenchmark subprocess in `app.py:run_single_test` with `run_benchmark`
- Move `get_unique_categories` from __main__.py to challenges/__init__.py
- Move `OPTIONAL_CATEGORIES` from __main__.py to challenge.py
- Reduce operations on updates.json (including `initialize_updates_file`) outside of API
* refactor(benchmark): Remove unused `/updates` endpoint and all related code
- Remove `updates_json_file` attribute from `AgentBenchmarkConfig`
- Remove `get_updates` and `_initialize_updates_file` in app.py
- Remove `append_updates_file` and `create_update_json` functions in agent_api_interface.py
- Remove call to `append_updates_file` in challenge.py
* refactor(benchmark/config): Clean up and update docstrings on `AgentBenchmarkConfig`
- Add and update docstrings
- Change base class from `BaseModel` to `BaseSettings`, allow extras for backwards compatibility
- Make naming of path attributes on `AgentBenchmarkConfig` more consistent
- Remove unused `agent_home_directory` attribute
- Remove unused `workspace` attribute
* fix(benchmark): Restore mechanism to select (optional) categories in agent benchmark config
* fix(benchmark): Update agent-protocol-client to v1.1.0
- Fixes issue with fetching task artifact listings
This commit is contained in:
committed by
GitHub
parent
b8238c2228
commit
25cc6ad6ae
2
.github/workflows/hackathon.yml
vendored
2
.github/workflows/hackathon.yml
vendored
@@ -121,7 +121,7 @@ jobs:
|
||||
./run agent start $AGENT_NAME
|
||||
cd ../benchmark
|
||||
poetry install
|
||||
poetry run agbenchmark --no_dep
|
||||
poetry run agbenchmark --no-dep
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
SERP_API_KEY: ${{ secrets.SERP_API_KEY }}
|
||||
|
||||
@@ -1,5 +1,4 @@
|
||||
import glob
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
@@ -7,205 +6,97 @@ from pathlib import Path
|
||||
from typing import Any, Optional
|
||||
|
||||
import click
|
||||
import pytest
|
||||
import toml
|
||||
from click_default_group import DefaultGroup
|
||||
from dotenv import load_dotenv
|
||||
from helicone.lock import HeliconeLockManager
|
||||
|
||||
from agbenchmark.app import app
|
||||
from agbenchmark.reports.ReportManager import SingletonReportManager
|
||||
from agbenchmark.utils.data_types import AgentBenchmarkConfig
|
||||
from agbenchmark.config import AgentBenchmarkConfig
|
||||
from agbenchmark.utils.logging import configure_logging
|
||||
|
||||
load_dotenv()
|
||||
|
||||
try:
|
||||
if os.getenv("HELICONE_API_KEY"):
|
||||
import helicone # noqa
|
||||
|
||||
helicone_enabled = True
|
||||
else:
|
||||
helicone_enabled = False
|
||||
except ImportError:
|
||||
helicone_enabled = False
|
||||
|
||||
|
||||
class InvalidInvocationError(ValueError):
|
||||
pass
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
BENCHMARK_START_TIME_DT = datetime.now(timezone.utc)
|
||||
BENCHMARK_START_TIME = BENCHMARK_START_TIME_DT.strftime("%Y-%m-%dT%H:%M:%S+00:00")
|
||||
TEMP_FOLDER_ABS_PATH = Path.cwd() / "agbenchmark_config" / "temp_folder"
|
||||
CHALLENGES_ALREADY_BEATEN = (
|
||||
Path.cwd() / "agbenchmark_config" / "challenges_already_beaten.json"
|
||||
)
|
||||
UPDATES_JSON_PATH = Path.cwd() / "agbenchmark_config" / "updates.json"
|
||||
|
||||
|
||||
if os.environ.get("HELICONE_API_KEY"):
|
||||
if helicone_enabled:
|
||||
from helicone.lock import HeliconeLockManager
|
||||
|
||||
HeliconeLockManager.write_custom_property(
|
||||
"benchmark_start_time", BENCHMARK_START_TIME
|
||||
)
|
||||
|
||||
with open(
|
||||
Path(__file__).resolve().parent / "challenges" / "optional_categories.json"
|
||||
) as f:
|
||||
OPTIONAL_CATEGORIES = json.load(f)["optional_categories"]
|
||||
|
||||
@click.group(cls=DefaultGroup, default_if_no_args=True)
|
||||
@click.option("--debug", is_flag=True, help="Enable debug output")
|
||||
def cli(
|
||||
debug: bool,
|
||||
) -> Any:
|
||||
configure_logging(logging.DEBUG if debug else logging.INFO)
|
||||
|
||||
|
||||
def get_unique_categories() -> set[str]:
|
||||
"""Find all data.json files in the directory relative to this file and its subdirectories,
|
||||
read the "category" field from each file, and return a set of unique categories."""
|
||||
categories = set()
|
||||
|
||||
# Get the directory of this file
|
||||
this_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
|
||||
glob_path = os.path.join(this_dir, "./challenges/**/data.json")
|
||||
# Use it as the base for the glob pattern
|
||||
for data_file in glob.glob(glob_path, recursive=True):
|
||||
with open(data_file, "r") as f:
|
||||
try:
|
||||
data = json.load(f)
|
||||
categories.update(data.get("category", []))
|
||||
except json.JSONDecodeError:
|
||||
print(f"Error: {data_file} is not a valid JSON file.")
|
||||
continue
|
||||
except IOError:
|
||||
print(f"IOError: file could not be read: {data_file}")
|
||||
continue
|
||||
|
||||
return categories
|
||||
@cli.command(hidden=True)
|
||||
def start():
|
||||
raise DeprecationWarning(
|
||||
"`agbenchmark start` is deprecated. Use `agbenchmark run` instead."
|
||||
)
|
||||
|
||||
|
||||
def run_benchmark(
|
||||
maintain: bool = False,
|
||||
improve: bool = False,
|
||||
explore: bool = False,
|
||||
mock: bool = False,
|
||||
no_dep: bool = False,
|
||||
nc: bool = False,
|
||||
keep_answers: bool = False,
|
||||
category: Optional[tuple[str]] = None,
|
||||
skip_category: Optional[tuple[str]] = None,
|
||||
test: Optional[str] = None,
|
||||
cutoff: Optional[int] = None,
|
||||
server: bool = False,
|
||||
) -> int:
|
||||
"""Start the benchmark tests. If a category flag is provided, run the categories with that mark."""
|
||||
# Check if configuration file exists and is not empty
|
||||
|
||||
initialize_updates_file()
|
||||
SingletonReportManager()
|
||||
agent_benchmark_config_path = str(Path.cwd() / "agbenchmark_config" / "config.json")
|
||||
try:
|
||||
with open(agent_benchmark_config_path, "r") as f:
|
||||
agent_benchmark_config = AgentBenchmarkConfig(**json.load(f))
|
||||
agent_benchmark_config.agent_benchmark_config_path = (
|
||||
agent_benchmark_config_path
|
||||
)
|
||||
except json.JSONDecodeError:
|
||||
print("Error: benchmark_config.json is not a valid JSON file.")
|
||||
return 1
|
||||
|
||||
if maintain and improve and explore:
|
||||
print(
|
||||
"Error: You can't use --maintain, --improve or --explore at the same time. Please choose one."
|
||||
)
|
||||
return 1
|
||||
|
||||
if test and (category or skip_category or maintain or improve or explore):
|
||||
print(
|
||||
"Error: If you're running a specific test make sure no other options are selected. Please just pass the --test."
|
||||
)
|
||||
return 1
|
||||
|
||||
assert agent_benchmark_config.host, "Error: host needs to be added to the config."
|
||||
|
||||
print("Current configuration:")
|
||||
for key, value in vars(agent_benchmark_config).items():
|
||||
print(f"{key}: {value}")
|
||||
|
||||
pytest_args = ["-vs"]
|
||||
if keep_answers:
|
||||
pytest_args.append("--keep-answers")
|
||||
|
||||
if test:
|
||||
print("Running specific test:", test)
|
||||
else:
|
||||
# Categories that are used in the challenges
|
||||
categories = get_unique_categories()
|
||||
if category:
|
||||
invalid_categories = set(category) - categories
|
||||
assert (
|
||||
not invalid_categories
|
||||
), f"Invalid categories: {invalid_categories}. Valid categories are: {categories}"
|
||||
|
||||
if category:
|
||||
categories_to_run = set(category)
|
||||
if skip_category:
|
||||
categories_to_run = categories_to_run.difference(set(skip_category))
|
||||
assert categories_to_run, "Error: You can't skip all categories"
|
||||
pytest_args.extend(["-m", " or ".join(categories_to_run), "--category"])
|
||||
print("Running tests of category:", categories_to_run)
|
||||
elif skip_category:
|
||||
categories_to_run = categories - set(skip_category)
|
||||
assert categories_to_run, "Error: You can't skip all categories"
|
||||
pytest_args.extend(["-m", " or ".join(categories_to_run), "--category"])
|
||||
print("Running tests of category:", categories_to_run)
|
||||
else:
|
||||
print("Running all categories")
|
||||
|
||||
if maintain:
|
||||
print("Running only regression tests")
|
||||
pytest_args.append("--maintain")
|
||||
elif improve:
|
||||
print("Running only non-regression tests")
|
||||
pytest_args.append("--improve")
|
||||
elif explore:
|
||||
print("Only attempt challenges that have never been beaten")
|
||||
pytest_args.append("--explore")
|
||||
|
||||
if mock:
|
||||
pytest_args.append("--mock")
|
||||
os.environ[
|
||||
"IS_MOCK"
|
||||
] = "True" # ugly hack to make the mock work when calling from API
|
||||
|
||||
if no_dep:
|
||||
pytest_args.append("--no_dep")
|
||||
|
||||
if nc and cutoff:
|
||||
print(
|
||||
"Error: You can't use both --nc and --cutoff at the same time. Please choose one."
|
||||
)
|
||||
return 1
|
||||
|
||||
if nc:
|
||||
pytest_args.append("--nc")
|
||||
if cutoff:
|
||||
pytest_args.append("--cutoff")
|
||||
print(f"Setting cuttoff override to {cutoff} seconds.")
|
||||
current_dir = Path(__file__).resolve().parent
|
||||
print(f"Current directory: {current_dir}")
|
||||
pytest_args.extend((str(current_dir), "--cache-clear"))
|
||||
exit_code = pytest.main(pytest_args)
|
||||
SingletonReportManager().clear_instance()
|
||||
|
||||
|
||||
@click.group(invoke_without_command=True)
|
||||
@click.option("--backend", is_flag=True, help="If it's being run from the cli")
|
||||
@click.option("-c", "--category", multiple=True, help="Specific category to run")
|
||||
@cli.command(default=True)
|
||||
@click.option(
|
||||
"-c",
|
||||
"--category",
|
||||
multiple=True,
|
||||
help="(+) Select a category to run.",
|
||||
)
|
||||
@click.option(
|
||||
"-s",
|
||||
"--skip-category",
|
||||
multiple=True,
|
||||
help="Skips preventing the tests from this category from running",
|
||||
help="(+) Exclude a category from running.",
|
||||
)
|
||||
@click.option("--test", multiple=True, help="Specific test to run")
|
||||
@click.option("--maintain", is_flag=True, help="Runs only regression tests")
|
||||
@click.option("--improve", is_flag=True, help="Run only non-regression tests")
|
||||
@click.option("--test", multiple=True, help="(+) Select a test to run.")
|
||||
@click.option("--maintain", is_flag=True, help="Run only regression tests.")
|
||||
@click.option("--improve", is_flag=True, help="Run only non-regression tests.")
|
||||
@click.option(
|
||||
"--explore",
|
||||
is_flag=True,
|
||||
help="Only attempt challenges that have never been beaten",
|
||||
help="Run only challenges that have never been beaten.",
|
||||
)
|
||||
@click.option("--mock", is_flag=True, help="Run with mock")
|
||||
@click.option(
|
||||
"--no_dep",
|
||||
"--no-dep",
|
||||
is_flag=True,
|
||||
help="Run without dependencies",
|
||||
help="Run all (selected) challenges, regardless of dependency success/failure.",
|
||||
)
|
||||
@click.option("--nc", is_flag=True, help="Run without cutoff")
|
||||
@click.option("--cutoff", type=int, help="Override the challenge time limit (seconds).")
|
||||
@click.option("--nc", is_flag=True, help="Disable the challenge time limit.")
|
||||
@click.option("--mock", is_flag=True, help="Run with mock")
|
||||
@click.option("--keep-answers", is_flag=True, help="Keep answers")
|
||||
@click.option("--cutoff", help="Set or override tests cutoff (seconds)")
|
||||
@click.argument("value", type=str, required=False)
|
||||
def cli(
|
||||
@click.option(
|
||||
"--backend",
|
||||
is_flag=True,
|
||||
help="Write log output to a file instead of the terminal.",
|
||||
)
|
||||
# @click.argument(
|
||||
# "agent_path", type=click.Path(exists=True, file_okay=False), required=False
|
||||
# )
|
||||
def run(
|
||||
maintain: bool,
|
||||
improve: bool,
|
||||
explore: bool,
|
||||
@@ -213,18 +104,37 @@ def cli(
|
||||
no_dep: bool,
|
||||
nc: bool,
|
||||
keep_answers: bool,
|
||||
category: Optional[list[str]] = None,
|
||||
skip_category: Optional[list[str]] = None,
|
||||
test: Optional[str] = None,
|
||||
test: tuple[str],
|
||||
category: tuple[str],
|
||||
skip_category: tuple[str],
|
||||
cutoff: Optional[int] = None,
|
||||
backend: Optional[bool] = False,
|
||||
value: Optional[str] = None,
|
||||
) -> Any:
|
||||
# Redirect stdout if backend is True
|
||||
if value == "start":
|
||||
raise ("`agbenchmark start` is removed. Run `agbenchmark` instead.")
|
||||
if value == "serve":
|
||||
return serve()
|
||||
# agent_path: Optional[Path] = None,
|
||||
) -> None:
|
||||
"""
|
||||
Run the benchmark on the agent in the current directory.
|
||||
|
||||
Options marked with (+) can be specified multiple times, to select multiple items.
|
||||
"""
|
||||
from agbenchmark.main import run_benchmark, validate_args
|
||||
|
||||
agbenchmark_config = AgentBenchmarkConfig.load()
|
||||
logger.debug(f"agbenchmark_config: {agbenchmark_config.agbenchmark_config_dir}")
|
||||
try:
|
||||
validate_args(
|
||||
maintain=maintain,
|
||||
improve=improve,
|
||||
explore=explore,
|
||||
tests=test,
|
||||
categories=category,
|
||||
skip_categories=skip_category,
|
||||
no_cutoff=nc,
|
||||
cutoff=cutoff,
|
||||
)
|
||||
except InvalidInvocationError as e:
|
||||
logger.error("Error: " + "\n".join(e.args))
|
||||
sys.exit(1)
|
||||
|
||||
original_stdout = sys.stdout # Save the original standard output
|
||||
exit_code = None
|
||||
|
||||
@@ -232,16 +142,17 @@ def cli(
|
||||
with open("backend/backend_stdout.txt", "w") as f:
|
||||
sys.stdout = f
|
||||
exit_code = run_benchmark(
|
||||
config=agbenchmark_config,
|
||||
maintain=maintain,
|
||||
improve=improve,
|
||||
explore=explore,
|
||||
mock=mock,
|
||||
no_dep=no_dep,
|
||||
nc=nc,
|
||||
no_cutoff=nc,
|
||||
keep_answers=keep_answers,
|
||||
category=category,
|
||||
skip_category=skip_category,
|
||||
test=test,
|
||||
tests=test,
|
||||
categories=category,
|
||||
skip_categories=skip_category,
|
||||
cutoff=cutoff,
|
||||
)
|
||||
|
||||
@@ -249,16 +160,17 @@ def cli(
|
||||
|
||||
else:
|
||||
exit_code = run_benchmark(
|
||||
config=agbenchmark_config,
|
||||
maintain=maintain,
|
||||
improve=improve,
|
||||
explore=explore,
|
||||
mock=mock,
|
||||
no_dep=no_dep,
|
||||
nc=nc,
|
||||
no_cutoff=nc,
|
||||
keep_answers=keep_answers,
|
||||
category=category,
|
||||
skip_category=skip_category,
|
||||
test=test,
|
||||
tests=test,
|
||||
categories=category,
|
||||
skip_categories=skip_category,
|
||||
cutoff=cutoff,
|
||||
)
|
||||
|
||||
@@ -266,33 +178,44 @@ def cli(
|
||||
|
||||
|
||||
@cli.command()
|
||||
def version():
|
||||
"""Print the version of the benchmark tool."""
|
||||
current_directory = Path(__file__).resolve().parent
|
||||
version = toml.load(current_directory / ".." / "pyproject.toml")["tool"]["poetry"][
|
||||
"version"
|
||||
]
|
||||
print(f"Benchmark Tool Version {version}")
|
||||
|
||||
|
||||
def serve():
|
||||
@click.option("--port", type=int, help="Port to run the API on.")
|
||||
def serve(port: Optional[int] = None):
|
||||
"""Serve the benchmark frontend and API on port 8080."""
|
||||
import uvicorn
|
||||
|
||||
from agbenchmark.app import setup_fastapi_app
|
||||
|
||||
config = AgentBenchmarkConfig.load()
|
||||
app = setup_fastapi_app(config)
|
||||
|
||||
# Run the FastAPI application using uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=8080)
|
||||
port = port or int(os.getenv("PORT", 8080))
|
||||
uvicorn.run(app, host="0.0.0.0", port=port)
|
||||
|
||||
|
||||
def initialize_updates_file():
|
||||
if os.path.exists(UPDATES_JSON_PATH):
|
||||
# If the file already exists, overwrite it with an empty list
|
||||
with open(UPDATES_JSON_PATH, "w") as file:
|
||||
json.dump([], file, indent=2)
|
||||
print("Initialized updates.json by overwriting with an empty array")
|
||||
else:
|
||||
# If the file doesn't exist, create it and write an empty list
|
||||
with open(UPDATES_JSON_PATH, "w") as file:
|
||||
json.dump([], file, indent=2)
|
||||
print("Created updates.json and initialized it with an empty array")
|
||||
@cli.command()
|
||||
def config():
|
||||
"""Displays info regarding the present AGBenchmark config."""
|
||||
try:
|
||||
config = AgentBenchmarkConfig.load()
|
||||
except FileNotFoundError as e:
|
||||
click.echo(e, err=True)
|
||||
return 1
|
||||
|
||||
k_col_width = max(len(k) for k in config.dict().keys())
|
||||
for k, v in config.dict().items():
|
||||
click.echo(f"{k: <{k_col_width}} = {v}")
|
||||
|
||||
|
||||
@cli.command()
|
||||
def version():
|
||||
"""Print version info for the AGBenchmark application."""
|
||||
import toml
|
||||
|
||||
package_root = Path(__file__).resolve().parent.parent
|
||||
pyproject = toml.load(package_root / "pyproject.toml")
|
||||
version = pyproject["tool"]["poetry"]["version"]
|
||||
click.echo(f"AGBenchmark version {version}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -1,30 +1,25 @@
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import pathlib
|
||||
import time
|
||||
from typing import Any, Dict, Optional
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from agent_protocol_client import AgentApi, ApiClient, Configuration, TaskRequestBody
|
||||
|
||||
from agbenchmark.__main__ import TEMP_FOLDER_ABS_PATH, UPDATES_JSON_PATH
|
||||
from agbenchmark.agent_interface import get_list_of_file_paths
|
||||
from agbenchmark.agent_protocol_client import (
|
||||
AgentApi,
|
||||
ApiClient,
|
||||
Configuration,
|
||||
TaskRequestBody,
|
||||
)
|
||||
from agbenchmark.agent_protocol_client.models.step import Step
|
||||
from agbenchmark.config import AgentBenchmarkConfig
|
||||
from agbenchmark.utils.data_types import ChallengeData
|
||||
|
||||
LOG = logging.getLogger(__name__)
|
||||
|
||||
|
||||
async def run_api_agent(
|
||||
task: ChallengeData, config: Dict[str, Any], artifacts_location: str, timeout: int
|
||||
task: ChallengeData,
|
||||
config: AgentBenchmarkConfig,
|
||||
artifacts_location: str,
|
||||
timeout: int,
|
||||
) -> None:
|
||||
host_value = None
|
||||
|
||||
configuration = Configuration(host=config["AgentBenchmarkConfig"].host + "/ap/v1")
|
||||
configuration = Configuration(host=config.host)
|
||||
async with ApiClient(configuration) as api_client:
|
||||
api_instance = AgentApi(api_client)
|
||||
task_request_body = TaskRequestBody(input=task.task)
|
||||
@@ -45,7 +40,6 @@ async def run_api_agent(
|
||||
# Read the existing JSON data from the file
|
||||
|
||||
step = await api_instance.execute_agent_task_step(task_id=task_id)
|
||||
await append_updates_file(step)
|
||||
|
||||
print(f"[{task.name}] - step {step.name} ({i}. request)")
|
||||
i += 1
|
||||
@@ -54,34 +48,38 @@ async def run_api_agent(
|
||||
raise TimeoutError("Time limit exceeded")
|
||||
if not step or step.is_last:
|
||||
steps_remaining = False
|
||||
# if we're calling a mock agent, we "cheat" and give the correct artifacts to pass the tests
|
||||
|
||||
# In "mock" mode, we cheat by giving the correct artifacts to pass the challenge
|
||||
if os.getenv("IS_MOCK"):
|
||||
await upload_artifacts(
|
||||
api_instance, artifacts_location, task_id, "artifacts_out"
|
||||
)
|
||||
|
||||
await copy_agent_artifacts_into_temp_folder(api_instance, task_id)
|
||||
await copy_agent_artifacts_into_folder(
|
||||
api_instance, task_id, config.temp_folder
|
||||
)
|
||||
|
||||
|
||||
async def copy_agent_artifacts_into_temp_folder(api_instance, task_id):
|
||||
async def copy_agent_artifacts_into_folder(
|
||||
api_instance: AgentApi, task_id: str, folder: Path
|
||||
):
|
||||
artifacts = await api_instance.list_agent_task_artifacts(task_id=task_id)
|
||||
|
||||
for artifact in artifacts.artifacts:
|
||||
# current absolute path of the directory of the file
|
||||
directory_location = pathlib.Path(TEMP_FOLDER_ABS_PATH)
|
||||
if artifact.relative_path:
|
||||
path = (
|
||||
path: str = (
|
||||
artifact.relative_path
|
||||
if not artifact.relative_path.startswith("/")
|
||||
else artifact.relative_path[1:]
|
||||
)
|
||||
directory_location = pathlib.Path(
|
||||
os.path.dirname(directory_location / path)
|
||||
)
|
||||
LOG.info(f"Creating directory {directory_location}")
|
||||
folder = (folder / path).parent
|
||||
|
||||
directory_location.mkdir(parents=True, exist_ok=True)
|
||||
if not folder.exists():
|
||||
LOG.info(f"Creating directory {folder}")
|
||||
folder.mkdir(parents=True)
|
||||
|
||||
file_path = directory_location / artifact.file_name
|
||||
file_path = folder / artifact.file_name
|
||||
LOG.info(f"Writing file {file_path}")
|
||||
with open(file_path, "wb") as f:
|
||||
content = await api_instance.download_agent_task_artifact(
|
||||
@@ -91,35 +89,16 @@ async def copy_agent_artifacts_into_temp_folder(api_instance, task_id):
|
||||
f.write(content)
|
||||
|
||||
|
||||
async def append_updates_file(step: Step):
|
||||
with open(UPDATES_JSON_PATH, "r") as file:
|
||||
existing_data = json.load(file)
|
||||
# Append the new update to the existing array
|
||||
new_update = create_update_json(step)
|
||||
|
||||
existing_data.append(new_update)
|
||||
# Write the updated array back to the file
|
||||
with open(UPDATES_JSON_PATH, "w") as file:
|
||||
file.write(json.dumps(existing_data, indent=2))
|
||||
|
||||
|
||||
async def upload_artifacts(
|
||||
api_instance: ApiClient, artifacts_location: str, task_id: str, type: str
|
||||
api_instance: AgentApi, artifacts_location: str, task_id: str, type: str
|
||||
) -> None:
|
||||
for file_path in get_list_of_file_paths(artifacts_location, type):
|
||||
relative_path: Optional[str] = "/".join(
|
||||
file_path.split(f"{type}/", 1)[-1].split("/")[:-1]
|
||||
str(file_path).split(f"{type}/", 1)[-1].split("/")[:-1]
|
||||
)
|
||||
if not relative_path:
|
||||
relative_path = None
|
||||
|
||||
await api_instance.upload_agent_task_artifacts(
|
||||
task_id=task_id, file=file_path, relative_path=relative_path
|
||||
task_id=task_id, file=str(file_path), relative_path=relative_path
|
||||
)
|
||||
|
||||
|
||||
def create_update_json(step: Step):
|
||||
now = int(time.time())
|
||||
content = {"content": step.to_dict(), "timestamp": now}
|
||||
|
||||
return content
|
||||
|
||||
@@ -1,45 +1,27 @@
|
||||
import os
|
||||
import shutil
|
||||
import sys
|
||||
from typing import List
|
||||
from pathlib import Path
|
||||
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from agbenchmark.execute_sub_process import execute_subprocess
|
||||
|
||||
load_dotenv()
|
||||
|
||||
helicone_graphql_logs = os.getenv("HELICONE_GRAPHQL_LOGS")
|
||||
HELICONE_GRAPHQL_LOGS = (
|
||||
helicone_graphql_logs.lower() == "true" if helicone_graphql_logs else False
|
||||
)
|
||||
|
||||
|
||||
def run_agent(task: str, timeout: int) -> None:
|
||||
print(f"Running agbenchmark/benchmarks.py with timeout {timeout}")
|
||||
|
||||
command = [sys.executable, "-m", "agbenchmark_config.benchmarks", str(task)]
|
||||
|
||||
execute_subprocess(command, timeout)
|
||||
HELICONE_GRAPHQL_LOGS = os.getenv("HELICONE_GRAPHQL_LOGS", "").lower() == "true"
|
||||
|
||||
|
||||
def get_list_of_file_paths(
|
||||
challenge_dir_path: str, artifact_folder_name: str
|
||||
) -> List[str]:
|
||||
# this file is at agbenchmark\agent_interface.py
|
||||
source_dir = os.path.join(
|
||||
challenge_dir_path,
|
||||
artifact_folder_name,
|
||||
)
|
||||
if not os.path.exists(source_dir):
|
||||
challenge_dir_path: str | Path, artifact_folder_name: str
|
||||
) -> list[Path]:
|
||||
source_dir = Path(challenge_dir_path) / artifact_folder_name
|
||||
if not source_dir.exists():
|
||||
return []
|
||||
return [os.path.join(source_dir, file_name) for file_name in os.listdir(source_dir)]
|
||||
return list(source_dir.iterdir())
|
||||
|
||||
|
||||
def copy_artifacts_into_temp_folder(
|
||||
workspace: str | dict[str, str], artifact_folder_name: str, challenge_dir_path: str
|
||||
workspace: str | Path, artifact_folder_name: str, challenge_dir_path: str | Path
|
||||
) -> None:
|
||||
file_paths = get_list_of_file_paths(challenge_dir_path, artifact_folder_name)
|
||||
for file_path in file_paths:
|
||||
if os.path.isfile(file_path):
|
||||
if file_path.is_file():
|
||||
shutil.copy(file_path, workspace)
|
||||
|
||||
@@ -1,42 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
# flake8: noqa
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
__version__ = "1.0.0"
|
||||
|
||||
# import apis into sdk package
|
||||
from agbenchmark.agent_protocol_client.api.agent_api import AgentApi
|
||||
from agbenchmark.agent_protocol_client.api_client import ApiClient
|
||||
|
||||
# import ApiClient
|
||||
from agbenchmark.agent_protocol_client.api_response import ApiResponse
|
||||
from agbenchmark.agent_protocol_client.configuration import Configuration
|
||||
from agbenchmark.agent_protocol_client.exceptions import (
|
||||
ApiAttributeError,
|
||||
ApiException,
|
||||
ApiKeyError,
|
||||
ApiTypeError,
|
||||
ApiValueError,
|
||||
OpenApiException,
|
||||
)
|
||||
|
||||
# import models into sdk package
|
||||
from agbenchmark.agent_protocol_client.models.artifact import Artifact
|
||||
from agbenchmark.agent_protocol_client.models.step import Step
|
||||
from agbenchmark.agent_protocol_client.models.step_all_of import StepAllOf
|
||||
from agbenchmark.agent_protocol_client.models.step_request_body import StepRequestBody
|
||||
from agbenchmark.agent_protocol_client.models.task import Task
|
||||
from agbenchmark.agent_protocol_client.models.task_all_of import TaskAllOf
|
||||
from agbenchmark.agent_protocol_client.models.task_request_body import TaskRequestBody
|
||||
@@ -1,4 +0,0 @@
|
||||
# flake8: noqa
|
||||
|
||||
# import apis into api package
|
||||
from agbenchmark.agent_protocol_client.api.agent_api import AgentApi
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,838 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
import atexit
|
||||
import datetime
|
||||
import json
|
||||
import mimetypes
|
||||
import os
|
||||
import re
|
||||
import tempfile
|
||||
from multiprocessing.pool import ThreadPool
|
||||
from urllib.parse import quote
|
||||
|
||||
from dateutil.parser import parse
|
||||
|
||||
import agbenchmark.agent_protocol_client.models
|
||||
from agbenchmark.agent_protocol_client import rest
|
||||
from agbenchmark.agent_protocol_client.api_response import ApiResponse
|
||||
from agbenchmark.agent_protocol_client.configuration import Configuration
|
||||
from agbenchmark.agent_protocol_client.exceptions import ApiException, ApiValueError
|
||||
|
||||
|
||||
class ApiClient(object):
|
||||
"""Generic API client for OpenAPI client library builds.
|
||||
|
||||
OpenAPI generic API client. This client handles the client-
|
||||
server communication, and is invariant across implementations. Specifics of
|
||||
the methods and models for each application are generated from the OpenAPI
|
||||
templates.
|
||||
|
||||
:param configuration: .Configuration object for this client
|
||||
:param header_name: a header to pass when making calls to the API.
|
||||
:param header_value: a header value to pass when making calls to
|
||||
the API.
|
||||
:param cookie: a cookie to include in the header when making calls
|
||||
to the API
|
||||
:param pool_threads: The number of threads to use for async requests
|
||||
to the API. More threads means more concurrent API requests.
|
||||
"""
|
||||
|
||||
PRIMITIVE_TYPES = (float, bool, bytes, str, int)
|
||||
NATIVE_TYPES_MAPPING = {
|
||||
"int": int,
|
||||
"long": int, # TODO remove as only py3 is supported?
|
||||
"float": float,
|
||||
"str": str,
|
||||
"bool": bool,
|
||||
"date": datetime.date,
|
||||
"datetime": datetime.datetime,
|
||||
"object": object,
|
||||
}
|
||||
_pool = None
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
configuration=None,
|
||||
header_name=None,
|
||||
header_value=None,
|
||||
cookie=None,
|
||||
pool_threads=1,
|
||||
):
|
||||
# use default configuration if none is provided
|
||||
if configuration is None:
|
||||
configuration = Configuration.get_default()
|
||||
self.configuration = configuration
|
||||
self.pool_threads = pool_threads
|
||||
|
||||
self.rest_client = rest.RESTClientObject(configuration)
|
||||
self.default_headers = {}
|
||||
if header_name is not None:
|
||||
self.default_headers[header_name] = header_value
|
||||
self.cookie = cookie
|
||||
# Set default User-Agent.
|
||||
self.user_agent = "OpenAPI-Generator/1.0.0/python"
|
||||
self.client_side_validation = configuration.client_side_validation
|
||||
|
||||
async def __aenter__(self):
|
||||
return self
|
||||
|
||||
async def __aexit__(self, exc_type, exc_value, traceback):
|
||||
await self.close()
|
||||
|
||||
async def close(self):
|
||||
await self.rest_client.close()
|
||||
if self._pool:
|
||||
self._pool.close()
|
||||
self._pool.join()
|
||||
self._pool = None
|
||||
if hasattr(atexit, "unregister"):
|
||||
atexit.unregister(self.close)
|
||||
|
||||
@property
|
||||
def pool(self):
|
||||
"""Create thread pool on first request
|
||||
avoids instantiating unused threadpool for blocking clients.
|
||||
"""
|
||||
if self._pool is None:
|
||||
atexit.register(self.close)
|
||||
self._pool = ThreadPool(self.pool_threads)
|
||||
return self._pool
|
||||
|
||||
@property
|
||||
def user_agent(self):
|
||||
"""User agent for this API client"""
|
||||
return self.default_headers["User-Agent"]
|
||||
|
||||
@user_agent.setter
|
||||
def user_agent(self, value):
|
||||
self.default_headers["User-Agent"] = value
|
||||
|
||||
def set_default_header(self, header_name, header_value):
|
||||
self.default_headers[header_name] = header_value
|
||||
|
||||
_default = None
|
||||
|
||||
@classmethod
|
||||
def get_default(cls):
|
||||
"""Return new instance of ApiClient.
|
||||
|
||||
This method returns newly created, based on default constructor,
|
||||
object of ApiClient class or returns a copy of default
|
||||
ApiClient.
|
||||
|
||||
:return: The ApiClient object.
|
||||
"""
|
||||
if cls._default is None:
|
||||
cls._default = ApiClient()
|
||||
return cls._default
|
||||
|
||||
@classmethod
|
||||
def set_default(cls, default):
|
||||
"""Set default instance of ApiClient.
|
||||
|
||||
It stores default ApiClient.
|
||||
|
||||
:param default: object of ApiClient.
|
||||
"""
|
||||
cls._default = default
|
||||
|
||||
async def __call_api(
|
||||
self,
|
||||
resource_path,
|
||||
method,
|
||||
path_params=None,
|
||||
query_params=None,
|
||||
header_params=None,
|
||||
body=None,
|
||||
post_params=None,
|
||||
files=None,
|
||||
response_types_map=None,
|
||||
auth_settings=None,
|
||||
_return_http_data_only=None,
|
||||
collection_formats=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
_host=None,
|
||||
_request_auth=None,
|
||||
):
|
||||
config = self.configuration
|
||||
|
||||
# header parameters
|
||||
header_params = header_params or {}
|
||||
header_params.update(self.default_headers)
|
||||
if self.cookie:
|
||||
header_params["Cookie"] = self.cookie
|
||||
if header_params:
|
||||
header_params = self.sanitize_for_serialization(header_params)
|
||||
header_params = dict(
|
||||
self.parameters_to_tuples(header_params, collection_formats)
|
||||
)
|
||||
|
||||
# path parameters
|
||||
if path_params:
|
||||
path_params = self.sanitize_for_serialization(path_params)
|
||||
path_params = self.parameters_to_tuples(path_params, collection_formats)
|
||||
for k, v in path_params:
|
||||
# specified safe chars, encode everything
|
||||
resource_path = resource_path.replace(
|
||||
"{%s}" % k, quote(str(v), safe=config.safe_chars_for_path_param)
|
||||
)
|
||||
|
||||
# post parameters
|
||||
if post_params or files:
|
||||
post_params = post_params if post_params else []
|
||||
post_params = self.sanitize_for_serialization(post_params)
|
||||
post_params = self.parameters_to_tuples(post_params, collection_formats)
|
||||
post_params.extend(self.files_parameters(files))
|
||||
|
||||
# auth setting
|
||||
self.update_params_for_auth(
|
||||
header_params,
|
||||
query_params,
|
||||
auth_settings,
|
||||
resource_path,
|
||||
method,
|
||||
body,
|
||||
request_auth=_request_auth,
|
||||
)
|
||||
|
||||
# body
|
||||
if body:
|
||||
body = self.sanitize_for_serialization(body)
|
||||
|
||||
# request url
|
||||
if _host is None:
|
||||
url = self.configuration.host + resource_path
|
||||
else:
|
||||
# use server/host defined in path or operation instead
|
||||
url = _host + resource_path
|
||||
|
||||
# query parameters
|
||||
if query_params:
|
||||
query_params = self.sanitize_for_serialization(query_params)
|
||||
url_query = self.parameters_to_url_query(query_params, collection_formats)
|
||||
url += "?" + url_query
|
||||
|
||||
try:
|
||||
# perform request and return response
|
||||
response_data = await self.request(
|
||||
method,
|
||||
url,
|
||||
query_params=query_params,
|
||||
headers=header_params,
|
||||
post_params=post_params,
|
||||
body=body,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
)
|
||||
except ApiException as e:
|
||||
if e.body:
|
||||
e.body = e.body.decode("utf-8")
|
||||
raise e
|
||||
|
||||
self.last_response = response_data
|
||||
|
||||
return_data = None # assuming derialization is not needed
|
||||
# data needs deserialization or returns HTTP data (deserialized) only
|
||||
if _preload_content or _return_http_data_only:
|
||||
response_type = response_types_map.get(str(response_data.status), None)
|
||||
|
||||
if response_type == "bytearray":
|
||||
response_data.data = response_data.data
|
||||
else:
|
||||
match = None
|
||||
content_type = response_data.getheader("content-type")
|
||||
if content_type is not None:
|
||||
match = re.search(r"charset=([a-zA-Z\-\d]+)[\s;]?", content_type)
|
||||
encoding = match.group(1) if match else "utf-8"
|
||||
response_data.data = response_data.data.decode(encoding)
|
||||
|
||||
# deserialize response data
|
||||
if response_type == "bytearray":
|
||||
return_data = response_data.data
|
||||
elif response_type:
|
||||
return_data = self.deserialize(response_data, response_type)
|
||||
else:
|
||||
return_data = None
|
||||
|
||||
if _return_http_data_only:
|
||||
return return_data
|
||||
else:
|
||||
return ApiResponse(
|
||||
status_code=response_data.status,
|
||||
data=return_data,
|
||||
headers=response_data.getheaders(),
|
||||
raw_data=response_data.data,
|
||||
)
|
||||
|
||||
def sanitize_for_serialization(self, obj):
|
||||
"""Builds a JSON POST object.
|
||||
|
||||
If obj is None, return None.
|
||||
If obj is str, int, long, float, bool, return directly.
|
||||
If obj is datetime.datetime, datetime.date
|
||||
convert to string in iso8601 format.
|
||||
If obj is list, sanitize each element in the list.
|
||||
If obj is dict, return the dict.
|
||||
If obj is OpenAPI model, return the properties dict.
|
||||
|
||||
:param obj: The data to serialize.
|
||||
:return: The serialized form of data.
|
||||
"""
|
||||
if obj is None:
|
||||
return None
|
||||
elif isinstance(obj, self.PRIMITIVE_TYPES):
|
||||
return obj
|
||||
elif isinstance(obj, list):
|
||||
return [self.sanitize_for_serialization(sub_obj) for sub_obj in obj]
|
||||
elif isinstance(obj, tuple):
|
||||
return tuple(self.sanitize_for_serialization(sub_obj) for sub_obj in obj)
|
||||
elif isinstance(obj, (datetime.datetime, datetime.date)):
|
||||
return obj.isoformat()
|
||||
|
||||
if isinstance(obj, dict):
|
||||
obj_dict = obj
|
||||
else:
|
||||
# Convert model obj to dict except
|
||||
# attributes `openapi_types`, `attribute_map`
|
||||
# and attributes which value is not None.
|
||||
# Convert attribute name to json key in
|
||||
# model definition for request.
|
||||
obj_dict = obj.to_dict()
|
||||
|
||||
return {
|
||||
key: self.sanitize_for_serialization(val) for key, val in obj_dict.items()
|
||||
}
|
||||
|
||||
def deserialize(self, response, response_type):
|
||||
"""Deserializes response into an object.
|
||||
|
||||
:param response: RESTResponse object to be deserialized.
|
||||
:param response_type: class literal for
|
||||
deserialized object, or string of class name.
|
||||
|
||||
:return: deserialized object.
|
||||
"""
|
||||
# handle file downloading
|
||||
# save response body into a tmp file and return the instance
|
||||
if response_type == "file":
|
||||
return self.__deserialize_file(response)
|
||||
|
||||
# fetch data from response object
|
||||
try:
|
||||
data = json.loads(response.data)
|
||||
except ValueError:
|
||||
data = response.data
|
||||
|
||||
return self.__deserialize(data, response_type)
|
||||
|
||||
def __deserialize(self, data, klass):
|
||||
"""Deserializes dict, list, str into an object.
|
||||
|
||||
:param data: dict, list or str.
|
||||
:param klass: class literal, or string of class name.
|
||||
|
||||
:return: object.
|
||||
"""
|
||||
if data is None:
|
||||
return None
|
||||
|
||||
if type(klass) == str:
|
||||
if klass.startswith("List["):
|
||||
sub_kls = re.match(r"List\[(.*)]", klass).group(1)
|
||||
return [self.__deserialize(sub_data, sub_kls) for sub_data in data]
|
||||
|
||||
if klass.startswith("Dict["):
|
||||
sub_kls = re.match(r"Dict\[([^,]*), (.*)]", klass).group(2)
|
||||
return {k: self.__deserialize(v, sub_kls) for k, v in data.items()}
|
||||
|
||||
# convert str to class
|
||||
if klass in self.NATIVE_TYPES_MAPPING:
|
||||
klass = self.NATIVE_TYPES_MAPPING[klass]
|
||||
else:
|
||||
klass = getattr(agbenchmark.agent_protocol_client.models, klass)
|
||||
|
||||
if klass in self.PRIMITIVE_TYPES:
|
||||
return self.__deserialize_primitive(data, klass)
|
||||
elif klass == object:
|
||||
return self.__deserialize_object(data)
|
||||
elif klass == datetime.date:
|
||||
return self.__deserialize_date(data)
|
||||
elif klass == datetime.datetime:
|
||||
return self.__deserialize_datetime(data)
|
||||
else:
|
||||
return self.__deserialize_model(data, klass)
|
||||
|
||||
def call_api(
|
||||
self,
|
||||
resource_path,
|
||||
method,
|
||||
path_params=None,
|
||||
query_params=None,
|
||||
header_params=None,
|
||||
body=None,
|
||||
post_params=None,
|
||||
files=None,
|
||||
response_types_map=None,
|
||||
auth_settings=None,
|
||||
async_req=None,
|
||||
_return_http_data_only=None,
|
||||
collection_formats=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
_host=None,
|
||||
_request_auth=None,
|
||||
):
|
||||
"""Makes the HTTP request (synchronous) and returns deserialized data.
|
||||
|
||||
To make an async_req request, set the async_req parameter.
|
||||
|
||||
:param resource_path: Path to method endpoint.
|
||||
:param method: Method to call.
|
||||
:param path_params: Path parameters in the url.
|
||||
:param query_params: Query parameters in the url.
|
||||
:param header_params: Header parameters to be
|
||||
placed in the request header.
|
||||
:param body: Request body.
|
||||
:param post_params dict: Request post form parameters,
|
||||
for `application/x-www-form-urlencoded`, `multipart/form-data`.
|
||||
:param auth_settings list: Auth Settings names for the request.
|
||||
:param response: Response data type.
|
||||
:param files dict: key -> filename, value -> filepath,
|
||||
for `multipart/form-data`.
|
||||
:param async_req bool: execute request asynchronously
|
||||
:param _return_http_data_only: response data instead of ApiResponse
|
||||
object with status code, headers, etc
|
||||
:param _preload_content: if False, the ApiResponse.data will
|
||||
be set to none and raw_data will store the
|
||||
HTTP response body without reading/decoding.
|
||||
Default is True.
|
||||
:param collection_formats: dict of collection formats for path, query,
|
||||
header, and post parameters.
|
||||
:param _request_timeout: timeout setting for this request. If one
|
||||
number provided, it will be total request
|
||||
timeout. It can also be a pair (tuple) of
|
||||
(connection, read) timeouts.
|
||||
:param _request_auth: set to override the auth_settings for an a single
|
||||
request; this effectively ignores the authentication
|
||||
in the spec for a single request.
|
||||
:type _request_token: dict, optional
|
||||
:return:
|
||||
If async_req parameter is True,
|
||||
the request will be called asynchronously.
|
||||
The method will return the request thread.
|
||||
If parameter async_req is False or missing,
|
||||
then the method will return the response directly.
|
||||
"""
|
||||
if not async_req:
|
||||
return self.__call_api(
|
||||
resource_path,
|
||||
method,
|
||||
path_params,
|
||||
query_params,
|
||||
header_params,
|
||||
body,
|
||||
post_params,
|
||||
files,
|
||||
response_types_map,
|
||||
auth_settings,
|
||||
_return_http_data_only,
|
||||
collection_formats,
|
||||
_preload_content,
|
||||
_request_timeout,
|
||||
_host,
|
||||
_request_auth,
|
||||
)
|
||||
|
||||
return self.pool.apply_async(
|
||||
self.__call_api,
|
||||
(
|
||||
resource_path,
|
||||
method,
|
||||
path_params,
|
||||
query_params,
|
||||
header_params,
|
||||
body,
|
||||
post_params,
|
||||
files,
|
||||
response_types_map,
|
||||
auth_settings,
|
||||
_return_http_data_only,
|
||||
collection_formats,
|
||||
_preload_content,
|
||||
_request_timeout,
|
||||
_host,
|
||||
_request_auth,
|
||||
),
|
||||
)
|
||||
|
||||
def request(
|
||||
self,
|
||||
method,
|
||||
url,
|
||||
query_params=None,
|
||||
headers=None,
|
||||
post_params=None,
|
||||
body=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
):
|
||||
"""Makes the HTTP request using RESTClient."""
|
||||
if method == "GET":
|
||||
return self.rest_client.get_request(
|
||||
url,
|
||||
query_params=query_params,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
headers=headers,
|
||||
)
|
||||
elif method == "HEAD":
|
||||
return self.rest_client.head_request(
|
||||
url,
|
||||
query_params=query_params,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
headers=headers,
|
||||
)
|
||||
elif method == "OPTIONS":
|
||||
return self.rest_client.options_request(
|
||||
url,
|
||||
query_params=query_params,
|
||||
headers=headers,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
)
|
||||
elif method == "POST":
|
||||
return self.rest_client.post_request(
|
||||
url,
|
||||
query_params=query_params,
|
||||
headers=headers,
|
||||
post_params=post_params,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
body=body,
|
||||
)
|
||||
elif method == "PUT":
|
||||
return self.rest_client.put_request(
|
||||
url,
|
||||
query_params=query_params,
|
||||
headers=headers,
|
||||
post_params=post_params,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
body=body,
|
||||
)
|
||||
elif method == "PATCH":
|
||||
return self.rest_client.patch_request(
|
||||
url,
|
||||
query_params=query_params,
|
||||
headers=headers,
|
||||
post_params=post_params,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
body=body,
|
||||
)
|
||||
elif method == "DELETE":
|
||||
return self.rest_client.delete_request(
|
||||
url,
|
||||
query_params=query_params,
|
||||
headers=headers,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
body=body,
|
||||
)
|
||||
else:
|
||||
raise ApiValueError(
|
||||
"http method must be `GET`, `HEAD`, `OPTIONS`,"
|
||||
" `POST`, `PATCH`, `PUT` or `DELETE`."
|
||||
)
|
||||
|
||||
def parameters_to_tuples(self, params, collection_formats):
|
||||
"""Get parameters as list of tuples, formatting collections.
|
||||
|
||||
:param params: Parameters as dict or list of two-tuples
|
||||
:param dict collection_formats: Parameter collection formats
|
||||
:return: Parameters as list of tuples, collections formatted
|
||||
"""
|
||||
new_params = []
|
||||
if collection_formats is None:
|
||||
collection_formats = {}
|
||||
for k, v in (
|
||||
params.items() if isinstance(params, dict) else params
|
||||
): # noqa: E501
|
||||
if k in collection_formats:
|
||||
collection_format = collection_formats[k]
|
||||
if collection_format == "multi":
|
||||
new_params.extend((k, value) for value in v)
|
||||
else:
|
||||
if collection_format == "ssv":
|
||||
delimiter = " "
|
||||
elif collection_format == "tsv":
|
||||
delimiter = "\t"
|
||||
elif collection_format == "pipes":
|
||||
delimiter = "|"
|
||||
else: # csv is the default
|
||||
delimiter = ","
|
||||
new_params.append((k, delimiter.join(str(value) for value in v)))
|
||||
else:
|
||||
new_params.append((k, v))
|
||||
return new_params
|
||||
|
||||
def parameters_to_url_query(self, params, collection_formats):
|
||||
"""Get parameters as list of tuples, formatting collections.
|
||||
|
||||
:param params: Parameters as dict or list of two-tuples
|
||||
:param dict collection_formats: Parameter collection formats
|
||||
:return: URL query string (e.g. a=Hello%20World&b=123)
|
||||
"""
|
||||
new_params = []
|
||||
if collection_formats is None:
|
||||
collection_formats = {}
|
||||
for k, v in (
|
||||
params.items() if isinstance(params, dict) else params
|
||||
): # noqa: E501
|
||||
if isinstance(v, (int, float)):
|
||||
v = str(v)
|
||||
if isinstance(v, bool):
|
||||
v = str(v).lower()
|
||||
if isinstance(v, dict):
|
||||
v = json.dumps(v)
|
||||
|
||||
if k in collection_formats:
|
||||
collection_format = collection_formats[k]
|
||||
if collection_format == "multi":
|
||||
new_params.extend((k, value) for value in v)
|
||||
else:
|
||||
if collection_format == "ssv":
|
||||
delimiter = " "
|
||||
elif collection_format == "tsv":
|
||||
delimiter = "\t"
|
||||
elif collection_format == "pipes":
|
||||
delimiter = "|"
|
||||
else: # csv is the default
|
||||
delimiter = ","
|
||||
new_params.append(
|
||||
(k, delimiter.join(quote(str(value)) for value in v))
|
||||
)
|
||||
else:
|
||||
new_params.append((k, quote(str(v))))
|
||||
|
||||
return "&".join(["=".join(item) for item in new_params])
|
||||
|
||||
def files_parameters(self, files=None):
|
||||
"""Builds form parameters.
|
||||
|
||||
:param files: File parameters.
|
||||
:return: Form parameters with files.
|
||||
"""
|
||||
params = []
|
||||
|
||||
if files:
|
||||
for k, v in files.items():
|
||||
if not v:
|
||||
continue
|
||||
file_names = v if type(v) is list else [v]
|
||||
for n in file_names:
|
||||
with open(n, "rb") as f:
|
||||
filename = os.path.basename(f.name)
|
||||
filedata = f.read()
|
||||
mimetype = (
|
||||
mimetypes.guess_type(filename)[0]
|
||||
or "application/octet-stream"
|
||||
)
|
||||
params.append(tuple([k, tuple([filename, filedata, mimetype])]))
|
||||
|
||||
return params
|
||||
|
||||
def select_header_accept(self, accepts):
|
||||
"""Returns `Accept` based on an array of accepts provided.
|
||||
|
||||
:param accepts: List of headers.
|
||||
:return: Accept (e.g. application/json).
|
||||
"""
|
||||
if not accepts:
|
||||
return
|
||||
|
||||
for accept in accepts:
|
||||
if re.search("json", accept, re.IGNORECASE):
|
||||
return accept
|
||||
|
||||
return accepts[0]
|
||||
|
||||
def select_header_content_type(self, content_types):
|
||||
"""Returns `Content-Type` based on an array of content_types provided.
|
||||
|
||||
:param content_types: List of content-types.
|
||||
:return: Content-Type (e.g. application/json).
|
||||
"""
|
||||
if not content_types:
|
||||
return None
|
||||
|
||||
for content_type in content_types:
|
||||
if re.search("json", content_type, re.IGNORECASE):
|
||||
return content_type
|
||||
|
||||
return content_types[0]
|
||||
|
||||
def update_params_for_auth(
|
||||
self,
|
||||
headers,
|
||||
queries,
|
||||
auth_settings,
|
||||
resource_path,
|
||||
method,
|
||||
body,
|
||||
request_auth=None,
|
||||
):
|
||||
"""Updates header and query params based on authentication setting.
|
||||
|
||||
:param headers: Header parameters dict to be updated.
|
||||
:param queries: Query parameters tuple list to be updated.
|
||||
:param auth_settings: Authentication setting identifiers list.
|
||||
:resource_path: A string representation of the HTTP request resource path.
|
||||
:method: A string representation of the HTTP request method.
|
||||
:body: A object representing the body of the HTTP request.
|
||||
The object type is the return value of sanitize_for_serialization().
|
||||
:param request_auth: if set, the provided settings will
|
||||
override the token in the configuration.
|
||||
"""
|
||||
if not auth_settings:
|
||||
return
|
||||
|
||||
if request_auth:
|
||||
self._apply_auth_params(
|
||||
headers, queries, resource_path, method, body, request_auth
|
||||
)
|
||||
return
|
||||
|
||||
for auth in auth_settings:
|
||||
auth_setting = self.configuration.auth_settings().get(auth)
|
||||
if auth_setting:
|
||||
self._apply_auth_params(
|
||||
headers, queries, resource_path, method, body, auth_setting
|
||||
)
|
||||
|
||||
def _apply_auth_params(
|
||||
self, headers, queries, resource_path, method, body, auth_setting
|
||||
):
|
||||
"""Updates the request parameters based on a single auth_setting
|
||||
|
||||
:param headers: Header parameters dict to be updated.
|
||||
:param queries: Query parameters tuple list to be updated.
|
||||
:resource_path: A string representation of the HTTP request resource path.
|
||||
:method: A string representation of the HTTP request method.
|
||||
:body: A object representing the body of the HTTP request.
|
||||
The object type is the return value of sanitize_for_serialization().
|
||||
:param auth_setting: auth settings for the endpoint
|
||||
"""
|
||||
if auth_setting["in"] == "cookie":
|
||||
headers["Cookie"] = auth_setting["value"]
|
||||
elif auth_setting["in"] == "header":
|
||||
if auth_setting["type"] != "http-signature":
|
||||
headers[auth_setting["key"]] = auth_setting["value"]
|
||||
elif auth_setting["in"] == "query":
|
||||
queries.append((auth_setting["key"], auth_setting["value"]))
|
||||
else:
|
||||
raise ApiValueError("Authentication token must be in `query` or `header`")
|
||||
|
||||
def __deserialize_file(self, response):
|
||||
"""Deserializes body to file
|
||||
|
||||
Saves response body into a file in a temporary folder,
|
||||
using the filename from the `Content-Disposition` header if provided.
|
||||
|
||||
:param response: RESTResponse.
|
||||
:return: file path.
|
||||
"""
|
||||
fd, path = tempfile.mkstemp(dir=self.configuration.temp_folder_path)
|
||||
os.close(fd)
|
||||
os.remove(path)
|
||||
|
||||
content_disposition = response.getheader("Content-Disposition")
|
||||
if content_disposition:
|
||||
filename = re.search(
|
||||
r'filename=[\'"]?([^\'"\s]+)[\'"]?', content_disposition
|
||||
).group(1)
|
||||
path = os.path.join(os.path.dirname(path), filename)
|
||||
|
||||
with open(path, "wb") as f:
|
||||
f.write(response.data)
|
||||
|
||||
return path
|
||||
|
||||
def __deserialize_primitive(self, data, klass):
|
||||
"""Deserializes string to primitive type.
|
||||
|
||||
:param data: str.
|
||||
:param klass: class literal.
|
||||
|
||||
:return: int, long, float, str, bool.
|
||||
"""
|
||||
try:
|
||||
return klass(data)
|
||||
except UnicodeEncodeError:
|
||||
return str(data)
|
||||
except TypeError:
|
||||
return data
|
||||
|
||||
def __deserialize_object(self, value):
|
||||
"""Return an original value.
|
||||
|
||||
:return: object.
|
||||
"""
|
||||
return value
|
||||
|
||||
def __deserialize_date(self, string):
|
||||
"""Deserializes string to date.
|
||||
|
||||
:param string: str.
|
||||
:return: date.
|
||||
"""
|
||||
try:
|
||||
return parse(string).date()
|
||||
except ImportError:
|
||||
return string
|
||||
except ValueError:
|
||||
raise rest.ApiException(
|
||||
status=0, reason="Failed to parse `{0}` as date object".format(string)
|
||||
)
|
||||
|
||||
def __deserialize_datetime(self, string):
|
||||
"""Deserializes string to datetime.
|
||||
|
||||
The string should be in iso8601 datetime format.
|
||||
|
||||
:param string: str.
|
||||
:return: datetime.
|
||||
"""
|
||||
try:
|
||||
return parse(string)
|
||||
except ImportError:
|
||||
return string
|
||||
except ValueError:
|
||||
raise rest.ApiException(
|
||||
status=0,
|
||||
reason=("Failed to parse `{0}` as datetime object".format(string)),
|
||||
)
|
||||
|
||||
def __deserialize_model(self, data, klass):
|
||||
"""Deserializes list or dict to model.
|
||||
|
||||
:param data: dict, list.
|
||||
:param klass: class literal.
|
||||
:return: model object.
|
||||
"""
|
||||
|
||||
return klass.from_dict(data)
|
||||
@@ -1,28 +0,0 @@
|
||||
"""API response object."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
from pydantic import Field, StrictInt, StrictStr
|
||||
|
||||
|
||||
class ApiResponse:
|
||||
"""
|
||||
API response object
|
||||
"""
|
||||
|
||||
status_code: Optional[StrictInt] = Field(None, description="HTTP status code")
|
||||
headers: Optional[Dict[StrictStr, StrictStr]] = Field(
|
||||
None, description="HTTP headers"
|
||||
)
|
||||
data: Optional[Any] = Field(
|
||||
None, description="Deserialized data given the data type"
|
||||
)
|
||||
raw_data: Optional[Any] = Field(None, description="Raw data (HTTP response body)")
|
||||
|
||||
def __init__(self, status_code=None, headers=None, data=None, raw_data=None):
|
||||
self.status_code = status_code
|
||||
self.headers = headers
|
||||
self.data = data
|
||||
self.raw_data = raw_data
|
||||
@@ -1,447 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
import copy
|
||||
import http.client as httplib
|
||||
import logging
|
||||
import sys
|
||||
|
||||
import urllib3
|
||||
|
||||
JSON_SCHEMA_VALIDATION_KEYWORDS = {
|
||||
"multipleOf",
|
||||
"maximum",
|
||||
"exclusiveMaximum",
|
||||
"minimum",
|
||||
"exclusiveMinimum",
|
||||
"maxLength",
|
||||
"minLength",
|
||||
"pattern",
|
||||
"maxItems",
|
||||
"minItems",
|
||||
}
|
||||
|
||||
|
||||
class Configuration(object):
|
||||
"""This class contains various settings of the API client.
|
||||
|
||||
:param host: Base url.
|
||||
:param api_key: Dict to store API key(s).
|
||||
Each entry in the dict specifies an API key.
|
||||
The dict key is the name of the security scheme in the OAS specification.
|
||||
The dict value is the API key secret.
|
||||
:param api_key_prefix: Dict to store API prefix (e.g. Bearer).
|
||||
The dict key is the name of the security scheme in the OAS specification.
|
||||
The dict value is an API key prefix when generating the auth data.
|
||||
:param username: Username for HTTP basic authentication.
|
||||
:param password: Password for HTTP basic authentication.
|
||||
:param access_token: Access token.
|
||||
:param server_index: Index to servers configuration.
|
||||
:param server_variables: Mapping with string values to replace variables in
|
||||
templated server configuration. The validation of enums is performed for
|
||||
variables with defined enum values before.
|
||||
:param server_operation_index: Mapping from operation ID to an index to server
|
||||
configuration.
|
||||
:param server_operation_variables: Mapping from operation ID to a mapping with
|
||||
string values to replace variables in templated server configuration.
|
||||
The validation of enums is performed for variables with defined enum values before.
|
||||
:param ssl_ca_cert: str - the path to a file of concatenated CA certificates
|
||||
in PEM format.
|
||||
|
||||
"""
|
||||
|
||||
_default = None
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
host=None,
|
||||
api_key=None,
|
||||
api_key_prefix=None,
|
||||
username=None,
|
||||
password=None,
|
||||
access_token=None,
|
||||
server_index=None,
|
||||
server_variables=None,
|
||||
server_operation_index=None,
|
||||
server_operation_variables=None,
|
||||
ssl_ca_cert=None,
|
||||
):
|
||||
"""Constructor"""
|
||||
self._base_path = "http://localhost" if host is None else host
|
||||
"""Default Base url
|
||||
"""
|
||||
self.server_index = 0 if server_index is None and host is None else server_index
|
||||
self.server_operation_index = server_operation_index or {}
|
||||
"""Default server index
|
||||
"""
|
||||
self.server_variables = server_variables or {}
|
||||
self.server_operation_variables = server_operation_variables or {}
|
||||
"""Default server variables
|
||||
"""
|
||||
self.temp_folder_path = None
|
||||
"""Temp file folder for downloading files
|
||||
"""
|
||||
# Authentication Settings
|
||||
self.api_key = {}
|
||||
if api_key:
|
||||
self.api_key = api_key
|
||||
"""dict to store API key(s)
|
||||
"""
|
||||
self.api_key_prefix = {}
|
||||
if api_key_prefix:
|
||||
self.api_key_prefix = api_key_prefix
|
||||
"""dict to store API prefix (e.g. Bearer)
|
||||
"""
|
||||
self.refresh_api_key_hook = None
|
||||
"""function hook to refresh API key if expired
|
||||
"""
|
||||
self.username = username
|
||||
"""Username for HTTP basic authentication
|
||||
"""
|
||||
self.password = password
|
||||
"""Password for HTTP basic authentication
|
||||
"""
|
||||
self.access_token = access_token
|
||||
"""Access token
|
||||
"""
|
||||
self.logger = {}
|
||||
"""Logging Settings
|
||||
"""
|
||||
self.logger["package_logger"] = logging.getLogger("agent_protocol_client")
|
||||
self.logger["urllib3_logger"] = logging.getLogger("urllib3")
|
||||
self.logger_format = "%(asctime)s %(levelname)s %(message)s"
|
||||
"""Log format
|
||||
"""
|
||||
self.logger_stream_handler = None
|
||||
"""Log stream handler
|
||||
"""
|
||||
self.logger_file_handler = None
|
||||
"""Log file handler
|
||||
"""
|
||||
self.logger_file = None
|
||||
"""Debug file location
|
||||
"""
|
||||
self.debug = False
|
||||
"""Debug switch
|
||||
"""
|
||||
|
||||
self.verify_ssl = True
|
||||
"""SSL/TLS verification
|
||||
Set this to false to skip verifying SSL certificate when calling API
|
||||
from https server.
|
||||
"""
|
||||
self.ssl_ca_cert = ssl_ca_cert
|
||||
"""Set this to customize the certificate file to verify the peer.
|
||||
"""
|
||||
self.cert_file = None
|
||||
"""client certificate file
|
||||
"""
|
||||
self.key_file = None
|
||||
"""client key file
|
||||
"""
|
||||
self.assert_hostname = None
|
||||
"""Set this to True/False to enable/disable SSL hostname verification.
|
||||
"""
|
||||
self.tls_server_name = None
|
||||
"""SSL/TLS Server Name Indication (SNI)
|
||||
Set this to the SNI value expected by the server.
|
||||
"""
|
||||
|
||||
self.connection_pool_maxsize = 100
|
||||
"""This value is passed to the aiohttp to limit simultaneous connections.
|
||||
Default values is 100, None means no-limit.
|
||||
"""
|
||||
|
||||
self.proxy = None
|
||||
"""Proxy URL
|
||||
"""
|
||||
self.proxy_headers = None
|
||||
"""Proxy headers
|
||||
"""
|
||||
self.safe_chars_for_path_param = ""
|
||||
"""Safe chars for path_param
|
||||
"""
|
||||
self.retries = None
|
||||
"""Adding retries to override urllib3 default value 3
|
||||
"""
|
||||
# Enable client side validation
|
||||
self.client_side_validation = True
|
||||
|
||||
self.socket_options = None
|
||||
"""Options to pass down to the underlying urllib3 socket
|
||||
"""
|
||||
|
||||
self.datetime_format = "%Y-%m-%dT%H:%M:%S.%f%z"
|
||||
"""datetime format
|
||||
"""
|
||||
|
||||
self.date_format = "%Y-%m-%d"
|
||||
"""date format
|
||||
"""
|
||||
|
||||
def __deepcopy__(self, memo):
|
||||
cls = self.__class__
|
||||
result = cls.__new__(cls)
|
||||
memo[id(self)] = result
|
||||
for k, v in self.__dict__.items():
|
||||
if k not in ("logger", "logger_file_handler"):
|
||||
setattr(result, k, copy.deepcopy(v, memo))
|
||||
# shallow copy of loggers
|
||||
result.logger = copy.copy(self.logger)
|
||||
# use setters to configure loggers
|
||||
result.logger_file = self.logger_file
|
||||
result.debug = self.debug
|
||||
return result
|
||||
|
||||
def __setattr__(self, name, value):
|
||||
object.__setattr__(self, name, value)
|
||||
|
||||
@classmethod
|
||||
def set_default(cls, default):
|
||||
"""Set default instance of configuration.
|
||||
|
||||
It stores default configuration, which can be
|
||||
returned by get_default_copy method.
|
||||
|
||||
:param default: object of Configuration
|
||||
"""
|
||||
cls._default = default
|
||||
|
||||
@classmethod
|
||||
def get_default_copy(cls):
|
||||
"""Deprecated. Please use `get_default` instead.
|
||||
|
||||
Deprecated. Please use `get_default` instead.
|
||||
|
||||
:return: The configuration object.
|
||||
"""
|
||||
return cls.get_default()
|
||||
|
||||
@classmethod
|
||||
def get_default(cls):
|
||||
"""Return the default configuration.
|
||||
|
||||
This method returns newly created, based on default constructor,
|
||||
object of Configuration class or returns a copy of default
|
||||
configuration.
|
||||
|
||||
:return: The configuration object.
|
||||
"""
|
||||
if cls._default is None:
|
||||
cls._default = Configuration()
|
||||
return cls._default
|
||||
|
||||
@property
|
||||
def logger_file(self):
|
||||
"""The logger file.
|
||||
|
||||
If the logger_file is None, then add stream handler and remove file
|
||||
handler. Otherwise, add file handler and remove stream handler.
|
||||
|
||||
:param value: The logger_file path.
|
||||
:type: str
|
||||
"""
|
||||
return self.__logger_file
|
||||
|
||||
@logger_file.setter
|
||||
def logger_file(self, value):
|
||||
"""The logger file.
|
||||
|
||||
If the logger_file is None, then add stream handler and remove file
|
||||
handler. Otherwise, add file handler and remove stream handler.
|
||||
|
||||
:param value: The logger_file path.
|
||||
:type: str
|
||||
"""
|
||||
self.__logger_file = value
|
||||
if self.__logger_file:
|
||||
# If set logging file,
|
||||
# then add file handler and remove stream handler.
|
||||
self.logger_file_handler = logging.FileHandler(self.__logger_file)
|
||||
self.logger_file_handler.setFormatter(self.logger_formatter)
|
||||
for _, logger in self.logger.items():
|
||||
logger.addHandler(self.logger_file_handler)
|
||||
|
||||
@property
|
||||
def debug(self):
|
||||
"""Debug status
|
||||
|
||||
:param value: The debug status, True or False.
|
||||
:type: bool
|
||||
"""
|
||||
return self.__debug
|
||||
|
||||
@debug.setter
|
||||
def debug(self, value):
|
||||
"""Debug status
|
||||
|
||||
:param value: The debug status, True or False.
|
||||
:type: bool
|
||||
"""
|
||||
self.__debug = value
|
||||
if self.__debug:
|
||||
# if debug status is True, turn on debug logging
|
||||
for _, logger in self.logger.items():
|
||||
logger.setLevel(logging.DEBUG)
|
||||
# turn on httplib debug
|
||||
httplib.HTTPConnection.debuglevel = 1
|
||||
else:
|
||||
# if debug status is False, turn off debug logging,
|
||||
# setting log level to default `logging.WARNING`
|
||||
for _, logger in self.logger.items():
|
||||
logger.setLevel(logging.WARNING)
|
||||
# turn off httplib debug
|
||||
httplib.HTTPConnection.debuglevel = 0
|
||||
|
||||
@property
|
||||
def logger_format(self):
|
||||
"""The logger format.
|
||||
|
||||
The logger_formatter will be updated when sets logger_format.
|
||||
|
||||
:param value: The format string.
|
||||
:type: str
|
||||
"""
|
||||
return self.__logger_format
|
||||
|
||||
@logger_format.setter
|
||||
def logger_format(self, value):
|
||||
"""The logger format.
|
||||
|
||||
The logger_formatter will be updated when sets logger_format.
|
||||
|
||||
:param value: The format string.
|
||||
:type: str
|
||||
"""
|
||||
self.__logger_format = value
|
||||
self.logger_formatter = logging.Formatter(self.__logger_format)
|
||||
|
||||
def get_api_key_with_prefix(self, identifier, alias=None):
|
||||
"""Gets API key (with prefix if set).
|
||||
|
||||
:param identifier: The identifier of apiKey.
|
||||
:param alias: The alternative identifier of apiKey.
|
||||
:return: The token for api key authentication.
|
||||
"""
|
||||
if self.refresh_api_key_hook is not None:
|
||||
self.refresh_api_key_hook(self)
|
||||
key = self.api_key.get(
|
||||
identifier, self.api_key.get(alias) if alias is not None else None
|
||||
)
|
||||
if key:
|
||||
prefix = self.api_key_prefix.get(identifier)
|
||||
if prefix:
|
||||
return "%s %s" % (prefix, key)
|
||||
else:
|
||||
return key
|
||||
|
||||
def get_basic_auth_token(self):
|
||||
"""Gets HTTP basic authentication header (string).
|
||||
|
||||
:return: The token for basic HTTP authentication.
|
||||
"""
|
||||
username = ""
|
||||
if self.username is not None:
|
||||
username = self.username
|
||||
password = ""
|
||||
if self.password is not None:
|
||||
password = self.password
|
||||
return urllib3.util.make_headers(basic_auth=username + ":" + password).get(
|
||||
"authorization"
|
||||
)
|
||||
|
||||
def auth_settings(self):
|
||||
"""Gets Auth Settings dict for api client.
|
||||
|
||||
:return: The Auth Settings information dict.
|
||||
"""
|
||||
auth = {}
|
||||
return auth
|
||||
|
||||
def to_debug_report(self):
|
||||
"""Gets the essential information for debugging.
|
||||
|
||||
:return: The report for debugging.
|
||||
"""
|
||||
return (
|
||||
"Python SDK Debug Report:\n"
|
||||
"OS: {env}\n"
|
||||
"Python Version: {pyversion}\n"
|
||||
"Version of the API: v0.2\n"
|
||||
"SDK Package Version: 1.0.0".format(env=sys.platform, pyversion=sys.version)
|
||||
)
|
||||
|
||||
def get_host_settings(self):
|
||||
"""Gets an array of host settings
|
||||
|
||||
:return: An array of host settings
|
||||
"""
|
||||
return [
|
||||
{
|
||||
"url": "",
|
||||
"description": "No description provided",
|
||||
}
|
||||
]
|
||||
|
||||
def get_host_from_settings(self, index, variables=None, servers=None):
|
||||
"""Gets host URL based on the index and variables
|
||||
:param index: array index of the host settings
|
||||
:param variables: hash of variable and the corresponding value
|
||||
:param servers: an array of host settings or None
|
||||
:return: URL based on host settings
|
||||
"""
|
||||
if index is None:
|
||||
return self._base_path
|
||||
|
||||
variables = {} if variables is None else variables
|
||||
servers = self.get_host_settings() if servers is None else servers
|
||||
|
||||
try:
|
||||
server = servers[index]
|
||||
except IndexError:
|
||||
raise ValueError(
|
||||
"Invalid index {0} when selecting the host settings. "
|
||||
"Must be less than {1}".format(index, len(servers))
|
||||
)
|
||||
|
||||
url = server["url"]
|
||||
|
||||
# go through variables and replace placeholders
|
||||
for variable_name, variable in server.get("variables", {}).items():
|
||||
used_value = variables.get(variable_name, variable["default_value"])
|
||||
|
||||
if "enum_values" in variable and used_value not in variable["enum_values"]:
|
||||
raise ValueError(
|
||||
"The variable `{0}` in the host URL has invalid value "
|
||||
"{1}. Must be {2}.".format(
|
||||
variable_name, variables[variable_name], variable["enum_values"]
|
||||
)
|
||||
)
|
||||
|
||||
url = url.replace("{" + variable_name + "}", used_value)
|
||||
|
||||
return url
|
||||
|
||||
@property
|
||||
def host(self):
|
||||
"""Return generated host."""
|
||||
return self.get_host_from_settings(
|
||||
self.server_index, variables=self.server_variables
|
||||
)
|
||||
|
||||
@host.setter
|
||||
def host(self, value):
|
||||
"""Fix base path."""
|
||||
self._base_path = value
|
||||
self.server_index = None
|
||||
@@ -1,615 +0,0 @@
|
||||
# agbenchmark.agent_protocol_client.AgentApi
|
||||
|
||||
All URIs are relative to _http://localhost_
|
||||
|
||||
| Method | HTTP request | Description |
|
||||
| ---------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------------- |
|
||||
| [**create_agent_task**](AgentApi.md#create_agent_task) | **POST** /agent/tasks | Creates a task for the agent. |
|
||||
| [**download_agent_task_artifact**](AgentApi.md#download_agent_task_artifact) | **GET** /agent/tasks/{task_id}/artifacts/{artifact_id} | Download a specified artifact. |
|
||||
| [**execute_agent_task_step**](AgentApi.md#execute_agent_task_step) | **POST** /agent/tasks/{task_id}/steps | Execute a step in the specified agent task. |
|
||||
| [**get_agent_task**](AgentApi.md#get_agent_task) | **GET** /agent/tasks/{task_id} | Get details about a specified agent task. |
|
||||
| [**get_agent_task_step**](AgentApi.md#get_agent_task_step) | **GET** /agent/tasks/{task_id}/steps/{step_id} | Get details about a specified task step. |
|
||||
| [**list_agent_task_artifacts**](AgentApi.md#list_agent_task_artifacts) | **GET** /agent/tasks/{task_id}/artifacts | List all artifacts that have been created for the given task. |
|
||||
| [**list_agent_task_steps**](AgentApi.md#list_agent_task_steps) | **GET** /agent/tasks/{task_id}/steps | List all steps for the specified task. |
|
||||
| [**list_agent_tasks_ids**](AgentApi.md#list_agent_tasks_ids) | **GET** /agent/tasks | List all tasks that have been created for the agent. |
|
||||
| [**upload_agent_task_artifacts**](AgentApi.md#upload_agent_task_artifacts) | **POST** /agent/tasks/{task_id}/artifacts | Upload an artifact for the specified task. |
|
||||
|
||||
# **create_agent_task**
|
||||
|
||||
> Task create_agent_task(task_request_body=task_request_body)
|
||||
|
||||
Creates a task for the agent.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import time
|
||||
import os
|
||||
import agent_protocol_client
|
||||
from agbenchmark.agent_protocol_client.models.task import Task
|
||||
from agbenchmark.agent_protocol_client.models.task_request_body import TaskRequestBody
|
||||
from agbenchmark.agent_protocol_client.rest import ApiException
|
||||
from pprint import pprint
|
||||
|
||||
# Defining the host is optional and defaults to http://localhost
|
||||
# See configuration.py for a list of all supported configuration parameters.
|
||||
configuration = agbenchmark.agent_protocol_client.Configuration(
|
||||
host = "http://localhost"
|
||||
)
|
||||
|
||||
|
||||
# Enter a context with an instance of the API client
|
||||
async with agbenchmark.agent_protocol_client.ApiClient(configuration) as api_client:
|
||||
# Create an instance of the API class
|
||||
api_instance = agbenchmark.agent_protocol_client.AgentApi(api_client)
|
||||
task_request_body = agbenchmark.agent_protocol_client.TaskRequestBody() # TaskRequestBody | (optional)
|
||||
|
||||
try:
|
||||
# Creates a task for the agent.
|
||||
api_response = await api_instance.create_agent_task(task_request_body=task_request_body)
|
||||
print("The response of AgentApi->create_agent_task:\n")
|
||||
pprint(api_response)
|
||||
except Exception as e:
|
||||
print("Exception when calling AgentApi->create_agent_task: %s\n" % e)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Name | Type | Description | Notes |
|
||||
| --------------------- | ----------------------------------------- | ----------- | ---------- |
|
||||
| **task_request_body** | [**TaskRequestBody**](TaskRequestBody.md) | | [optional] |
|
||||
|
||||
### Return type
|
||||
|
||||
[**Task**](Task.md)
|
||||
|
||||
### Authorization
|
||||
|
||||
No authorization required
|
||||
|
||||
### HTTP request headers
|
||||
|
||||
- **Content-Type**: application/json
|
||||
- **Accept**: application/json
|
||||
|
||||
### HTTP response details
|
||||
|
||||
| Status code | Description | Response headers |
|
||||
| ----------- | ------------------------------------------ | ---------------- |
|
||||
| **200** | A new agent task was successfully created. | - |
|
||||
| **0** | Internal Server Error | - |
|
||||
|
||||
[[Back to top]](#) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to Model list]](../README.md#documentation-for-models) [[Back to README]](../README.md)
|
||||
|
||||
# **download_agent_task_artifact**
|
||||
|
||||
> bytearray download_agent_task_artifact(task_id, artifact_id)
|
||||
|
||||
Download a specified artifact.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import time
|
||||
import os
|
||||
import agent_protocol_client
|
||||
from agbenchmark.agent_protocol_client.rest import ApiException
|
||||
from pprint import pprint
|
||||
|
||||
# Defining the host is optional and defaults to http://localhost
|
||||
# See configuration.py for a list of all supported configuration parameters.
|
||||
configuration = agbenchmark.agent_protocol_client.Configuration(
|
||||
host = "http://localhost"
|
||||
)
|
||||
|
||||
|
||||
# Enter a context with an instance of the API client
|
||||
async with agbenchmark.agent_protocol_client.ApiClient(configuration) as api_client:
|
||||
# Create an instance of the API class
|
||||
api_instance = agbenchmark.agent_protocol_client.AgentApi(api_client)
|
||||
task_id = 'task_id_example' # str | ID of the task
|
||||
artifact_id = 'artifact_id_example' # str | ID of the artifact
|
||||
|
||||
try:
|
||||
# Download a specified artifact.
|
||||
api_response = await api_instance.download_agent_task_artifact(task_id, artifact_id)
|
||||
print("The response of AgentApi->download_agent_task_artifact:\n")
|
||||
pprint(api_response)
|
||||
except Exception as e:
|
||||
print("Exception when calling AgentApi->download_agent_task_artifact: %s\n" % e)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Name | Type | Description | Notes |
|
||||
| --------------- | ------- | ------------------ | ----- |
|
||||
| **task_id** | **str** | ID of the task |
|
||||
| **artifact_id** | **str** | ID of the artifact |
|
||||
|
||||
### Return type
|
||||
|
||||
**bytearray**
|
||||
|
||||
### Authorization
|
||||
|
||||
No authorization required
|
||||
|
||||
### HTTP request headers
|
||||
|
||||
- **Content-Type**: Not defined
|
||||
- **Accept**: application/octet-stream
|
||||
|
||||
### HTTP response details
|
||||
|
||||
| Status code | Description | Response headers |
|
||||
| ----------- | ------------------------------------- | ---------------- |
|
||||
| **200** | Returned the content of the artifact. | - |
|
||||
| **0** | Internal Server Error | - |
|
||||
|
||||
[[Back to top]](#) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to Model list]](../README.md#documentation-for-models) [[Back to README]](../README.md)
|
||||
|
||||
# **execute_agent_task_step**
|
||||
|
||||
> Step execute_agent_task_step(task_id, step_request_body=step_request_body)
|
||||
|
||||
Execute a step in the specified agent task.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import time
|
||||
import os
|
||||
import agent_protocol_client
|
||||
from agbenchmark.agent_protocol_client.models.step import Step
|
||||
from agbenchmark.agent_protocol_client.models.step_request_body import StepRequestBody
|
||||
from agbenchmark.agent_protocol_client.rest import ApiException
|
||||
from pprint import pprint
|
||||
|
||||
# Defining the host is optional and defaults to http://localhost
|
||||
# See configuration.py for a list of all supported configuration parameters.
|
||||
configuration = agbenchmark.agent_protocol_client.Configuration(
|
||||
host = "http://localhost"
|
||||
)
|
||||
|
||||
|
||||
# Enter a context with an instance of the API client
|
||||
async with agbenchmark.agent_protocol_client.ApiClient(configuration) as api_client:
|
||||
# Create an instance of the API class
|
||||
api_instance = agbenchmark.agent_protocol_client.AgentApi(api_client)
|
||||
task_id = 'task_id_example' # str | ID of the task
|
||||
step_request_body = agbenchmark.agent_protocol_client.StepRequestBody() # StepRequestBody | (optional)
|
||||
|
||||
try:
|
||||
# Execute a step in the specified agent task.
|
||||
api_response = await api_instance.execute_agent_task_step(task_id, step_request_body=step_request_body)
|
||||
print("The response of AgentApi->execute_agent_task_step:\n")
|
||||
pprint(api_response)
|
||||
except Exception as e:
|
||||
print("Exception when calling AgentApi->execute_agent_task_step: %s\n" % e)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Name | Type | Description | Notes |
|
||||
| --------------------- | ----------------------------------------- | -------------- | ---------- |
|
||||
| **task_id** | **str** | ID of the task |
|
||||
| **step_request_body** | [**StepRequestBody**](StepRequestBody.md) | | [optional] |
|
||||
|
||||
### Return type
|
||||
|
||||
[**Step**](Step.md)
|
||||
|
||||
### Authorization
|
||||
|
||||
No authorization required
|
||||
|
||||
### HTTP request headers
|
||||
|
||||
- **Content-Type**: application/json
|
||||
- **Accept**: application/json
|
||||
|
||||
### HTTP response details
|
||||
|
||||
| Status code | Description | Response headers |
|
||||
| ----------- | --------------------------------- | ---------------- |
|
||||
| **200** | Executed step for the agent task. | - |
|
||||
| **0** | Internal Server Error | - |
|
||||
|
||||
[[Back to top]](#) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to Model list]](../README.md#documentation-for-models) [[Back to README]](../README.md)
|
||||
|
||||
# **get_agent_task**
|
||||
|
||||
> Task get_agent_task(task_id)
|
||||
|
||||
Get details about a specified agent task.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import time
|
||||
import os
|
||||
import agent_protocol_client
|
||||
from agbenchmark.agent_protocol_client.models.task import Task
|
||||
from agbenchmark.agent_protocol_client.rest import ApiException
|
||||
from pprint import pprint
|
||||
|
||||
# Defining the host is optional and defaults to http://localhost
|
||||
# See configuration.py for a list of all supported configuration parameters.
|
||||
configuration = agbenchmark.agent_protocol_client.Configuration(
|
||||
host = "http://localhost"
|
||||
)
|
||||
|
||||
|
||||
# Enter a context with an instance of the API client
|
||||
async with agbenchmark.agent_protocol_client.ApiClient(configuration) as api_client:
|
||||
# Create an instance of the API class
|
||||
api_instance = agbenchmark.agent_protocol_client.AgentApi(api_client)
|
||||
task_id = 'task_id_example' # str | ID of the task
|
||||
|
||||
try:
|
||||
# Get details about a specified agent task.
|
||||
api_response = await api_instance.get_agent_task(task_id)
|
||||
print("The response of AgentApi->get_agent_task:\n")
|
||||
pprint(api_response)
|
||||
except Exception as e:
|
||||
print("Exception when calling AgentApi->get_agent_task: %s\n" % e)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Name | Type | Description | Notes |
|
||||
| ----------- | ------- | -------------- | ----- |
|
||||
| **task_id** | **str** | ID of the task |
|
||||
|
||||
### Return type
|
||||
|
||||
[**Task**](Task.md)
|
||||
|
||||
### Authorization
|
||||
|
||||
No authorization required
|
||||
|
||||
### HTTP request headers
|
||||
|
||||
- **Content-Type**: Not defined
|
||||
- **Accept**: application/json
|
||||
|
||||
### HTTP response details
|
||||
|
||||
| Status code | Description | Response headers |
|
||||
| ----------- | ------------------------------------- | ---------------- |
|
||||
| **200** | Returned details about an agent task. | - |
|
||||
| **0** | Internal Server Error | - |
|
||||
|
||||
[[Back to top]](#) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to Model list]](../README.md#documentation-for-models) [[Back to README]](../README.md)
|
||||
|
||||
# **get_agent_task_step**
|
||||
|
||||
> Step get_agent_task_step(task_id, step_id)
|
||||
|
||||
Get details about a specified task step.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import time
|
||||
import os
|
||||
import agent_protocol_client
|
||||
from agbenchmark.agent_protocol_client.models.step import Step
|
||||
from agbenchmark.agent_protocol_client.rest import ApiException
|
||||
from pprint import pprint
|
||||
|
||||
# Defining the host is optional and defaults to http://localhost
|
||||
# See configuration.py for a list of all supported configuration parameters.
|
||||
configuration = agbenchmark.agent_protocol_client.Configuration(
|
||||
host = "http://localhost"
|
||||
)
|
||||
|
||||
|
||||
# Enter a context with an instance of the API client
|
||||
async with agbenchmark.agent_protocol_client.ApiClient(configuration) as api_client:
|
||||
# Create an instance of the API class
|
||||
api_instance = agbenchmark.agent_protocol_client.AgentApi(api_client)
|
||||
task_id = 'task_id_example' # str | ID of the task
|
||||
step_id = 'step_id_example' # str | ID of the step
|
||||
|
||||
try:
|
||||
# Get details about a specified task step.
|
||||
api_response = await api_instance.get_agent_task_step(task_id, step_id)
|
||||
print("The response of AgentApi->get_agent_task_step:\n")
|
||||
pprint(api_response)
|
||||
except Exception as e:
|
||||
print("Exception when calling AgentApi->get_agent_task_step: %s\n" % e)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Name | Type | Description | Notes |
|
||||
| ----------- | ------- | -------------- | ----- |
|
||||
| **task_id** | **str** | ID of the task |
|
||||
| **step_id** | **str** | ID of the step |
|
||||
|
||||
### Return type
|
||||
|
||||
[**Step**](Step.md)
|
||||
|
||||
### Authorization
|
||||
|
||||
No authorization required
|
||||
|
||||
### HTTP request headers
|
||||
|
||||
- **Content-Type**: Not defined
|
||||
- **Accept**: application/json
|
||||
|
||||
### HTTP response details
|
||||
|
||||
| Status code | Description | Response headers |
|
||||
| ----------- | ------------------------------------------ | ---------------- |
|
||||
| **200** | Returned details about an agent task step. | - |
|
||||
| **0** | Internal Server Error | - |
|
||||
|
||||
[[Back to top]](#) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to Model list]](../README.md#documentation-for-models) [[Back to README]](../README.md)
|
||||
|
||||
# **list_agent_task_artifacts**
|
||||
|
||||
> List[Artifact] list_agent_task_artifacts(task_id)
|
||||
|
||||
List all artifacts that have been created for the given task.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import time
|
||||
import os
|
||||
import agent_protocol_client
|
||||
from agbenchmark.agent_protocol_client.models.artifact import Artifact
|
||||
from agbenchmark.agent_protocol_client.rest import ApiException
|
||||
from pprint import pprint
|
||||
|
||||
# Defining the host is optional and defaults to http://localhost
|
||||
# See configuration.py for a list of all supported configuration parameters.
|
||||
configuration = agbenchmark.agent_protocol_client.Configuration(
|
||||
host = "http://localhost"
|
||||
)
|
||||
|
||||
|
||||
# Enter a context with an instance of the API client
|
||||
async with agbenchmark.agent_protocol_client.ApiClient(configuration) as api_client:
|
||||
# Create an instance of the API class
|
||||
api_instance = agbenchmark.agent_protocol_client.AgentApi(api_client)
|
||||
task_id = 'task_id_example' # str | ID of the task
|
||||
|
||||
try:
|
||||
# List all artifacts that have been created for the given task.
|
||||
api_response = await api_instance.list_agent_task_artifacts(task_id)
|
||||
print("The response of AgentApi->list_agent_task_artifacts:\n")
|
||||
pprint(api_response)
|
||||
except Exception as e:
|
||||
print("Exception when calling AgentApi->list_agent_task_artifacts: %s\n" % e)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Name | Type | Description | Notes |
|
||||
| ----------- | ------- | -------------- | ----- |
|
||||
| **task_id** | **str** | ID of the task |
|
||||
|
||||
### Return type
|
||||
|
||||
[**List[Artifact]**](Artifact.md)
|
||||
|
||||
### Authorization
|
||||
|
||||
No authorization required
|
||||
|
||||
### HTTP request headers
|
||||
|
||||
- **Content-Type**: Not defined
|
||||
- **Accept**: application/json
|
||||
|
||||
### HTTP response details
|
||||
|
||||
| Status code | Description | Response headers |
|
||||
| ----------- | ------------------------------------- | ---------------- |
|
||||
| **200** | Returned the content of the artifact. | - |
|
||||
| **0** | Internal Server Error | - |
|
||||
|
||||
[[Back to top]](#) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to Model list]](../README.md#documentation-for-models) [[Back to README]](../README.md)
|
||||
|
||||
# **list_agent_task_steps**
|
||||
|
||||
> List[str] list_agent_task_steps(task_id)
|
||||
|
||||
List all steps for the specified task.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import time
|
||||
import os
|
||||
import agent_protocol_client
|
||||
from agbenchmark.agent_protocol_client.rest import ApiException
|
||||
from pprint import pprint
|
||||
|
||||
# Defining the host is optional and defaults to http://localhost
|
||||
# See configuration.py for a list of all supported configuration parameters.
|
||||
configuration = agbenchmark.agent_protocol_client.Configuration(
|
||||
host = "http://localhost"
|
||||
)
|
||||
|
||||
|
||||
# Enter a context with an instance of the API client
|
||||
async with agbenchmark.agent_protocol_client.ApiClient(configuration) as api_client:
|
||||
# Create an instance of the API class
|
||||
api_instance = agbenchmark.agent_protocol_client.AgentApi(api_client)
|
||||
task_id = 'task_id_example' # str | ID of the task
|
||||
|
||||
try:
|
||||
# List all steps for the specified task.
|
||||
api_response = await api_instance.list_agent_task_steps(task_id)
|
||||
print("The response of AgentApi->list_agent_task_steps:\n")
|
||||
pprint(api_response)
|
||||
except Exception as e:
|
||||
print("Exception when calling AgentApi->list_agent_task_steps: %s\n" % e)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Name | Type | Description | Notes |
|
||||
| ----------- | ------- | -------------- | ----- |
|
||||
| **task_id** | **str** | ID of the task |
|
||||
|
||||
### Return type
|
||||
|
||||
**List[str]**
|
||||
|
||||
### Authorization
|
||||
|
||||
No authorization required
|
||||
|
||||
### HTTP request headers
|
||||
|
||||
- **Content-Type**: Not defined
|
||||
- **Accept**: application/json
|
||||
|
||||
### HTTP response details
|
||||
|
||||
| Status code | Description | Response headers |
|
||||
| ----------- | ------------------------------------------------------------- | ---------------- |
|
||||
| **200** | Returned list of agent's step IDs for the specified task. | - |
|
||||
| **0** | Internal Server Error | - |
|
||||
|
||||
[[Back to top]](#) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to Model list]](../README.md#documentation-for-models) [[Back to README]](../README.md)
|
||||
|
||||
# **list_agent_tasks_ids**
|
||||
|
||||
> List[str] list_agent_tasks_ids()
|
||||
|
||||
List all tasks that have been created for the agent.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import time
|
||||
import os
|
||||
import agent_protocol_client
|
||||
from agbenchmark.agent_protocol_client.rest import ApiException
|
||||
from pprint import pprint
|
||||
|
||||
# Defining the host is optional and defaults to http://localhost
|
||||
# See configuration.py for a list of all supported configuration parameters.
|
||||
configuration = agbenchmark.agent_protocol_client.Configuration(
|
||||
host = "http://localhost"
|
||||
)
|
||||
|
||||
|
||||
# Enter a context with an instance of the API client
|
||||
async with agbenchmark.agent_protocol_client.ApiClient(configuration) as api_client:
|
||||
# Create an instance of the API class
|
||||
api_instance = agbenchmark.agent_protocol_client.AgentApi(api_client)
|
||||
|
||||
try:
|
||||
# List all tasks that have been created for the agent.
|
||||
api_response = await api_instance.list_agent_tasks_ids()
|
||||
print("The response of AgentApi->list_agent_tasks_ids:\n")
|
||||
pprint(api_response)
|
||||
except Exception as e:
|
||||
print("Exception when calling AgentApi->list_agent_tasks_ids: %s\n" % e)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
This endpoint does not need any parameter.
|
||||
|
||||
### Return type
|
||||
|
||||
**List[str]**
|
||||
|
||||
### Authorization
|
||||
|
||||
No authorization required
|
||||
|
||||
### HTTP request headers
|
||||
|
||||
- **Content-Type**: Not defined
|
||||
- **Accept**: application/json
|
||||
|
||||
### HTTP response details
|
||||
|
||||
| Status code | Description | Response headers |
|
||||
| ----------- | -------------------------------------- | ---------------- |
|
||||
| **200** | Returned list of agent's task IDs. | - |
|
||||
| **0** | Internal Server Error | - |
|
||||
|
||||
[[Back to top]](#) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to Model list]](../README.md#documentation-for-models) [[Back to README]](../README.md)
|
||||
|
||||
# **upload_agent_task_artifacts**
|
||||
|
||||
> Artifact upload_agent_task_artifacts(task_id, file, relative_path=relative_path)
|
||||
|
||||
Upload an artifact for the specified task.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import time
|
||||
import os
|
||||
import agent_protocol_client
|
||||
from agbenchmark.agent_protocol_client.models.artifact import Artifact
|
||||
from agbenchmark.agent_protocol_client.rest import ApiException
|
||||
from pprint import pprint
|
||||
|
||||
# Defining the host is optional and defaults to http://localhost
|
||||
# See configuration.py for a list of all supported configuration parameters.
|
||||
configuration = agbenchmark.agent_protocol_client.Configuration(
|
||||
host = "http://localhost"
|
||||
)
|
||||
|
||||
|
||||
# Enter a context with an instance of the API client
|
||||
async with agbenchmark.agent_protocol_client.ApiClient(configuration) as api_client:
|
||||
# Create an instance of the API class
|
||||
api_instance = agbenchmark.agent_protocol_client.AgentApi(api_client)
|
||||
task_id = 'task_id_example' # str | ID of the task
|
||||
file = None # bytearray | File to upload.
|
||||
relative_path = 'relative_path_example' # str | Relative path of the artifact in the agent's workspace. (optional)
|
||||
|
||||
try:
|
||||
# Upload an artifact for the specified task.
|
||||
api_response = await api_instance.upload_agent_task_artifacts(task_id, file, relative_path=relative_path)
|
||||
print("The response of AgentApi->upload_agent_task_artifacts:\n")
|
||||
pprint(api_response)
|
||||
except Exception as e:
|
||||
print("Exception when calling AgentApi->upload_agent_task_artifacts: %s\n" % e)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Name | Type | Description | Notes |
|
||||
| ----------------- | ------------- | ----------------------------------------------------------- | ---------- |
|
||||
| **task_id** | **str** | ID of the task |
|
||||
| **file** | **bytearray** | File to upload. |
|
||||
| **relative_path** | **str** | Relative path of the artifact in the agent's workspace. | [optional] |
|
||||
|
||||
### Return type
|
||||
|
||||
[**Artifact**](Artifact.md)
|
||||
|
||||
### Authorization
|
||||
|
||||
No authorization required
|
||||
|
||||
### HTTP request headers
|
||||
|
||||
- **Content-Type**: multipart/form-data
|
||||
- **Accept**: application/json
|
||||
|
||||
### HTTP response details
|
||||
|
||||
| Status code | Description | Response headers |
|
||||
| ----------- | ------------------------------------- | ---------------- |
|
||||
| **200** | Returned the content of the artifact. | - |
|
||||
| **0** | Internal Server Error | - |
|
||||
|
||||
[[Back to top]](#) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to Model list]](../README.md#documentation-for-models) [[Back to README]](../README.md)
|
||||
@@ -1,154 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
class OpenApiException(Exception):
|
||||
"""The base exception class for all OpenAPIExceptions"""
|
||||
|
||||
|
||||
class ApiTypeError(OpenApiException, TypeError):
|
||||
def __init__(self, msg, path_to_item=None, valid_classes=None, key_type=None):
|
||||
"""Raises an exception for TypeErrors
|
||||
|
||||
Args:
|
||||
msg (str): the exception message
|
||||
|
||||
Keyword Args:
|
||||
path_to_item (list): a list of keys an indices to get to the
|
||||
current_item
|
||||
None if unset
|
||||
valid_classes (tuple): the primitive classes that current item
|
||||
should be an instance of
|
||||
None if unset
|
||||
key_type (bool): False if our value is a value in a dict
|
||||
True if it is a key in a dict
|
||||
False if our item is an item in a list
|
||||
None if unset
|
||||
"""
|
||||
self.path_to_item = path_to_item
|
||||
self.valid_classes = valid_classes
|
||||
self.key_type = key_type
|
||||
full_msg = msg
|
||||
if path_to_item:
|
||||
full_msg = "{0} at {1}".format(msg, render_path(path_to_item))
|
||||
super(ApiTypeError, self).__init__(full_msg)
|
||||
|
||||
|
||||
class ApiValueError(OpenApiException, ValueError):
|
||||
def __init__(self, msg, path_to_item=None):
|
||||
"""
|
||||
Args:
|
||||
msg (str): the exception message
|
||||
|
||||
Keyword Args:
|
||||
path_to_item (list) the path to the exception in the
|
||||
received_data dict. None if unset
|
||||
"""
|
||||
|
||||
self.path_to_item = path_to_item
|
||||
full_msg = msg
|
||||
if path_to_item:
|
||||
full_msg = "{0} at {1}".format(msg, render_path(path_to_item))
|
||||
super(ApiValueError, self).__init__(full_msg)
|
||||
|
||||
|
||||
class ApiAttributeError(OpenApiException, AttributeError):
|
||||
def __init__(self, msg, path_to_item=None):
|
||||
"""
|
||||
Raised when an attribute reference or assignment fails.
|
||||
|
||||
Args:
|
||||
msg (str): the exception message
|
||||
|
||||
Keyword Args:
|
||||
path_to_item (None/list) the path to the exception in the
|
||||
received_data dict
|
||||
"""
|
||||
self.path_to_item = path_to_item
|
||||
full_msg = msg
|
||||
if path_to_item:
|
||||
full_msg = "{0} at {1}".format(msg, render_path(path_to_item))
|
||||
super(ApiAttributeError, self).__init__(full_msg)
|
||||
|
||||
|
||||
class ApiKeyError(OpenApiException, KeyError):
|
||||
def __init__(self, msg, path_to_item=None):
|
||||
"""
|
||||
Args:
|
||||
msg (str): the exception message
|
||||
|
||||
Keyword Args:
|
||||
path_to_item (None/list) the path to the exception in the
|
||||
received_data dict
|
||||
"""
|
||||
self.path_to_item = path_to_item
|
||||
full_msg = msg
|
||||
if path_to_item:
|
||||
full_msg = "{0} at {1}".format(msg, render_path(path_to_item))
|
||||
super(ApiKeyError, self).__init__(full_msg)
|
||||
|
||||
|
||||
class ApiException(OpenApiException):
|
||||
def __init__(self, status=None, reason=None, http_resp=None):
|
||||
if http_resp:
|
||||
self.status = http_resp.status
|
||||
self.reason = http_resp.reason
|
||||
self.body = http_resp.data
|
||||
self.headers = http_resp.getheaders()
|
||||
else:
|
||||
self.status = status
|
||||
self.reason = reason
|
||||
self.body = None
|
||||
self.headers = None
|
||||
|
||||
def __str__(self):
|
||||
"""Custom error messages for exception"""
|
||||
error_message = "({0})\n" "Reason: {1}\n".format(self.status, self.reason)
|
||||
if self.headers:
|
||||
error_message += "HTTP response headers: {0}\n".format(self.headers)
|
||||
|
||||
if self.body:
|
||||
error_message += "HTTP response body: {0}\n".format(self.body)
|
||||
|
||||
return error_message
|
||||
|
||||
|
||||
class NotFoundException(ApiException):
|
||||
def __init__(self, status=None, reason=None, http_resp=None):
|
||||
super(NotFoundException, self).__init__(status, reason, http_resp)
|
||||
|
||||
|
||||
class UnauthorizedException(ApiException):
|
||||
def __init__(self, status=None, reason=None, http_resp=None):
|
||||
super(UnauthorizedException, self).__init__(status, reason, http_resp)
|
||||
|
||||
|
||||
class ForbiddenException(ApiException):
|
||||
def __init__(self, status=None, reason=None, http_resp=None):
|
||||
super(ForbiddenException, self).__init__(status, reason, http_resp)
|
||||
|
||||
|
||||
class ServiceException(ApiException):
|
||||
def __init__(self, status=None, reason=None, http_resp=None):
|
||||
super(ServiceException, self).__init__(status, reason, http_resp)
|
||||
|
||||
|
||||
def render_path(path_to_item):
|
||||
"""Returns a string representation of a path"""
|
||||
result = ""
|
||||
for pth in path_to_item:
|
||||
if isinstance(pth, int):
|
||||
result += "[{0}]".format(pth)
|
||||
else:
|
||||
result += "['{0}']".format(pth)
|
||||
return result
|
||||
@@ -1,25 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
# flake8: noqa
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
# import models into model package
|
||||
from agbenchmark.agent_protocol_client.models.artifact import Artifact
|
||||
from agbenchmark.agent_protocol_client.models.artifacts import Artifacts
|
||||
from agbenchmark.agent_protocol_client.models.pagination import Pagination
|
||||
from agbenchmark.agent_protocol_client.models.step import Step
|
||||
from agbenchmark.agent_protocol_client.models.step_all_of import StepAllOf
|
||||
from agbenchmark.agent_protocol_client.models.step_request_body import StepRequestBody
|
||||
from agbenchmark.agent_protocol_client.models.task import Task
|
||||
from agbenchmark.agent_protocol_client.models.task_all_of import TaskAllOf
|
||||
from agbenchmark.agent_protocol_client.models.task_request_body import TaskRequestBody
|
||||
@@ -1,72 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import pprint
|
||||
import re # noqa: F401
|
||||
from typing import Optional
|
||||
|
||||
from pydantic import BaseModel, Field, StrictStr
|
||||
|
||||
|
||||
class Artifact(BaseModel):
|
||||
"""
|
||||
Artifact that the task has produced.
|
||||
"""
|
||||
|
||||
artifact_id: StrictStr = Field(..., description="ID of the artifact.")
|
||||
file_name: StrictStr = Field(..., description="Filename of the artifact.")
|
||||
relative_path: Optional[StrictStr] = Field(
|
||||
None, description="Relative path of the artifact in the agent's workspace."
|
||||
)
|
||||
__properties = ["artifact_id", "file_name", "relative_path"]
|
||||
created_at: StrictStr = Field(..., description="Creation date of the artifact.")
|
||||
# modified_at: StrictStr = Field(..., description="Modification date of the artifact.")
|
||||
agent_created: bool = Field(..., description="True if created by the agent")
|
||||
|
||||
class Config:
|
||||
"""Pydantic configuration"""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
validate_assignment = True
|
||||
|
||||
def to_str(self) -> str:
|
||||
"""Returns the string representation of the model using alias"""
|
||||
return pprint.pformat(self.dict(by_alias=True))
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Returns the JSON representation of the model using alias"""
|
||||
return json.dumps(self.to_dict())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, json_str: str) -> Artifact:
|
||||
"""Create an instance of Artifact from a JSON string"""
|
||||
return cls.from_dict(json.loads(json_str))
|
||||
|
||||
def to_dict(self):
|
||||
"""Returns the dictionary representation of the model using alias"""
|
||||
_dict = self.dict(by_alias=True, exclude={}, exclude_none=True)
|
||||
return _dict
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, obj: dict) -> Artifact:
|
||||
"""Create an instance of Artifact from a dict"""
|
||||
if obj is None:
|
||||
return None
|
||||
|
||||
if not isinstance(obj, dict):
|
||||
return Artifact.parse_obj(obj)
|
||||
|
||||
_obj = Artifact.parse_obj(
|
||||
{
|
||||
"artifact_id": obj.get("artifact_id"),
|
||||
"file_name": obj.get("file_name"),
|
||||
"relative_path": obj.get("relative_path"),
|
||||
"created_at": obj.get("created_at"),
|
||||
"modified_at": obj.get("modified_at"),
|
||||
"agent_created": obj.get("agent_created"),
|
||||
}
|
||||
)
|
||||
return _obj
|
||||
@@ -1,77 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import pprint
|
||||
import re # noqa: F401
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from agbenchmark.agent_protocol_client.models.artifact import Artifact
|
||||
from agbenchmark.agent_protocol_client.models.pagination import Pagination
|
||||
|
||||
|
||||
class Artifacts(BaseModel):
|
||||
"""
|
||||
Artifacts that the task has produced.
|
||||
"""
|
||||
|
||||
artifacts: list[Artifact]
|
||||
pagination: Pagination
|
||||
|
||||
class Config:
|
||||
"""Pydantic configuration"""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
validate_assignment = True
|
||||
|
||||
def to_str(self) -> str:
|
||||
"""Returns the string representation of the model using alias"""
|
||||
return pprint.pformat(self.dict(by_alias=True))
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Returns the JSON representation of the model using alias"""
|
||||
return json.dumps(self.to_dict())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, json_str: str) -> Artifacts:
|
||||
"""Create an instance of Artifacts from a JSON string"""
|
||||
return cls.from_dict(json.loads(json_str))
|
||||
|
||||
def to_dict(self):
|
||||
"""Returns the dictionary representation of the model using alias"""
|
||||
_dict = self.dict(by_alias=True, exclude={}, exclude_none=True)
|
||||
return _dict
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, obj: dict) -> Artifacts:
|
||||
"""Create an instance of Artifacts from a dict"""
|
||||
if obj is None:
|
||||
return None
|
||||
|
||||
if not isinstance(obj, dict):
|
||||
return Artifacts.parse_obj(obj)
|
||||
|
||||
_obj = Artifacts.parse_obj(
|
||||
{
|
||||
"artifacts": obj.get("artifacts"),
|
||||
"pagination": obj.get("pagination"),
|
||||
}
|
||||
)
|
||||
return _obj
|
||||
|
||||
|
||||
Artifacts.update_forward_refs()
|
||||
@@ -1,75 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import pprint
|
||||
import re # noqa: F401
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
class Pagination(BaseModel):
|
||||
"""
|
||||
Pagination that the task has produced.
|
||||
"""
|
||||
|
||||
total_items: int
|
||||
total_pages: int
|
||||
current_page: int
|
||||
page_size: int
|
||||
|
||||
class Config:
|
||||
"""Pydantic configuration"""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
validate_assignment = True
|
||||
|
||||
def to_str(self) -> str:
|
||||
"""Returns the string representation of the model using alias"""
|
||||
return pprint.pformat(self.dict(by_alias=True))
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Returns the JSON representation of the model using alias"""
|
||||
return json.dumps(self.to_dict())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, json_str: str) -> Pagination:
|
||||
"""Create an instance of Pagination from a JSON string"""
|
||||
return cls.from_dict(json.loads(json_str))
|
||||
|
||||
def to_dict(self):
|
||||
"""Returns the dictionary representation of the model using alias"""
|
||||
_dict = self.dict(by_alias=True, exclude={}, exclude_none=True)
|
||||
return _dict
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, obj: dict) -> Pagination:
|
||||
"""Create an instance of Pagination from a dict"""
|
||||
if obj is None:
|
||||
return None
|
||||
|
||||
if not isinstance(obj, dict):
|
||||
return Pagination.parse_obj(obj)
|
||||
|
||||
_obj = Pagination.parse_obj(
|
||||
{
|
||||
"total_items": obj.get("total_items"),
|
||||
"total_pages": obj.get("total_pages"),
|
||||
"current_page": obj.get("current_page"),
|
||||
"page_size": obj.get("page_size"),
|
||||
}
|
||||
)
|
||||
return _obj
|
||||
@@ -1,146 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import pprint
|
||||
import re # noqa: F401
|
||||
from typing import Any, Optional
|
||||
|
||||
from pydantic import BaseModel, Field, StrictBool, StrictStr, conlist, validator
|
||||
|
||||
from agbenchmark.agent_protocol_client.models.artifact import Artifact
|
||||
|
||||
|
||||
class Step(BaseModel):
|
||||
"""
|
||||
Step
|
||||
"""
|
||||
|
||||
input: Optional[StrictStr] = Field(None, description="Input prompt for the step.")
|
||||
additional_input: Optional[Any] = Field(
|
||||
None, description="Input parameters for the task step. Any value is allowed."
|
||||
)
|
||||
task_id: StrictStr = Field(
|
||||
..., description="The ID of the task this step belongs to."
|
||||
)
|
||||
step_id: StrictStr = Field(..., description="The ID of the task step.")
|
||||
name: Optional[StrictStr] = Field(None, description="The name of the task step.")
|
||||
status: StrictStr = Field(..., description="The status of the task step.")
|
||||
output: Optional[StrictStr] = Field(None, description="Output of the task step.")
|
||||
additional_output: Optional[Any] = Field(
|
||||
None,
|
||||
description="Output that the task step has produced. Any value is allowed.",
|
||||
)
|
||||
artifacts: conlist(Artifact) = Field(
|
||||
..., description="A list of artifacts that the step has produced."
|
||||
)
|
||||
is_last: Optional[StrictBool] = Field(
|
||||
False, description="Whether this is the last step in the task."
|
||||
)
|
||||
__properties = [
|
||||
"input",
|
||||
"additional_input",
|
||||
"task_id",
|
||||
"step_id",
|
||||
"name",
|
||||
"status",
|
||||
"output",
|
||||
"additional_output",
|
||||
"artifacts",
|
||||
"is_last",
|
||||
]
|
||||
|
||||
@validator("status")
|
||||
def status_validate_enum(cls, value):
|
||||
"""Validates the enum"""
|
||||
if value not in ("created", "completed"):
|
||||
raise ValueError("must be one of enum values ('created', 'completed')")
|
||||
return value
|
||||
|
||||
class Config:
|
||||
"""Pydantic configuration"""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
validate_assignment = True
|
||||
|
||||
def to_str(self) -> str:
|
||||
"""Returns the string representation of the model using alias"""
|
||||
return pprint.pformat(self.dict(by_alias=True))
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Returns the JSON representation of the model using alias"""
|
||||
return json.dumps(self.to_dict())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, json_str: str) -> Step:
|
||||
"""Create an instance of Step from a JSON string"""
|
||||
return cls.from_dict(json.loads(json_str))
|
||||
|
||||
def to_dict(self):
|
||||
"""Returns the dictionary representation of the model using alias"""
|
||||
_dict = self.dict(by_alias=True, exclude={}, exclude_none=True)
|
||||
# override the default output from pydantic by calling `to_dict()` of each item in artifacts (list)
|
||||
_items = []
|
||||
if self.artifacts:
|
||||
for _item in self.artifacts:
|
||||
if _item:
|
||||
_items.append(_item.to_dict())
|
||||
_dict["artifacts"] = _items
|
||||
# set to None if additional_input (nullable) is None
|
||||
# and __fields_set__ contains the field
|
||||
if self.additional_input is None and "additional_input" in self.__fields_set__:
|
||||
_dict["additional_input"] = None
|
||||
|
||||
# set to None if additional_output (nullable) is None
|
||||
# and __fields_set__ contains the field
|
||||
if (
|
||||
self.additional_output is None
|
||||
and "additional_output" in self.__fields_set__
|
||||
):
|
||||
_dict["additional_output"] = None
|
||||
|
||||
return _dict
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, obj: dict) -> Step:
|
||||
"""Create an instance of Step from a dict"""
|
||||
if obj is None:
|
||||
return None
|
||||
|
||||
if not isinstance(obj, dict):
|
||||
return Step.parse_obj(obj)
|
||||
|
||||
_obj = Step.parse_obj(
|
||||
{
|
||||
"input": obj.get("input"),
|
||||
"additional_input": obj.get("additional_input"),
|
||||
"task_id": obj.get("task_id"),
|
||||
"step_id": obj.get("step_id"),
|
||||
"name": obj.get("name"),
|
||||
"status": obj.get("status"),
|
||||
"output": obj.get("output"),
|
||||
"additional_output": obj.get("additional_output"),
|
||||
"artifacts": [
|
||||
Artifact.from_dict(_item) for _item in obj.get("artifacts")
|
||||
]
|
||||
if obj.get("artifacts") is not None
|
||||
else None,
|
||||
"is_last": obj.get("is_last")
|
||||
if obj.get("is_last") is not None
|
||||
else False,
|
||||
}
|
||||
)
|
||||
return _obj
|
||||
@@ -1,133 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import pprint
|
||||
import re # noqa: F401
|
||||
from typing import Any, Optional
|
||||
|
||||
from pydantic import BaseModel, Field, StrictBool, StrictStr, conlist, validator
|
||||
|
||||
from agbenchmark.agent_protocol_client.models.artifact import Artifact
|
||||
|
||||
|
||||
class StepAllOf(BaseModel):
|
||||
"""
|
||||
StepAllOf
|
||||
"""
|
||||
|
||||
task_id: StrictStr = Field(
|
||||
..., description="The ID of the task this step belongs to."
|
||||
)
|
||||
step_id: StrictStr = Field(..., description="The ID of the task step.")
|
||||
name: Optional[StrictStr] = Field(None, description="The name of the task step.")
|
||||
status: StrictStr = Field(..., description="The status of the task step.")
|
||||
output: Optional[StrictStr] = Field(None, description="Output of the task step.")
|
||||
additional_output: Optional[Any] = Field(
|
||||
None,
|
||||
description="Output that the task step has produced. Any value is allowed.",
|
||||
)
|
||||
artifacts: conlist(Artifact) = Field(
|
||||
..., description="A list of artifacts that the step has produced."
|
||||
)
|
||||
is_last: Optional[StrictBool] = Field(
|
||||
False, description="Whether this is the last step in the task."
|
||||
)
|
||||
__properties = [
|
||||
"task_id",
|
||||
"step_id",
|
||||
"name",
|
||||
"status",
|
||||
"output",
|
||||
"additional_output",
|
||||
"artifacts",
|
||||
"is_last",
|
||||
]
|
||||
|
||||
@validator("status")
|
||||
def status_validate_enum(cls, value):
|
||||
"""Validates the enum"""
|
||||
if value not in ("created", "completed"):
|
||||
raise ValueError("must be one of enum values ('created', 'completed')")
|
||||
return value
|
||||
|
||||
class Config:
|
||||
"""Pydantic configuration"""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
validate_assignment = True
|
||||
|
||||
def to_str(self) -> str:
|
||||
"""Returns the string representation of the model using alias"""
|
||||
return pprint.pformat(self.dict(by_alias=True))
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Returns the JSON representation of the model using alias"""
|
||||
return json.dumps(self.to_dict())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, json_str: str) -> StepAllOf:
|
||||
"""Create an instance of StepAllOf from a JSON string"""
|
||||
return cls.from_dict(json.loads(json_str))
|
||||
|
||||
def to_dict(self):
|
||||
"""Returns the dictionary representation of the model using alias"""
|
||||
_dict = self.dict(by_alias=True, exclude={}, exclude_none=True)
|
||||
# override the default output from pydantic by calling `to_dict()` of each item in artifacts (list)
|
||||
_items = []
|
||||
if self.artifacts:
|
||||
for _item in self.artifacts:
|
||||
if _item:
|
||||
_items.append(_item.to_dict())
|
||||
_dict["artifacts"] = _items
|
||||
# set to None if additional_output (nullable) is None
|
||||
# and __fields_set__ contains the field
|
||||
if (
|
||||
self.additional_output is None
|
||||
and "additional_output" in self.__fields_set__
|
||||
):
|
||||
_dict["additional_output"] = None
|
||||
|
||||
return _dict
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, obj: dict) -> StepAllOf:
|
||||
"""Create an instance of StepAllOf from a dict"""
|
||||
if obj is None:
|
||||
return None
|
||||
|
||||
if not isinstance(obj, dict):
|
||||
return StepAllOf.parse_obj(obj)
|
||||
|
||||
_obj = StepAllOf.parse_obj(
|
||||
{
|
||||
"task_id": obj.get("task_id"),
|
||||
"step_id": obj.get("step_id"),
|
||||
"name": obj.get("name"),
|
||||
"status": obj.get("status"),
|
||||
"output": obj.get("output"),
|
||||
"additional_output": obj.get("additional_output"),
|
||||
"artifacts": [
|
||||
Artifact.from_dict(_item) for _item in obj.get("artifacts")
|
||||
]
|
||||
if obj.get("artifacts") is not None
|
||||
else None,
|
||||
"is_last": obj.get("is_last")
|
||||
if obj.get("is_last") is not None
|
||||
else False,
|
||||
}
|
||||
)
|
||||
return _obj
|
||||
@@ -1,77 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import pprint
|
||||
import re # noqa: F401
|
||||
from typing import Any, Optional
|
||||
|
||||
from pydantic import BaseModel, Field, StrictStr
|
||||
|
||||
|
||||
class StepRequestBody(BaseModel):
|
||||
"""
|
||||
Body of the task request.
|
||||
"""
|
||||
|
||||
input: Optional[StrictStr] = Field(None, description="Input prompt for the step.")
|
||||
additional_input: Optional[Any] = Field(
|
||||
None, description="Input parameters for the task step. Any value is allowed."
|
||||
)
|
||||
__properties = ["input", "additional_input"]
|
||||
|
||||
class Config:
|
||||
"""Pydantic configuration"""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
validate_assignment = True
|
||||
|
||||
def to_str(self) -> str:
|
||||
"""Returns the string representation of the model using alias"""
|
||||
return pprint.pformat(self.dict(by_alias=True))
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Returns the JSON representation of the model using alias"""
|
||||
return json.dumps(self.to_dict())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, json_str: str) -> StepRequestBody:
|
||||
"""Create an instance of StepRequestBody from a JSON string"""
|
||||
return cls.from_dict(json.loads(json_str))
|
||||
|
||||
def to_dict(self):
|
||||
"""Returns the dictionary representation of the model using alias"""
|
||||
_dict = self.dict(by_alias=True, exclude={}, exclude_none=True)
|
||||
# set to None if additional_input (nullable) is None
|
||||
# and __fields_set__ contains the field
|
||||
if self.additional_input is None and "additional_input" in self.__fields_set__:
|
||||
_dict["additional_input"] = None
|
||||
|
||||
return _dict
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, obj: dict) -> StepRequestBody:
|
||||
"""Create an instance of StepRequestBody from a dict"""
|
||||
if obj is None:
|
||||
return None
|
||||
|
||||
if not isinstance(obj, dict):
|
||||
return StepRequestBody.parse_obj(obj)
|
||||
|
||||
_obj = StepRequestBody.parse_obj(
|
||||
{"input": obj.get("input"), "additional_input": obj.get("additional_input")}
|
||||
)
|
||||
return _obj
|
||||
@@ -1,89 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v1
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import pprint
|
||||
import re # noqa: F401
|
||||
from typing import Any, Optional
|
||||
|
||||
from pydantic import BaseModel, Field, StrictBool, conlist
|
||||
|
||||
|
||||
class StepResult(BaseModel):
|
||||
"""
|
||||
Result of the task step.
|
||||
"""
|
||||
|
||||
output: Optional[Any] = Field(
|
||||
None,
|
||||
description="Output that the task step has produced. Any value is allowed.",
|
||||
)
|
||||
artifacts: conlist(Any) = Field(
|
||||
..., description="A list of artifacts that the step has produced."
|
||||
)
|
||||
is_last: Optional[StrictBool] = Field(
|
||||
False, description="Whether this is the last step in the task."
|
||||
)
|
||||
__properties = ["output", "artifacts", "is_last"]
|
||||
|
||||
class Config:
|
||||
"""Pydantic configuration"""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
validate_assignment = True
|
||||
|
||||
def to_str(self) -> str:
|
||||
"""Returns the string representation of the model using alias"""
|
||||
return pprint.pformat(self.dict(by_alias=True))
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Returns the JSON representation of the model using alias"""
|
||||
return json.dumps(self.to_dict())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, json_str: str) -> StepResult:
|
||||
"""Create an instance of StepResult from a JSON string"""
|
||||
return cls.from_dict(json.loads(json_str))
|
||||
|
||||
def to_dict(self):
|
||||
"""Returns the dictionary representation of the model using alias"""
|
||||
_dict = self.dict(by_alias=True, exclude={}, exclude_none=True)
|
||||
# set to None if output (nullable) is None
|
||||
# and __fields_set__ contains the field
|
||||
if self.output is None and "output" in self.__fields_set__:
|
||||
_dict["output"] = None
|
||||
|
||||
return _dict
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, obj: dict) -> StepResult:
|
||||
"""Create an instance of StepResult from a dict"""
|
||||
if obj is None:
|
||||
return None
|
||||
|
||||
if not isinstance(obj, dict):
|
||||
return StepResult.parse_obj(obj)
|
||||
|
||||
_obj = StepResult.parse_obj(
|
||||
{
|
||||
"output": obj.get("output"),
|
||||
"artifacts": obj.get("artifacts"),
|
||||
"is_last": obj.get("is_last")
|
||||
if obj.get("is_last") is not None
|
||||
else False,
|
||||
}
|
||||
)
|
||||
return _obj
|
||||
@@ -1,99 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import pprint
|
||||
import re # noqa: F401
|
||||
from typing import Any, Optional
|
||||
|
||||
from pydantic import BaseModel, Field, StrictStr, conlist
|
||||
|
||||
from agbenchmark.agent_protocol_client.models.artifact import Artifact
|
||||
|
||||
|
||||
class Task(BaseModel):
|
||||
"""
|
||||
Task
|
||||
"""
|
||||
|
||||
input: Optional[StrictStr] = Field(None, description="Input prompt for the task.")
|
||||
additional_input: Optional[Any] = Field(
|
||||
None, description="Input parameters for the task. Any value is allowed."
|
||||
)
|
||||
task_id: StrictStr = Field(..., description="The ID of the task.")
|
||||
artifacts: conlist(Artifact) = Field(
|
||||
..., description="A list of artifacts that the task has produced."
|
||||
)
|
||||
__properties = ["input", "additional_input", "task_id", "artifacts"]
|
||||
|
||||
class Config:
|
||||
"""Pydantic configuration"""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
validate_assignment = True
|
||||
|
||||
def to_str(self) -> str:
|
||||
"""Returns the string representation of the model using alias"""
|
||||
return pprint.pformat(self.dict(by_alias=True))
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Returns the JSON representation of the model using alias"""
|
||||
return json.dumps(self.to_dict())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, json_str: str) -> Task:
|
||||
"""Create an instance of Task from a JSON string"""
|
||||
return cls.from_dict(json.loads(json_str))
|
||||
|
||||
def to_dict(self):
|
||||
"""Returns the dictionary representation of the model using alias"""
|
||||
_dict = self.dict(by_alias=True, exclude={}, exclude_none=True)
|
||||
# override the default output from pydantic by calling `to_dict()` of each item in artifacts (list)
|
||||
_items = []
|
||||
if self.artifacts:
|
||||
for _item in self.artifacts:
|
||||
if _item:
|
||||
_items.append(_item.to_dict())
|
||||
_dict["artifacts"] = _items
|
||||
# set to None if additional_input (nullable) is None
|
||||
# and __fields_set__ contains the field
|
||||
if self.additional_input is None and "additional_input" in self.__fields_set__:
|
||||
_dict["additional_input"] = None
|
||||
|
||||
return _dict
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, obj: dict) -> Task:
|
||||
"""Create an instance of Task from a dict"""
|
||||
if obj is None:
|
||||
return None
|
||||
|
||||
if not isinstance(obj, dict):
|
||||
return Task.parse_obj(obj)
|
||||
|
||||
_obj = Task.parse_obj(
|
||||
{
|
||||
"input": obj.get("input"),
|
||||
"additional_input": obj.get("additional_input"),
|
||||
"task_id": obj.get("task_id"),
|
||||
"artifacts": [
|
||||
Artifact.from_dict(_item) for _item in obj.get("artifacts")
|
||||
]
|
||||
if obj.get("artifacts") is not None
|
||||
else None,
|
||||
}
|
||||
)
|
||||
return _obj
|
||||
@@ -1,87 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import pprint
|
||||
import re # noqa: F401
|
||||
|
||||
from pydantic import BaseModel, Field, StrictStr, conlist
|
||||
|
||||
from agbenchmark.agent_protocol_client.models.artifact import Artifact
|
||||
|
||||
|
||||
class TaskAllOf(BaseModel):
|
||||
"""
|
||||
Definition of an agent task.
|
||||
"""
|
||||
|
||||
task_id: StrictStr = Field(..., description="The ID of the task.")
|
||||
artifacts: conlist(Artifact) = Field(
|
||||
..., description="A list of artifacts that the task has produced."
|
||||
)
|
||||
__properties = ["task_id", "artifacts"]
|
||||
|
||||
class Config:
|
||||
"""Pydantic configuration"""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
validate_assignment = True
|
||||
|
||||
def to_str(self) -> str:
|
||||
"""Returns the string representation of the model using alias"""
|
||||
return pprint.pformat(self.dict(by_alias=True))
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Returns the JSON representation of the model using alias"""
|
||||
return json.dumps(self.to_dict())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, json_str: str) -> TaskAllOf:
|
||||
"""Create an instance of TaskAllOf from a JSON string"""
|
||||
return cls.from_dict(json.loads(json_str))
|
||||
|
||||
def to_dict(self):
|
||||
"""Returns the dictionary representation of the model using alias"""
|
||||
_dict = self.dict(by_alias=True, exclude={}, exclude_none=True)
|
||||
# override the default output from pydantic by calling `to_dict()` of each item in artifacts (list)
|
||||
_items = []
|
||||
if self.artifacts:
|
||||
for _item in self.artifacts:
|
||||
if _item:
|
||||
_items.append(_item.to_dict())
|
||||
_dict["artifacts"] = _items
|
||||
return _dict
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, obj: dict) -> TaskAllOf:
|
||||
"""Create an instance of TaskAllOf from a dict"""
|
||||
if obj is None:
|
||||
return None
|
||||
|
||||
if not isinstance(obj, dict):
|
||||
return TaskAllOf.parse_obj(obj)
|
||||
|
||||
_obj = TaskAllOf.parse_obj(
|
||||
{
|
||||
"task_id": obj.get("task_id"),
|
||||
"artifacts": [
|
||||
Artifact.from_dict(_item) for _item in obj.get("artifacts")
|
||||
]
|
||||
if obj.get("artifacts") is not None
|
||||
else None,
|
||||
}
|
||||
)
|
||||
return _obj
|
||||
@@ -1,77 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import pprint
|
||||
import re # noqa: F401
|
||||
from typing import Any, Optional
|
||||
|
||||
from pydantic import BaseModel, Field, StrictStr
|
||||
|
||||
|
||||
class TaskRequestBody(BaseModel):
|
||||
"""
|
||||
Body of the task request.
|
||||
"""
|
||||
|
||||
input: Optional[StrictStr] = Field(None, description="Input prompt for the task.")
|
||||
additional_input: Optional[Any] = Field(
|
||||
None, description="Input parameters for the task. Any value is allowed."
|
||||
)
|
||||
__properties = ["input", "additional_input"]
|
||||
|
||||
class Config:
|
||||
"""Pydantic configuration"""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
validate_assignment = True
|
||||
|
||||
def to_str(self) -> str:
|
||||
"""Returns the string representation of the model using alias"""
|
||||
return pprint.pformat(self.dict(by_alias=True))
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Returns the JSON representation of the model using alias"""
|
||||
return json.dumps(self.to_dict())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, json_str: str) -> TaskRequestBody:
|
||||
"""Create an instance of TaskRequestBody from a JSON string"""
|
||||
return cls.from_dict(json.loads(json_str))
|
||||
|
||||
def to_dict(self):
|
||||
"""Returns the dictionary representation of the model using alias"""
|
||||
_dict = self.dict(by_alias=True, exclude={}, exclude_none=True)
|
||||
# set to None if additional_input (nullable) is None
|
||||
# and __fields_set__ contains the field
|
||||
if self.additional_input is None and "additional_input" in self.__fields_set__:
|
||||
_dict["additional_input"] = None
|
||||
|
||||
return _dict
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, obj: dict) -> TaskRequestBody:
|
||||
"""Create an instance of TaskRequestBody from a dict"""
|
||||
if obj is None:
|
||||
return None
|
||||
|
||||
if not isinstance(obj, dict):
|
||||
return TaskRequestBody.parse_obj(obj)
|
||||
|
||||
_obj = TaskRequestBody.parse_obj(
|
||||
{"input": obj.get("input"), "additional_input": obj.get("additional_input")}
|
||||
)
|
||||
return _obj
|
||||
@@ -1,311 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
"""
|
||||
Agent Communication Protocol
|
||||
|
||||
Specification of the API protocol for communication with an agent. # noqa: E501
|
||||
|
||||
The version of the OpenAPI document: v0.2
|
||||
Generated by OpenAPI Generator (https://openapi-generator.tech)
|
||||
|
||||
Do not edit the class manually.
|
||||
"""
|
||||
|
||||
|
||||
import io
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
import ssl
|
||||
from urllib.parse import urlencode
|
||||
|
||||
import aiohttp
|
||||
|
||||
from agbenchmark.agent_protocol_client.exceptions import ApiException, ApiValueError
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class RESTResponse(io.IOBase):
|
||||
def __init__(self, resp, data):
|
||||
self.aiohttp_response = resp
|
||||
self.status = resp.status
|
||||
self.reason = resp.reason
|
||||
self.data = data
|
||||
|
||||
def getheaders(self):
|
||||
"""Returns a CIMultiDictProxy of the response headers."""
|
||||
return self.aiohttp_response.headers
|
||||
|
||||
def getheader(self, name, default=None):
|
||||
"""Returns a given response header."""
|
||||
return self.aiohttp_response.headers.get(name, default)
|
||||
|
||||
|
||||
class RESTClientObject(object):
|
||||
def __init__(self, configuration, pools_size=4, maxsize=None):
|
||||
# maxsize is number of requests to host that are allowed in parallel
|
||||
if maxsize is None:
|
||||
maxsize = configuration.connection_pool_maxsize
|
||||
|
||||
ssl_context = ssl.create_default_context(cafile=configuration.ssl_ca_cert)
|
||||
if configuration.cert_file:
|
||||
ssl_context.load_cert_chain(
|
||||
configuration.cert_file, keyfile=configuration.key_file
|
||||
)
|
||||
|
||||
if not configuration.verify_ssl:
|
||||
ssl_context.check_hostname = False
|
||||
ssl_context.verify_mode = ssl.CERT_NONE
|
||||
|
||||
connector = aiohttp.TCPConnector(limit=maxsize, ssl=ssl_context)
|
||||
|
||||
self.proxy = configuration.proxy
|
||||
self.proxy_headers = configuration.proxy_headers
|
||||
|
||||
# https pool manager
|
||||
self.pool_manager = aiohttp.ClientSession(connector=connector, trust_env=True)
|
||||
|
||||
async def close(self):
|
||||
await self.pool_manager.close()
|
||||
|
||||
async def request(
|
||||
self,
|
||||
method,
|
||||
url,
|
||||
query_params=None,
|
||||
headers=None,
|
||||
body=None,
|
||||
post_params=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
):
|
||||
"""Execute request
|
||||
|
||||
:param method: http request method
|
||||
:param url: http request url
|
||||
:param query_params: query parameters in the url
|
||||
:param headers: http request headers
|
||||
:param body: request json body, for `application/json`
|
||||
:param post_params: request post parameters,
|
||||
`application/x-www-form-urlencoded`
|
||||
and `multipart/form-data`
|
||||
:param _preload_content: this is a non-applicable field for
|
||||
the AiohttpClient.
|
||||
:param _request_timeout: timeout setting for this request. If one
|
||||
number provided, it will be total request
|
||||
timeout. It can also be a pair (tuple) of
|
||||
(connection, read) timeouts.
|
||||
"""
|
||||
method = method.upper()
|
||||
assert method in ["GET", "HEAD", "DELETE", "POST", "PUT", "PATCH", "OPTIONS"]
|
||||
|
||||
if post_params and body:
|
||||
raise ApiValueError(
|
||||
"body parameter cannot be used with post_params parameter."
|
||||
)
|
||||
|
||||
post_params = post_params or {}
|
||||
headers = headers or {}
|
||||
# url already contains the URL query string
|
||||
# so reset query_params to empty dict
|
||||
query_params = {}
|
||||
timeout = _request_timeout or 5 * 60
|
||||
|
||||
if "Content-Type" not in headers:
|
||||
headers["Content-Type"] = "application/json"
|
||||
|
||||
args = {"method": method, "url": url, "timeout": timeout, "headers": headers}
|
||||
|
||||
if self.proxy:
|
||||
args["proxy"] = self.proxy
|
||||
if self.proxy_headers:
|
||||
args["proxy_headers"] = self.proxy_headers
|
||||
|
||||
if query_params:
|
||||
args["url"] += "?" + urlencode(query_params)
|
||||
|
||||
# For `POST`, `PUT`, `PATCH`, `OPTIONS`, `DELETE`
|
||||
if method in ["POST", "PUT", "PATCH", "OPTIONS", "DELETE"]:
|
||||
if re.search("json", headers["Content-Type"], re.IGNORECASE):
|
||||
if body is not None:
|
||||
body = json.dumps(body)
|
||||
args["data"] = body
|
||||
elif (
|
||||
headers["Content-Type"] == "application/x-www-form-urlencoded"
|
||||
): # noqa: E501
|
||||
args["data"] = aiohttp.FormData(post_params)
|
||||
elif headers["Content-Type"] == "multipart/form-data":
|
||||
# must del headers['Content-Type'], or the correct
|
||||
# Content-Type which generated by aiohttp
|
||||
del headers["Content-Type"]
|
||||
data = aiohttp.FormData()
|
||||
for param in post_params:
|
||||
k, v = param
|
||||
if isinstance(v, tuple) and len(v) == 3:
|
||||
data.add_field(k, value=v[1], filename=v[0], content_type=v[2])
|
||||
else:
|
||||
data.add_field(k, v)
|
||||
args["data"] = data
|
||||
|
||||
# Pass a `bytes` parameter directly in the body to support
|
||||
# other content types than Json when `body` argument is provided
|
||||
# in serialized form
|
||||
elif isinstance(body, bytes):
|
||||
args["data"] = body
|
||||
else:
|
||||
# Cannot generate the request from given parameters
|
||||
msg = """Cannot prepare a request message for provided
|
||||
arguments. Please check that your arguments match
|
||||
declared content type."""
|
||||
raise ApiException(status=0, reason=msg)
|
||||
|
||||
r = await self.pool_manager.request(**args)
|
||||
if _preload_content:
|
||||
data = await r.read()
|
||||
r = RESTResponse(r, data)
|
||||
|
||||
# log response body
|
||||
logger.debug("response body: %s", r.data)
|
||||
|
||||
if not 200 <= r.status <= 299:
|
||||
raise ApiException(http_resp=r)
|
||||
|
||||
return r
|
||||
|
||||
async def get_request(
|
||||
self,
|
||||
url,
|
||||
headers=None,
|
||||
query_params=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
):
|
||||
return await self.request(
|
||||
"GET",
|
||||
url,
|
||||
headers=headers,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
query_params=query_params,
|
||||
)
|
||||
|
||||
async def head_request(
|
||||
self,
|
||||
url,
|
||||
headers=None,
|
||||
query_params=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
):
|
||||
return await self.request(
|
||||
"HEAD",
|
||||
url,
|
||||
headers=headers,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
query_params=query_params,
|
||||
)
|
||||
|
||||
async def options_request(
|
||||
self,
|
||||
url,
|
||||
headers=None,
|
||||
query_params=None,
|
||||
post_params=None,
|
||||
body=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
):
|
||||
return await self.request(
|
||||
"OPTIONS",
|
||||
url,
|
||||
headers=headers,
|
||||
query_params=query_params,
|
||||
post_params=post_params,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
body=body,
|
||||
)
|
||||
|
||||
async def delete_request(
|
||||
self,
|
||||
url,
|
||||
headers=None,
|
||||
query_params=None,
|
||||
body=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
):
|
||||
return await self.request(
|
||||
"DELETE",
|
||||
url,
|
||||
headers=headers,
|
||||
query_params=query_params,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
body=body,
|
||||
)
|
||||
|
||||
async def post_request(
|
||||
self,
|
||||
url,
|
||||
headers=None,
|
||||
query_params=None,
|
||||
post_params=None,
|
||||
body=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
):
|
||||
return await self.request(
|
||||
"POST",
|
||||
url,
|
||||
headers=headers,
|
||||
query_params=query_params,
|
||||
post_params=post_params,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
body=body,
|
||||
)
|
||||
|
||||
async def put_request(
|
||||
self,
|
||||
url,
|
||||
headers=None,
|
||||
query_params=None,
|
||||
post_params=None,
|
||||
body=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
):
|
||||
return await self.request(
|
||||
"PUT",
|
||||
url,
|
||||
headers=headers,
|
||||
query_params=query_params,
|
||||
post_params=post_params,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
body=body,
|
||||
)
|
||||
|
||||
async def patch_request(
|
||||
self,
|
||||
url,
|
||||
headers=None,
|
||||
query_params=None,
|
||||
post_params=None,
|
||||
body=None,
|
||||
_preload_content=True,
|
||||
_request_timeout=None,
|
||||
):
|
||||
return await self.request(
|
||||
"PATCH",
|
||||
url,
|
||||
headers=headers,
|
||||
query_params=query_params,
|
||||
post_params=post_params,
|
||||
_preload_content=_preload_content,
|
||||
_request_timeout=_request_timeout,
|
||||
body=body,
|
||||
)
|
||||
@@ -1,78 +1,74 @@
|
||||
import datetime
|
||||
import glob
|
||||
import json
|
||||
import logging
|
||||
import sys
|
||||
import time
|
||||
import uuid
|
||||
from collections import defaultdict, deque
|
||||
from multiprocessing import Process
|
||||
from pathlib import Path
|
||||
|
||||
import httpx
|
||||
|
||||
from agbenchmark.agent_protocol_client import (
|
||||
AgentApi,
|
||||
ApiClient,
|
||||
ApiException,
|
||||
Configuration,
|
||||
)
|
||||
from agbenchmark.reports.processing.report_types_v2 import BenchmarkRun
|
||||
from agbenchmark.schema import TaskEvalRequestBody
|
||||
from agbenchmark.utils.utils import write_pretty_json
|
||||
|
||||
configuration = Configuration(host="http://localhost:8000" + "/ap/v1")
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from typing import Any, Optional
|
||||
|
||||
import httpx
|
||||
import psutil
|
||||
from fastapi import APIRouter, FastAPI
|
||||
from fastapi import (
|
||||
HTTPException as FastAPIHTTPException, # Import HTTPException from FastAPI
|
||||
)
|
||||
from fastapi import Request, Response
|
||||
from agent_protocol_client import AgentApi, ApiClient, ApiException, Configuration
|
||||
from agent_protocol_client.models import Task, TaskRequestBody
|
||||
from fastapi import APIRouter, FastAPI, HTTPException, Request, Response
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from pydantic import BaseModel, Extra, ValidationError
|
||||
|
||||
from agbenchmark.execute_sub_process import execute_subprocess
|
||||
from agbenchmark.schema import Task, TaskRequestBody
|
||||
from agbenchmark.config import AgentBenchmarkConfig
|
||||
from agbenchmark.reports.processing.report_types_v2 import (
|
||||
BenchmarkRun,
|
||||
Metrics,
|
||||
RepositoryInfo,
|
||||
RunDetails,
|
||||
TaskInfo,
|
||||
)
|
||||
from agbenchmark.schema import TaskEvalRequestBody
|
||||
from agbenchmark.utils.data_types import ChallengeData
|
||||
from agbenchmark.utils.utils import write_pretty_json
|
||||
|
||||
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
from fastapi import FastAPI
|
||||
from pydantic import BaseModel, Extra
|
||||
sys.path.append(str(Path(__file__).parent.parent))
|
||||
|
||||
router = APIRouter()
|
||||
import glob
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Change the current working directory to the benchmark path
|
||||
# home_path = find_absolute_benchmark_path()
|
||||
# os.chdir(home_path)
|
||||
|
||||
general_command = ["poetry", "run", "agbenchmark", "start", "--backend"]
|
||||
|
||||
import psutil
|
||||
|
||||
challenges_path = os.path.join(os.path.dirname(__file__), "challenges")
|
||||
|
||||
json_files = deque(
|
||||
CHALLENGES: dict[str, ChallengeData] = {}
|
||||
challenges_path = Path(__file__).parent / "challenges"
|
||||
challenge_spec_files = deque(
|
||||
glob.glob(
|
||||
f"{challenges_path}/**/data.json",
|
||||
recursive=True,
|
||||
)
|
||||
)
|
||||
|
||||
CHALLENGES = {}
|
||||
task_informations = defaultdict(dict)
|
||||
logger.debug("Loading challenges...")
|
||||
while challenge_spec_files:
|
||||
challenge_spec_file = Path(challenge_spec_files.popleft())
|
||||
challenge_relpath = challenge_spec_file.relative_to(challenges_path.parent)
|
||||
if challenge_relpath.is_relative_to("challenges/deprecated"):
|
||||
continue
|
||||
|
||||
while json_files:
|
||||
json_file = json_files.popleft()
|
||||
logger.debug(f"Loading {challenge_relpath}...")
|
||||
try:
|
||||
challenge_info = ChallengeData.parse_file(challenge_spec_file)
|
||||
except ValidationError as e:
|
||||
if logging.getLogger().level == logging.DEBUG:
|
||||
logger.warning(f"Spec file {challenge_relpath} failed to load:\n{e}")
|
||||
logger.debug(f"Invalid challenge spec: {challenge_spec_file.read_text()}")
|
||||
continue
|
||||
challenge_info.spec_file = challenge_spec_file
|
||||
|
||||
with open(json_file, "r") as file:
|
||||
data = json.load(file)
|
||||
if not challenge_info.eval_id:
|
||||
challenge_info.eval_id = str(uuid.uuid4())
|
||||
# this will sort all the keys of the JSON systematically
|
||||
# so that the order is always the same
|
||||
write_pretty_json(challenge_info.dict(), challenge_spec_file)
|
||||
|
||||
if "eval_id" not in data:
|
||||
data["eval_id"] = str(uuid.uuid4())
|
||||
# this will sort all the keys of the JSON systematically so that the order is always the same
|
||||
write_pretty_json(data, json_file)
|
||||
# ok
|
||||
CHALLENGES[data["eval_id"]] = data
|
||||
CHALLENGES[data["eval_id"]]["path"] = json_file
|
||||
CHALLENGES[challenge_info.eval_id] = challenge_info
|
||||
|
||||
task_informations = defaultdict(dict[str, Any])
|
||||
|
||||
|
||||
def find_agbenchmark_without_uvicorn():
|
||||
@@ -93,10 +89,10 @@ def find_agbenchmark_without_uvicorn():
|
||||
):
|
||||
try:
|
||||
# Convert the process.info dictionary values to strings and concatenate them
|
||||
full_info = " ".join([str(v) for k, v in process.info.items()])
|
||||
full_info = " ".join([str(v) for k, v in process.as_dict().items()])
|
||||
|
||||
if "agbenchmark" in full_info and "uvicorn" not in full_info:
|
||||
pids.append(process.info["pid"])
|
||||
pids.append(process.pid)
|
||||
except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):
|
||||
pass
|
||||
return pids
|
||||
@@ -114,24 +110,12 @@ class CreateReportRequest(BaseModel):
|
||||
|
||||
updates_list = []
|
||||
|
||||
updates_list = []
|
||||
|
||||
import json
|
||||
|
||||
origins = [
|
||||
"http://localhost:8000",
|
||||
"http://localhost:8080",
|
||||
"http://127.0.0.1:5000",
|
||||
"http://localhost:5000",
|
||||
]
|
||||
app = FastAPI()
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=origins,
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
|
||||
def stream_output(pipe):
|
||||
@@ -139,275 +123,210 @@ def stream_output(pipe):
|
||||
print(line, end="")
|
||||
|
||||
|
||||
@router.post("/reports")
|
||||
def run_single_test(body: CreateReportRequest) -> Any:
|
||||
pids = find_agbenchmark_without_uvicorn()
|
||||
print(f"pids already running with agbenchmark: {pids}")
|
||||
print(body.dict())
|
||||
# it's a hack because other parts of the code are using sys.argv
|
||||
print(os.getcwd())
|
||||
command_options = ["agbenchmark"]
|
||||
# if body.category:
|
||||
# sys.argv.append(f"--category={body.category}")
|
||||
command_options.append(f"--test={body.test}")
|
||||
if body.mock:
|
||||
command_options.append("--mock")
|
||||
|
||||
execute_subprocess(command_options, 200)
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
print("finished running")
|
||||
# List all folders in the current working directory
|
||||
path_reports = Path.cwd() / "agbenchmark_config" / "reports"
|
||||
folders = [folder for folder in path_reports.iterdir() if folder.is_dir()]
|
||||
|
||||
# Sort the folders based on their names
|
||||
sorted_folders = sorted(folders, key=lambda x: x.name)
|
||||
|
||||
# Get the last folder
|
||||
last_folder = sorted_folders[-1] if sorted_folders else None
|
||||
|
||||
# Read report.json from this folder
|
||||
if last_folder:
|
||||
report_path = last_folder / "report.json"
|
||||
print(report_path)
|
||||
if report_path.exists():
|
||||
with report_path.open() as file:
|
||||
data = json.load(file)
|
||||
print(data)
|
||||
else:
|
||||
print(f"'report.json' does not exist in '{last_folder}'")
|
||||
else:
|
||||
print("No folders found.")
|
||||
|
||||
return Response(
|
||||
content=json.dumps(data),
|
||||
status_code=200,
|
||||
media_type="application/json",
|
||||
def setup_fastapi_app(agbenchmark_config: AgentBenchmarkConfig) -> FastAPI:
|
||||
from agbenchmark.agent_api_interface import (
|
||||
copy_agent_artifacts_into_folder,
|
||||
upload_artifacts,
|
||||
)
|
||||
|
||||
|
||||
import json
|
||||
from typing import Any
|
||||
|
||||
from fastapi import FastAPI, Request, Response
|
||||
|
||||
|
||||
@router.get("/updates")
|
||||
def get_updates(request: Request) -> Any:
|
||||
from agbenchmark.__main__ import UPDATES_JSON_PATH
|
||||
|
||||
try:
|
||||
# Read data from the "update.json" file (provide the correct file path)
|
||||
with open(UPDATES_JSON_PATH, "r") as file:
|
||||
data = json.load(file)
|
||||
|
||||
# Get the last_update_time from the query parameter
|
||||
query_param = request.query_params.get("last_update_time")
|
||||
|
||||
if query_param is None:
|
||||
# Handle the case when last_update_time is not provided
|
||||
print("ERROR: last_update_time parameter is missing")
|
||||
return Response(
|
||||
content=json.dumps({"error": "last_update_time parameter is missing"}),
|
||||
status_code=400,
|
||||
media_type="application/json",
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
|
||||
# Convert query_param to a Unix timestamp (assuming it's in seconds as a string)
|
||||
query_timestamp = int(query_param)
|
||||
|
||||
# Filter the data based on the timestamp (keep timestamps before query_timestamp)
|
||||
filtered_data = [item for item in data if item["timestamp"] > query_timestamp]
|
||||
|
||||
# Extract only the "content" field from each item
|
||||
filtered_data = [item["content"] for item in filtered_data]
|
||||
|
||||
# Convert the filtered data to JSON
|
||||
filtered_json = json.dumps(filtered_data, indent=2)
|
||||
|
||||
print("INFO: Returning filtered data to the client")
|
||||
return Response(
|
||||
content=filtered_json,
|
||||
status_code=200,
|
||||
media_type="application/json",
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
except FileNotFoundError:
|
||||
print("ERROR: File not found: updates.json")
|
||||
return Response(
|
||||
content=json.dumps({"error": "File not found"}),
|
||||
status_code=404,
|
||||
media_type="application/json",
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
|
||||
|
||||
@router.post("/agent/tasks", tags=["agent"], response_model=Task)
|
||||
async def create_agent_task(task_eval_request: TaskEvalRequestBody) -> Task:
|
||||
"""
|
||||
Creates a new task using the provided TaskRequestBody and returns a Task.
|
||||
|
||||
Args:
|
||||
request (Request): FastAPI request object.
|
||||
task (TaskRequestBody): The task request containing input and additional input data.
|
||||
|
||||
Returns:
|
||||
Task: A new task with task_id, input, additional_input, and empty lists for artifacts and steps.
|
||||
|
||||
Example:
|
||||
Request (TaskRequestBody defined in schema.py):
|
||||
{
|
||||
"input": "Write the words you receive to the file 'output.txt'.",
|
||||
"additional_input": "python/code"
|
||||
}
|
||||
|
||||
Response (Task defined in schema.py):
|
||||
{
|
||||
"task_id": "50da533e-3904-4401-8a07-c49adf88b5eb",
|
||||
"input": "Write the word 'Washington' to a .txt file",
|
||||
"additional_input": "python/code",
|
||||
"artifacts": [],
|
||||
}
|
||||
"""
|
||||
from agbenchmark.agent_api_interface import upload_artifacts
|
||||
|
||||
try:
|
||||
async with ApiClient(configuration) as api_client:
|
||||
api_instance = AgentApi(api_client)
|
||||
task_input = CHALLENGES[task_eval_request.eval_id]["task"]
|
||||
|
||||
task_request_body = TaskRequestBody(input=task_input)
|
||||
task_response = await api_instance.create_agent_task(
|
||||
task_request_body=task_request_body
|
||||
)
|
||||
task_informations[task_response.task_id][
|
||||
"benchmark_start_time"
|
||||
] = datetime.datetime.now(datetime.timezone.utc).strftime(
|
||||
"%Y-%m-%dT%H:%M:%S+00:00"
|
||||
)
|
||||
task_informations[task_response.task_id][
|
||||
"eval_id"
|
||||
] = task_eval_request.eval_id
|
||||
await upload_artifacts(
|
||||
api_instance,
|
||||
str(Path(CHALLENGES[task_eval_request.eval_id]["path"]).parent),
|
||||
task_response.task_id,
|
||||
"artifacts_in",
|
||||
)
|
||||
return Response(
|
||||
content=task_response.json(),
|
||||
status_code=200,
|
||||
media_type="application/json",
|
||||
)
|
||||
except ApiException as e:
|
||||
print(f"Error whilst trying to create a task: {task_eval_request}")
|
||||
return Response(
|
||||
content=json.dumps({"error": "Internal server error"}),
|
||||
status_code=500,
|
||||
media_type="application/json",
|
||||
)
|
||||
|
||||
|
||||
@router.post("/agent/tasks/{task_id}/steps")
|
||||
async def proxy(request: Request, task_id: str):
|
||||
timeout = httpx.Timeout(300.0, read=300.0) # 5 minutes
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
# Construct the new URL
|
||||
new_url = f"http://localhost:8000/ap/v1/agent/tasks/{task_id}/steps"
|
||||
|
||||
# Forward the request
|
||||
response = await client.post(
|
||||
new_url,
|
||||
data=await request.body(),
|
||||
headers=dict(request.headers),
|
||||
)
|
||||
|
||||
# Return the response from the forwarded request
|
||||
return Response(content=response.content, status_code=response.status_code)
|
||||
|
||||
|
||||
@router.post("/agent/tasks/{task_id}/evaluations")
|
||||
async def create_evaluation(task_id: str) -> deque:
|
||||
from agbenchmark.__main__ import TEMP_FOLDER_ABS_PATH
|
||||
from agbenchmark.agent_api_interface import copy_agent_artifacts_into_temp_folder
|
||||
from agbenchmark.agent_interface import copy_artifacts_into_temp_folder
|
||||
from agbenchmark.generate_test import create_challenge
|
||||
from agbenchmark.generate_test import create_challenge_from_spec_file
|
||||
from agbenchmark.main import run_benchmark
|
||||
|
||||
try:
|
||||
async with ApiClient(configuration) as api_client:
|
||||
api_instance = AgentApi(api_client)
|
||||
await copy_agent_artifacts_into_temp_folder(api_instance, task_id)
|
||||
# add custom python
|
||||
data = CHALLENGES[task_informations[task_id]["eval_id"]]
|
||||
configuration = Configuration(
|
||||
host=agbenchmark_config.host or "http://localhost:8000"
|
||||
)
|
||||
app = FastAPI()
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=origins,
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
router = APIRouter()
|
||||
|
||||
artifact_path = str(Path(data["path"]).parent)
|
||||
copy_artifacts_into_temp_folder(
|
||||
TEMP_FOLDER_ABS_PATH, "custom_python", artifact_path
|
||||
@router.post("/reports")
|
||||
def run_single_test(body: CreateReportRequest) -> dict:
|
||||
pids = find_agbenchmark_without_uvicorn()
|
||||
logger.info(f"pids already running with agbenchmark: {pids}")
|
||||
|
||||
logger.debug(f"Request to /reports: {body.dict()}")
|
||||
|
||||
# Start the benchmark in a separate thread
|
||||
benchmark_process = Process(
|
||||
target=lambda: run_benchmark(
|
||||
config=agbenchmark_config,
|
||||
tests=(body.test,),
|
||||
mock=body.mock or False,
|
||||
)
|
||||
)
|
||||
json_file = CHALLENGES[task_informations[task_id]["eval_id"]]["path"]
|
||||
json_files = deque()
|
||||
benchmark_process.start()
|
||||
|
||||
_, challenge_class = create_challenge(data, json_file, json_files)
|
||||
challenge_instance = challenge_class()
|
||||
scores = challenge_instance.get_scores(config={})
|
||||
test_name = "Test" + data["name"]
|
||||
is_score_100 = 1 in scores["values"]
|
||||
# Wait for the benchmark to finish, with a timeout of 200 seconds
|
||||
timeout = 200
|
||||
start_time = time.time()
|
||||
while benchmark_process.is_alive():
|
||||
if time.time() - start_time > timeout:
|
||||
logger.warning(f"Benchmark run timed out after {timeout} seconds")
|
||||
benchmark_process.terminate()
|
||||
break
|
||||
time.sleep(1)
|
||||
else:
|
||||
logger.debug(f"Benchmark finished running in {time.time() - start_time} s")
|
||||
|
||||
info_details = {
|
||||
"repository_info": {
|
||||
"repo_url": None,
|
||||
"team_name": None,
|
||||
"benchmark_git_commit_sha": None,
|
||||
"agent_git_commit_sha": None,
|
||||
},
|
||||
"run_details": {
|
||||
"run_id": None,
|
||||
"command": "agbenchmark" + " --test=" + test_name,
|
||||
"completion_time": None,
|
||||
"benchmark_start_time": task_informations[task_id][
|
||||
# List all folders in the current working directory
|
||||
path_reports = agbenchmark_config.reports_folder
|
||||
folders = [folder for folder in path_reports.iterdir() if folder.is_dir()]
|
||||
|
||||
# Sort the folders based on their names
|
||||
sorted_folders = sorted(folders, key=lambda x: x.name)
|
||||
|
||||
# Get the last folder
|
||||
latest_folder = sorted_folders[-1] if sorted_folders else None
|
||||
|
||||
# Read report.json from this folder
|
||||
if latest_folder:
|
||||
report_path = latest_folder / "report.json"
|
||||
logger.debug(f"Getting latest report from {report_path}")
|
||||
if report_path.exists():
|
||||
with report_path.open() as file:
|
||||
data = json.load(file)
|
||||
logger.debug(f"Report data: {data}")
|
||||
else:
|
||||
logger.error(
|
||||
"Could not get result after running benchmark: "
|
||||
f"'report.json' does not exist in '{latest_folder}'"
|
||||
)
|
||||
else:
|
||||
logger.error(
|
||||
"Could not get result after running benchmark: no reports found"
|
||||
)
|
||||
|
||||
return data
|
||||
|
||||
@router.post("/agent/tasks", tags=["agent"])
|
||||
async def create_agent_task(task_eval_request: TaskEvalRequestBody) -> Task:
|
||||
"""
|
||||
Creates a new task using the provided TaskEvalRequestBody and returns a Task.
|
||||
|
||||
Args:
|
||||
task_eval_request: `TaskRequestBody` including an eval_id.
|
||||
|
||||
Returns:
|
||||
Task: A new task with task_id, input, additional_input,
|
||||
and empty lists for artifacts and steps.
|
||||
|
||||
Example:
|
||||
Request (TaskEvalRequestBody defined in schema.py):
|
||||
{
|
||||
...,
|
||||
"eval_id": "50da533e-3904-4401-8a07-c49adf88b5eb"
|
||||
}
|
||||
|
||||
Response (Task defined in `agent_protocol_client.models`):
|
||||
{
|
||||
"task_id": "50da533e-3904-4401-8a07-c49adf88b5eb",
|
||||
"input": "Write the word 'Washington' to a .txt file",
|
||||
"artifacts": []
|
||||
}
|
||||
"""
|
||||
try:
|
||||
async with ApiClient(configuration) as api_client:
|
||||
api_instance = AgentApi(api_client)
|
||||
task_input = CHALLENGES[task_eval_request.eval_id].task
|
||||
|
||||
task_request_body = TaskRequestBody(input=task_input)
|
||||
task_response = await api_instance.create_agent_task(
|
||||
task_request_body=task_request_body
|
||||
)
|
||||
task_informations[task_response.task_id][
|
||||
"benchmark_start_time"
|
||||
],
|
||||
"test_name": data["name"],
|
||||
},
|
||||
"task_info": {
|
||||
"data_path": data["path"].split("benchmark/", 1)[-1],
|
||||
"is_regression": None,
|
||||
"category": data["category"],
|
||||
"task": data["task"],
|
||||
"answer": data["ground"]["answer"],
|
||||
"description": data["info"]["description"],
|
||||
},
|
||||
"metrics": {
|
||||
"difficulty": None,
|
||||
"success": is_score_100,
|
||||
"attempted": True,
|
||||
"success_percentage": None,
|
||||
"cost": None,
|
||||
"run_time": None,
|
||||
},
|
||||
"reached_cutoff": None,
|
||||
"config": {},
|
||||
}
|
||||
] = datetime.datetime.now(datetime.timezone.utc).strftime(
|
||||
"%Y-%m-%dT%H:%M:%S+00:00"
|
||||
)
|
||||
task_informations[task_response.task_id][
|
||||
"eval_id"
|
||||
] = task_eval_request.eval_id
|
||||
await upload_artifacts(
|
||||
api_instance,
|
||||
str(CHALLENGES[task_eval_request.eval_id].spec_file.parent),
|
||||
task_response.task_id,
|
||||
"artifacts_in",
|
||||
)
|
||||
return task_response
|
||||
except ApiException as e:
|
||||
logger.error(f"Error whilst trying to create a task:\n{e}")
|
||||
logger.error(
|
||||
"The above error was caused while processing request: "
|
||||
f"{task_eval_request}"
|
||||
)
|
||||
raise HTTPException(500)
|
||||
|
||||
BenchmarkRun.parse_obj(info_details)
|
||||
@router.post("/agent/tasks/{task_id}/steps")
|
||||
async def proxy(request: Request, task_id: str):
|
||||
timeout = httpx.Timeout(300.0, read=300.0) # 5 minutes
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
# Construct the new URL
|
||||
new_url = f"{configuration.host}/ap/v1/agent/tasks/{task_id}/steps"
|
||||
|
||||
print(json.dumps(info_details, indent=4))
|
||||
return Response(
|
||||
content=json.dumps(info_details),
|
||||
status_code=200,
|
||||
media_type="application/json",
|
||||
)
|
||||
except ApiException as e:
|
||||
print(f"Error whilst trying to evaluate the task: {task_id}")
|
||||
return Response(
|
||||
content=json.dumps({"error": "Internal server error"}),
|
||||
status_code=500,
|
||||
media_type="application/json",
|
||||
)
|
||||
# path = Path(json_file).resolve()
|
||||
# Forward the request
|
||||
response = await client.post(
|
||||
new_url,
|
||||
data=await request.body(),
|
||||
headers=dict(request.headers),
|
||||
)
|
||||
|
||||
# Return the response from the forwarded request
|
||||
return Response(content=response.content, status_code=response.status_code)
|
||||
|
||||
app.include_router(router, prefix="/ap/v1")
|
||||
@router.post("/agent/tasks/{task_id}/evaluations")
|
||||
async def create_evaluation(task_id: str) -> BenchmarkRun:
|
||||
challenge_info = CHALLENGES[task_informations[task_id]["eval_id"]]
|
||||
workspace = agbenchmark_config.temp_folder
|
||||
try:
|
||||
async with ApiClient(configuration) as api_client:
|
||||
api_instance = AgentApi(api_client)
|
||||
await copy_agent_artifacts_into_folder(api_instance, task_id, workspace)
|
||||
|
||||
artifact_path = challenge_info.spec_file.parent
|
||||
copy_artifacts_into_temp_folder(workspace, "custom_python", artifact_path)
|
||||
|
||||
challenge = create_challenge_from_spec_file(challenge_info.spec_file)
|
||||
scores = challenge.get_scores(workspace)
|
||||
is_score_100 = 1 in scores["values"]
|
||||
|
||||
eval_info = BenchmarkRun(
|
||||
repository_info=RepositoryInfo(),
|
||||
run_details=RunDetails(
|
||||
command=f"agbenchmark --test={challenge_info.name}",
|
||||
benchmark_start_time=(
|
||||
task_informations[task_id]["benchmark_start_time"]
|
||||
),
|
||||
test_name=challenge_info.name,
|
||||
),
|
||||
task_info=TaskInfo(
|
||||
data_path=str(
|
||||
challenge_info.spec_file.relative_to(challenges_path.parent)
|
||||
),
|
||||
is_regression=None,
|
||||
category=[c.value for c in challenge_info.category],
|
||||
task=challenge_info.task,
|
||||
answer=challenge_info.ground.answer,
|
||||
description=challenge_info.info.description,
|
||||
),
|
||||
metrics=Metrics(
|
||||
success=is_score_100,
|
||||
attempted=True,
|
||||
),
|
||||
config={},
|
||||
)
|
||||
|
||||
logger.debug(f"Returning evaluation data:\n{eval_info.json(indent=4)}")
|
||||
return eval_info
|
||||
except ApiException as e:
|
||||
logger.error(f"Error {e} whilst trying to evaluate task: {task_id}")
|
||||
raise HTTPException(500)
|
||||
|
||||
app.include_router(router, prefix="/ap/v1")
|
||||
|
||||
return app
|
||||
|
||||
@@ -0,0 +1,32 @@
|
||||
import glob
|
||||
import json
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def get_unique_categories() -> set[str]:
|
||||
"""
|
||||
Find all data.json files in the directory relative to this file and its
|
||||
subdirectories, read the "category" field from each file, and return a set of unique
|
||||
categories.
|
||||
"""
|
||||
categories = set()
|
||||
|
||||
challenges_dir = Path(__file__).parent
|
||||
glob_path = f"{challenges_dir}/**/data.json"
|
||||
|
||||
for data_file in glob.glob(glob_path, recursive=True):
|
||||
with open(data_file, "r") as f:
|
||||
try:
|
||||
challenge_data = json.load(f)
|
||||
categories.update(challenge_data.get("category", []))
|
||||
except json.JSONDecodeError:
|
||||
logger.error(f"Error: {data_file} is not a valid JSON file.")
|
||||
continue
|
||||
except IOError:
|
||||
logger.error(f"IOError: file could not be read: {data_file}")
|
||||
continue
|
||||
|
||||
return categories
|
||||
|
||||
@@ -16,21 +16,21 @@
|
||||
".txt"
|
||||
],
|
||||
"should_contain": [
|
||||
"15",
|
||||
"112",
|
||||
"117",
|
||||
"204",
|
||||
"413",
|
||||
"2,0",
|
||||
"3,198",
|
||||
"4,046",
|
||||
"7,000",
|
||||
"11,759",
|
||||
"21,461",
|
||||
"24,578",
|
||||
"31,536",
|
||||
"53,823",
|
||||
"81,462"
|
||||
"15",
|
||||
"112",
|
||||
"117",
|
||||
"204",
|
||||
"413",
|
||||
"2,0",
|
||||
"3,198",
|
||||
"4,046",
|
||||
"7,000",
|
||||
"11,759",
|
||||
"21,461",
|
||||
"24,578",
|
||||
"31,536",
|
||||
"53,823",
|
||||
"81,462"
|
||||
],
|
||||
"should_not_contain": []
|
||||
},
|
||||
|
||||
119
benchmark/agbenchmark/config.py
Normal file
119
benchmark/agbenchmark/config.py
Normal file
@@ -0,0 +1,119 @@
|
||||
import json
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from pydantic import BaseSettings
|
||||
|
||||
|
||||
def _calculate_info_test_path(base_path: Path, benchmark_start_time: datetime) -> Path:
|
||||
"""
|
||||
Calculates the path to the directory where the test report will be saved.
|
||||
"""
|
||||
# Ensure the reports path exists
|
||||
base_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Get current UTC date-time stamp
|
||||
date_stamp = benchmark_start_time.strftime("%Y%m%dT%H%M%S")
|
||||
|
||||
# Default run name
|
||||
run_name = "full_run"
|
||||
|
||||
# Map command-line arguments to their respective labels
|
||||
arg_labels = {
|
||||
"--test": None,
|
||||
"--category": None,
|
||||
"--maintain": "maintain",
|
||||
"--improve": "improve",
|
||||
"--explore": "explore",
|
||||
}
|
||||
|
||||
# Identify the relevant command-line argument
|
||||
for arg, label in arg_labels.items():
|
||||
if arg in sys.argv:
|
||||
test_arg = sys.argv[sys.argv.index(arg) + 1] if label is None else None
|
||||
run_name = arg.strip("--")
|
||||
if test_arg:
|
||||
run_name = f"{run_name}_{test_arg}"
|
||||
break
|
||||
|
||||
# Create the full new directory path with ISO standard UTC date-time stamp
|
||||
report_path = base_path / f"{date_stamp}_{run_name}"
|
||||
|
||||
# Ensure the new directory is created
|
||||
# FIXME: this is not a desirable side-effect of loading the config
|
||||
report_path.mkdir(exist_ok=True)
|
||||
|
||||
return report_path
|
||||
|
||||
|
||||
class AgentBenchmarkConfig(BaseSettings, extra="allow"):
|
||||
"""
|
||||
Configuration model and loader for the AGBenchmark.
|
||||
|
||||
Projects that want to use AGBenchmark should contain an agbenchmark_config folder
|
||||
with a config.json file that - at minimum - specifies the `host` at which the
|
||||
subject application exposes an Agent Protocol compliant API.
|
||||
"""
|
||||
|
||||
agbenchmark_config_dir: Path
|
||||
"""Path to the agbenchmark_config folder of the subject agent application."""
|
||||
|
||||
categories: list[str] | None = None
|
||||
"""Categories to benchmark the agent for. If omitted, all categories are assumed."""
|
||||
|
||||
host: str
|
||||
"""Host (scheme://address:port) of the subject agent application."""
|
||||
|
||||
@classmethod
|
||||
def load(cls, config_dir: Optional[Path] = None) -> "AgentBenchmarkConfig":
|
||||
config_dir = config_dir or cls.find_config_folder()
|
||||
with (config_dir / "config.json").open("r") as f:
|
||||
return cls(
|
||||
agbenchmark_config_dir=config_dir,
|
||||
**json.load(f),
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def find_config_folder(for_dir: Path = Path.cwd()) -> Path:
|
||||
"""
|
||||
Find the closest ancestor folder containing an agbenchmark_config folder,
|
||||
and returns the path of that agbenchmark_config folder.
|
||||
"""
|
||||
current_directory = for_dir
|
||||
while current_directory != Path("/"):
|
||||
if (path := current_directory / "agbenchmark_config").exists():
|
||||
if (path / "config.json").is_file():
|
||||
return path
|
||||
current_directory = current_directory.parent
|
||||
raise FileNotFoundError(
|
||||
"No 'agbenchmark_config' directory found in the path hierarchy."
|
||||
)
|
||||
|
||||
@property
|
||||
def config_file(self) -> Path:
|
||||
return self.agbenchmark_config_dir / "config.json"
|
||||
|
||||
@property
|
||||
def reports_folder(self) -> Path:
|
||||
return self.agbenchmark_config_dir / "reports"
|
||||
|
||||
def get_report_dir(self, benchmark_start_time: datetime) -> Path:
|
||||
return _calculate_info_test_path(self.reports_folder, benchmark_start_time)
|
||||
|
||||
@property
|
||||
def regression_tests_file(self) -> Path:
|
||||
return self.reports_folder / "regression_tests.json"
|
||||
|
||||
@property
|
||||
def success_rate_file(self) -> Path:
|
||||
return self.reports_folder / "success_rate.json"
|
||||
|
||||
@property
|
||||
def challenges_already_beaten_file(self) -> Path:
|
||||
return self.agbenchmark_config_dir / "challenges_already_beaten.json"
|
||||
|
||||
@property
|
||||
def temp_folder(self) -> Path:
|
||||
return self.agbenchmark_config_dir / "temp_folder"
|
||||
@@ -1,167 +1,127 @@
|
||||
import contextlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import shutil
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
from pathlib import Path # noqa
|
||||
from pathlib import Path
|
||||
from typing import Any, Generator
|
||||
|
||||
import pytest
|
||||
|
||||
from agbenchmark.__main__ import TEMP_FOLDER_ABS_PATH
|
||||
from agbenchmark.config import AgentBenchmarkConfig
|
||||
from agbenchmark.reports.reports import (
|
||||
finalize_reports,
|
||||
generate_single_call_report,
|
||||
session_finish,
|
||||
)
|
||||
from agbenchmark.utils.data_types import AgentBenchmarkConfig
|
||||
from agbenchmark.utils.challenge import Challenge
|
||||
from agbenchmark.utils.data_types import Category
|
||||
|
||||
GLOBAL_TIMEOUT = (
|
||||
1500 # The tests will stop after 25 minutes so we can send the reports.
|
||||
)
|
||||
|
||||
agbenchmark_config = AgentBenchmarkConfig.load()
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
pytest_plugins = ["agbenchmark.utils.dependencies"]
|
||||
collect_ignore = ["challenges"]
|
||||
suite_reports: dict[str, list] = {}
|
||||
|
||||
|
||||
def load_config_from_request(request: Any) -> AgentBenchmarkConfig:
|
||||
"""
|
||||
This function loads the configuration for the agent benchmark from a given request.
|
||||
|
||||
Args:
|
||||
request (Any): The request object from which the agent benchmark configuration is to be loaded.
|
||||
|
||||
Returns:
|
||||
AgentBenchmarkConfig: The loaded agent benchmark configuration.
|
||||
|
||||
Raises:
|
||||
json.JSONDecodeError: If the benchmark configuration file is not a valid JSON file.
|
||||
"""
|
||||
agent_benchmark_config_path = Path.cwd() / "agbenchmark_config" / "config.json"
|
||||
try:
|
||||
with open(agent_benchmark_config_path, "r") as f:
|
||||
agent_benchmark_config = AgentBenchmarkConfig(**json.load(f))
|
||||
agent_benchmark_config.agent_benchmark_config_path = (
|
||||
agent_benchmark_config_path
|
||||
)
|
||||
return agent_benchmark_config
|
||||
except json.JSONDecodeError:
|
||||
print("Error: benchmark_config.json is not a valid JSON file.")
|
||||
raise
|
||||
|
||||
|
||||
@pytest.fixture(scope="module")
|
||||
def config(request: Any) -> Any:
|
||||
"""
|
||||
This pytest fixture is responsible for loading the agent benchmark configuration from a given request.
|
||||
This fixture is scoped to the module level, meaning it's invoked once per test module.
|
||||
|
||||
Args:
|
||||
request (Any): The request object from which the agent benchmark configuration is to be loaded.
|
||||
|
||||
Returns:
|
||||
Any: The loaded configuration dictionary.
|
||||
|
||||
Raises:
|
||||
json.JSONDecodeError: If the benchmark configuration file is not a valid JSON file.
|
||||
"""
|
||||
config = {}
|
||||
agent_benchmark_config_path = Path.cwd() / "agbenchmark_config" / "config.json"
|
||||
try:
|
||||
with open(agent_benchmark_config_path, "r") as f:
|
||||
agent_benchmark_config = AgentBenchmarkConfig(**json.load(f))
|
||||
agent_benchmark_config.agent_benchmark_config_path = (
|
||||
agent_benchmark_config_path
|
||||
)
|
||||
except json.JSONDecodeError:
|
||||
print("Error: benchmark_config.json is not a valid JSON file.")
|
||||
raise
|
||||
|
||||
config["AgentBenchmarkConfig"] = agent_benchmark_config
|
||||
|
||||
return config
|
||||
def config() -> AgentBenchmarkConfig:
|
||||
return agbenchmark_config
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def temp_folder() -> Generator[str, None, None]:
|
||||
def temp_folder() -> Generator[Path, None, None]:
|
||||
"""
|
||||
This pytest fixture is responsible for setting up and tearing down the temporary folder for each test.
|
||||
Pytest fixture that sets up and tears down the temporary folder for each test.
|
||||
It is automatically used in every test due to the 'autouse=True' parameter.
|
||||
It is used in order to let agbenchmark store files so they can then be evaluated.
|
||||
"""
|
||||
|
||||
# create output directory if it doesn't exist
|
||||
if not os.path.exists(TEMP_FOLDER_ABS_PATH):
|
||||
os.makedirs(TEMP_FOLDER_ABS_PATH, exist_ok=True)
|
||||
if not os.path.exists(agbenchmark_config.temp_folder):
|
||||
os.makedirs(agbenchmark_config.temp_folder, exist_ok=True)
|
||||
|
||||
yield
|
||||
yield agbenchmark_config.temp_folder
|
||||
# teardown after test function completes
|
||||
if not os.getenv("KEEP_TEMP_FOLDER_FILES"):
|
||||
for filename in os.listdir(TEMP_FOLDER_ABS_PATH):
|
||||
file_path = os.path.join(TEMP_FOLDER_ABS_PATH, filename)
|
||||
for filename in os.listdir(agbenchmark_config.temp_folder):
|
||||
file_path = os.path.join(agbenchmark_config.temp_folder, filename)
|
||||
try:
|
||||
if os.path.isfile(file_path) or os.path.islink(file_path):
|
||||
os.unlink(file_path)
|
||||
elif os.path.isdir(file_path):
|
||||
shutil.rmtree(file_path)
|
||||
except Exception as e:
|
||||
print(f"Failed to delete {file_path}. Reason: {e}")
|
||||
logger.warning(f"Failed to delete {file_path}. Reason: {e}")
|
||||
|
||||
|
||||
def pytest_addoption(parser: Any) -> None:
|
||||
def pytest_addoption(parser: pytest.Parser) -> None:
|
||||
"""
|
||||
This function is a pytest hook that is called to add command-line options.
|
||||
It is used to add custom command-line options that are specific to the agent benchmark tests.
|
||||
These options can be used to control the behavior of the tests.
|
||||
The "--mock" option is used to run the tests in mock mode.
|
||||
The "--host" option is used to specify the host for the tests.
|
||||
The "--category" option is used to run only tests of a specific category.
|
||||
The "--nc" option is used to run the tests without caching.
|
||||
The "--cutoff" option is used to specify a cutoff time for the tests.
|
||||
The "--improve" option is used to run only the tests that are marked for improvement.
|
||||
The "--maintain" option is used to run only the tests that are marked for maintenance.
|
||||
The "--explore" option is used to run the tests in exploration mode.
|
||||
The "--test" option is used to run a specific test.
|
||||
The "--no_dep" option is used to run the tests without dependencies.
|
||||
The "--keep_answers" option is used to keep the answers of the tests.
|
||||
Pytest hook that adds command-line options to the `pytest` command.
|
||||
The added options are specific to agbenchmark and control its behavior:
|
||||
* `--mock` is used to run the tests in mock mode.
|
||||
* `--host` is used to specify the host for the tests.
|
||||
* `--category` is used to run only tests of a specific category.
|
||||
* `--nc` is used to run the tests without caching.
|
||||
* `--cutoff` is used to specify a cutoff time for the tests.
|
||||
* `--improve` is used to run only the tests that are marked for improvement.
|
||||
* `--maintain` is used to run only the tests that are marked for maintenance.
|
||||
* `--explore` is used to run the tests in exploration mode.
|
||||
* `--test` is used to run a specific test.
|
||||
* `--no-dep` is used to run the tests without dependencies.
|
||||
* `--keep-answers` is used to keep the answers of the tests.
|
||||
|
||||
Args:
|
||||
parser (Any): The parser object to which the command-line options are added.
|
||||
parser: The Pytest CLI parser to which the command-line options are added.
|
||||
"""
|
||||
parser.addoption("--no_dep", action="store_true", default=False)
|
||||
parser.addoption("--mock", action="store_true", default=False)
|
||||
parser.addoption("--host", action="store_true", default=None)
|
||||
parser.addoption("--nc", action="store_true", default=False)
|
||||
parser.addoption("--cutoff", action="store_true", default=False)
|
||||
parser.addoption("--category", action="store_true", default=False)
|
||||
parser.addoption("--test", action="store_true", default=None)
|
||||
parser.addoption("--improve", action="store_true", default=False)
|
||||
parser.addoption("--maintain", action="store_true", default=False)
|
||||
parser.addoption("--explore", action="store_true", default=False)
|
||||
parser.addoption("--keep-answers", action="store_true", default=False)
|
||||
parser.addoption("--no-dep", action="store_true")
|
||||
parser.addoption("--mock", action="store_true")
|
||||
parser.addoption("--host", default=None)
|
||||
parser.addoption("--nc", action="store_true")
|
||||
parser.addoption("--cutoff", action="store")
|
||||
parser.addoption("--category", action="append")
|
||||
parser.addoption("--test", action="append")
|
||||
parser.addoption("--improve", action="store_true")
|
||||
parser.addoption("--maintain", action="store_true")
|
||||
parser.addoption("--explore", action="store_true")
|
||||
parser.addoption("--keep-answers", action="store_true")
|
||||
|
||||
|
||||
def pytest_configure(config: pytest.Config) -> None:
|
||||
# Register category markers to prevent "unknown marker" warnings
|
||||
for category in Category:
|
||||
config.addinivalue_line("markers", f"{category.value}: {category}")
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def check_regression(request: Any) -> None:
|
||||
def check_regression(request: pytest.FixtureRequest) -> None:
|
||||
"""
|
||||
This pytest fixture is responsible for checking if a test is a regression test.
|
||||
It is automatically used in every test due to the 'autouse=True' parameter.
|
||||
The test name and the agent benchmark configuration are retrieved from the request object.
|
||||
The regression reports are loaded from the path specified in the agent benchmark configuration.
|
||||
If the "--improve" option is used and the test name exists in the regression tests, the test is skipped.
|
||||
If the "--maintain" option is used and the test name does not exist in the regression tests, the test is also skipped.
|
||||
Fixture that checks for every test if it should be treated as a regression test,
|
||||
and whether to skip it based on that.
|
||||
|
||||
The test name is retrieved from the `request` object. Regression reports are loaded
|
||||
from the path specified in the benchmark configuration.
|
||||
|
||||
Effect:
|
||||
* If the `--improve` option is used and the current test is considered a regression
|
||||
test, it is skipped.
|
||||
* If the `--maintain` option is used and the current test is not considered a
|
||||
regression test, it is also skipped.
|
||||
|
||||
Args:
|
||||
request (Any): The request object from which the test name and the agent benchmark configuration are retrieved.
|
||||
request: The request object from which the test name and the benchmark
|
||||
configuration are retrieved.
|
||||
"""
|
||||
test_name = request.node.parent.name
|
||||
agent_benchmark_config = load_config_from_request(request)
|
||||
with contextlib.suppress(Exception):
|
||||
test = agent_benchmark_config.get_regression_reports_path()
|
||||
data = json.loads(test)
|
||||
with contextlib.suppress(FileNotFoundError):
|
||||
regression_report = agbenchmark_config.regression_tests_file
|
||||
data = json.loads(regression_report.read_bytes())
|
||||
challenge_location = getattr(request.node.parent.cls, "CHALLENGE_LOCATION", "")
|
||||
|
||||
skip_string = f"Skipping {test_name} at {challenge_location}"
|
||||
@@ -173,55 +133,33 @@ def check_regression(request: Any) -> None:
|
||||
pytest.skip(f"{skip_string} because it's not a regression test")
|
||||
|
||||
|
||||
# this is to get the challenge_data from every test
|
||||
@pytest.fixture(autouse=True)
|
||||
def challenge_data(request: Any) -> None:
|
||||
"""
|
||||
This pytest fixture is responsible for providing the challenge data for each test.
|
||||
It is automatically used in every test due to the 'autouse=True' parameter.
|
||||
The challenge data is retrieved from the request object's parameters.
|
||||
This fixture is essential for the pytest system as it provides the necessary data for each test.
|
||||
|
||||
Args:
|
||||
request (Any): The request object from which the challenge data is retrieved.
|
||||
|
||||
Returns:
|
||||
None: The challenge data is directly passed to the test function and does not need to be returned.
|
||||
"""
|
||||
return request.param
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True, scope="session")
|
||||
def mock(request: Any) -> None:
|
||||
def mock(request: pytest.FixtureRequest) -> bool:
|
||||
"""
|
||||
This pytest fixture is responsible for retrieving the value of the "--mock" command-line option.
|
||||
It is automatically used in every test session due to the 'autouse=True' parameter and 'session' scope.
|
||||
The "--mock" option is used to run the tests in mock mode.
|
||||
This fixture is essential for the pytest system as it provides the necessary command-line option value for each test session.
|
||||
Pytest fixture that retrieves the value of the `--mock` command-line option.
|
||||
The `--mock` option is used to run the tests in mock mode.
|
||||
|
||||
Args:
|
||||
request (Any): The request object from which the "--mock" option value is retrieved.
|
||||
request: The `pytest.FixtureRequest` from which the `--mock` option value
|
||||
is retrieved.
|
||||
|
||||
Returns:
|
||||
None: The "--mock" option value is directly passed to the test session and does not need to be returned.
|
||||
bool: Whether `--mock` is set for this session.
|
||||
"""
|
||||
return request.config.getoption("--mock")
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True, scope="function")
|
||||
def timer(request: Any) -> Any:
|
||||
def timer(request: pytest.FixtureRequest) -> Generator[None, None, None]:
|
||||
"""
|
||||
This pytest fixture is responsible for timing the execution of each test.
|
||||
It is automatically used in every test due to the 'autouse=True' parameter and 'function' scope.
|
||||
Pytest fixture that times the execution of each test.
|
||||
At the start of each test, it records the current time.
|
||||
After the test function completes, it calculates the run time and appends it to the test node's user properties.
|
||||
This allows the run time of each test to be accessed later for reporting or analysis.
|
||||
After the test function completes, it calculates the run time and adds it to
|
||||
the test node's `user_properties`.
|
||||
|
||||
Args:
|
||||
request (Any): The request object from which the test node is retrieved.
|
||||
|
||||
Yields:
|
||||
None: Control is yielded back to the test function.
|
||||
request: The `pytest.FixtureRequest` object through which the run time is stored
|
||||
in the test node's `user_properties`.
|
||||
"""
|
||||
start_time = time.time()
|
||||
yield
|
||||
@@ -229,33 +167,21 @@ def timer(request: Any) -> Any:
|
||||
request.node.user_properties.append(("run_time", run_time))
|
||||
|
||||
|
||||
def pytest_runtest_makereport(item: Any, call: Any) -> None:
|
||||
def pytest_runtest_makereport(item: pytest.Item, call: pytest.CallInfo) -> None:
|
||||
"""
|
||||
This function is a pytest hook that is called when a test report is being generated.
|
||||
Pytest hook that is called when a test report is being generated.
|
||||
It is used to generate and finalize reports for each test.
|
||||
|
||||
Args:
|
||||
item (Any): The test item for which the report is being generated.
|
||||
call (Any): The call object from which the test result is retrieved.
|
||||
item: The test item for which the report is being generated.
|
||||
call: The call object from which the test result is retrieved.
|
||||
"""
|
||||
challenge_data = item.funcargs.get("challenge_data", None)
|
||||
|
||||
if not challenge_data:
|
||||
# this will only happen for dummy dependency setup tests
|
||||
return
|
||||
|
||||
challenge_location: str = getattr(item.cls, "CHALLENGE_LOCATION", "")
|
||||
|
||||
flags = (
|
||||
"--test" in sys.argv
|
||||
or "--maintain" in sys.argv
|
||||
or "--improve" in sys.argv
|
||||
or "--explore" in sys.argv
|
||||
)
|
||||
challenge: type[Challenge] = item.cls # type: ignore
|
||||
challenge_data = challenge.data
|
||||
challenge_location = challenge.CHALLENGE_LOCATION
|
||||
|
||||
if call.when == "call":
|
||||
answers = getattr(item, "answers", None)
|
||||
challenge_location: str = getattr(item.cls, "CHALLENGE_LOCATION", "")
|
||||
test_name = item.nodeid.split("::")[1]
|
||||
item.test_name = test_name
|
||||
|
||||
@@ -264,14 +190,14 @@ def pytest_runtest_makereport(item: Any, call: Any) -> None:
|
||||
)
|
||||
|
||||
if call.when == "teardown":
|
||||
finalize_reports(item, challenge_data)
|
||||
finalize_reports(agbenchmark_config, item, challenge_data)
|
||||
|
||||
|
||||
def timeout_monitor(start_time: int) -> None:
|
||||
"""
|
||||
This function is responsible for monitoring the total execution time of the test suite.
|
||||
It runs in a separate thread and checks every second if the total execution time has exceeded the global timeout.
|
||||
If the global timeout is exceeded, it terminates the pytest session with a specific return code.
|
||||
Function that limits the total execution time of the test suite.
|
||||
This function is supposed to be run in a separate thread and calls `pytest.exit`
|
||||
if the total execution time has exceeded the global timeout.
|
||||
|
||||
Args:
|
||||
start_time (int): The start time of the test suite.
|
||||
@@ -282,14 +208,11 @@ def timeout_monitor(start_time: int) -> None:
|
||||
pytest.exit("Test suite exceeded the global timeout", returncode=1)
|
||||
|
||||
|
||||
def pytest_sessionstart(session: Any) -> None:
|
||||
def pytest_sessionstart(session: pytest.Session) -> None:
|
||||
"""
|
||||
This function is a pytest hook that is called at the start of the test session.
|
||||
It starts the timeout monitor in a separate thread.
|
||||
The timeout monitor checks if the total execution time of the test suite has exceeded the global timeout.
|
||||
Pytest hook that is called at the start of a test session.
|
||||
|
||||
Args:
|
||||
session (Any): The pytest session object.
|
||||
Sets up and runs a `timeout_monitor` in a separate thread.
|
||||
"""
|
||||
start_time = time.time()
|
||||
t = threading.Thread(target=timeout_monitor, args=(start_time,))
|
||||
@@ -297,94 +220,125 @@ def pytest_sessionstart(session: Any) -> None:
|
||||
t.start()
|
||||
|
||||
|
||||
def pytest_sessionfinish(session: Any) -> None:
|
||||
def pytest_sessionfinish(session: pytest.Session) -> None:
|
||||
"""
|
||||
This function is a pytest hook that is called at the end of the test session.
|
||||
It is used to finalize and save the test reports.
|
||||
The reports are saved in a specific location defined in the suite reports.
|
||||
Pytest hook that is called at the end of a test session.
|
||||
|
||||
Args:
|
||||
session (Any): The pytest session object.
|
||||
Finalizes and saves the test reports.
|
||||
"""
|
||||
session_finish(suite_reports)
|
||||
session_finish(agbenchmark_config, suite_reports)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def scores(request: Any) -> None:
|
||||
def scores(request: pytest.FixtureRequest) -> None:
|
||||
"""
|
||||
This pytest fixture is responsible for retrieving the scores of the test class.
|
||||
The scores are retrieved from the test class's 'scores' attribute using the test class name.
|
||||
This fixture is essential for the pytest system as it provides the necessary scores for each test.
|
||||
Pytest fixture that retrieves the scores of the test class.
|
||||
The scores are retrieved from the `Challenge.scores` attribute
|
||||
using the test class name.
|
||||
|
||||
Args:
|
||||
request (Any): The request object from which the test class is retrieved.
|
||||
|
||||
Returns:
|
||||
None: The scores are directly passed to the test function and do not need to be returned.
|
||||
request: The request object.
|
||||
"""
|
||||
test_class_name = request.node.cls.__name__
|
||||
return request.node.cls.scores.get(test_class_name)
|
||||
challenge: type[Challenge] = request.node.cls
|
||||
return challenge.scores.get(challenge.__name__)
|
||||
|
||||
|
||||
# this is adding the dependency marker and category markers automatically from the json
|
||||
def pytest_collection_modifyitems(items: Any, config: Any) -> None:
|
||||
def pytest_collection_modifyitems(
|
||||
items: list[pytest.Item], config: pytest.Config
|
||||
) -> None:
|
||||
"""
|
||||
This function is a pytest hook that is called after the test collection has been performed.
|
||||
It is used to modify the collected test items based on the agent benchmark configuration.
|
||||
The function loads the agent benchmark configuration from the specified path and retrieves the regression reports.
|
||||
For each test item, it checks if the test method exists and retrieves the dependencies and categories from the test class instance.
|
||||
If the "--improve" or "--category" options are used, the dependencies are filtered based on the regression data.
|
||||
If the "--test", "--no_dep", or "--maintain" options are used, the dependencies are cleared.
|
||||
The function then dynamically adds the 'depends' and 'category' markers to the test item.
|
||||
This function is essential for the pytest system as it provides the necessary modification of the test items based on the agent benchmark configuration.
|
||||
Pytest hook that is called after initial test collection has been performed.
|
||||
Modifies the collected test items based on the agent benchmark configuration,
|
||||
adding the dependency marker and category markers.
|
||||
|
||||
Args:
|
||||
items (Any): The collected test items to be modified.
|
||||
config (Any): The pytest configuration object from which the agent benchmark configuration path is retrieved.
|
||||
items: The collected test items to be modified.
|
||||
config: The active pytest configuration.
|
||||
"""
|
||||
agent_benchmark_config_path = str(Path.cwd() / "agbenchmark_config" / "config.json")
|
||||
try:
|
||||
with open(agent_benchmark_config_path) as f:
|
||||
agent_benchmark_config = AgentBenchmarkConfig(**json.load(f))
|
||||
except json.JSONDecodeError:
|
||||
print("Error: benchmark_config.json is not a valid JSON file.")
|
||||
raise
|
||||
|
||||
regression_file = agent_benchmark_config.get_regression_reports_path()
|
||||
data = (
|
||||
json.loads(open(regression_file, "r").read())
|
||||
if os.path.exists(regression_file)
|
||||
else {}
|
||||
regression_file = agbenchmark_config.regression_tests_file
|
||||
regression_tests: dict[str, Any] = (
|
||||
json.loads(regression_file.read_bytes()) if regression_file.is_file() else {}
|
||||
)
|
||||
|
||||
for item in items:
|
||||
# Assuming item.cls is your test class
|
||||
test_class_instance = item.cls()
|
||||
try:
|
||||
challenges_beaten_in_the_past = json.loads(
|
||||
agbenchmark_config.challenges_already_beaten_file.read_bytes()
|
||||
)
|
||||
except FileNotFoundError:
|
||||
challenges_beaten_in_the_past = {}
|
||||
|
||||
if "test_method" not in item.name:
|
||||
selected_tests: tuple[str] = config.getoption("--test") # type: ignore
|
||||
selected_categories: tuple[str] = config.getoption("--category") # type: ignore
|
||||
|
||||
# Can't use a for-loop to remove items in-place
|
||||
i = 0
|
||||
while i < len(items):
|
||||
item = items[i]
|
||||
challenge = item.cls
|
||||
challenge_name = item.cls.__name__
|
||||
|
||||
if not issubclass(challenge, Challenge):
|
||||
item.warn(
|
||||
pytest.PytestCollectionWarning(
|
||||
f"Non-challenge item collected: {challenge}"
|
||||
)
|
||||
)
|
||||
i += 1
|
||||
continue
|
||||
|
||||
# Then you can access your properties
|
||||
name = item.parent.cls.__name__
|
||||
# dependencies = test_class_instance.data.dependencies
|
||||
# --test: remove the test from the set if it's not specifically selected
|
||||
if selected_tests and challenge.data.name not in selected_tests:
|
||||
items.remove(item)
|
||||
continue
|
||||
|
||||
# Filter dependencies if they exist in regression data if its an improvement test
|
||||
# if config.getoption("--improve") or config.getoption(
|
||||
# "--category"
|
||||
# ):
|
||||
# dependencies = [dep for dep in dependencies if not data.get(dep, None)]
|
||||
# if (
|
||||
# config.getoption("--test")
|
||||
# or config.getoption("--no_dep")
|
||||
# or config.getoption("--maintain")
|
||||
# ):
|
||||
dependencies = test_class_instance.dependencies
|
||||
# Filter challenges for --maintain, --improve, and --explore:
|
||||
# --maintain -> only challenges expected to be passed (= regression tests)
|
||||
# --improve -> only challenges that so far are not passed (reliably)
|
||||
# --explore -> only challenges that have never been passed
|
||||
is_regression_test = regression_tests.get(challenge.data.name, None)
|
||||
has_been_passed = challenges_beaten_in_the_past.get(challenge.data.name, False)
|
||||
if (
|
||||
(config.getoption("--maintain") and not is_regression_test)
|
||||
or (config.getoption("--improve") and is_regression_test)
|
||||
or (config.getoption("--explore") and has_been_passed)
|
||||
):
|
||||
items.remove(item)
|
||||
continue
|
||||
|
||||
# Add depends marker dynamically
|
||||
item.add_marker(pytest.mark.depends(on=dependencies, name=name))
|
||||
dependencies = challenge.data.dependencies
|
||||
if (
|
||||
config.getoption("--test")
|
||||
or config.getoption("--no-dep")
|
||||
or config.getoption("--maintain")
|
||||
):
|
||||
# Ignore dependencies:
|
||||
# --test -> user selected specific tests to run, don't care about deps
|
||||
# --no-dep -> ignore dependency relations regardless of test selection
|
||||
# --maintain -> all "regression" tests must pass, so run all of them
|
||||
dependencies = []
|
||||
elif config.getoption("--improve"):
|
||||
# Filter dependencies, keep only deps that are not "regression" tests
|
||||
dependencies = [
|
||||
d for d in dependencies if not regression_tests.get(d, None)
|
||||
]
|
||||
|
||||
categories = test_class_instance.data.category
|
||||
# Set category markers
|
||||
challenge_categories = [c.value for c in challenge.data.category]
|
||||
for category in challenge_categories:
|
||||
item.add_marker(category)
|
||||
|
||||
# Add category marker dynamically
|
||||
for category in categories:
|
||||
item.add_marker(getattr(pytest.mark, category))
|
||||
# Enforce category selection
|
||||
if selected_categories:
|
||||
if not set(challenge_categories).intersection(set(selected_categories)):
|
||||
items.remove(item)
|
||||
continue
|
||||
# # Filter dependencies, keep only deps from selected categories
|
||||
# dependencies = [
|
||||
# d for d in dependencies
|
||||
# if not set(d.categories).intersection(set(selected_categories))
|
||||
# ]
|
||||
|
||||
# Add marker for the DependencyManager
|
||||
item.add_marker(pytest.mark.depends(on=dependencies, name=challenge_name))
|
||||
|
||||
i += 1
|
||||
|
||||
@@ -1,79 +0,0 @@
|
||||
import platform
|
||||
import queue
|
||||
import select
|
||||
import subprocess
|
||||
import time
|
||||
from threading import Thread
|
||||
from typing import Any
|
||||
|
||||
import psutil
|
||||
|
||||
|
||||
def run_linux_env(process: Any, start_time: float, timeout: float) -> None:
|
||||
while True:
|
||||
try:
|
||||
# This checks if there's data to be read from stdout without blocking.
|
||||
if process.stdout and select.select([process.stdout], [], [], 0)[0]:
|
||||
output = process.stdout.readline()
|
||||
print(output.strip())
|
||||
except Exception as e:
|
||||
continue
|
||||
|
||||
# Check if process has ended, has no more output, or exceeded timeout
|
||||
if process.poll() is not None or (time.time() - start_time > timeout):
|
||||
break
|
||||
|
||||
if time.time() - start_time > timeout:
|
||||
print("The Python function has exceeded the time limit and was terminated.")
|
||||
parent = psutil.Process(process.pid)
|
||||
for child in parent.children(recursive=True):
|
||||
child.kill()
|
||||
parent.kill()
|
||||
|
||||
else:
|
||||
print("The Python function has finished running.")
|
||||
|
||||
|
||||
def enqueue_output(out: Any, my_queue: Any) -> None:
|
||||
for line in iter(out.readline, b""):
|
||||
my_queue.put(line)
|
||||
out.close()
|
||||
|
||||
|
||||
def run_windows_env(process: Any, start_time: float, timeout: float) -> None:
|
||||
my_queue: Any = queue.Queue()
|
||||
thread = Thread(target=enqueue_output, args=(process.stdout, my_queue))
|
||||
thread.daemon = True
|
||||
thread.start()
|
||||
|
||||
while True:
|
||||
try:
|
||||
output = my_queue.get_nowait().strip()
|
||||
print(output)
|
||||
except queue.Empty:
|
||||
pass
|
||||
|
||||
if process.poll() is not None or (time.time() - start_time > timeout):
|
||||
break
|
||||
|
||||
if time.time() - start_time > timeout:
|
||||
print("The Python function has exceeded the time limit and was terminated.")
|
||||
process.terminate()
|
||||
|
||||
|
||||
def execute_subprocess(command, timeout):
|
||||
process = subprocess.Popen(
|
||||
command,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT,
|
||||
universal_newlines=True,
|
||||
bufsize=1,
|
||||
)
|
||||
start_time = time.time()
|
||||
if platform.system() == "Windows":
|
||||
run_windows_env(process, start_time, timeout)
|
||||
else:
|
||||
run_linux_env(process, start_time, timeout)
|
||||
process.wait()
|
||||
if process.returncode != 0:
|
||||
print(f"The agent timed out")
|
||||
@@ -1,147 +1,34 @@
|
||||
import glob
|
||||
import importlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import types
|
||||
from collections import deque
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, Optional, Union
|
||||
|
||||
import pytest
|
||||
|
||||
from agbenchmark.__main__ import CHALLENGES_ALREADY_BEATEN
|
||||
from agbenchmark.agent_api_interface import append_updates_file
|
||||
from agbenchmark.agent_protocol_client.models.step import Step
|
||||
from agbenchmark.utils.challenge import Challenge
|
||||
from agbenchmark.utils.data_types import AgentBenchmarkConfig, ChallengeData
|
||||
from agbenchmark.utils.data_types import ChallengeData
|
||||
|
||||
DATA_CATEGORY = {}
|
||||
|
||||
|
||||
def create_single_test(
|
||||
data: Dict[str, Any] | ChallengeData,
|
||||
challenge_location: str,
|
||||
file_datum: Optional[list[dict[str, Any]]] = None,
|
||||
) -> None:
|
||||
challenge_data = None
|
||||
artifacts_location = None
|
||||
if isinstance(data, ChallengeData):
|
||||
challenge_data = data
|
||||
data = data.get_data()
|
||||
|
||||
DATA_CATEGORY[data["name"]] = data["category"][0]
|
||||
|
||||
# Define test class dynamically
|
||||
challenge_class = types.new_class(f"Test{data['name']}", (Challenge,))
|
||||
print(challenge_location)
|
||||
# clean_challenge_location = get_test_path(challenge_location)
|
||||
setattr(challenge_class, "CHALLENGE_LOCATION", challenge_location)
|
||||
|
||||
setattr(
|
||||
challenge_class,
|
||||
"ARTIFACTS_LOCATION",
|
||||
artifacts_location or str(Path(challenge_location).resolve().parent),
|
||||
)
|
||||
|
||||
# Define test method within the dynamically created class
|
||||
@pytest.mark.asyncio
|
||||
async def test_method(self, config: Dict[str, Any], request) -> None: # type: ignore
|
||||
# create a random number between 0 and 1
|
||||
test_name = self.data.name
|
||||
|
||||
try:
|
||||
with open(CHALLENGES_ALREADY_BEATEN, "r") as f:
|
||||
challenges_beaten_in_the_past = json.load(f)
|
||||
except:
|
||||
challenges_beaten_in_the_past = {}
|
||||
|
||||
if request.config.getoption("--explore") and challenges_beaten_in_the_past.get(
|
||||
test_name, False
|
||||
):
|
||||
return None
|
||||
|
||||
# skip optional categories
|
||||
self.skip_optional_categories(config)
|
||||
|
||||
from helicone.lock import HeliconeLockManager
|
||||
|
||||
if os.environ.get("HELICONE_API_KEY"):
|
||||
HeliconeLockManager.write_custom_property("challenge", self.data.name)
|
||||
|
||||
cutoff = self.data.cutoff or 60
|
||||
|
||||
timeout = cutoff
|
||||
if "--nc" in sys.argv:
|
||||
timeout = 100000
|
||||
if "--cutoff" in sys.argv:
|
||||
timeout = int(sys.argv[sys.argv.index("--cutoff") + 1])
|
||||
|
||||
await self.setup_challenge(config, timeout)
|
||||
|
||||
scores = self.get_scores(config)
|
||||
request.node.answers = (
|
||||
scores["answers"] if "--keep-answers" in sys.argv else None
|
||||
)
|
||||
del scores["answers"] # remove answers from scores
|
||||
request.node.scores = scores # store scores in request.node
|
||||
is_score_100 = 1 in scores["values"]
|
||||
|
||||
evaluation = "Correct!" if is_score_100 else "Incorrect."
|
||||
eval_step = Step(
|
||||
input=evaluation,
|
||||
additional_input=None,
|
||||
task_id="irrelevant, this step is a hack",
|
||||
step_id="irrelevant, this step is a hack",
|
||||
name="",
|
||||
status="created",
|
||||
output=None,
|
||||
additional_output=None,
|
||||
artifacts=[],
|
||||
is_last=True,
|
||||
)
|
||||
await append_updates_file(eval_step)
|
||||
|
||||
assert is_score_100
|
||||
|
||||
# Parametrize the method here
|
||||
test_method = pytest.mark.parametrize(
|
||||
"challenge_data",
|
||||
[data],
|
||||
indirect=True,
|
||||
)(test_method)
|
||||
|
||||
setattr(challenge_class, "test_method", test_method)
|
||||
|
||||
# Attach the new class to a module so it can be discovered by pytest
|
||||
module = importlib.import_module(__name__)
|
||||
setattr(module, f"Test{data['name']}", challenge_class)
|
||||
return challenge_class
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def create_single_suite_challenge(challenge_data: ChallengeData, path: Path) -> None:
|
||||
create_single_test(challenge_data, str(path))
|
||||
def create_challenge_from_spec_file(spec_file: Path) -> type[Challenge]:
|
||||
challenge = Challenge.from_challenge_spec(spec_file)
|
||||
DATA_CATEGORY[challenge.data.name] = challenge.data.category[0].value
|
||||
return challenge
|
||||
|
||||
|
||||
def create_challenge(
|
||||
data: Dict[str, Any],
|
||||
json_file: str,
|
||||
json_files: deque,
|
||||
) -> Union[deque, Any]:
|
||||
path = Path(json_file).resolve()
|
||||
print("Creating challenge for", path)
|
||||
|
||||
challenge_class = create_single_test(data, str(path))
|
||||
print("Creation complete for", path)
|
||||
|
||||
return json_files, challenge_class
|
||||
def create_challenge_from_spec_file_path(spec_file_path: str) -> type[Challenge]:
|
||||
spec_file = Path(spec_file_path).resolve()
|
||||
return create_challenge_from_spec_file(spec_file)
|
||||
|
||||
|
||||
def generate_tests() -> None: # sourcery skip: invert-any-all
|
||||
print("Generating tests...")
|
||||
def load_challenges() -> None:
|
||||
logger.info("Loading challenges...")
|
||||
|
||||
challenges_path = os.path.join(os.path.dirname(__file__), "challenges")
|
||||
print(f"Looking for challenges in {challenges_path}...")
|
||||
logger.debug(f"Looking for challenges in {challenges_path}...")
|
||||
|
||||
json_files = deque(
|
||||
glob.glob(
|
||||
@@ -150,74 +37,39 @@ def generate_tests() -> None: # sourcery skip: invert-any-all
|
||||
)
|
||||
)
|
||||
|
||||
print(f"Found {len(json_files)} challenges.")
|
||||
print(f"Sample path: {json_files[0]}")
|
||||
|
||||
agent_benchmark_config_path = str(Path.cwd() / "agbenchmark_config" / "config.json")
|
||||
try:
|
||||
with open(agent_benchmark_config_path, "r") as f:
|
||||
agent_benchmark_config = AgentBenchmarkConfig(**json.load(f))
|
||||
agent_benchmark_config.agent_benchmark_config_path = (
|
||||
agent_benchmark_config_path
|
||||
)
|
||||
except json.JSONDecodeError:
|
||||
print("Error: benchmark_config.json is not a valid JSON file.")
|
||||
raise
|
||||
|
||||
regression_reports_path = agent_benchmark_config.get_regression_reports_path()
|
||||
if regression_reports_path and os.path.exists(regression_reports_path):
|
||||
with open(regression_reports_path, "r") as f:
|
||||
regression_tests = json.load(f)
|
||||
else:
|
||||
regression_tests = {}
|
||||
logger.debug(f"Found {len(json_files)} challenges.")
|
||||
logger.debug(f"Sample path: {json_files[0]}")
|
||||
|
||||
loaded, ignored = 0, 0
|
||||
while json_files:
|
||||
json_file = (
|
||||
json_files.popleft()
|
||||
) # Take and remove the first element from json_files
|
||||
# Take and remove the first element from json_files
|
||||
json_file = json_files.popleft()
|
||||
if challenge_should_be_ignored(json_file):
|
||||
ignored += 1
|
||||
continue
|
||||
|
||||
data = ChallengeData.get_json_from_path(json_file)
|
||||
challenge_info = ChallengeData.parse_file(json_file)
|
||||
|
||||
commands = sys.argv
|
||||
# --by flag
|
||||
if "--category" in commands:
|
||||
categories = data.get("category", [])
|
||||
commands_set = set(commands)
|
||||
challenge_class = create_challenge_from_spec_file_path(json_file)
|
||||
|
||||
# Convert the combined list to a set
|
||||
categories_set = set(categories)
|
||||
logger.debug(f"Generated test for {challenge_info.name}")
|
||||
_add_challenge_to_module(challenge_class)
|
||||
loaded += 1
|
||||
|
||||
# If there's no overlap with commands
|
||||
if not categories_set.intersection(commands_set):
|
||||
continue
|
||||
|
||||
# --test flag, only run the test if it's the exact one specified
|
||||
tests = []
|
||||
for command in commands:
|
||||
if command.startswith("--test="):
|
||||
tests.append(command.split("=")[1])
|
||||
|
||||
if tests and data["name"] not in tests:
|
||||
continue
|
||||
|
||||
# --maintain and --improve flag
|
||||
in_regression = regression_tests.get(data["name"], None)
|
||||
improve_flag = in_regression and "--improve" in commands
|
||||
maintain_flag = not in_regression and "--maintain" in commands
|
||||
if "--maintain" in commands and maintain_flag:
|
||||
continue
|
||||
elif "--improve" in commands and improve_flag:
|
||||
continue
|
||||
json_files, challenge_class = create_challenge(data, json_file, json_files)
|
||||
|
||||
print(f"Generated test for {data['name']}.")
|
||||
print("Test generation complete.")
|
||||
logger.info(f"Loading challenges complete: loaded {loaded}, ignored {ignored}.")
|
||||
|
||||
|
||||
def challenge_should_be_ignored(json_file):
|
||||
return "challenges/deprecated" in json_file or "challenges/library" in json_file
|
||||
def challenge_should_be_ignored(json_file_path: str):
|
||||
return (
|
||||
"challenges/deprecated" in json_file_path
|
||||
or "challenges/library" in json_file_path
|
||||
)
|
||||
|
||||
|
||||
generate_tests()
|
||||
def _add_challenge_to_module(challenge: type[Challenge]):
|
||||
# Attach the Challenge class to this module so it can be discovered by pytest
|
||||
module = importlib.import_module(__name__)
|
||||
setattr(module, f"{challenge.__name__}", challenge)
|
||||
|
||||
|
||||
load_challenges()
|
||||
|
||||
153
benchmark/agbenchmark/main.py
Normal file
153
benchmark/agbenchmark/main.py
Normal file
@@ -0,0 +1,153 @@
|
||||
import logging
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Optional, Sequence
|
||||
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from agbenchmark.challenges import get_unique_categories
|
||||
from agbenchmark.config import AgentBenchmarkConfig
|
||||
|
||||
load_dotenv()
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def run_benchmark(
|
||||
config: AgentBenchmarkConfig,
|
||||
maintain: bool = False,
|
||||
improve: bool = False,
|
||||
explore: bool = False,
|
||||
tests: tuple[str] = tuple(),
|
||||
categories: tuple[str] = tuple(),
|
||||
skip_categories: tuple[str] = tuple(),
|
||||
mock: bool = False,
|
||||
no_dep: bool = False,
|
||||
no_cutoff: bool = False,
|
||||
cutoff: Optional[int] = None,
|
||||
keep_answers: bool = False,
|
||||
server: bool = False,
|
||||
) -> int:
|
||||
"""
|
||||
Starts the benchmark. If a category flag is provided, only challenges with the
|
||||
corresponding mark will be run.
|
||||
"""
|
||||
import pytest
|
||||
|
||||
from agbenchmark.reports.ReportManager import SingletonReportManager
|
||||
|
||||
validate_args(
|
||||
maintain=maintain,
|
||||
improve=improve,
|
||||
explore=explore,
|
||||
tests=tests,
|
||||
categories=categories,
|
||||
skip_categories=skip_categories,
|
||||
no_cutoff=no_cutoff,
|
||||
cutoff=cutoff,
|
||||
)
|
||||
|
||||
SingletonReportManager()
|
||||
|
||||
for key, value in vars(config).items():
|
||||
logger.debug(f"config.{key} = {repr(value)}")
|
||||
|
||||
pytest_args = ["-vs"]
|
||||
|
||||
if tests:
|
||||
logger.info(f"Running specific test(s): {' '.join(tests)}")
|
||||
pytest_args += [f"--test={t}" for t in tests]
|
||||
else:
|
||||
all_categories = get_unique_categories()
|
||||
|
||||
if categories or skip_categories:
|
||||
categories_to_run = set(categories) or all_categories
|
||||
if skip_categories:
|
||||
categories_to_run = categories_to_run.difference(set(skip_categories))
|
||||
assert categories_to_run, "Error: You can't skip all categories"
|
||||
pytest_args += [f"--category={c}" for c in categories_to_run]
|
||||
logger.info(f"Running tests of category: {categories_to_run}")
|
||||
else:
|
||||
logger.info("Running all categories")
|
||||
|
||||
if maintain:
|
||||
logger.info("Running only regression tests")
|
||||
elif improve:
|
||||
logger.info("Running only non-regression tests")
|
||||
elif explore:
|
||||
logger.info("Only attempt challenges that have never been beaten")
|
||||
|
||||
if mock:
|
||||
# TODO: unhack
|
||||
os.environ[
|
||||
"IS_MOCK"
|
||||
] = "True" # ugly hack to make the mock work when calling from API
|
||||
|
||||
# Pass through flags
|
||||
for flag, active in {
|
||||
"--maintain": maintain,
|
||||
"--improve": improve,
|
||||
"--explore": explore,
|
||||
"--no-dep": no_dep,
|
||||
"--mock": mock,
|
||||
"--nc": no_cutoff,
|
||||
"--keep-answers": keep_answers,
|
||||
}.items():
|
||||
if active:
|
||||
pytest_args.append(flag)
|
||||
|
||||
if cutoff:
|
||||
pytest_args.append(f"--cutoff={cutoff}")
|
||||
logger.debug(f"Setting cuttoff override to {cutoff} seconds.")
|
||||
|
||||
current_dir = Path(__file__).resolve().parent
|
||||
pytest_args.append(str(current_dir / "generate_test.py"))
|
||||
|
||||
pytest_args.append("--cache-clear")
|
||||
exit_code = pytest.main(pytest_args)
|
||||
|
||||
SingletonReportManager.clear_instance()
|
||||
return exit_code
|
||||
|
||||
|
||||
class InvalidInvocationError(ValueError):
|
||||
pass
|
||||
|
||||
|
||||
def validate_args(
|
||||
maintain: bool,
|
||||
improve: bool,
|
||||
explore: bool,
|
||||
tests: Sequence[str],
|
||||
categories: Sequence[str],
|
||||
skip_categories: Sequence[str],
|
||||
no_cutoff: bool,
|
||||
cutoff: Optional[int],
|
||||
) -> None:
|
||||
if categories:
|
||||
all_categories = get_unique_categories()
|
||||
invalid_categories = set(categories) - all_categories
|
||||
if invalid_categories:
|
||||
raise InvalidInvocationError(
|
||||
"One or more invalid categories were specified: "
|
||||
f"{', '.join(invalid_categories)}.\n"
|
||||
f"Valid categories are: {', '.join(all_categories)}."
|
||||
)
|
||||
|
||||
if (maintain + improve + explore) > 1:
|
||||
raise InvalidInvocationError(
|
||||
"You can't use --maintain, --improve or --explore at the same time. "
|
||||
"Please choose one."
|
||||
)
|
||||
|
||||
if tests and (categories or skip_categories or maintain or improve or explore):
|
||||
raise InvalidInvocationError(
|
||||
"If you're running a specific test make sure no other options are "
|
||||
"selected. Please just pass the --test."
|
||||
)
|
||||
|
||||
if no_cutoff and cutoff:
|
||||
raise InvalidInvocationError(
|
||||
"You can't use both --nc and --cutoff at the same time. "
|
||||
"Please choose one."
|
||||
)
|
||||
@@ -4,11 +4,12 @@ import os
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from agbenchmark.config import AgentBenchmarkConfig
|
||||
from agbenchmark.reports.processing.graphs import save_single_radar_chart
|
||||
from agbenchmark.reports.processing.process_report import get_agent_category
|
||||
from agbenchmark.reports.processing.report_types import Report
|
||||
from agbenchmark.utils.data_types import AgentBenchmarkConfig
|
||||
from agbenchmark.utils.utils import get_highest_success_difficulty
|
||||
|
||||
|
||||
@@ -16,32 +17,26 @@ class SingletonReportManager:
|
||||
instance = None
|
||||
|
||||
def __new__(cls):
|
||||
from agbenchmark.reports.agent_benchmark_config import (
|
||||
get_agent_benchmark_config,
|
||||
)
|
||||
|
||||
if not cls.instance:
|
||||
cls.instance = super(SingletonReportManager, cls).__new__(cls)
|
||||
|
||||
agent_benchmark_config = get_agent_benchmark_config()
|
||||
agent_benchmark_config = AgentBenchmarkConfig.load()
|
||||
benchmark_start_time_dt = datetime.now(
|
||||
timezone.utc
|
||||
) # or any logic to fetch the datetime
|
||||
|
||||
# Make the Managers class attributes
|
||||
cls.REGRESSION_MANAGER = ReportManager(
|
||||
agent_benchmark_config.get_regression_reports_path(),
|
||||
agent_benchmark_config.regression_tests_file,
|
||||
benchmark_start_time_dt,
|
||||
)
|
||||
cls.INFO_MANAGER = ReportManager(
|
||||
str(
|
||||
agent_benchmark_config.get_reports_path(benchmark_start_time_dt)
|
||||
/ "report.json"
|
||||
),
|
||||
agent_benchmark_config.get_report_dir(benchmark_start_time_dt)
|
||||
/ "report.json",
|
||||
benchmark_start_time_dt,
|
||||
)
|
||||
cls.INTERNAL_INFO_MANAGER = ReportManager(
|
||||
agent_benchmark_config.get_success_rate_path(), benchmark_start_time_dt
|
||||
agent_benchmark_config.success_rate_file, benchmark_start_time_dt
|
||||
)
|
||||
|
||||
return cls.instance
|
||||
@@ -57,21 +52,20 @@ class SingletonReportManager:
|
||||
class ReportManager:
|
||||
"""Abstracts interaction with the regression tests file"""
|
||||
|
||||
def __init__(self, filename: str, benchmark_start_time: str):
|
||||
self.filename = filename
|
||||
def __init__(self, report_file: Path, benchmark_start_time: datetime):
|
||||
self.report_file = report_file
|
||||
self.start_time = time.time()
|
||||
self.benchmark_start_time = benchmark_start_time
|
||||
|
||||
self.load()
|
||||
|
||||
def load(self) -> None:
|
||||
if not os.path.exists(self.filename):
|
||||
os.makedirs(os.path.dirname(self.filename), exist_ok=True)
|
||||
with open(self.filename, "w") as f:
|
||||
pass
|
||||
if not self.report_file.exists():
|
||||
self.report_file.parent.mkdir(exist_ok=True)
|
||||
self.report_file.touch()
|
||||
|
||||
try:
|
||||
with open(self.filename, "r") as f:
|
||||
with self.report_file.open("r") as f:
|
||||
file_content = (
|
||||
f.read().strip()
|
||||
) # read the content and remove any leading/trailing whitespace
|
||||
@@ -87,7 +81,7 @@ class ReportManager:
|
||||
self.save()
|
||||
|
||||
def save(self) -> None:
|
||||
with open(self.filename, "w") as f:
|
||||
with self.report_file.open("w") as f:
|
||||
json.dump(self.tests, f, indent=4)
|
||||
|
||||
def add_test(self, test_name: str, test_details: dict | list) -> None:
|
||||
@@ -137,7 +131,7 @@ class ReportManager:
|
||||
if len(agent_categories) > 1:
|
||||
save_single_radar_chart(
|
||||
agent_categories,
|
||||
config.get_reports_path(self.benchmark_start_time) / "radar_chart.png",
|
||||
config.get_report_dir(self.benchmark_start_time) / "radar_chart.png",
|
||||
)
|
||||
|
||||
self.save()
|
||||
|
||||
@@ -1,18 +0,0 @@
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from agbenchmark.utils.data_types import AgentBenchmarkConfig
|
||||
|
||||
|
||||
def get_agent_benchmark_config() -> AgentBenchmarkConfig:
|
||||
agent_benchmark_config_path = str(Path.cwd() / "agbenchmark_config" / "config.json")
|
||||
try:
|
||||
with open(agent_benchmark_config_path, "r") as f:
|
||||
agent_benchmark_config = AgentBenchmarkConfig(**json.load(f))
|
||||
agent_benchmark_config.agent_benchmark_config_path = (
|
||||
agent_benchmark_config_path
|
||||
)
|
||||
return agent_benchmark_config
|
||||
except json.JSONDecodeError:
|
||||
print("Error: benchmark_config.json is not a valid JSON file.")
|
||||
raise
|
||||
@@ -1,4 +1,5 @@
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
@@ -9,6 +10,8 @@ from agbenchmark.reports.processing.get_files import (
|
||||
from agbenchmark.reports.processing.report_types import Report, Test
|
||||
from agbenchmark.utils.data_types import STRING_DIFFICULTY_MAP
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def get_reports_data(report_path: str) -> dict[str, Any]:
|
||||
latest_files = get_latest_report_from_agent_directories(report_path)
|
||||
@@ -60,7 +63,7 @@ def all_agent_categories(reports_data: dict[str, Any]) -> dict[str, Any]:
|
||||
for name, report in reports_data.items():
|
||||
categories = get_agent_category(report)
|
||||
if categories: # only add to all_categories if categories is not empty
|
||||
print(f"Adding {name}: {categories}")
|
||||
logger.debug(f"Adding {name}: {categories}")
|
||||
all_categories[name] = categories
|
||||
|
||||
return all_categories
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
from typing import Dict, List
|
||||
from pydantic import BaseModel, constr
|
||||
|
||||
datetime_format = r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\+00:00$"
|
||||
from pydantic import BaseModel, constr
|
||||
|
||||
|
||||
class BaseModelBenchmark(BaseModel):
|
||||
@@ -14,32 +13,32 @@ class TaskInfo(BaseModelBenchmark):
|
||||
is_regression: bool | None
|
||||
answer: str
|
||||
description: str
|
||||
category: List[str]
|
||||
category: list[str]
|
||||
task: str
|
||||
|
||||
|
||||
class RepositoryInfo(BaseModelBenchmark):
|
||||
repo_url: str | None
|
||||
team_name: str | None
|
||||
benchmark_git_commit_sha: str | None
|
||||
agent_git_commit_sha: str | None
|
||||
repo_url: str | None = None
|
||||
team_name: str | None = None
|
||||
agent_git_commit_sha: str | None = None
|
||||
benchmark_git_commit_sha: str | None = None
|
||||
|
||||
|
||||
class Metrics(BaseModelBenchmark):
|
||||
difficulty: str | None
|
||||
cost: float | None = None
|
||||
success: bool
|
||||
success_percentage: float | None
|
||||
run_time: str | None
|
||||
fail_reason: str | None
|
||||
attempted: bool
|
||||
cost: float | None
|
||||
difficulty: str | None = None
|
||||
run_time: str | None = None
|
||||
fail_reason: str | None = None
|
||||
success_percentage: float | None = None
|
||||
|
||||
|
||||
class RunDetails(BaseModelBenchmark):
|
||||
test_name: str
|
||||
run_id: str | None
|
||||
run_id: str | None = None
|
||||
command: str
|
||||
completion_time: str | None
|
||||
completion_time: str | None = None
|
||||
benchmark_start_time: constr(regex=datetime_format)
|
||||
|
||||
|
||||
@@ -48,5 +47,5 @@ class BenchmarkRun(BaseModelBenchmark):
|
||||
run_details: RunDetails
|
||||
task_info: TaskInfo
|
||||
metrics: Metrics
|
||||
reached_cutoff: bool | None
|
||||
config: Dict[str, str | dict[str, str]]
|
||||
reached_cutoff: bool | None = None
|
||||
config: dict[str, str | dict[str, str]]
|
||||
|
||||
@@ -1,20 +1,24 @@
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict
|
||||
|
||||
from agbenchmark.__main__ import CHALLENGES_ALREADY_BEATEN
|
||||
from agbenchmark.reports.agent_benchmark_config import get_agent_benchmark_config
|
||||
import pytest
|
||||
|
||||
from agbenchmark.config import AgentBenchmarkConfig
|
||||
from agbenchmark.reports.ReportManager import SingletonReportManager
|
||||
from agbenchmark.utils.data_types import DifficultyLevel
|
||||
from agbenchmark.utils.data_types import ChallengeData, DifficultyLevel
|
||||
from agbenchmark.utils.get_data_from_helicone import get_data_from_helicone
|
||||
from agbenchmark.utils.utils import calculate_success_percentage
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def get_previous_test_results(
|
||||
test_name: str, info_details: dict[str, Any]
|
||||
) -> list[bool]:
|
||||
agent_tests: dict[str, list[bool]] = {}
|
||||
mock = os.getenv("IS_MOCK") # Check if --mock is in sys.argv
|
||||
|
||||
prev_test_results = SingletonReportManager().INTERNAL_INFO_MANAGER.tests.get(
|
||||
@@ -49,17 +53,14 @@ def update_regression_tests(
|
||||
|
||||
|
||||
def generate_single_call_report(
|
||||
item: Any,
|
||||
call: Any,
|
||||
challenge_data: dict[str, Any],
|
||||
item: pytest.Item,
|
||||
call: pytest.CallInfo,
|
||||
challenge_data: ChallengeData,
|
||||
answers: dict[str, Any],
|
||||
challenge_location,
|
||||
test_name,
|
||||
challenge_location: str,
|
||||
test_name: str,
|
||||
) -> None:
|
||||
try:
|
||||
difficulty = challenge_data["info"]["difficulty"]
|
||||
except KeyError:
|
||||
return None
|
||||
difficulty = challenge_data.info.difficulty
|
||||
|
||||
if isinstance(difficulty, DifficultyLevel):
|
||||
difficulty = difficulty.value
|
||||
@@ -77,10 +78,10 @@ def generate_single_call_report(
|
||||
info_details: Any = {
|
||||
"data_path": challenge_location,
|
||||
"is_regression": False,
|
||||
"category": challenge_data["category"],
|
||||
"task": challenge_data["task"],
|
||||
"answer": challenge_data["ground"]["answer"],
|
||||
"description": challenge_data["info"]["description"],
|
||||
"category": challenge_data.category,
|
||||
"task": challenge_data.task,
|
||||
"answer": challenge_data.ground.answer,
|
||||
"description": challenge_data.info.description,
|
||||
"metrics": {
|
||||
"difficulty": difficulty,
|
||||
"success": False,
|
||||
@@ -91,8 +92,8 @@ def generate_single_call_report(
|
||||
if answers:
|
||||
info_details["answers"] = answers
|
||||
|
||||
if "metadata" in challenge_data:
|
||||
info_details["metadata"] = challenge_data["metadata"]
|
||||
if challenge_data.metadata:
|
||||
info_details["metadata"] = challenge_data.metadata
|
||||
|
||||
mock = os.getenv("IS_MOCK") # Check if --mock is in sys.argv
|
||||
if call:
|
||||
@@ -116,7 +117,9 @@ def generate_single_call_report(
|
||||
return info_details
|
||||
|
||||
|
||||
def finalize_reports(item: Any, challenge_data: dict[str, Any]) -> None:
|
||||
def finalize_reports(
|
||||
config: AgentBenchmarkConfig, item: pytest.Item, challenge_data: ChallengeData
|
||||
) -> None:
|
||||
run_time = dict(item.user_properties).get("run_time")
|
||||
|
||||
info_details = getattr(item, "info_details", {})
|
||||
@@ -126,8 +129,9 @@ def finalize_reports(item: Any, challenge_data: dict[str, Any]) -> None:
|
||||
if run_time is not None:
|
||||
cost = None
|
||||
if "--mock" not in sys.argv and os.environ.get("HELICONE_API_KEY"):
|
||||
print("Getting cost from Helicone")
|
||||
logger.debug("Getting cost from Helicone")
|
||||
cost = get_data_from_helicone(test_name)
|
||||
logger.debug(f"Cost: {cost}")
|
||||
|
||||
info_details["metrics"]["cost"] = cost
|
||||
|
||||
@@ -142,29 +146,33 @@ def finalize_reports(item: Any, challenge_data: dict[str, Any]) -> None:
|
||||
|
||||
info_details["metrics"]["run_time"] = f"{str(round(run_time, 3))} seconds"
|
||||
|
||||
info_details["reached_cutoff"] = float(run_time) > challenge_data["cutoff"]
|
||||
info_details["reached_cutoff"] = float(run_time) > challenge_data.cutoff
|
||||
|
||||
if "--mock" not in sys.argv:
|
||||
update_challenges_already_beaten(info_details, test_name)
|
||||
update_challenges_already_beaten(
|
||||
config.challenges_already_beaten_file, info_details, test_name
|
||||
)
|
||||
if info_details.get("tests") is not None:
|
||||
for nested_test_name, nested_test_info in info_details[
|
||||
"tests"
|
||||
].items():
|
||||
update_challenges_already_beaten(
|
||||
nested_test_info, nested_test_name
|
||||
config.challenges_already_beaten_file,
|
||||
nested_test_info,
|
||||
nested_test_name,
|
||||
)
|
||||
|
||||
SingletonReportManager().INFO_MANAGER.add_test(test_name, info_details)
|
||||
|
||||
|
||||
def update_challenges_already_beaten(
|
||||
info_details: Dict[str, Any], test_name: str
|
||||
challenges_already_beaten_file: Path, info_details: Dict[str, Any], test_name: str
|
||||
) -> None:
|
||||
current_run_successful = info_details["metrics"]["success"]
|
||||
try:
|
||||
with open(CHALLENGES_ALREADY_BEATEN, "r") as f:
|
||||
with open(challenges_already_beaten_file, "r") as f:
|
||||
challenge_data = json.load(f)
|
||||
except:
|
||||
except FileNotFoundError:
|
||||
challenge_data = {}
|
||||
challenge_beaten_in_the_past = challenge_data.get(test_name)
|
||||
|
||||
@@ -172,13 +180,13 @@ def update_challenges_already_beaten(
|
||||
if challenge_beaten_in_the_past is None and not current_run_successful:
|
||||
challenge_data[test_name] = False
|
||||
|
||||
with open(CHALLENGES_ALREADY_BEATEN, "w") as f:
|
||||
with open(challenges_already_beaten_file, "w") as f:
|
||||
json.dump(challenge_data, f, indent=4)
|
||||
|
||||
|
||||
def session_finish(suite_reports: dict) -> None:
|
||||
agent_benchmark_config = get_agent_benchmark_config()
|
||||
|
||||
def session_finish(
|
||||
agbenchmark_config: AgentBenchmarkConfig, suite_reports: dict
|
||||
) -> None:
|
||||
SingletonReportManager().INTERNAL_INFO_MANAGER.save()
|
||||
SingletonReportManager().INFO_MANAGER.end_info_report(agent_benchmark_config)
|
||||
SingletonReportManager().INFO_MANAGER.end_info_report(agbenchmark_config)
|
||||
SingletonReportManager().REGRESSION_MANAGER.save()
|
||||
|
||||
@@ -1,79 +1,14 @@
|
||||
# generated by fastapi-codegen:
|
||||
# filename: ../../postman/schemas/openapi.yaml
|
||||
# timestamp: 2023-08-25T10:36:11+00:00
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
from typing import List, Optional
|
||||
from typing import Optional
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class ArtifactUpload(BaseModel):
|
||||
file: str = Field(..., description="File to upload.", format="binary")
|
||||
relative_path: str = Field(
|
||||
...,
|
||||
description="Relative path of the artifact in the agent's workspace.",
|
||||
example="python/code",
|
||||
)
|
||||
|
||||
|
||||
class Pagination(BaseModel):
|
||||
total_items: int = Field(..., description="Total number of items.", example=42)
|
||||
total_pages: int = Field(..., description="Total number of pages.", example=97)
|
||||
current_page: int = Field(..., description="Current_page page number.", example=1)
|
||||
page_size: int = Field(..., description="Number of items per page.", example=25)
|
||||
|
||||
|
||||
class TaskInput(BaseModel):
|
||||
pass
|
||||
|
||||
|
||||
class Artifact(BaseModel):
|
||||
created_at: datetime = Field(
|
||||
...,
|
||||
description="The creation datetime of the task.",
|
||||
example="2023-01-01T00:00:00Z",
|
||||
json_encoders={datetime: lambda v: v.isoformat()},
|
||||
)
|
||||
modified_at: datetime = Field(
|
||||
...,
|
||||
description="The modification datetime of the task.",
|
||||
example="2023-01-01T00:00:00Z",
|
||||
json_encoders={datetime: lambda v: v.isoformat()},
|
||||
)
|
||||
artifact_id: str = Field(
|
||||
...,
|
||||
description="ID of the artifact.",
|
||||
example="b225e278-8b4c-4f99-a696-8facf19f0e56",
|
||||
)
|
||||
agent_created: bool = Field(
|
||||
...,
|
||||
description="Whether the artifact has been created by the agent.",
|
||||
example=False,
|
||||
)
|
||||
relative_path: str = Field(
|
||||
...,
|
||||
description="Relative path of the artifact in the agents workspace.",
|
||||
example="/my_folder/my_other_folder/",
|
||||
)
|
||||
file_name: str = Field(
|
||||
...,
|
||||
description="Filename of the artifact.",
|
||||
example="main.py",
|
||||
)
|
||||
|
||||
|
||||
class StepInput(BaseModel):
|
||||
pass
|
||||
|
||||
|
||||
class StepOutput(BaseModel):
|
||||
pass
|
||||
|
||||
|
||||
class TaskRequestBody(BaseModel):
|
||||
input: str = Field(
|
||||
...,
|
||||
@@ -86,108 +21,3 @@ class TaskRequestBody(BaseModel):
|
||||
|
||||
class TaskEvalRequestBody(TaskRequestBody):
|
||||
eval_id: str
|
||||
|
||||
|
||||
class Task(TaskRequestBody):
|
||||
created_at: datetime = Field(
|
||||
...,
|
||||
description="The creation datetime of the task.",
|
||||
example="2023-01-01T00:00:00Z",
|
||||
json_encoders={datetime: lambda v: v.isoformat()},
|
||||
)
|
||||
modified_at: datetime = Field(
|
||||
...,
|
||||
description="The modification datetime of the task.",
|
||||
example="2023-01-01T00:00:00Z",
|
||||
json_encoders={datetime: lambda v: v.isoformat()},
|
||||
)
|
||||
task_id: str = Field(
|
||||
...,
|
||||
description="The ID of the task.",
|
||||
example="50da533e-3904-4401-8a07-c49adf88b5eb",
|
||||
)
|
||||
artifacts: Optional[List[Artifact]] = Field(
|
||||
[],
|
||||
description="A list of artifacts that the task has produced.",
|
||||
example=[
|
||||
"7a49f31c-f9c6-4346-a22c-e32bc5af4d8e",
|
||||
"ab7b4091-2560-4692-a4fe-d831ea3ca7d6",
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
class StepRequestBody(BaseModel):
|
||||
name: Optional[str] = Field(
|
||||
None, description="The name of the task step.", example="Write to file"
|
||||
)
|
||||
input: Optional[str] = Field(
|
||||
None,
|
||||
min_length=1,
|
||||
description="Input prompt for the step.",
|
||||
example="Washington",
|
||||
)
|
||||
additional_input: Optional[StepInput] = {}
|
||||
|
||||
|
||||
class Status(Enum):
|
||||
created = "created"
|
||||
running = "running"
|
||||
completed = "completed"
|
||||
|
||||
|
||||
class Step(StepRequestBody):
|
||||
created_at: datetime = Field(
|
||||
...,
|
||||
description="The creation datetime of the task.",
|
||||
example="2023-01-01T00:00:00Z",
|
||||
json_encoders={datetime: lambda v: v.isoformat()},
|
||||
)
|
||||
modified_at: datetime = Field(
|
||||
...,
|
||||
description="The modification datetime of the task.",
|
||||
example="2023-01-01T00:00:00Z",
|
||||
json_encoders={datetime: lambda v: v.isoformat()},
|
||||
)
|
||||
task_id: str = Field(
|
||||
...,
|
||||
description="The ID of the task this step belongs to.",
|
||||
example="50da533e-3904-4401-8a07-c49adf88b5eb",
|
||||
)
|
||||
step_id: str = Field(
|
||||
...,
|
||||
description="The ID of the task step.",
|
||||
example="6bb1801a-fd80-45e8-899a-4dd723cc602e",
|
||||
)
|
||||
name: Optional[str] = Field(
|
||||
None, description="The name of the task step.", example="Write to file"
|
||||
)
|
||||
status: Status = Field(
|
||||
..., description="The status of the task step.", example="created"
|
||||
)
|
||||
output: Optional[str] = Field(
|
||||
None,
|
||||
description="Output of the task step.",
|
||||
example="I am going to use the write_to_file command and write Washington to a file called output.txt <write_to_file('output.txt', 'Washington')",
|
||||
)
|
||||
additional_output: Optional[StepOutput] = {}
|
||||
artifacts: Optional[List[Artifact]] = Field(
|
||||
[], description="A list of artifacts that the step has produced."
|
||||
)
|
||||
is_last: bool = Field(
|
||||
..., description="Whether this is the last step in the task.", example=True
|
||||
)
|
||||
|
||||
|
||||
class TaskListResponse(BaseModel):
|
||||
tasks: Optional[List[Task]] = None
|
||||
pagination: Optional[Pagination] = None
|
||||
|
||||
|
||||
class TaskStepsListResponse(BaseModel):
|
||||
steps: Optional[List[Step]] = None
|
||||
pagination: Optional[Pagination] = None
|
||||
|
||||
|
||||
class TaskArtifactsListResponse(BaseModel):
|
||||
artifacts: Optional[List[Artifact]] = None
|
||||
pagination: Optional[Pagination] = None
|
||||
|
||||
@@ -1,17 +1,20 @@
|
||||
import glob
|
||||
import json
|
||||
import logging
|
||||
import math
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
from abc import ABC
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List
|
||||
from typing import Any, ClassVar, List
|
||||
|
||||
import openai
|
||||
import pytest
|
||||
from colorama import Fore, Style
|
||||
|
||||
from agbenchmark.__main__ import OPTIONAL_CATEGORIES, TEMP_FOLDER_ABS_PATH
|
||||
from agbenchmark.agent_api_interface import run_api_agent
|
||||
from agbenchmark.config import AgentBenchmarkConfig
|
||||
from agbenchmark.utils.data_types import ChallengeData, Ground
|
||||
from agbenchmark.utils.prompts import (
|
||||
END_PROMPT,
|
||||
@@ -19,43 +22,84 @@ from agbenchmark.utils.prompts import (
|
||||
PROMPT_MAP,
|
||||
SCORING_MAP,
|
||||
)
|
||||
from agbenchmark.utils.utils import agent_eligibible_for_optional_categories
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
with open(
|
||||
Path(__file__).parent.parent / "challenges" / "optional_categories.json"
|
||||
) as f:
|
||||
OPTIONAL_CATEGORIES: list[str] = json.load(f)["optional_categories"]
|
||||
|
||||
|
||||
class Challenge(ABC):
|
||||
"""The parent class to all specific challenges classes.
|
||||
Defines helper methods for running a challenge"""
|
||||
|
||||
_data_cache: Dict[str, ChallengeData] = {}
|
||||
CHALLENGE_LOCATION: str = ""
|
||||
scores: dict[str, Any] = {} # this is for suites
|
||||
data: ChallengeData
|
||||
CHALLENGE_LOCATION: ClassVar[str]
|
||||
ARTIFACTS_LOCATION: ClassVar[str]
|
||||
scores: ClassVar[dict[str, Any]] = {} # this is for suites
|
||||
|
||||
@property
|
||||
def data(self) -> ChallengeData:
|
||||
if self.CHALLENGE_LOCATION not in self._data_cache:
|
||||
self._data_cache[self.CHALLENGE_LOCATION] = ChallengeData.deserialize(
|
||||
self.CHALLENGE_LOCATION
|
||||
)
|
||||
return self._data_cache[self.CHALLENGE_LOCATION]
|
||||
@staticmethod
|
||||
def from_challenge_spec(spec_file: Path) -> type["Challenge"]:
|
||||
challenge_data = ChallengeData.parse_file(spec_file)
|
||||
|
||||
@property
|
||||
def task(self) -> str:
|
||||
return self.data.task
|
||||
challenge_class_name = f"Test{challenge_data.name}"
|
||||
logger.debug(f"Creating {challenge_class_name} from spec: {spec_file}")
|
||||
return type(
|
||||
challenge_class_name,
|
||||
(Challenge,),
|
||||
{
|
||||
"data": challenge_data,
|
||||
"CHALLENGE_LOCATION": str(spec_file),
|
||||
"ARTIFACTS_LOCATION": str(spec_file.resolve().parent),
|
||||
},
|
||||
)
|
||||
|
||||
@property
|
||||
def dependencies(self) -> list:
|
||||
return self.data.dependencies
|
||||
# Define test method within the dynamically created class
|
||||
@pytest.mark.asyncio
|
||||
async def test_method(
|
||||
self, config: AgentBenchmarkConfig, request: pytest.FixtureRequest
|
||||
) -> None:
|
||||
# skip optional categories
|
||||
self.skip_optional_categories(config)
|
||||
|
||||
async def setup_challenge(self, config: Dict[str, Any], cutoff: int) -> None:
|
||||
if os.environ.get("HELICONE_API_KEY"):
|
||||
from helicone.lock import HeliconeLockManager
|
||||
|
||||
HeliconeLockManager.write_custom_property("challenge", self.data.name)
|
||||
|
||||
timeout = self.data.cutoff or 60
|
||||
|
||||
if request.config.getoption("--nc"):
|
||||
timeout = 100000
|
||||
elif cutoff := request.config.getoption("--cutoff"):
|
||||
timeout = int(cutoff)
|
||||
|
||||
await self.run_challenge(config, timeout)
|
||||
|
||||
scores = self.get_scores(config.temp_folder)
|
||||
request.node.answers = (
|
||||
scores["answers"] if request.config.getoption("--keep-answers") else None
|
||||
)
|
||||
del scores["answers"] # remove answers from scores
|
||||
request.node.scores = scores # store scores in request.node
|
||||
is_score_100 = 1 in scores["values"]
|
||||
|
||||
assert is_score_100
|
||||
|
||||
async def run_challenge(self, config: AgentBenchmarkConfig, cutoff: int) -> None:
|
||||
from agbenchmark.agent_interface import copy_artifacts_into_temp_folder
|
||||
|
||||
if not self.task:
|
||||
if not self.data.task:
|
||||
return
|
||||
|
||||
print(
|
||||
f"\033[1;35m============Starting {self.data.name} challenge============\033[0m"
|
||||
f"{Fore.MAGENTA + Style.BRIGHT}{'='*24} "
|
||||
f"Starting {self.data.name} challenge"
|
||||
f" {'='*24}{Style.RESET_ALL}"
|
||||
)
|
||||
print(f"\033[1;30mTask: {self.task}\033[0m")
|
||||
print(f"{Fore.BLACK}Task: {self.data.task}{Fore.RESET}")
|
||||
|
||||
await run_api_agent(self.data, config, self.ARTIFACTS_LOCATION, cutoff)
|
||||
|
||||
@@ -66,13 +110,11 @@ class Challenge(ABC):
|
||||
str(Path(self.CHALLENGE_LOCATION).parent),
|
||||
]
|
||||
for path in artifact_paths:
|
||||
copy_artifacts_into_temp_folder(TEMP_FOLDER_ABS_PATH, "custom_python", path)
|
||||
|
||||
def test_method(self, config: Dict[str, Any]) -> None:
|
||||
raise NotImplementedError
|
||||
copy_artifacts_into_temp_folder(config.temp_folder, "custom_python", path)
|
||||
|
||||
@staticmethod
|
||||
def get_artifacts_out(
|
||||
self, workspace: str | dict[str, str], ground: Ground
|
||||
workspace: str | Path | dict[str, str], ground: Ground
|
||||
) -> List[str]:
|
||||
if isinstance(workspace, dict):
|
||||
workspace = workspace["output"]
|
||||
@@ -108,7 +150,7 @@ class Challenge(ABC):
|
||||
if ground.eval.type == "pytest":
|
||||
result = subprocess.run(
|
||||
[sys.executable, "-m", "pytest"],
|
||||
cwd=TEMP_FOLDER_ABS_PATH,
|
||||
cwd=os.path.abspath(workspace),
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
@@ -119,15 +161,17 @@ class Challenge(ABC):
|
||||
|
||||
return files_contents
|
||||
|
||||
def scoring(self, config: Dict[str, Any], content: str, ground: Ground) -> float:
|
||||
print("\033[1;34mScoring content:\033[0m", content)
|
||||
@staticmethod
|
||||
def scoring(content: str, ground: Ground) -> float:
|
||||
print(f"{Fore.BLUE}Scoring content:{Style.RESET_ALL}", content)
|
||||
if ground.should_contain:
|
||||
for should_contain_word in ground.should_contain:
|
||||
if not getattr(ground, "case_sensitive", True):
|
||||
should_contain_word = should_contain_word.lower()
|
||||
content = content.lower()
|
||||
print_content = (
|
||||
f"\033[1;34mWord that should exist\033[0m - {should_contain_word}:"
|
||||
f"{Fore.BLUE}Word that should exist{Style.RESET_ALL}"
|
||||
f" - {should_contain_word}:"
|
||||
)
|
||||
if should_contain_word not in content:
|
||||
print(print_content, "False")
|
||||
@@ -140,7 +184,10 @@ class Challenge(ABC):
|
||||
if not getattr(ground, "case_sensitive", True):
|
||||
should_not_contain_word = should_not_contain_word.lower()
|
||||
content = content.lower()
|
||||
print_content = f"\033[1;34mWord that should not exist\033[0m - {should_not_contain_word}:"
|
||||
print_content = (
|
||||
f"{Fore.BLUE}Word that should not exist{Style.RESET_ALL}"
|
||||
f" - {should_not_contain_word}:"
|
||||
)
|
||||
if should_not_contain_word in content:
|
||||
print(print_content, "False")
|
||||
return 0.0
|
||||
@@ -149,14 +196,17 @@ class Challenge(ABC):
|
||||
|
||||
return 1.0
|
||||
|
||||
def llm_eval(self, config: Dict[str, Any], content: str, ground: Ground) -> float:
|
||||
@classmethod
|
||||
def llm_eval(cls, content: str, ground: Ground) -> float:
|
||||
openai.api_key = os.getenv("OPENAI_API_KEY")
|
||||
if os.getenv("IS_MOCK"):
|
||||
return 1.0
|
||||
|
||||
# the validation for this is done in the Eval BaseModel
|
||||
scoring = SCORING_MAP[ground.eval.scoring] # type: ignore
|
||||
prompt = PROMPT_MAP[ground.eval.template].format(task=self.data.task, scoring=scoring, answer=ground.answer, response=content) # type: ignore
|
||||
prompt = PROMPT_MAP[ground.eval.template].format( # type: ignore
|
||||
task=cls.data.task, scoring=scoring, answer=ground.answer, response=content
|
||||
)
|
||||
|
||||
if ground.eval.examples:
|
||||
prompt += FEW_SHOT_EXAMPLES.format(examples=ground.eval.examples)
|
||||
@@ -172,34 +222,31 @@ class Challenge(ABC):
|
||||
|
||||
return float(answer["choices"][0]["message"]["content"]) # type: ignore
|
||||
|
||||
def get_scores(self, config: Dict[str, Any]) -> dict[str, Any]:
|
||||
@classmethod
|
||||
def get_scores(cls, workspace: Path) -> dict[str, Any]:
|
||||
scores = []
|
||||
scores_dict: Any = {}
|
||||
percentage = None
|
||||
answers = {}
|
||||
try:
|
||||
if self.data.task == "" and os.getenv("IS_MOCK"):
|
||||
if cls.data.task == "" and os.getenv("IS_MOCK"):
|
||||
scores = [1.0]
|
||||
answers = {"mock": "This is a mock answer"}
|
||||
elif isinstance(self.data.ground, Ground):
|
||||
files_contents = self.get_artifacts_out(
|
||||
TEMP_FOLDER_ABS_PATH, self.data.ground
|
||||
)
|
||||
elif isinstance(cls.data.ground, Ground):
|
||||
files_contents = cls.get_artifacts_out(workspace, cls.data.ground)
|
||||
answers = {"answer": files_contents}
|
||||
for file_content in files_contents:
|
||||
score = self.scoring(config, file_content, self.data.ground)
|
||||
print("\033[1;32mYour score is:\033[0m", score)
|
||||
score = cls.scoring(file_content, cls.data.ground)
|
||||
print(f"{Fore.GREEN}Your score is:{Style.RESET_ALL}", score)
|
||||
scores.append(score)
|
||||
|
||||
if self.data.ground.eval.type == "llm":
|
||||
llm_eval = self.llm_eval(
|
||||
config, "\n".join(files_contents), self.data.ground
|
||||
)
|
||||
if self.data.ground.eval.scoring == "percentage":
|
||||
if cls.data.ground.eval.type == "llm":
|
||||
llm_eval = cls.llm_eval("\n".join(files_contents), cls.data.ground)
|
||||
if cls.data.ground.eval.scoring == "percentage":
|
||||
scores.append(math.ceil(llm_eval / 100))
|
||||
elif self.data.ground.eval.scoring == "scale":
|
||||
elif cls.data.ground.eval.scoring == "scale":
|
||||
scores.append(math.ceil(llm_eval / 10))
|
||||
print("\033[1;32mYour score is:\033[0m", llm_eval)
|
||||
print(f"{Fore.GREEN}Your score is:{Style.RESET_ALL}", llm_eval)
|
||||
|
||||
scores.append(llm_eval)
|
||||
except Exception as e:
|
||||
@@ -212,7 +259,7 @@ class Challenge(ABC):
|
||||
"answers": answers,
|
||||
}
|
||||
|
||||
self.scores[self.__class__.__name__] = scores_data
|
||||
cls.scores[cls.__name__] = scores_data
|
||||
|
||||
return scores_data
|
||||
|
||||
@@ -223,14 +270,15 @@ class Challenge(ABC):
|
||||
|
||||
return None
|
||||
|
||||
def skip_optional_categories(self, config: Dict[str, Any]) -> None:
|
||||
challenge_category = self.data.category
|
||||
categories = [
|
||||
category
|
||||
for category in OPTIONAL_CATEGORIES
|
||||
if category in challenge_category
|
||||
]
|
||||
if not agent_eligibible_for_optional_categories(
|
||||
categories, config.get("category", [])
|
||||
@classmethod
|
||||
def skip_optional_categories(cls, config: AgentBenchmarkConfig) -> None:
|
||||
challenge_categories = set(c.value for c in cls.data.category)
|
||||
challenge_optional_categories = challenge_categories & set(OPTIONAL_CATEGORIES)
|
||||
if challenge_optional_categories and not (
|
||||
config.categories
|
||||
and set(challenge_optional_categories).issubset(set(config.categories))
|
||||
):
|
||||
pytest.skip("Agent is not eligible for this category")
|
||||
pytest.skip(
|
||||
f"Category {', '.join(challenge_optional_categories)} is optional, "
|
||||
"and not explicitly selected in the benchmark config."
|
||||
)
|
||||
|
||||
@@ -1,12 +1,8 @@
|
||||
import datetime
|
||||
import json
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import BaseModel, constr, validator
|
||||
from pydantic import BaseModel, Field, constr, validator
|
||||
|
||||
|
||||
class DifficultyLevel(Enum):
|
||||
@@ -33,80 +29,6 @@ DIFFICULTY_MAP = {
|
||||
STRING_DIFFICULTY_MAP = {e.value: DIFFICULTY_MAP[e] for e in DifficultyLevel}
|
||||
|
||||
|
||||
def calculate_info_test_path(base_path: Path, benchmark_start_time: datetime) -> Path:
|
||||
"""
|
||||
Calculates the path to the directory where the test report will be saved.
|
||||
"""
|
||||
# Ensure the reports path exists
|
||||
base_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Get current UTC date-time stamp
|
||||
date_stamp = benchmark_start_time.strftime("%Y%m%dT%H%M%S")
|
||||
|
||||
# Default run name
|
||||
run_name = "full_run"
|
||||
|
||||
# Map command-line arguments to their respective labels
|
||||
arg_labels = {
|
||||
"--test": None,
|
||||
"--category": None,
|
||||
"--maintain": "maintain",
|
||||
"--improve": "improve",
|
||||
"--explore": "explore",
|
||||
}
|
||||
|
||||
# Identify the relevant command-line argument
|
||||
for arg, label in arg_labels.items():
|
||||
if arg in sys.argv:
|
||||
test_arg = sys.argv[sys.argv.index(arg) + 1] if label is None else None
|
||||
run_name = arg.strip("--")
|
||||
if test_arg:
|
||||
run_name = f"{run_name}_{test_arg}"
|
||||
break
|
||||
|
||||
# Create the full new directory path with ISO standard UTC date-time stamp
|
||||
report_path = base_path / f"{date_stamp}_{run_name}"
|
||||
|
||||
# Ensure the new directory is created
|
||||
report_path.mkdir(exist_ok=True)
|
||||
return report_path
|
||||
|
||||
|
||||
class AgentBenchmarkConfig(BaseModel):
|
||||
"""
|
||||
This class represents the configuration for the Agent agbenchmark.
|
||||
It includes the following attributes:
|
||||
- agent_benchmark_config_path: The path to the agent benchmark config that this object was created from.
|
||||
- reports_folder: The path to the folder where the benchmark reports will be stored.
|
||||
- host: The host where the benchmark is run.
|
||||
"""
|
||||
|
||||
agent_benchmark_config_path: Path | None = None
|
||||
reports_folder: Path | None = None
|
||||
host: str | None
|
||||
|
||||
def get_reports_location(self) -> Path:
|
||||
# if not self.reports_folder:
|
||||
# self.reports_folder = (
|
||||
# Path(self.agent_benchmark_config_path).parent / "reports"
|
||||
# ).resolve()
|
||||
return Path.cwd() / "agbenchmark_config" / "reports"
|
||||
|
||||
def get_reports_path(self, benchmark_start_time: datetime) -> Path:
|
||||
return calculate_info_test_path(
|
||||
self.get_reports_location(), benchmark_start_time
|
||||
)
|
||||
|
||||
def get_regression_reports_path(self) -> Path:
|
||||
return self.get_reports_location() / "regression_tests.json"
|
||||
|
||||
def get_success_rate_path(self) -> Path:
|
||||
return self.get_reports_location() / "success_rate.json"
|
||||
|
||||
def get_agent_home_directory(self) -> Path:
|
||||
return Path(self.agent_benchmark_config_path).resolve().parent
|
||||
|
||||
|
||||
class Info(BaseModel):
|
||||
difficulty: DifficultyLevel
|
||||
description: constr(regex=r"^Tests if the agent can.*")
|
||||
@@ -180,6 +102,7 @@ class Category(str, Enum):
|
||||
|
||||
|
||||
class ChallengeData(BaseModel):
|
||||
eval_id: str = ""
|
||||
name: str
|
||||
category: List[Category]
|
||||
task: str
|
||||
@@ -189,73 +112,4 @@ class ChallengeData(BaseModel):
|
||||
info: Info | Dict[str, Info]
|
||||
metadata: Optional[Dict[str, Any]] = None
|
||||
|
||||
def serialize(self, path: str) -> None:
|
||||
with open(path, "w") as file:
|
||||
file.write(self.json())
|
||||
|
||||
def get_data(self) -> dict:
|
||||
return self.dict()
|
||||
|
||||
@staticmethod
|
||||
def get_json_from_path(json_path: Path | str) -> dict:
|
||||
path = Path(json_path).resolve()
|
||||
with open(path, "r") as file:
|
||||
data = json.load(file)
|
||||
return data
|
||||
|
||||
@staticmethod
|
||||
def deserialize(path: str) -> "ChallengeData":
|
||||
# this script is in root/agbenchmark/utils/define_task_types.py
|
||||
script_dir = Path(__file__).resolve().parent.parent.parent
|
||||
json_path = script_dir / Path(path)
|
||||
|
||||
with open(json_path, "r") as file:
|
||||
data = json.load(file)
|
||||
try:
|
||||
return ChallengeData(**data)
|
||||
except:
|
||||
test = "ok"
|
||||
|
||||
def challenge_from_datum(self, file_datum: list[dict[str, Any]]) -> "ChallengeData":
|
||||
same_task_data = {
|
||||
"name": self.prefix,
|
||||
"dependencies": self.dependencies,
|
||||
"category": self.shared_category,
|
||||
"task": self.task,
|
||||
"cutoff": self.cutoff,
|
||||
}
|
||||
|
||||
if not self.info:
|
||||
same_task_data["info"] = {
|
||||
datum["name"]: datum["info"] for datum in file_datum
|
||||
}
|
||||
else:
|
||||
same_task_data["info"] = self.info
|
||||
|
||||
if not self.ground:
|
||||
same_task_data["ground"] = {
|
||||
datum["name"]: datum["ground"] for datum in file_datum
|
||||
}
|
||||
else:
|
||||
same_task_data["ground"] = self.ground
|
||||
|
||||
return ChallengeData(**same_task_data)
|
||||
|
||||
def challenge_from_test_data(self, data: dict[str, Any]) -> "ChallengeData":
|
||||
same_task_data = {
|
||||
"name": data["name"],
|
||||
"dependencies": data["dependencies"],
|
||||
"category": data["category"],
|
||||
"info": data["info"],
|
||||
"ground": data["ground"],
|
||||
}
|
||||
|
||||
if self.same_task:
|
||||
same_task_data["category"].extend(self.shared_category)
|
||||
same_task_data["task"] = self.task
|
||||
same_task_data["cutoff"] = self.cutoff
|
||||
else:
|
||||
same_task_data["task"] = data["task"]
|
||||
same_task_data["cutoff"] = data["cutoff"]
|
||||
|
||||
return ChallengeData(**same_task_data)
|
||||
spec_file: Path | None = Field(None, exclude=True)
|
||||
|
||||
@@ -1,3 +1,5 @@
|
||||
import json
|
||||
import logging
|
||||
import math
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Tuple
|
||||
@@ -11,6 +13,8 @@ from pyvis.network import Network
|
||||
from agbenchmark.generate_test import DATA_CATEGORY
|
||||
from agbenchmark.utils.utils import write_pretty_json
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def bezier_curve(
|
||||
src: np.ndarray, ctrl: List[float], dst: np.ndarray
|
||||
@@ -221,8 +225,8 @@ def graph_interactive_network(
|
||||
f"{source_id_str}_to_{target_id_str}" # Construct a unique edge id
|
||||
)
|
||||
if not (source_id_str in nt.get_nodes() and target_id_str in nt.get_nodes()):
|
||||
print(
|
||||
f"Skipping edge {source_id_str} -> {target_id_str} due to missing nodes."
|
||||
logger.warning(
|
||||
f"Skipping edge {source_id_str} -> {target_id_str} due to missing nodes"
|
||||
)
|
||||
continue
|
||||
nt.add_edge(source_id_str, target_id_str, id=edge_id_str)
|
||||
@@ -271,9 +275,12 @@ def graph_interactive_network(
|
||||
"layout": {"hierarchical": hierarchical_options},
|
||||
}
|
||||
|
||||
# Serialize the graph to JSON
|
||||
# Serialize the graph to JSON and save in appropriate locations
|
||||
graph_data = {"nodes": nt.nodes, "edges": nt.edges}
|
||||
logger.debug(f"Generated graph data:\n{json.dumps(graph_data, indent=4)}")
|
||||
|
||||
# FIXME: use more reliable method to find the right location for these files.
|
||||
# This will fail in all cases except if run from the root of our repo.
|
||||
home_path = Path.cwd()
|
||||
write_pretty_json(graph_data, home_path / "frontend" / "public" / "graph.json")
|
||||
|
||||
@@ -284,7 +291,6 @@ def graph_interactive_network(
|
||||
# this literally only works in the AutoGPT repo, but this part of the code is not reached if BUILD_SKILL_TREE is false
|
||||
write_pretty_json(graph_data, flutter_app_path / "tree_structure.json")
|
||||
validate_skill_tree(graph_data, "")
|
||||
import json
|
||||
|
||||
# Extract node IDs with category "coding"
|
||||
|
||||
@@ -317,9 +323,6 @@ def graph_interactive_network(
|
||||
scrape_synthesize_tree,
|
||||
flutter_app_path / "scrape_synthesize_tree_structure.json",
|
||||
)
|
||||
# If you want to convert back to JSON
|
||||
filtered_json = json.dumps(graph_data, indent=4)
|
||||
print(filtered_json)
|
||||
|
||||
if html_graph_path:
|
||||
file_path = str(Path(html_graph_path).resolve())
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from typing import Optional
|
||||
|
||||
@@ -7,6 +8,8 @@ import requests
|
||||
from agbenchmark.__main__ import BENCHMARK_START_TIME
|
||||
from agbenchmark.agent_interface import HELICONE_GRAPHQL_LOGS
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def get_data_from_helicone(challenge: str) -> Optional[float]:
|
||||
# Define the endpoint of your GraphQL server
|
||||
@@ -38,8 +41,8 @@ query ExampleQuery($properties: [PropertyFilter!]){
|
||||
]
|
||||
}
|
||||
if HELICONE_GRAPHQL_LOGS:
|
||||
print(query)
|
||||
print(json.dumps(variables, indent=4))
|
||||
logger.debug(f"Executing Helicone query:\n{query.strip()}")
|
||||
logger.debug(f"Query variables:\n{json.dumps(variables, indent=4)}")
|
||||
|
||||
operation_name = "ExampleQuery"
|
||||
|
||||
@@ -59,24 +62,22 @@ query ExampleQuery($properties: [PropertyFilter!]){
|
||||
|
||||
data = response.json()
|
||||
except requests.HTTPError as http_err:
|
||||
print(f"HTTP error occurred: {http_err}")
|
||||
return None # Re-raise the exception to stop execution
|
||||
logger.error(f"Helicone returned an HTTP error: {http_err}")
|
||||
return None
|
||||
except json.JSONDecodeError:
|
||||
print(f"Invalid JSON response: {response.text if response else 'No response'}")
|
||||
raw_response = response.text # type: ignore
|
||||
logger.error(
|
||||
f"Helicone returned an invalid JSON response: '''{raw_response}'''"
|
||||
)
|
||||
return None
|
||||
except Exception as err:
|
||||
print(f"Other error occurred: {err}")
|
||||
logger.error(f"Error while trying to get data from Helicone: {err}")
|
||||
return None
|
||||
|
||||
try:
|
||||
if data is None or data.get("data") is None:
|
||||
print("Invalid response received from server: no data")
|
||||
return None
|
||||
return (
|
||||
data.get("data", {})
|
||||
.get("aggregatedHeliconeRequest", {})
|
||||
.get("costUSD", None)
|
||||
)
|
||||
except Exception as err:
|
||||
print(f"Error occurred while parsing response: {err}")
|
||||
if data is None or data.get("data") is None:
|
||||
logger.error("Invalid response received from Helicone: no data")
|
||||
logger.error(f"Offending response: {response}")
|
||||
return None
|
||||
return (
|
||||
data.get("data", {}).get("aggregatedHeliconeRequest", {}).get("costUSD", None)
|
||||
)
|
||||
|
||||
74
benchmark/agbenchmark/utils/logging.py
Normal file
74
benchmark/agbenchmark/utils/logging.py
Normal file
@@ -0,0 +1,74 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
|
||||
from colorama import Fore, Style
|
||||
|
||||
SIMPLE_LOG_FORMAT = "[%(asctime)s] %(levelname)s %(message)s"
|
||||
DEBUG_LOG_FORMAT = "[%(asctime)s] %(levelname)s %(filename)s:%(lineno)03d %(message)s"
|
||||
|
||||
|
||||
def configure_logging(
|
||||
level: int = logging.INFO,
|
||||
) -> None:
|
||||
"""Configure the native logging module."""
|
||||
|
||||
# Auto-adjust default log format based on log level
|
||||
log_format = DEBUG_LOG_FORMAT if level == logging.DEBUG else SIMPLE_LOG_FORMAT
|
||||
|
||||
console_handler = logging.StreamHandler()
|
||||
console_handler.setFormatter(FancyConsoleFormatter(log_format))
|
||||
|
||||
# Configure the root logger
|
||||
logging.basicConfig(
|
||||
level=level,
|
||||
format=log_format,
|
||||
handlers=[console_handler],
|
||||
)
|
||||
|
||||
|
||||
class FancyConsoleFormatter(logging.Formatter):
|
||||
"""
|
||||
A custom logging formatter designed for console output.
|
||||
|
||||
This formatter enhances the standard logging output with color coding. The color
|
||||
coding is based on the level of the log message, making it easier to distinguish
|
||||
between different types of messages in the console output.
|
||||
|
||||
The color for each level is defined in the LEVEL_COLOR_MAP class attribute.
|
||||
"""
|
||||
|
||||
# level -> (level & text color, title color)
|
||||
LEVEL_COLOR_MAP = {
|
||||
logging.DEBUG: Fore.LIGHTBLACK_EX,
|
||||
logging.INFO: Fore.BLUE,
|
||||
logging.WARNING: Fore.YELLOW,
|
||||
logging.ERROR: Fore.RED,
|
||||
logging.CRITICAL: Fore.RED + Style.BRIGHT,
|
||||
}
|
||||
|
||||
def format(self, record: logging.LogRecord) -> str:
|
||||
# Make sure `msg` is a string
|
||||
if not hasattr(record, "msg"):
|
||||
record.msg = ""
|
||||
elif not type(record.msg) is str:
|
||||
record.msg = str(record.msg)
|
||||
|
||||
# Justify the level name to 5 characters minimum
|
||||
record.levelname = record.levelname.ljust(5)
|
||||
|
||||
# Determine default color based on error level
|
||||
level_color = ""
|
||||
if record.levelno in self.LEVEL_COLOR_MAP:
|
||||
level_color = self.LEVEL_COLOR_MAP[record.levelno]
|
||||
record.levelname = f"{level_color}{record.levelname}{Style.RESET_ALL}"
|
||||
|
||||
# Determine color for message
|
||||
color = getattr(record, "color", level_color)
|
||||
color_is_specified = hasattr(record, "color")
|
||||
|
||||
# Don't color INFO messages unless the color is explicitly specified.
|
||||
if color and (record.levelno != logging.INFO or color_is_specified):
|
||||
record.msg = f"{color}{record.msg}{Style.RESET_ALL}"
|
||||
|
||||
return super().format(record)
|
||||
@@ -1,18 +1,22 @@
|
||||
# radio charts, logs, helper functions for tests, anything else relevant.
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Any, List, Optional
|
||||
from typing import Any, Optional
|
||||
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
from agbenchmark.utils.data_types import DIFFICULTY_MAP, DifficultyLevel
|
||||
|
||||
load_dotenv()
|
||||
|
||||
AGENT_NAME = os.getenv("AGENT_NAME")
|
||||
REPORT_LOCATION = os.getenv("REPORT_LOCATION", None)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def replace_backslash(value: Any) -> Any:
|
||||
if isinstance(value, str):
|
||||
@@ -72,8 +76,9 @@ def get_highest_success_difficulty(
|
||||
highest_difficulty = DifficultyLevel[highest_difficulty_str]
|
||||
highest_difficulty_level = DIFFICULTY_MAP[highest_difficulty]
|
||||
except KeyError:
|
||||
print(
|
||||
f"Unexpected difficulty level '{highest_difficulty_str}' in test '{test_name}'"
|
||||
logger.warning(
|
||||
f"Unexpected difficulty level '{highest_difficulty_str}' "
|
||||
f"in test '{test_name}'"
|
||||
)
|
||||
continue
|
||||
else:
|
||||
@@ -88,12 +93,21 @@ def get_highest_success_difficulty(
|
||||
highest_difficulty = difficulty_enum
|
||||
highest_difficulty_level = difficulty_level
|
||||
except KeyError:
|
||||
print(
|
||||
f"Unexpected difficulty level '{difficulty_str}' in test '{test_name}'"
|
||||
logger.warning(
|
||||
f"Unexpected difficulty level '{difficulty_str}' "
|
||||
f"in test '{test_name}'"
|
||||
)
|
||||
continue
|
||||
except Exception:
|
||||
print(f"Make sure you selected the right test, no reports were generated.")
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
"An unexpected error [1] occurred while analyzing report [2]."
|
||||
"Please notify a maintainer.\n"
|
||||
f"Report data [1]: {data}\n"
|
||||
f"Error [2]: {e}"
|
||||
)
|
||||
logger.warning(
|
||||
"Make sure you selected the right test, no reports were generated."
|
||||
)
|
||||
break
|
||||
|
||||
if highest_difficulty is not None:
|
||||
@@ -116,22 +130,13 @@ def get_highest_success_difficulty(
|
||||
# remote_url = remote_url[:-4]
|
||||
# git_commit_sha = f"{remote_url}/tree/{repo.head.commit.hexsha}"
|
||||
|
||||
# # print(f"GIT_COMMIT_SHA: {git_commit_sha}")
|
||||
# # logger.debug(f"GIT_COMMIT_SHA: {git_commit_sha}")
|
||||
# return git_commit_sha
|
||||
# except Exception:
|
||||
# # print(f"{directory} is not a git repository!")
|
||||
# # logger.error(f"{directory} is not a git repository!")
|
||||
# return None
|
||||
|
||||
|
||||
def agent_eligibible_for_optional_categories(
|
||||
optional_challenge_categories: List, agent_categories: List
|
||||
) -> bool:
|
||||
for element in optional_challenge_categories:
|
||||
if element not in agent_categories:
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def write_pretty_json(data, json_file):
|
||||
sorted_data = deep_sort(data)
|
||||
json_graph = json.dumps(sorted_data, indent=4)
|
||||
|
||||
1787
benchmark/poetry.lock
generated
1787
benchmark/poetry.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -32,6 +32,8 @@ python-multipart = "^0.0.6"
|
||||
toml = "^0.10.2"
|
||||
helicone = "^1.0.9"
|
||||
httpx = "^0.24.0"
|
||||
agent-protocol-client = "^1.1.0"
|
||||
click-default-group = "^1.2.4"
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
flake8 = "^3.9.2"
|
||||
|
||||
@@ -1,121 +0,0 @@
|
||||
import io
|
||||
import json
|
||||
import logging
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from random import randint
|
||||
from typing import Annotated, Any, Dict, List
|
||||
|
||||
from fastapi import FastAPI, File, Form, HTTPException, UploadFile
|
||||
from fastapi.responses import StreamingResponse
|
||||
from pydantic import BaseModel
|
||||
|
||||
# Set up logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
app = FastAPI()
|
||||
artifacts: List[Dict[str, Any]] = []
|
||||
|
||||
|
||||
class Task(BaseModel):
|
||||
input: str
|
||||
|
||||
|
||||
@app.post("/agent/tasks/{task_id}/artifacts")
|
||||
async def upload_file(
|
||||
task_id: str, file: Annotated[UploadFile, File()], relative_path: str = Form("")
|
||||
) -> Dict[str, Any]:
|
||||
logger.info(
|
||||
"Uploading file for task_id: %s with relative path: %s", task_id, relative_path
|
||||
)
|
||||
absolute_directory_path = Path(__file__).parent.absolute()
|
||||
save_path = (
|
||||
absolute_directory_path
|
||||
/ "agent/gpt-engineer"
|
||||
/ "projects/my-new-project/workspace"
|
||||
)
|
||||
|
||||
random_string = str(randint(0, 100000))
|
||||
while random_string in artifacts:
|
||||
random_string = str(randint(0, 100000))
|
||||
|
||||
artifact_data = await file.read()
|
||||
artifacts.append(
|
||||
{
|
||||
"binary": artifact_data,
|
||||
"relative_path": relative_path,
|
||||
"file_name": file.filename,
|
||||
"artifact_id": random_string,
|
||||
}
|
||||
)
|
||||
|
||||
print(artifacts)
|
||||
return {
|
||||
"artifact_id": random_string,
|
||||
"file_name": "file_name",
|
||||
"relative_path": "relative_path",
|
||||
}
|
||||
|
||||
|
||||
@app.get("/agent/tasks/{task_id}/artifacts")
|
||||
async def get_files() -> List[Dict[str, Any]]:
|
||||
logger.info("Fetching list of files for task")
|
||||
return artifacts
|
||||
|
||||
|
||||
@app.get("/agent/tasks/{task_id}/artifacts/{artifact_id}")
|
||||
async def get_file(artifact_id: str):
|
||||
for artifact in artifacts:
|
||||
if artifact["artifact_id"] == artifact_id:
|
||||
break
|
||||
else:
|
||||
logger.error("Attempt to access nonexistent artifact with ID: %s", artifact_id)
|
||||
raise HTTPException(status_code=404, detail="Artifact not found")
|
||||
|
||||
logger.info("Fetching artifact with ID: %s", artifact_id)
|
||||
# find aritifact where artifact_id = artifact_id
|
||||
|
||||
for artifact in artifacts:
|
||||
if artifact["artifact_id"] == artifact_id:
|
||||
return StreamingResponse(
|
||||
io.BytesIO(artifact["binary"]),
|
||||
media_type="application/octet-stream",
|
||||
headers={"Content-Disposition": f"attachment; filename=test.txt"},
|
||||
)
|
||||
# return 404
|
||||
return HTTPException(status_code=404, detail="Artifact not found")
|
||||
|
||||
|
||||
@app.post("/agent/tasks/{task_id}/steps")
|
||||
async def create_steps(task_id: str):
|
||||
logger.info("Creating step for task_id: %s", task_id)
|
||||
return {
|
||||
"input": "random",
|
||||
"additional_input": {},
|
||||
"task_id": task_id,
|
||||
"step_id": "random_step",
|
||||
"name": "random",
|
||||
"status": "created",
|
||||
"output": "random",
|
||||
"additional_output": {},
|
||||
"artifacts": [],
|
||||
"is_last": True,
|
||||
}
|
||||
|
||||
|
||||
@app.post("/agent/tasks")
|
||||
async def create_tasks(task: Task):
|
||||
artifacts.clear()
|
||||
return {
|
||||
"input": "random",
|
||||
"additional_input": {},
|
||||
"task_id": "static_task_id",
|
||||
"artifacts": [],
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
|
||||
uvicorn.run(app, host="0.0.0.0", port=8000)
|
||||
Reference in New Issue
Block a user