move config details further into doc (#2092)

2025-12-18 14:44:21 +01:00 · 2025-04-08 16:09:40 -04:00
parent 8b5ff8a4a9
commit 4f9f21ac90
1 changed files with 63 additions and 53 deletions
--- a/documentation/docs/guides/benchmarking.md
+++ b/documentation/docs/guides/benchmarking.md
@@ -1,83 +1,13 @@
 ---
 sidebar_position: 7
 ---
 # Benchmarking with Goose
-The Goose benchmarking system allows you to evaluate goose performance on complex tasks with one or more system configurations.<br></br>
+The Goose benchmarking system allows you to evaluate goose performance on complex tasks with one or more system
 configurations.<br></br>
 This guide covers how to use the `goose bench` command to run benchmarks and analyze results.
 ## Configuration File
 The benchmark configuration is specified in a JSON file with the following structure:
 ```json
 {
  "models": [
    {
      "provider": "databricks",
      "name": "goose",
      "parallel_safe": true,
      "tool_shim": {
        "use_tool_shim": false,
        "tool_shim_model": null
      }
    }
  ],
  "evals": [
    {
      "selector": "core",
      "post_process_cmd": null,
      "parallel_safe": true
    }
  ],
  "include_dirs": [],
  "repeat": 2,
  "run_id": null,
  "eval_result_filename": "eval-results.json",
  "run_summary_filename": "run-results-summary.json",
  "env_file": null
 }
 ```
 ### Configuration Options
 #### Models Section
 Each model entry in the `models` array specifies:
 - `provider`: The model provider (e.g., "databricks")
 - `name`: Model identifier
 - `parallel_safe`: Whether the model can be run in parallel
 - `tool_shim`: Optional configuration for tool shimming
  - `use_tool_shim`: Enable/disable tool shimming
  - `tool_shim_model`: Optional model to use for tool shimming
 #### Evals Section
 Each evaluation entry in the `evals` array specifies:
 - `selector`: The evaluation suite to run (e.g., "core")
 - `post_process_cmd`: Optional path to a post-processing script
 - `parallel_safe`: Whether the evaluation can run in parallel
 #### General Options
 - `include_dirs`: Additional directories to include in the evaluation
 - `repeat`: Number of times to repeat each evaluation
 - `run_id`: Optional identifier for the benchmark run
 - `eval_result_filename`: Name of the evaluation results file
 - `run_summary_filename`: Name of the summary results file
 - `env_file`: Optional path to an environment file
 ##### Mechanics of include_dirs option
 The `include_dirs` config parameter makes the items at all paths listed within the option, available to all evaluations.<br></br>
 It accomplishes this by:
 * copying each included asset into the top-level directory created for each model/provider pair
 * at evaluation run-time
  * whichever assets is explicitly required by an evaluation gets copied into the eval-specific directory
  * only if the evaluation-code specifically pulls it in
  * and only if the evaluation actually is covered by one of the configured selectors and therefore runs
 ## Running Benchmarks
 ### Quick Start
 1. The benchmarking system includes several evaluation suites.<br></br>
@@ -117,11 +47,87 @@ cat bench-config.json
 goose bench run -c bench-config.json
 ```
 ## Configuration File
 The benchmark configuration is specified in a JSON file with the following structure:
 ```json
 {
  "models": [
    {
      "provider": "databricks",
      "name": "goose",
      "parallel_safe": true,
      "tool_shim": {
        "use_tool_shim": false,
        "tool_shim_model": null
      }
    }
  ],
  "evals": [
    {
      "selector": "core",
      "post_process_cmd": null,
      "parallel_safe": true
    }
  ],
  "include_dirs": [],
  "repeat": 2,
  "run_id": null,
  "eval_result_filename": "eval-results.json",
  "run_summary_filename": "run-results-summary.json",
  "env_file": null
 }
 ```
 ### Configuration Options
 #### Models Section
 Each model entry in the `models` array specifies:
 - `provider`: The model provider (e.g., "databricks")
 - `name`: Model identifier
 - `parallel_safe`: Whether the model can be run in parallel
 - `tool_shim`: Optional configuration for tool shimming
    - `use_tool_shim`: Enable/disable tool shimming
    - `tool_shim_model`: Optional model to use for tool shimming
 #### Evals Section
 Each evaluation entry in the `evals` array specifies:
 - `selector`: The evaluation suite to run (e.g., "core")
 - `post_process_cmd`: Optional path to a post-processing script
 - `parallel_safe`: Whether the evaluation can run in parallel
 #### General Options
 - `include_dirs`: Additional directories to include in the evaluation
 - `repeat`: Number of times to repeat each evaluation
 - `run_id`: Optional identifier for the benchmark run
 - `eval_result_filename`: Name of the evaluation results file
 - `run_summary_filename`: Name of the summary results file
 - `env_file`: Optional path to an environment file
 ##### Mechanics of include_dirs option
 The `include_dirs` config parameter makes the items at all paths listed within the option, available to all
 evaluations.<br></br>
 It accomplishes this by:
 * copying each included asset into the top-level directory created for each model/provider pair
 * at evaluation run-time
    * whichever assets is explicitly required by an evaluation gets copied into the eval-specific directory
    * only if the evaluation-code specifically pulls it in
    * and only if the evaluation actually is covered by one of the configured selectors and therefore runs
 ### Customizing Evaluations
 You can customize runs in several ways:
 1. Using Post-Processing Commands after evaluation:
 ```json
 {
  "evals": [
@@ -135,6 +141,7 @@ You can customize runs in several ways:
 ```
 2. Including Additional Data:
 ```json
 {
  "include_dirs": [
@@ -144,6 +151,7 @@ You can customize runs in several ways:
 ```
 3. Setting Environment Variables:
 ```json
 {
  "env_file": "/path/to/env-file"
@@ -154,6 +162,7 @@ You can customize runs in several ways:
 The benchmark generates two main output files within a file-hierarchy similar to the following.<br></br>
 Results from running ach model/provider pair are stored within their own directory:
 ```bash
 benchmark-${datetime}/
  ${model}-${provider}[-tool-shim[-${shim-model}]]/
@@ -185,5 +194,6 @@ RUST_LOG=debug goose bench bench-config.json
 ### Tool Shimming
-Tool shimming allows you to use a non-tool-capable models with Goose, provided Ollama is installed on the system.<br></br>
+Tool shimming allows you to use a non-tool-capable models with Goose, provided Ollama is installed on the
 system.<br></br>
 See this guide for important details on [tool shimming](experimental-features).