docs: move topics to tutorials section (#3297)

2026-02-21 22:44:24 +01:00 · 2025-07-08 07:36:21 -07:00
parent 08147d9bab
commit f02be09fbb
3 changed files with 4 additions and 5 deletions
--- a/documentation/docs/tutorials/benchmarking.md
+++ b/documentation/docs/tutorials/benchmarking.md
@@ -0,0 +1,199 @@
+---
+title: Benchmarking with Goose
+sidebar_label: Benchmark with Goose
+---
+
+The Goose benchmarking system allows you to evaluate goose performance on complex tasks with one or more system
+configurations.<br></br>
+This guide covers how to use the `goose bench` command to run benchmarks and analyze results.
+
+### Quick Start
+
+1. The benchmarking system includes several evaluation suites.<br></br>
+   Run the following to see a listing of every valid selector:
+
+```bash
+goose bench selectors
+```
+
+2. Create a basic configuration file:
+
+```bash
+goose bench init-config -n bench-config.json
+cat bench-config.json
+{
+  "models": [
+    {
+      "provider": "databricks",
+      "name": "goose",
+      "parallel_safe": true
+    }
+  ],
+  "evals": [
+    {
+      "selector": "core",
+      "parallel_safe": true
+    }
+  ],
+  "repeat": 1
+}
+...etc.
+```
+
+2. Run the benchmark:
+
+```bash
+goose bench run -c bench-config.json
+```
+
+## Configuration File
+
+The benchmark configuration is specified in a JSON file with the following structure:
+
+```json
+{
+  "models": [
+    {
+      "provider": "databricks",
+      "name": "goose",
+      "parallel_safe": true,
+      "tool_shim": {
+        "use_tool_shim": false,
+        "tool_shim_model": null
+      }
+    }
+  ],
+  "evals": [
+    {
+      "selector": "core",
+      "post_process_cmd": null,
+      "parallel_safe": true
+    }
+  ],
+  "include_dirs": [],
+  "repeat": 2,
+  "run_id": null,
+  "eval_result_filename": "eval-results.json",
+  "run_summary_filename": "run-results-summary.json",
+  "env_file": null
+}
+```
+
+### Configuration Options
+
+#### Models Section
+
+Each model entry in the `models` array specifies:
+
+- `provider`: The model provider (e.g., "databricks")
+- `name`: Model identifier
+- `parallel_safe`: Whether the model can be run in parallel
+- `tool_shim`: Optional configuration for tool shimming
+    - `use_tool_shim`: Enable/disable tool shimming
+    - `tool_shim_model`: Optional model to use for tool shimming
+
+#### Evals Section
+
+Each evaluation entry in the `evals` array specifies:
+
+- `selector`: The evaluation suite to run (e.g., "core")
+- `post_process_cmd`: Optional path to a post-processing script
+- `parallel_safe`: Whether the evaluation can run in parallel
+
+#### General Options
+
+- `include_dirs`: Additional directories to include in the evaluation
+- `repeat`: Number of times to repeat each evaluation
+- `run_id`: Optional identifier for the benchmark run
+- `eval_result_filename`: Name of the evaluation results file
+- `run_summary_filename`: Name of the summary results file
+- `env_file`: Optional path to an environment file
+
+##### Mechanics of include_dirs option
+
+The `include_dirs` config parameter makes the items at all paths listed within the option, available to all
+evaluations.<br></br>
+It accomplishes this by:
+
+* copying each included asset into the top-level directory created for each model/provider pair
+* at evaluation run-time
+    * whichever assets is explicitly required by an evaluation gets copied into the eval-specific directory
+    * only if the evaluation-code specifically pulls it in
+    * and only if the evaluation actually is covered by one of the configured selectors and therefore runs
+
+### Customizing Evaluations
+
+You can customize runs in several ways:
+
+1. Using Post-Processing Commands after evaluation:
+
+```json
+{
+  "evals": [
+    {
+      "selector": "core",
+      "post_process_cmd": "/path/to/process-script.sh",
+      "parallel_safe": true
+    }
+  ]
+}
+```
+
+2. Including Additional Data:
+
+```json
+{
+  "include_dirs": [
+    "/path/to/custom/eval/data"
+  ]
+}
+```
+
+3. Setting Environment Variables:
+
+```json
+{
+  "env_file": "/path/to/env-file"
+}
+```
+
+## Output and Results
+
+The benchmark generates two main output files within a file-hierarchy similar to the following.<br></br>
+Results from running ach model/provider pair are stored within their own directory:
+
+```bash
+benchmark-${datetime}/
+  ${model}-${provider}[-tool-shim[-${shim-model}]]/
+    run-${i}/
+      ${an-include_dir-asset}
+      run-results-summary.json
+      core/developer/list_files/
+        ${an-include_dir-asset}
+        run-results-summary.json
+```
+
+1. `eval-results.json`: Contains detailed results from each evaluation, including:
+    - Individual test case results
+    - Model responses
+    - Scoring metrics
+    - Error logs
+
+2. `run-results-summary.json`: A collection of all eval results across all suites.
+
+### Debug Mode
+
+For detailed logging, you can enable debug mode:
+
+```bash
+RUST_LOG=debug goose bench bench-config.json
+```
+
+## Advanced Usage
+
+### Tool Shimming
+
+Tool shimming allows you to use a non-tool-capable models with Goose, provided Ollama is installed on the
+system.
+
+See this guide for important details on [tool shimming](/docs/guides/experimental-features).
--- a/documentation/docs/tutorials/goose-in-docker.md
+++ b/documentation/docs/tutorials/goose-in-docker.md
@@ -0,0 +1,51 @@
+---
+title: Building Goose in Docker
+sidebar_label: Goose in Docker
+---
+
+:::info Tell Us What You Need
+There are various scenarios where you might want to build Goose in Docker. If the instructions below do not meet your needs, please contact us by replying to our [discussion topic](https://github.com/block/goose/discussions/1496).
+:::
+
+
+You can build Goose from the source file within a Docker container. This approach not only provides security benefits by creating an isolated environment but also enhances consistency and portability. For example, if you need to troubleshoot an error on a platform you don't usually work with (such as Ubuntu), you can easily debug it using Docker.
+
+To begin, you will need to modify the `Dockerfile` and `docker-compose.yml` files to suit your requirements. Some changes you might consider include:
+
+- **Required:** Setting your API key, provider, and model in the `docker-compose.yml` file as environment variables because the keyring settings do not work on Ubuntu in Docker. This example uses the Google API key and its corresponding settings, but you can [find your own list of API keys](https://github.com/block/goose/blob/main/ui/desktop/src/components/settings/models/hardcoded_stuff.tsx) and the [corresponding settings](https://github.com/block/goose/blob/main/ui/desktop/src/components/settings/models/hardcoded_stuff.tsx).
+
+- **Optional:** Changing the base image to a different Linux distribution in the `Dockerfile`. This example uses Ubuntu, but you can switch to another distribution such as CentOS, Fedora, or Alpine.
+
+- **Optional:** Mounting your personal Goose settings and hints files in the `docker-compose.yml` file. This allows you to use your personal settings and hints files within the Docker container.
+
+ 
+
+After setting the credentials, you can build the Docker image using the following command:
+
+```bash
+docker-compose -f documentation/docs/docker/docker-compose.yml build
+```
+
+Next, run the container and connect to it using the following command:
+
+```bash
+docker-compose -f documentation/docs/docker/docker-compose.yml run --rm goose-cli
+```
+
+Inside the container, run the following command to configure Goose:
+
+```bash
+goose configure
+```
+
+When prompted to save the API key to the keyring, select `No`, as you are already passing the API key as an environment variable.
+
+Configure Goose a second time, and this time, you can [add any extensions](/docs/getting-started/using-extensions) you need.
+
+After that, you can start a session:
+
+```bash
+goose session
+```
+
+You should now be able to connect to Goose with your configured extensions enabled.