docs: move topics to tutorials section (#3297)

This commit is contained in:
dianed-square
2025-07-08 07:36:21 -07:00
committed by GitHub
parent 08147d9bab
commit f02be09fbb
3 changed files with 4 additions and 5 deletions

View File

@@ -0,0 +1,199 @@
---
title: Benchmarking with Goose
sidebar_label: Benchmark with Goose
---
The Goose benchmarking system allows you to evaluate goose performance on complex tasks with one or more system
configurations.<br></br>
This guide covers how to use the `goose bench` command to run benchmarks and analyze results.
### Quick Start
1. The benchmarking system includes several evaluation suites.<br></br>
Run the following to see a listing of every valid selector:
```bash
goose bench selectors
```
2. Create a basic configuration file:
```bash
goose bench init-config -n bench-config.json
cat bench-config.json
{
"models": [
{
"provider": "databricks",
"name": "goose",
"parallel_safe": true
}
],
"evals": [
{
"selector": "core",
"parallel_safe": true
}
],
"repeat": 1
}
...etc.
```
2. Run the benchmark:
```bash
goose bench run -c bench-config.json
```
## Configuration File
The benchmark configuration is specified in a JSON file with the following structure:
```json
{
"models": [
{
"provider": "databricks",
"name": "goose",
"parallel_safe": true,
"tool_shim": {
"use_tool_shim": false,
"tool_shim_model": null
}
}
],
"evals": [
{
"selector": "core",
"post_process_cmd": null,
"parallel_safe": true
}
],
"include_dirs": [],
"repeat": 2,
"run_id": null,
"eval_result_filename": "eval-results.json",
"run_summary_filename": "run-results-summary.json",
"env_file": null
}
```
### Configuration Options
#### Models Section
Each model entry in the `models` array specifies:
- `provider`: The model provider (e.g., "databricks")
- `name`: Model identifier
- `parallel_safe`: Whether the model can be run in parallel
- `tool_shim`: Optional configuration for tool shimming
- `use_tool_shim`: Enable/disable tool shimming
- `tool_shim_model`: Optional model to use for tool shimming
#### Evals Section
Each evaluation entry in the `evals` array specifies:
- `selector`: The evaluation suite to run (e.g., "core")
- `post_process_cmd`: Optional path to a post-processing script
- `parallel_safe`: Whether the evaluation can run in parallel
#### General Options
- `include_dirs`: Additional directories to include in the evaluation
- `repeat`: Number of times to repeat each evaluation
- `run_id`: Optional identifier for the benchmark run
- `eval_result_filename`: Name of the evaluation results file
- `run_summary_filename`: Name of the summary results file
- `env_file`: Optional path to an environment file
##### Mechanics of include_dirs option
The `include_dirs` config parameter makes the items at all paths listed within the option, available to all
evaluations.<br></br>
It accomplishes this by:
* copying each included asset into the top-level directory created for each model/provider pair
* at evaluation run-time
* whichever assets is explicitly required by an evaluation gets copied into the eval-specific directory
* only if the evaluation-code specifically pulls it in
* and only if the evaluation actually is covered by one of the configured selectors and therefore runs
### Customizing Evaluations
You can customize runs in several ways:
1. Using Post-Processing Commands after evaluation:
```json
{
"evals": [
{
"selector": "core",
"post_process_cmd": "/path/to/process-script.sh",
"parallel_safe": true
}
]
}
```
2. Including Additional Data:
```json
{
"include_dirs": [
"/path/to/custom/eval/data"
]
}
```
3. Setting Environment Variables:
```json
{
"env_file": "/path/to/env-file"
}
```
## Output and Results
The benchmark generates two main output files within a file-hierarchy similar to the following.<br></br>
Results from running ach model/provider pair are stored within their own directory:
```bash
benchmark-${datetime}/
${model}-${provider}[-tool-shim[-${shim-model}]]/
run-${i}/
${an-include_dir-asset}
run-results-summary.json
core/developer/list_files/
${an-include_dir-asset}
run-results-summary.json
```
1. `eval-results.json`: Contains detailed results from each evaluation, including:
- Individual test case results
- Model responses
- Scoring metrics
- Error logs
2. `run-results-summary.json`: A collection of all eval results across all suites.
### Debug Mode
For detailed logging, you can enable debug mode:
```bash
RUST_LOG=debug goose bench bench-config.json
```
## Advanced Usage
### Tool Shimming
Tool shimming allows you to use a non-tool-capable models with Goose, provided Ollama is installed on the
system.
See this guide for important details on [tool shimming](/docs/guides/experimental-features).

View File

@@ -0,0 +1,51 @@
---
title: Building Goose in Docker
sidebar_label: Goose in Docker
---
:::info Tell Us What You Need
There are various scenarios where you might want to build Goose in Docker. If the instructions below do not meet your needs, please contact us by replying to our [discussion topic](https://github.com/block/goose/discussions/1496).
:::
You can build Goose from the source file within a Docker container. This approach not only provides security benefits by creating an isolated environment but also enhances consistency and portability. For example, if you need to troubleshoot an error on a platform you don't usually work with (such as Ubuntu), you can easily debug it using Docker.
To begin, you will need to modify the `Dockerfile` and `docker-compose.yml` files to suit your requirements. Some changes you might consider include:
- **Required:** Setting your API key, provider, and model in the `docker-compose.yml` file as environment variables because the keyring settings do not work on Ubuntu in Docker. This example uses the Google API key and its corresponding settings, but you can [find your own list of API keys](https://github.com/block/goose/blob/main/ui/desktop/src/components/settings/models/hardcoded_stuff.tsx) and the [corresponding settings](https://github.com/block/goose/blob/main/ui/desktop/src/components/settings/models/hardcoded_stuff.tsx).
- **Optional:** Changing the base image to a different Linux distribution in the `Dockerfile`. This example uses Ubuntu, but you can switch to another distribution such as CentOS, Fedora, or Alpine.
- **Optional:** Mounting your personal Goose settings and hints files in the `docker-compose.yml` file. This allows you to use your personal settings and hints files within the Docker container.
After setting the credentials, you can build the Docker image using the following command:
```bash
docker-compose -f documentation/docs/docker/docker-compose.yml build
```
Next, run the container and connect to it using the following command:
```bash
docker-compose -f documentation/docs/docker/docker-compose.yml run --rm goose-cli
```
Inside the container, run the following command to configure Goose:
```bash
goose configure
```
When prompted to save the API key to the keyring, select `No`, as you are already passing the API key as an environment variable.
Configure Goose a second time, and this time, you can [add any extensions](/docs/getting-started/using-extensions) you need.
After that, you can start a session:
```bash
goose session
```
You should now be able to connect to Goose with your configured extensions enabled.