mirror of
https://github.com/aljazceru/goose.git
synced 2025-12-17 14:14:26 +01:00
docs: move topics to tutorials section (#3297)
This commit is contained in:
199
documentation/docs/tutorials/benchmarking.md
Normal file
199
documentation/docs/tutorials/benchmarking.md
Normal file
@@ -0,0 +1,199 @@
|
||||
---
|
||||
title: Benchmarking with Goose
|
||||
sidebar_label: Benchmark with Goose
|
||||
---
|
||||
|
||||
The Goose benchmarking system allows you to evaluate goose performance on complex tasks with one or more system
|
||||
configurations.<br></br>
|
||||
This guide covers how to use the `goose bench` command to run benchmarks and analyze results.
|
||||
|
||||
### Quick Start
|
||||
|
||||
1. The benchmarking system includes several evaluation suites.<br></br>
|
||||
Run the following to see a listing of every valid selector:
|
||||
|
||||
```bash
|
||||
goose bench selectors
|
||||
```
|
||||
|
||||
2. Create a basic configuration file:
|
||||
|
||||
```bash
|
||||
goose bench init-config -n bench-config.json
|
||||
cat bench-config.json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"provider": "databricks",
|
||||
"name": "goose",
|
||||
"parallel_safe": true
|
||||
}
|
||||
],
|
||||
"evals": [
|
||||
{
|
||||
"selector": "core",
|
||||
"parallel_safe": true
|
||||
}
|
||||
],
|
||||
"repeat": 1
|
||||
}
|
||||
...etc.
|
||||
```
|
||||
|
||||
2. Run the benchmark:
|
||||
|
||||
```bash
|
||||
goose bench run -c bench-config.json
|
||||
```
|
||||
|
||||
## Configuration File
|
||||
|
||||
The benchmark configuration is specified in a JSON file with the following structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"provider": "databricks",
|
||||
"name": "goose",
|
||||
"parallel_safe": true,
|
||||
"tool_shim": {
|
||||
"use_tool_shim": false,
|
||||
"tool_shim_model": null
|
||||
}
|
||||
}
|
||||
],
|
||||
"evals": [
|
||||
{
|
||||
"selector": "core",
|
||||
"post_process_cmd": null,
|
||||
"parallel_safe": true
|
||||
}
|
||||
],
|
||||
"include_dirs": [],
|
||||
"repeat": 2,
|
||||
"run_id": null,
|
||||
"eval_result_filename": "eval-results.json",
|
||||
"run_summary_filename": "run-results-summary.json",
|
||||
"env_file": null
|
||||
}
|
||||
```
|
||||
|
||||
### Configuration Options
|
||||
|
||||
#### Models Section
|
||||
|
||||
Each model entry in the `models` array specifies:
|
||||
|
||||
- `provider`: The model provider (e.g., "databricks")
|
||||
- `name`: Model identifier
|
||||
- `parallel_safe`: Whether the model can be run in parallel
|
||||
- `tool_shim`: Optional configuration for tool shimming
|
||||
- `use_tool_shim`: Enable/disable tool shimming
|
||||
- `tool_shim_model`: Optional model to use for tool shimming
|
||||
|
||||
#### Evals Section
|
||||
|
||||
Each evaluation entry in the `evals` array specifies:
|
||||
|
||||
- `selector`: The evaluation suite to run (e.g., "core")
|
||||
- `post_process_cmd`: Optional path to a post-processing script
|
||||
- `parallel_safe`: Whether the evaluation can run in parallel
|
||||
|
||||
#### General Options
|
||||
|
||||
- `include_dirs`: Additional directories to include in the evaluation
|
||||
- `repeat`: Number of times to repeat each evaluation
|
||||
- `run_id`: Optional identifier for the benchmark run
|
||||
- `eval_result_filename`: Name of the evaluation results file
|
||||
- `run_summary_filename`: Name of the summary results file
|
||||
- `env_file`: Optional path to an environment file
|
||||
|
||||
##### Mechanics of include_dirs option
|
||||
|
||||
The `include_dirs` config parameter makes the items at all paths listed within the option, available to all
|
||||
evaluations.<br></br>
|
||||
It accomplishes this by:
|
||||
|
||||
* copying each included asset into the top-level directory created for each model/provider pair
|
||||
* at evaluation run-time
|
||||
* whichever assets is explicitly required by an evaluation gets copied into the eval-specific directory
|
||||
* only if the evaluation-code specifically pulls it in
|
||||
* and only if the evaluation actually is covered by one of the configured selectors and therefore runs
|
||||
|
||||
### Customizing Evaluations
|
||||
|
||||
You can customize runs in several ways:
|
||||
|
||||
1. Using Post-Processing Commands after evaluation:
|
||||
|
||||
```json
|
||||
{
|
||||
"evals": [
|
||||
{
|
||||
"selector": "core",
|
||||
"post_process_cmd": "/path/to/process-script.sh",
|
||||
"parallel_safe": true
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
2. Including Additional Data:
|
||||
|
||||
```json
|
||||
{
|
||||
"include_dirs": [
|
||||
"/path/to/custom/eval/data"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
3. Setting Environment Variables:
|
||||
|
||||
```json
|
||||
{
|
||||
"env_file": "/path/to/env-file"
|
||||
}
|
||||
```
|
||||
|
||||
## Output and Results
|
||||
|
||||
The benchmark generates two main output files within a file-hierarchy similar to the following.<br></br>
|
||||
Results from running ach model/provider pair are stored within their own directory:
|
||||
|
||||
```bash
|
||||
benchmark-${datetime}/
|
||||
${model}-${provider}[-tool-shim[-${shim-model}]]/
|
||||
run-${i}/
|
||||
${an-include_dir-asset}
|
||||
run-results-summary.json
|
||||
core/developer/list_files/
|
||||
${an-include_dir-asset}
|
||||
run-results-summary.json
|
||||
```
|
||||
|
||||
1. `eval-results.json`: Contains detailed results from each evaluation, including:
|
||||
- Individual test case results
|
||||
- Model responses
|
||||
- Scoring metrics
|
||||
- Error logs
|
||||
|
||||
2. `run-results-summary.json`: A collection of all eval results across all suites.
|
||||
|
||||
### Debug Mode
|
||||
|
||||
For detailed logging, you can enable debug mode:
|
||||
|
||||
```bash
|
||||
RUST_LOG=debug goose bench bench-config.json
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Tool Shimming
|
||||
|
||||
Tool shimming allows you to use a non-tool-capable models with Goose, provided Ollama is installed on the
|
||||
system.
|
||||
|
||||
See this guide for important details on [tool shimming](/docs/guides/experimental-features).
|
||||
51
documentation/docs/tutorials/goose-in-docker.md
Normal file
51
documentation/docs/tutorials/goose-in-docker.md
Normal file
@@ -0,0 +1,51 @@
|
||||
---
|
||||
title: Building Goose in Docker
|
||||
sidebar_label: Goose in Docker
|
||||
---
|
||||
|
||||
:::info Tell Us What You Need
|
||||
There are various scenarios where you might want to build Goose in Docker. If the instructions below do not meet your needs, please contact us by replying to our [discussion topic](https://github.com/block/goose/discussions/1496).
|
||||
:::
|
||||
|
||||
|
||||
You can build Goose from the source file within a Docker container. This approach not only provides security benefits by creating an isolated environment but also enhances consistency and portability. For example, if you need to troubleshoot an error on a platform you don't usually work with (such as Ubuntu), you can easily debug it using Docker.
|
||||
|
||||
To begin, you will need to modify the `Dockerfile` and `docker-compose.yml` files to suit your requirements. Some changes you might consider include:
|
||||
|
||||
- **Required:** Setting your API key, provider, and model in the `docker-compose.yml` file as environment variables because the keyring settings do not work on Ubuntu in Docker. This example uses the Google API key and its corresponding settings, but you can [find your own list of API keys](https://github.com/block/goose/blob/main/ui/desktop/src/components/settings/models/hardcoded_stuff.tsx) and the [corresponding settings](https://github.com/block/goose/blob/main/ui/desktop/src/components/settings/models/hardcoded_stuff.tsx).
|
||||
|
||||
- **Optional:** Changing the base image to a different Linux distribution in the `Dockerfile`. This example uses Ubuntu, but you can switch to another distribution such as CentOS, Fedora, or Alpine.
|
||||
|
||||
- **Optional:** Mounting your personal Goose settings and hints files in the `docker-compose.yml` file. This allows you to use your personal settings and hints files within the Docker container.
|
||||
|
||||
|
||||
|
||||
After setting the credentials, you can build the Docker image using the following command:
|
||||
|
||||
```bash
|
||||
docker-compose -f documentation/docs/docker/docker-compose.yml build
|
||||
```
|
||||
|
||||
Next, run the container and connect to it using the following command:
|
||||
|
||||
```bash
|
||||
docker-compose -f documentation/docs/docker/docker-compose.yml run --rm goose-cli
|
||||
```
|
||||
|
||||
Inside the container, run the following command to configure Goose:
|
||||
|
||||
```bash
|
||||
goose configure
|
||||
```
|
||||
|
||||
When prompted to save the API key to the keyring, select `No`, as you are already passing the API key as an environment variable.
|
||||
|
||||
Configure Goose a second time, and this time, you can [add any extensions](/docs/getting-started/using-extensions) you need.
|
||||
|
||||
After that, you can start a session:
|
||||
|
||||
```bash
|
||||
goose session
|
||||
```
|
||||
|
||||
You should now be able to connect to Goose with your configured extensions enabled.
|
||||
Reference in New Issue
Block a user