mirror of
https://github.com/aljazceru/goose.git
synced 2025-12-18 14:44:21 +01:00
docs: local LLMs context size tip (#3454)
Signed-off-by: jjjuk <gmodhl67@gmail.com> Co-authored-by: angiejones <jones.angie@gmail.com>
This commit is contained in:
@@ -306,7 +306,7 @@ Ollama and Ramalama are both options to provide local LLMs, each which requires
|
||||
2. Run any [model supporting tool-calling](https://ollama.com/search?c=tools):
|
||||
|
||||
:::warning Limited Support for models without tool calling
|
||||
Goose extensively uses tool calling, so models without it (e.g. `DeepSeek-r1`) can only do chat completion. If using models without tool calling, all Goose [extensions must be disabled](/docs/getting-started/using-extensions#enablingdisabling-extensions). As an alternative, you can use a [custom DeepSeek-r1 model](/docs/getting-started/providers#deepseek-r1) we've made specifically for Goose.
|
||||
Goose extensively uses tool calling, so models without it can only do chat completion. If using models without tool calling, all Goose [extensions must be disabled](/docs/getting-started/using-extensions#enablingdisabling-extensions).
|
||||
:::
|
||||
|
||||
Example:
|
||||
@@ -397,20 +397,24 @@ If you're running Ollama on a different server, you'll have to set `OLLAMA_HOST=
|
||||
└ Configuration saved successfully
|
||||
```
|
||||
|
||||
:::tip Context Length
|
||||
If you notice that Goose is having trouble using extensions or is ignoring [.goosehints](/docs/guides/using-goosehints), it is likely that the model's default context length of 4096 tokens is too low. Set the `OLLAMA_CONTEXT_LENGTH` environment variable to a [higher value](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size).
|
||||
:::
|
||||
|
||||
#### Ramalama
|
||||
|
||||
1. [Download Ramalama](https://github.com/containers/ramalama?tab=readme-ov-file#install).
|
||||
2. Run any Ollama [model supporting tool-calling](https://ollama.com/search?c=tools) or [GGUF format HuggingFace Model](https://huggingface.co/search/full-text?q=%22tools+support%22+%2B+%22gguf%22&type=model) :
|
||||
|
||||
:::warning Limited Support for models without tool calling
|
||||
Goose extensively uses tool calling, so models without it (e.g. `DeepSeek-r1`) can only do chat completion. If using models without tool calling, all Goose [extensions must be disabled](/docs/getting-started/using-extensions#enablingdisabling-extensions). As an alternative, you can use a [custom DeepSeek-r1 model](/docs/getting-started/providers#deepseek-r1) we've made specifically for Goose.
|
||||
Goose extensively uses tool calling, so models without it can only do chat completion. If using models without tool calling, all Goose [extensions must be disabled](/docs/getting-started/using-extensions#enablingdisabling-extensions).
|
||||
:::
|
||||
|
||||
Example:
|
||||
|
||||
```sh
|
||||
# NOTE: the --runtime-args="--jinja" flag is required for Ramalama to work with the Goose Ollama provider.
|
||||
ramalama serve --runtime-args="--jinja" ollama://qwen2.5
|
||||
ramalama serve --runtime-args="--jinja" --ctx-size=8192 ollama://qwen2.5
|
||||
```
|
||||
|
||||
3. In a separate terminal window, configure with Goose:
|
||||
@@ -493,6 +497,11 @@ For the Ollama provider, if you don't provide a host, we set it to `localhost:11
|
||||
└ Configuration saved successfully
|
||||
```
|
||||
|
||||
:::tip Context Length
|
||||
If you notice that Goose is having trouble using extensions or is ignoring [.goosehints](/docs/guides/using-goosehints), it is likely that the model's default context length of 2048 tokens is too low. Use `ramalama serve` to set the `--ctx-size, -c` option to a [higher value](https://github.com/containers/ramalama/blob/main/docs/ramalama-serve.1.md#--ctx-size--c).
|
||||
:::
|
||||
|
||||
|
||||
### DeepSeek-R1
|
||||
|
||||
Ollama provides open source LLMs, such as `DeepSeek-r1`, that you can install and run locally.
|
||||
|
||||
Reference in New Issue
Block a user