docs: local LLMs context size tip (#3454)

Signed-off-by: jjjuk <gmodhl67@gmail.com> Co-authored-by: angiejones <jones.angie@gmail.com>
2025-12-18 14:44:21 +01:00 · 2025-07-24 04:41:08 +03:00
parent d5c835536d
commit a52e11afc1
1 changed files with 12 additions and 3 deletions
--- a/documentation/docs/getting-started/providers.md
+++ b/documentation/docs/getting-started/providers.md
@@ -306,7 +306,7 @@ Ollama and Ramalama are both options to provide local LLMs, each which requires
 2. Run any [model supporting tool-calling](https://ollama.com/search?c=tools):

 :::warning Limited Support for models without tool calling
-Goose extensively uses tool calling, so models without it (e.g. `DeepSeek-r1`) can only do chat completion. If using models without tool calling, all Goose [extensions must be disabled](/docs/getting-started/using-extensions#enablingdisabling-extensions). As an alternative, you can use a [custom DeepSeek-r1 model](/docs/getting-started/providers#deepseek-r1) we've made specifically for Goose.
+Goose extensively uses tool calling, so models without it can only do chat completion. If using models without tool calling, all Goose [extensions must be disabled](/docs/getting-started/using-extensions#enablingdisabling-extensions).
 :::

 Example:
@@ -397,20 +397,24 @@ If you're running Ollama on a different server, you'll have to set `OLLAMA_HOST=
 └  Configuration saved successfully
 ```

+:::tip Context Length
+If you notice that Goose is having trouble using extensions or is ignoring [.goosehints](/docs/guides/using-goosehints), it is likely that the model's default context length of 4096 tokens is too low. Set the `OLLAMA_CONTEXT_LENGTH` environment variable to a [higher value](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size). 
+:::
+
 #### Ramalama

 1. [Download Ramalama](https://github.com/containers/ramalama?tab=readme-ov-file#install).
 2. Run any Ollama [model supporting tool-calling](https://ollama.com/search?c=tools) or [GGUF format HuggingFace Model](https://huggingface.co/search/full-text?q=%22tools+support%22+%2B+%22gguf%22&type=model) :

 :::warning Limited Support for models without tool calling
-Goose extensively uses tool calling, so models without it (e.g. `DeepSeek-r1`) can only do chat completion. If using models without tool calling, all Goose [extensions must be disabled](/docs/getting-started/using-extensions#enablingdisabling-extensions). As an alternative, you can use a [custom DeepSeek-r1 model](/docs/getting-started/providers#deepseek-r1) we've made specifically for Goose.
+Goose extensively uses tool calling, so models without it can only do chat completion. If using models without tool calling, all Goose [extensions must be disabled](/docs/getting-started/using-extensions#enablingdisabling-extensions).
 :::

 Example:

 ```sh
 # NOTE: the --runtime-args="--jinja" flag is required for Ramalama to work with the Goose Ollama provider.
-ramalama serve --runtime-args="--jinja" ollama://qwen2.5
+ramalama serve --runtime-args="--jinja" --ctx-size=8192 ollama://qwen2.5
 ```

 3. In a separate terminal window, configure with Goose:
@@ -493,6 +497,11 @@ For the Ollama provider, if you don't provide a host, we set it to `localhost:11
 └  Configuration saved successfully
 ```

+:::tip Context Length
+If you notice that Goose is having trouble using extensions or is ignoring [.goosehints](/docs/guides/using-goosehints), it is likely that the model's default context length of 2048 tokens is too low. Use `ramalama serve` to set the `--ctx-size, -c` option to a [higher value](https://github.com/containers/ramalama/blob/main/docs/ramalama-serve.1.md#--ctx-size--c). 
+:::
+
+
 ### DeepSeek-R1

 Ollama provides open source LLMs, such as `DeepSeek-r1`, that you can install and run locally.