Local models (llamacpp)¶
The llamacpp component connects to any server that exposes an OpenAI-compatible HTTP API. This includes:
Configuration¶
components:
- name: local
type: llamacpp
metadata:
base_url: http://localhost:11434/v1 # required
default_model: llama3.2:3b # required
api_key: local # optional; most local servers ignore it
defaults:
temperature: 0.7
max_tokens: 2048
system: "You are a helpful assistant."
Metadata reference¶
| Key | Required | Description |
|---|---|---|
base_url |
yes | Full base URL including the /v1 path segment |
default_model |
yes | Model name as recognised by your server |
api_key |
no | Bearer token. Defaults to "local" (most local servers ignore it). |
Default base URLs¶
| Server | Default base URL |
|---|---|
| Ollama | http://localhost:11434/v1 |
| LM Studio | http://localhost:1234/v1 |
| llama.cpp | http://localhost:8080/v1 |
Supported parameters¶
| Parameter | Supported |
|---|---|
temperature |
✓ |
max_tokens |
✓ |
top_p |
✓ |
top_k |
— (ignored) |
stop |
✓ |
frequency_penalty |
✓ |
presence_penalty |
✓ |
seed |
✓ |
system |
✓ |
tools |
✓ (model-dependent) |
Tool call support
Tool calls require a model that was fine-tuned for function calling. Small models (< 1B parameters) typically do not support them reliably. Models like llama3.2:3b, qwen2.5:7b, or mistral:7b work well.
Setup guides¶
-
Install Ollama for your OS.
-
Pull a model:
-
Ollama starts automatically (or run
ollama serve). Configure daimon:
- Download and install LM Studio.
- Download a model from the Discover tab.
- Go to Local Server, select your model, and click Start Server.
- Configure daimon:
- Build the llama.cpp server.
- Download a GGUF model file.
- Start the server:
- Configure daimon: