Quick Start¶
Get daimon running and make your first streaming request in under five minutes.
1. Install¶
Download from the latest release.
2. Set up a model¶
Export your API key, then save a config.yaml:
Start Ollama in Docker and pull a model:
docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull qwen2.5:1.5b
Save a config.yaml pointing at it:
config.yaml
port: 3500
components:
- name: local
type: llamacpp
metadata:
base_url: http://localhost:11434/v1
default_model: qwen2.5:1.5b
Tip
Swap qwen2.5:1.5b for any model on ollama.com/library. Larger models are slower but more capable.
3. Start daimon¶
4. Make a request¶
Note
Examples below use claude. If you used the Docker setup, replace it with local.
Next steps¶
-
Configuration
Components, inference defaults, MCP servers, telemetry — all in one YAML.
-
Python SDK
Multi-turn conversations, sessions, tool calls, async.
-
TypeScript SDK
Native fetch, async generators, full type safety.
-
Tool Calls (MCP)
Wire up filesystem, GitHub, search, and custom tools.