Mic → faster-whisper (local STT) → Any LLM (Ollama / OpenAI / OpenRouter) → TTS → Robot Speaker + Emotions
Run Qwen3 32B, Llama 3.1, Gemma, or any of 300+ models through a single app. Switch between local Ollama, OpenAI, and OpenRouter with one API call.
curl -X POST http://localhost:8042/config \
-d '{"provider":"ollama","model":"qwen3:32b"}'
Powered by the same universal model routing from AnyModel.
Everything runs on your machine. No audio leaves your network. No API keys needed for local models.
Switch between Qwen3, Llama, Gemma, GPT-4o, or any OpenRouter model via the web UI at :8042.
Metal GPU acceleration. Qwen3 32B at ~30 tok/s on M4 Max. Sub-3s responses with Llama 3.1 8B.
Say "Reachy" to activate. Ignores background noise, TV, music, and ambient speech.
The LLM triggers expressive movements — happy, curious, surprised, thinking — automatically.
Ask about news, weather, facts — the robot searches the internet and speaks results back.
| Model | Provider | Latency | Quality | Cost |
|---|---|---|---|---|
| Qwen3 32B | Ollama (local) | ~5s* | Excellent | Free |
| Llama 3.1 8B | Ollama (local) | ~3s | Good | Free |
| GPT-4o Mini | OpenAI | ~2s | Great | $0.15/1M tok |
| Llama 3.1 70B | OpenRouter | ~2s | Excellent | $0.06/1M tok |
| Qwen3 Coder 30B | Ollama (local) | ~5s* | Great | Free |
| Gemma 3 27B | Ollama (local) | ~4s | Great | Free |
* With num_ctx: 4096 optimization. Benchmarked on Apple M4 Max 128GB. Pull any model: ollama pull MODEL
brew install ollama && ollama pull llama3.1:8bFind reachy_local_voice in the marketplace and click Install. Or install via the daemon API.
Click Start. Say "Reachy, hello!" — it runs entirely on your machine.
OLLAMA_MODEL=qwen3:32b # Any Ollama model
LLM_PROVIDER=ollama # ollama | openai | openrouter
OLLAMA_URL=http://localhost:11434
WHISPER_MODEL=base # tiny | base | small | medium | large-v3
MACOS_VOICE=Samantha # macOS TTS voice name
WAKE_WORD=reachy # Activation word
ENABLE_WAKE_WORD=true # true | false
SILENCE_THRESHOLD=0.015 # Mic sensitivity