I built a sovereign voice layer that routes to 11 AI providers

After two years of bouncing between Claude desktop, ChatGPT voice, Gemini, and a half-dozen Ollama frontends, I got tired of the wake-word thrash. Every assistant assumes you’ve picked their team forever.

So I built BRAGI — a voice layer that runs locally, listens locally, and routes to whichever AI I tell it to. Including the one running on the same machine.

This post is the architecture, not a sales pitch. If you’ve been thinking about building something similar, here’s what I learned shipping v0.2.

The pipeline

Mic input
↓
openwakeword (local) — “Hey Jarvis”
↓
faster-whisper medium (local, GPU optional)
↓
Provider router (settings UI picks destination)
↓
[Cloud: Claude / OpenAI / Gemini / Grok / Groq / Together / HuggingFace]
[Local: Ollama / LM Studio / FREYA / Echo]
↓
TTS (eSpeak free, OpenAI Nova BYOK)
↓
Speaker output
Audio never leaves the machine. Only transcribed text goes to whichever cloud you picked, if any.

Wake word

openwakeword is the right call for a sovereign product. Picovoice is better quality but locks you into a paid commercial license. openwakeword is Apache 2.0 and runs on CPU.

The catch: training your own custom model requires matching the feature dimensions to whichever preprocessor version you’re targeting. I burned half a day on a model that had 96×103 features when openwakeword expected 32×147. v0.2 ships with the stock “Hey Jarvis” model and includes the custom “Hey BRAGI” model for users with compatible hardware.

STT

faster-whisper medium on CUDA is the sweet spot. Tiny is too inaccurate for real conversation, large is overkill for short voice commands. Medium gets ~1 second latency on a midrange GPU and handles bilingual input out of the box.

Critical detail: instantiate Whisper once at startup, never per-request. First inference call takes 5-10 seconds to warm CUDA. Users won’t tolerate that on every wake.

The router

This was the hardest part. Each provider has a different SDK, different streaming format, different auth pattern. The router abstracts that into one interface:

class Provider(Protocol):
    def name(self) -> str: ...
    def is_ready(self) -> bool: ...
    async def respond(self, prompt: str, history: list[Message]) -> AsyncIterator[str]: ...

Each provider implementation handles its own SDK quirks. The router just picks one based on user settings or voice command (“BRAGI, switch to Claude”) and calls respond().

For local models I support both Ollama (HTTP API) and LM Studio (OpenAI-compatible HTTP API). Both run on the user’s machine. Both look identical to the router.

TTS

eSpeak ships with the installer because it’s free, offline, and 100+ languages. It sounds robotic. That’s fine. People who want premium voice can paste an OpenAI API key and use Nova.

I tried Kokoro for higher-quality offline TTS. Worked great in dev. Production builds kept hitting a 404 on the default voice file in HuggingFace. Shipped with eSpeak as the default and Kokoro as best-effort.

The settings UI

Local web UI on http://127.0.0.1:7777. Configure providers, paste API keys, pick voices, manage license. Page lives on the user’s machine. No account, no login, no cloud dashboard.

API keys live in a local vault. They never leave the machine. The product is sovereignty — that has to be true at every layer.

Stack

Python 3.11
openwakeword for wake detection
faster-whisper for STT
eSpeak / OpenAI Nova for TTS
FastAPI for the local settings server
pythonw.exe in tray mode for daily use
PyInstaller for bundling
NSIS for the Windows installer
~169MB installer, Win10/11

What I’d do differently

Custom wake word training is harder than the docs admit. openwakeword’s preprocessor is versioned and the feature dims have to match exactly. Document this for users who want to train their own.
PyInstaller + 4GB CUDA torch builds blow past NSIS’s 2GB single-file limit. I had to move torch + Kokoro to a first-run download instead of bundling them.
Don’t trust the embedded Python’s python311._pth defaults. User-site contamination from %APPDATA%RoamingPython will silently break your install. Always launch with -s -E flags.

What’s next

v0.3 will likely add: better Kokoro fallback, custom wake word training UI, multi-room concurrency. The architecture supports it — I just need to ship v0.2 first and see what users actually ask for.

If you want to see it: clintwave84.gumroad.com/l/leetkd

If you’ve built something similar and want to compare notes — drop a comment. Especially curious how others have handled the provider abstraction across cloud + local.

— Built by one guy in Idaho. Snake River AI.

I built a sovereign voice layer that routes to 11 AI providers — here’s the architecture

The pipeline

Wake word

STT

The router

TTS

The settings UI

Stack

What I’d do differently

What’s next

Search

Quads Text

Recent Posts

Archives

Meta