Parlor
fikrikarim
On-device, real-time voice and vision AI - powered by Gemma and Kokoro, no cloud.
What is Parlor?
Parlor is a local assistant combining a multimodal Gemma model with Kokoro TTS for real-time voice-and-camera conversations with no cloud dependency. It runs on Apple Silicon (MLX) or Linux GPU, uses Silero VAD for hands-free use, supports barge-in, and streams TTS at the sentence level.
Pros & Cons
Pros
- Truly on-device - voice, vision and LLM all local, strong privacy story
- Barge-in and sentence-level streaming give a natural conversational feel
- Apache-2.0 throughout, actively maintained
Cons
- English-only and Apple Silicon / Linux GPU only - no Windows or CPU path
- Thin layer over Gemma + Kokoro - voice quality bound by Kokoro
- Alpha-stage solo project with no versioned releases
License
Apache-2.0 (OSI-open)
When it is interesting
You want a privacy-first, fully local voice assistant with camera awareness and zero API keys, especially on Apple Silicon.
When it is too early
You need multilingual support, a stable SDK, or production reliability.
Commercial alternative & related
- Commercial counterpart: ElevenLabs
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
voicebox
jamiepine
A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.
VoxCPM
OpenBMB
Tokenizer-free TTS from OpenBMB covering 30 languages with voice design and real-time streaming.
Chatterbox
resemble-ai
MIT-licensed open TTS with zero-shot voice cloning - 500M params, 23+ languages.