voicebox
jamiepine
A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.
What is voicebox?
A private, on-device voice toolkit for text-to-speech, cloning and dictation with agent integration. It bills itself as a free, open-source alternative to ElevenLabs.
Pros & Cons
Pros
- MIT code with mostly MIT/Apache model weights - genuinely OSI-open and fully local
- Covers both halves of the voice loop: TTS output and dictation/STT input, with native MCP integration for agents
- Broad hardware support (Apple Silicon MLX, CUDA, ROCm, DirectML, Intel Arc, CPU)
Cons
- Very young: repo created January 2026, v0.5.0, 433 open issues, several core features still on the roadmap
- Voice-cloning abuse risk with no consent framework - the homepage promotes non-consenting celebrity presets (Freeman, Johansson, Obama)
- Performance and privacy claims ('150x realtime on CPU', 'nothing leaves your device') are the project's own, unverified
License
MIT (OSI-open)
When it is interesting
Private, on-device TTS, cloning and dictation with agent integration.
When it is too early
Production use, or anywhere the cloning ethics and a four-month-old codebase are a concern.
Commercial alternative & related
- Commercial counterpart: ElevenLabs
This repo featured in the 2026-06 edition of the Open-Source AI Radar.
VoxCPM
OpenBMB
Tokenizer-free TTS from OpenBMB covering 30 languages with voice design and real-time streaming.
Chatterbox
resemble-ai
MIT-licensed open TTS with zero-shot voice cloning - 500M params, 23+ languages.
supertonic
supertone-inc
Fast on-device TTS via ONNX with 31-language support, running on CPU, browser and mobile.