OSI-openOpen voice and text-to-speech

voicebox

jamiepine

A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.

29.5k stars(as of 2026-06-07)View on GitHub

Overview

What is voicebox?

A private, on-device voice toolkit for text-to-speech, cloning and dictation with agent integration. It bills itself as a free, open-source alternative to ElevenLabs.

Analysis

Pros & Cons

Pros

MIT code with mostly MIT/Apache model weights - genuinely OSI-open and fully local
Covers both halves of the voice loop: TTS output and dictation/STT input, with native MCP integration for agents
Broad hardware support (Apple Silicon MLX, CUDA, ROCm, DirectML, Intel Arc, CPU)

Cons

Very young: repo created January 2026, v0.5.0, 433 open issues, several core features still on the roadmap
Voice-cloning abuse risk with no consent framework - the homepage promotes non-consenting celebrity presets (Freeman, Johansson, Obama)
Performance and privacy claims ('150x realtime on CPU', 'nothing leaves your device') are the project's own, unverified

License

MIT (OSI-open)