Skip to main content
AI Tool Radar
OSI-openOpen voice and text-to-speech

voicebox

jamiepine

A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.

29.5k stars(as of 2026-06-07)View on GitHub

What is voicebox?

A private, on-device voice toolkit for text-to-speech, cloning and dictation with agent integration. It bills itself as a free, open-source alternative to ElevenLabs.

Pros & Cons

Pros

  • MIT code with mostly MIT/Apache model weights - genuinely OSI-open and fully local
  • Covers both halves of the voice loop: TTS output and dictation/STT input, with native MCP integration for agents
  • Broad hardware support (Apple Silicon MLX, CUDA, ROCm, DirectML, Intel Arc, CPU)

Cons

  • Very young: repo created January 2026, v0.5.0, 433 open issues, several core features still on the roadmap
  • Voice-cloning abuse risk with no consent framework - the homepage promotes non-consenting celebrity presets (Freeman, Johansson, Obama)
  • Performance and privacy claims ('150x realtime on CPU', 'nothing leaves your device') are the project's own, unverified

License

MIT (OSI-open)

When it is interesting

Private, on-device TTS, cloning and dictation with agent integration.

When it is too early

Production use, or anywhere the cloning ethics and a four-month-old codebase are a concern.

Commercial alternative & related

This repo featured in the 2026-06 edition of the Open-Source AI Radar.