VieNeu-TTS
pnnbao97
Open Vietnamese (plus English) TTS with instant voice cloning, trained from scratch and runnable on CPU.
What is VieNeu-TTS?
A Vietnamese, with English code-switching, text-to-speech system with instant zero-shot voice cloning from a few seconds of reference audio. The current v3 Turbo is a 0.1B model trained from scratch on around 10,000 hours of Vietnamese-English speech, outputs 48 kHz and uses the MOSS-Audio-Tokenizer-Nano codec, with a Python package, CPU (ONNX/GGUF) and GPU paths and Docker serving.
Pros & Cons
Pros
- Fills a genuine niche: dedicated, open, voice-cloning Vietnamese TTS with code-switching, poorly covered by mainstream open models
- Apache-2.0 on code and weights, commercial-safe with no inherited restrictions
- A real on-device CPU path (torch-free ONNX/GGUF) plus pip and Docker tooling
Cons
- The flagship v3 Turbo is 'early access', not a final release, so the strongest claims sit on a pre-stable build
- Language scope is narrow (Vietnamese and English only)
- Largely single-maintainer; long-term support and the from-scratch training claims rest on the author's statements
License
Apache-2.0 (OSI-open) - model license: Apache-2.0
Both the code and the model weights are Apache-2.0; v3 Turbo is trained from scratch, so there is no inherited base-model restriction.
When it is interesting
You specifically need offline, commercially licensed Vietnamese voice cloning or Vietnamese-English code-switching on CPU or a modest GPU.
When it is too early
If you need a frozen, production-stable release today; use the stable v1/v2 rather than v3 Turbo early access.
Commercial alternative & related
- Commercial counterpart: ElevenLabs
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
voicebox
jamiepine
A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.
VoxCPM
OpenBMB
Tokenizer-free TTS from OpenBMB covering 30 languages with voice design and real-time streaming.
Chatterbox
resemble-ai
MIT-licensed open TTS with zero-shot voice cloning - 500M params, 23+ languages.