MOSS-TTS-Nano
OpenMOSS
0.1B multilingual TTS with zero-shot voice cloning that runs realtime on a CPU, fully open weights.
What is MOSS-TTS-Nano?
A 0.1B-parameter multilingual text-to-speech model (audio tokenizer plus a small LLM) doing zero-shot voice cloning across 20 languages including German, with native 48 kHz output. It is built for low-latency, CPU-only realtime synthesis and ships open weights, full inference code, an ONNX CPU build, an Android example and a browser-extension reader, from the OpenMOSS team (Fudan/SII).
Pros & Cons
Pros
- Genuinely tiny at 0.1B and CPU-runnable in realtime, no GPU needed
- Fully OSI-open Apache-2.0 on both code and weights, commercial-safe
- 20-language coverage plus ONNX, Android and browser deployment paths and released finetuning code
Cons
- 0.1B trades fidelity for size; the 8B MOSS-TTS flagship is the quality tier
- The README licence section still shows contradictory stale wording despite the Apache LICENSE file
- Very young (April 2026), so long-term maintenance and quality at scale are unproven
License
Apache-2.0 (OSI-open) - model license: Apache-2.0
Both the code and the model weights are Apache-2.0 (verified against the published LICENSE file and the Hugging Face card); the README's licence section still carries stale conditional wording that the Apache-2.0 LICENSE supersedes.
When it is interesting
On-device, offline, low-latency multilingual TTS and voice cloning on commodity CPUs (mobile, edge, browser).
When it is too early
If you need top-tier studio fidelity or production stability; the larger MOSS-TTS or a managed API fits better.
Commercial alternative & related
- Commercial counterpart: ElevenLabs
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
voicebox
jamiepine
A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.
VoxCPM
OpenBMB
Tokenizer-free TTS from OpenBMB covering 30 languages with voice design and real-time streaming.
Chatterbox
resemble-ai
MIT-licensed open TTS with zero-shot voice cloning - 500M params, 23+ languages.