OSI-openOpen voice and text-to-speech

MOSS-TTS-Nano

OpenMOSS

0.1B multilingual TTS with zero-shot voice cloning that runs realtime on a CPU, fully open weights.

3.8k stars(as of 2026-06-26)View on GitHub Homepage

Overview

What is MOSS-TTS-Nano?

A 0.1B-parameter multilingual text-to-speech model (audio tokenizer plus a small LLM) doing zero-shot voice cloning across 20 languages including German, with native 48 kHz output. It is built for low-latency, CPU-only realtime synthesis and ships open weights, full inference code, an ONNX CPU build, an Android example and a browser-extension reader, from the OpenMOSS team (Fudan/SII).

Analysis

Pros & Cons

Pros

Genuinely tiny at 0.1B and CPU-runnable in realtime, no GPU needed
Fully OSI-open Apache-2.0 on both code and weights, commercial-safe
20-language coverage plus ONNX, Android and browser deployment paths and released finetuning code

Cons

0.1B trades fidelity for size; the 8B MOSS-TTS flagship is the quality tier
The README licence section still shows contradictory stale wording despite the Apache LICENSE file
Very young (April 2026), so long-term maintenance and quality at scale are unproven

License

Apache-2.0 (OSI-open) - model license: Apache-2.0

Both the code and the model weights are Apache-2.0 (verified against the published LICENSE file and the Hugging Face card); the README's licence section still carries stale conditional wording that the Apache-2.0 LICENSE supersedes.

When it is interesting

On-device, offline, low-latency multilingual TTS and voice cloning on commodity CPUs (mobile, edge, browser).

When it is too early

If you need top-tier studio fidelity or production stability; the larger MOSS-TTS or a managed API fits better.

Context

Commercial alternative & related

Commercial counterpart: ElevenLabs

This repo featured in the 2026-07 edition of the Open-Source AI Radar.

Similar repositories

voicebox

jamiepine

29.5k

A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.

OSI-openOpen voice and text-to-speech

VoxCPM

OpenBMB

26.1k

Tokenizer-free TTS from OpenBMB covering 30 languages with voice design and real-time streaming.

OSI-openOpen voice and text-to-speech

Chatterbox

resemble-ai

25.1k

MIT-licensed open TTS with zero-shot voice cloning - 500M params, 23+ languages.

OSI-openOpen voice and text-to-speech