Higgs Audio
boson-ai
Text-audio foundation model - conversational TTS in 100+ languages with zero-shot cloning, 4B params.
What is Higgs Audio?
Higgs Audio is a text-audio foundation model family from Boson AI. v3 is a 4B-parameter conversational TTS model covering 100+ languages with zero-shot voice cloning, inline emotion/style/prosody control and an OpenAI-compatible streaming API. Self-hosting is via SGLang-Omni.
Pros & Cons
Pros
- 100+ languages with zero-shot cloning and inline prosody control in one 4B model
- Pretrained on 10M+ hours of audio (project's own claim) - a large open-weight corpus
- OpenAI-compatible streaming API eases drop-in integration
Cons
- Weights are non-commercial - commercial self-hosting needs a paid agreement
- 4B params plus SGLang-Omni adds meaningful infra overhead
- Research-licensed weights limit production open-source appeal
License
Apache-2.0 (code) (Open weight, with conditions) - model license: Boson Higgs Audio v3 Research and Non-Commercial License
Code is Apache-2.0, but the v3 model weights are under a Research and Non-Commercial License - production/revenue-generating deployments require a separate commercial agreement with Boson AI.
When it is interesting
Research or non-commercial products needing the broadest multilingual coverage and richest prosody control in open weights.
When it is too early
You need a fully open commercial self-hosting license.
Commercial alternative & related
- Commercial counterpart: ElevenLabs
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
voicebox
jamiepine
A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.
VoxCPM
OpenBMB
Tokenizer-free TTS from OpenBMB covering 30 languages with voice design and real-time streaming.
Chatterbox
resemble-ai
MIT-licensed open TTS with zero-shot voice cloning - 500M params, 23+ languages.