Open weight, with conditionsOpen voice and text-to-speech

Higgs Audio

boson-ai

Text-audio foundation model - conversational TTS in 100+ languages with zero-shot cloning, 4B params.

8.2k stars(as of 2026-06-14)View on GitHub Homepage

Overview

What is Higgs Audio?

Higgs Audio is a text-audio foundation model family from Boson AI. v3 is a 4B-parameter conversational TTS model covering 100+ languages with zero-shot voice cloning, inline emotion/style/prosody control and an OpenAI-compatible streaming API. Self-hosting is via SGLang-Omni.

Analysis

Pros & Cons

Pros

100+ languages with zero-shot cloning and inline prosody control in one 4B model
Pretrained on 10M+ hours of audio (project's own claim) - a large open-weight corpus
OpenAI-compatible streaming API eases drop-in integration

Cons

Weights are non-commercial - commercial self-hosting needs a paid agreement
4B params plus SGLang-Omni adds meaningful infra overhead
Research-licensed weights limit production open-source appeal

License

Apache-2.0 (code) (Open weight, with conditions) - model license: Boson Higgs Audio v3 Research and Non-Commercial License

Code is Apache-2.0, but the v3 model weights are under a Research and Non-Commercial License - production/revenue-generating deployments require a separate commercial agreement with Boson AI.

When it is interesting

Research or non-commercial products needing the broadest multilingual coverage and richest prosody control in open weights.

When it is too early

You need a fully open commercial self-hosting license.

Context

Commercial alternative & related

Commercial counterpart: ElevenLabs

This repo featured in the 2026-07 edition of the Open-Source AI Radar.

Similar repositories

voicebox

jamiepine

29.5k

A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.

OSI-openOpen voice and text-to-speech

VoxCPM

OpenBMB

26.1k

Tokenizer-free TTS from OpenBMB covering 30 languages with voice design and real-time streaming.

OSI-openOpen voice and text-to-speech

Chatterbox

resemble-ai

25.1k

MIT-licensed open TTS with zero-shot voice cloning - 500M params, 23+ languages.

OSI-openOpen voice and text-to-speech