OSI-openOpen voice and text-to-speech

Irodori-TTS

Aratako

Japanese flow-matching TTS with zero-shot cloning and emoji-driven style control, MIT on code and weights.

975 stars(as of 2026-06-26)View on GitHub

Overview

What is Irodori-TTS?

A Japanese flow-matching text-to-speech model (a rectified-flow diffusion transformer over continuous latents) with zero-shot voice cloning and distinctive emoji-driven style control, where emoji in the input steer delivery and non-verbal expression. A VoiceDesign variant adds caption-text conditioning for emotion and tone and can synthesise without reference audio, and it ships weights, a CLI, Gradio UIs, training and LoRA finetuning code.

Analysis

Pros & Cons

Pros

Permissive MIT on both code and weights, among the cleanest licensing for an open TTS model
Novel, genuinely useful emoji-driven style and caption-based VoiceDesign control, not just plain cloning
Broad backend support (CUDA, ROCm, Intel XPU, CPU, Apple MPS) with full training and LoRA finetuning code

Cons

Japanese only, no value outside Japanese use cases
Flow-matching inference is heavier than the autoregressive CPU-first models; GPU is the practical path
Quality depends on assembled components whose own licences must be checked before commercial redistribution

License

MIT (OSI-open) - model license: MIT

Both code and weights are MIT (per the v3 model cards); the cards add advisory ethical-use guidelines that are not licence restrictions, and the VoiceDesign variant builds on components (an llm-jp encoder, a DACVAE codec) whose own licences should be checked before commercial redistribution.

When it is interesting

Open, MIT-licensed Japanese TTS with expressive, controllable delivery (emoji or caption style steering) and finetuning flexibility.

When it is too early

If you need non-Japanese languages or lightweight CPU-only realtime synthesis on commodity hardware.

Context

Commercial alternative & related

Commercial counterpart: ElevenLabs

This repo featured in the 2026-07 edition of the Open-Source AI Radar.

Similar repositories

voicebox

jamiepine

29.5k

A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.

OSI-openOpen voice and text-to-speech

VoxCPM

OpenBMB

26.1k

Tokenizer-free TTS from OpenBMB covering 30 languages with voice design and real-time streaming.

OSI-openOpen voice and text-to-speech

Chatterbox

resemble-ai

25.1k

MIT-licensed open TTS with zero-shot voice cloning - 500M params, 23+ languages.

OSI-openOpen voice and text-to-speech