Irodori-TTS
Aratako
Japanese flow-matching TTS with zero-shot cloning and emoji-driven style control, MIT on code and weights.
What is Irodori-TTS?
A Japanese flow-matching text-to-speech model (a rectified-flow diffusion transformer over continuous latents) with zero-shot voice cloning and distinctive emoji-driven style control, where emoji in the input steer delivery and non-verbal expression. A VoiceDesign variant adds caption-text conditioning for emotion and tone and can synthesise without reference audio, and it ships weights, a CLI, Gradio UIs, training and LoRA finetuning code.
Pros & Cons
Pros
- Permissive MIT on both code and weights, among the cleanest licensing for an open TTS model
- Novel, genuinely useful emoji-driven style and caption-based VoiceDesign control, not just plain cloning
- Broad backend support (CUDA, ROCm, Intel XPU, CPU, Apple MPS) with full training and LoRA finetuning code
Cons
- Japanese only, no value outside Japanese use cases
- Flow-matching inference is heavier than the autoregressive CPU-first models; GPU is the practical path
- Quality depends on assembled components whose own licences must be checked before commercial redistribution
License
MIT (OSI-open) - model license: MIT
Both code and weights are MIT (per the v3 model cards); the cards add advisory ethical-use guidelines that are not licence restrictions, and the VoiceDesign variant builds on components (an llm-jp encoder, a DACVAE codec) whose own licences should be checked before commercial redistribution.
When it is interesting
Open, MIT-licensed Japanese TTS with expressive, controllable delivery (emoji or caption style steering) and finetuning flexibility.
When it is too early
If you need non-Japanese languages or lightweight CPU-only realtime synthesis on commodity hardware.
Commercial alternative & related
- Commercial counterpart: ElevenLabs
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
voicebox
jamiepine
A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.
VoxCPM
OpenBMB
Tokenizer-free TTS from OpenBMB covering 30 languages with voice design and real-time streaming.
Chatterbox
resemble-ai
MIT-licensed open TTS with zero-shot voice cloning - 500M params, 23+ languages.