speech-swift
soniqo
On-device Apple Silicon speech toolkit (ASR, TTS, diarization, VAD) wiring 40+ open models via MLX.
What is speech-swift?
An on-device speech toolkit for Apple Silicon (Mac and iOS) that bundles ASR, TTS, speech-to-speech, voice activity detection, speaker diarization, enhancement and source separation via MLX and CoreML, running locally without cloud APIs. It wires up 40+ open models (Qwen3-ASR/TTS, Parakeet, Kokoro, CosyVoice and more) and ships as a Swift package, a CLI and an OpenAI-compatible server.
Pros & Cons
Pros
- Fully on-device and offline, no API keys or per-minute cost
- Broad capability set (ASR, TTS, speech-to-speech, VAD, diarization) under one Apache-2.0 package
- Multiple distribution forms including an OpenAI-compatible server
Cons
- Apple Silicon only (macOS 15+/iOS 18+), no portability; the site's cross-platform claim is not reflected in this repo
- Pre-1.0 (0.0.x), so the API surface is unstable
- Performance and quality figures (e.g. '32x realtime') are unverified project claims
License
Apache-2.0 (OSI-open)
When it is interesting
Private, cloud-free ASR, TTS and diarization on Mac or iOS, built against a Swift/SPM stack.
When it is too early
If you need cross-platform support or a stable, versioned API; it is Apple-only and still 0.0.x.
Commercial alternative & related
- Commercial counterpart: Deepgram
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
voicebox
jamiepine
A free, on-device alternative to ElevenLabs for TTS, voice cloning and dictation.
VoxCPM
OpenBMB
Tokenizer-free TTS from OpenBMB covering 30 languages with voice design and real-time streaming.
Chatterbox
resemble-ai
MIT-licensed open TTS with zero-shot voice cloning - 500M params, 23+ languages.