Skip to main content
AI Tool Radar
OSI-openOpen voice and text-to-speech

speech-swift

soniqo

On-device Apple Silicon speech toolkit (ASR, TTS, diarization, VAD) wiring 40+ open models via MLX.

933 stars(as of 2026-06-26)View on GitHubHomepage

What is speech-swift?

An on-device speech toolkit for Apple Silicon (Mac and iOS) that bundles ASR, TTS, speech-to-speech, voice activity detection, speaker diarization, enhancement and source separation via MLX and CoreML, running locally without cloud APIs. It wires up 40+ open models (Qwen3-ASR/TTS, Parakeet, Kokoro, CosyVoice and more) and ships as a Swift package, a CLI and an OpenAI-compatible server.

Pros & Cons

Pros

  • Fully on-device and offline, no API keys or per-minute cost
  • Broad capability set (ASR, TTS, speech-to-speech, VAD, diarization) under one Apache-2.0 package
  • Multiple distribution forms including an OpenAI-compatible server

Cons

  • Apple Silicon only (macOS 15+/iOS 18+), no portability; the site's cross-platform claim is not reflected in this repo
  • Pre-1.0 (0.0.x), so the API surface is unstable
  • Performance and quality figures (e.g. '32x realtime') are unverified project claims

License

Apache-2.0 (OSI-open)

When it is interesting

Private, cloud-free ASR, TTS and diarization on Mac or iOS, built against a Swift/SPM stack.

When it is too early

If you need cross-platform support or a stable, versioned API; it is Apple-only and still 0.0.x.

Commercial alternative & related

  • Commercial counterpart: Deepgram

This repo featured in the 2026-07 edition of the Open-Source AI Radar.