Rapid-MLX
raullenchai
Local OpenAI-compatible inference server for Apple Silicon built on MLX, designed for coding agents.
What is Rapid-MLX?
A local OpenAI-compatible inference server for Apple Silicon built on MLX, designed to plug into coding agents like Cursor and Claude Code. It ships with tool-calling, prompt caching and 3,300+ tests.
Pros & Cons
Pros
- Serious engineering signals: 3,300+ tests, a doctor diagnostic, broad model support
- Clean Ollama/llama.cpp replacement on Apple Silicon
- Apache-2.0, fully OSI-open
Cons
- macOS / Apple Silicon only - no Linux, Windows or NVIDIA
- Officially Beta (PyPI development status 4) despite a high version number
- The '4.2x faster than Ollama' headline has no disclosed benchmark conditions - and PyPI states a more modest '2-4x'
License
Apache-2.0 (OSI-open)
When it is interesting
Apple Silicon users running local inference for coding agents.
When it is too early
Any non-Apple hardware, or if you need reproducible speed guarantees rather than a marketing headline.
This repo featured in the 2026-06 edition of the Open-Source AI Radar.
oMLX
jundot
macOS-native LLM inference server for Apple Silicon with continuous batching and SSD-tiered caching.
apfel
Arthur-Ficial
Expose the on-device Apple Intelligence model on macOS 26 as a zero-setup OpenAI-compatible local API.
shimmy
Michael-A-Kuykendall
Pure-Rust local inference engine with an OpenAI-compatible API, shipped as one binary.