oMLX
jundot
macOS-native LLM inference server for Apple Silicon with continuous batching and SSD-tiered caching.
What is oMLX?
oMLX is a macOS-native LLM inference server optimized for Apple Silicon. It ships a SwiftUI menubar app and admin dashboard, continuous batching, tiered KV caching that spills to SSD, multi-model serving with LRU eviction, and OpenAI/Anthropic-compatible APIs, plus built-in benchmarking and vision-language model support.
Pros & Cons
Pros
- Native SwiftUI menubar app and admin dashboard - polished Mac-first UX
- Tiered KV cache spills to SSD to extend effective context beyond RAM (project's own claim)
- OpenAI and Anthropic API compatibility makes it a drop-in local backend
Cons
- Apple Silicon only - no Linux or Windows
- Large open-issue backlog suggests rough edges
- Differentiates from MLX-LM and llama.cpp mainly via the GUI layer
License
Apache-2.0 (OSI-open)
When it is interesting
Apple Silicon users who want a GUI-managed local inference server without Docker or command-line daemons.
When it is too early
If you need Linux/Windows server deployments or multi-GPU cluster inference.
Commercial alternative & related
- Commercial counterpart: LM Studio
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
apfel
Arthur-Ficial
Expose the on-device Apple Intelligence model on macOS 26 as a zero-setup OpenAI-compatible local API.
shimmy
Michael-A-Kuykendall
Pure-Rust local inference engine with an OpenAI-compatible API, shipped as one binary.
whichllm
Andyyyy64
CLI that detects your hardware and ranks local LLMs that will run well on it, scored against real benchmarks.