Skip to main content
AI Tool Radar
OSI-openLocal inference and "what runs on my machine"

oMLX

jundot

macOS-native LLM inference server for Apple Silicon with continuous batching and SSD-tiered caching.

16.6k stars(as of 2026-06-14)View on GitHub

What is oMLX?

oMLX is a macOS-native LLM inference server optimized for Apple Silicon. It ships a SwiftUI menubar app and admin dashboard, continuous batching, tiered KV caching that spills to SSD, multi-model serving with LRU eviction, and OpenAI/Anthropic-compatible APIs, plus built-in benchmarking and vision-language model support.

Pros & Cons

Pros

  • Native SwiftUI menubar app and admin dashboard - polished Mac-first UX
  • Tiered KV cache spills to SSD to extend effective context beyond RAM (project's own claim)
  • OpenAI and Anthropic API compatibility makes it a drop-in local backend

Cons

  • Apple Silicon only - no Linux or Windows
  • Large open-issue backlog suggests rough edges
  • Differentiates from MLX-LM and llama.cpp mainly via the GUI layer

License

Apache-2.0 (OSI-open)

When it is interesting

Apple Silicon users who want a GUI-managed local inference server without Docker or command-line daemons.

When it is too early

If you need Linux/Windows server deployments or multi-GPU cluster inference.

Commercial alternative & related

  • Commercial counterpart: LM Studio

This repo featured in the 2026-07 edition of the Open-Source AI Radar.