OSI-openLocal inference and "what runs on my machine"

Rapid-MLX

raullenchai

Local OpenAI-compatible inference server for Apple Silicon built on MLX, designed for coding agents.

2.7k stars(as of 2026-06-05)View on GitHub

Overview

What is Rapid-MLX?

A local OpenAI-compatible inference server for Apple Silicon built on MLX, designed to plug into coding agents like Cursor and Claude Code. It ships with tool-calling, prompt caching and 3,300+ tests.

Analysis

Pros & Cons

Pros

Serious engineering signals: 3,300+ tests, a doctor diagnostic, broad model support
Clean Ollama/llama.cpp replacement on Apple Silicon
Apache-2.0, fully OSI-open

Cons

macOS / Apple Silicon only - no Linux, Windows or NVIDIA
Officially Beta (PyPI development status 4) despite a high version number
The '4.2x faster than Ollama' headline has no disclosed benchmark conditions - and PyPI states a more modest '2-4x'

License

Apache-2.0 (OSI-open)

When it is interesting

Apple Silicon users running local inference for coding agents.

When it is too early

Any non-Apple hardware, or if you need reproducible speed guarantees rather than a marketing headline.

This repo featured in the 2026-06 edition of the Open-Source AI Radar.

Similar repositories

oMLX

jundot

16.6k

macOS-native LLM inference server for Apple Silicon with continuous batching and SSD-tiered caching.

OSI-openLocal inference and "what runs on my machine"

apfel

Arthur-Ficial

5.8k

Expose the on-device Apple Intelligence model on macOS 26 as a zero-setup OpenAI-compatible local API.

OSI-openLocal inference and "what runs on my machine"

shimmy

Michael-A-Kuykendall

5.3k

Pure-Rust local inference engine with an OpenAI-compatible API, shipped as one binary.

OSI-openLocal inference and "what runs on my machine"