needle
cactus-compute
26M-parameter open-weights model for single-shot function calling on phones, watches and glasses.
What is needle?
A 26-million-parameter 'Simple Attention Network' for single-shot function and tool calling on resource-constrained devices like phones, watches and glasses. It takes a user query plus JSON tool schemas and emits the matching function call, and ships with weights, a dataset-generation pipeline, a CLI, a Python library and a web playground.
Pros & Cons
Pros
- Fully MIT for both code and weights, no conditions, rare for an on-device model
- Tiny (26M params), so it can run on phones, watches and glasses, with weights and dataset generation open
- Complete tooling out of the box: CLI, Python API, web playground and local finetuning on consumer Mac/PC
Cons
- At 26M params it is a narrow single-shot function-caller, not conversational or general-purpose
- Headline speed and benchmark-win numbers are unverified vendor claims, some measured on Cactus's own hardware
- No formal releases or versioning, and explicitly described as an 'experimental run'
License
MIT (OSI-open) - model license: MIT
Both the code and the model weights are MIT (verified against the LICENSE file and the Hugging Face model card), with no extra use conditions - unusually clean for an on-device model.
When it is interesting
Ultra-cheap, fully open, finetunable on-device tool calling on constrained hardware like wearables.
When it is too early
If you need conversation, multi-turn reasoning, or a stable versioned release.
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
oMLX
jundot
macOS-native LLM inference server for Apple Silicon with continuous batching and SSD-tiered caching.
apfel
Arthur-Ficial
Expose the on-device Apple Intelligence model on macOS 26 as a zero-setup OpenAI-compatible local API.
shimmy
Michael-A-Kuykendall
Pure-Rust local inference engine with an OpenAI-compatible API, shipped as one binary.