OSI-openLocal inference and "what runs on my machine"

needle

cactus-compute

26M-parameter open-weights model for single-shot function calling on phones, watches and glasses.

2.6k stars(as of 2026-06-26)View on GitHub Homepage

Overview

What is needle?

A 26-million-parameter 'Simple Attention Network' for single-shot function and tool calling on resource-constrained devices like phones, watches and glasses. It takes a user query plus JSON tool schemas and emits the matching function call, and ships with weights, a dataset-generation pipeline, a CLI, a Python library and a web playground.

Analysis

Pros & Cons

Pros

Fully MIT for both code and weights, no conditions, rare for an on-device model
Tiny (26M params), so it can run on phones, watches and glasses, with weights and dataset generation open
Complete tooling out of the box: CLI, Python API, web playground and local finetuning on consumer Mac/PC

Cons

At 26M params it is a narrow single-shot function-caller, not conversational or general-purpose
Headline speed and benchmark-win numbers are unverified vendor claims, some measured on Cactus's own hardware
No formal releases or versioning, and explicitly described as an 'experimental run'

License

MIT (OSI-open) - model license: MIT

Both the code and the model weights are MIT (verified against the LICENSE file and the Hugging Face model card), with no extra use conditions - unusually clean for an on-device model.

When it is interesting

Ultra-cheap, fully open, finetunable on-device tool calling on constrained hardware like wearables.

When it is too early

If you need conversation, multi-turn reasoning, or a stable versioned release.

This repo featured in the 2026-07 edition of the Open-Source AI Radar.

Similar repositories

oMLX

jundot

16.6k

macOS-native LLM inference server for Apple Silicon with continuous batching and SSD-tiered caching.

OSI-openLocal inference and "what runs on my machine"

apfel

Arthur-Ficial

5.8k

Expose the on-device Apple Intelligence model on macOS 26 as a zero-setup OpenAI-compatible local API.

OSI-openLocal inference and "what runs on my machine"

shimmy

Michael-A-Kuykendall

5.3k

Pure-Rust local inference engine with an OpenAI-compatible API, shipped as one binary.

OSI-openLocal inference and "what runs on my machine"