OSI-openVectors, documents and extraction

semble

MinishLab

CPU-only semantic code search for agents: query in natural language, get back only the relevant snippets.

5.4k stars(as of 2026-06-26)View on GitHub Homepage

Overview

What is semble?

A code-search tool built for AI coding agents: you query in natural language and get back only the relevant snippets instead of reading whole files or grepping. It runs entirely on CPU with tree-sitter parsing, Model2Vec static embeddings and BM25, needs no API keys or GPU, and ships as a Python library, a CLI and an MCP server.

Analysis

Pros & Cons

Pros

Zero-setup local operation, no API keys, GPU or cloud, from the Model2Vec/Potion team
Triple distribution (library, CLI and MCP server) fits both agent and human workflows
MIT-licensed with releases, tests and docs

Cons

Pre-1.0 (0.4.x), so interfaces may break
Headline efficiency numbers (98% fewer tokens, 218x faster indexing) are unverified project benchmarks
Static-embedding retrieval quality on very large or unusual codebases is not independently validated

License

MIT (OSI-open)

When it is interesting

Coding-agent users who want to cut context-token spend on code retrieval with a local CPU tool.

When it is too early

If you need a stable 1.0 API, or proven retrieval quality on your specific monorepo first.

Context

Commercial alternative & related

Commercial counterpart: Greptile

This repo featured in the 2026-07 edition of the Open-Source AI Radar.

Similar repositories

langextract

google

36.8k

Python library from Google for LLM-powered structured extraction with source grounding.

OSI-openVectors, documents and extraction

LEANN

StarTrail-org

11.9k

RAG on everything - graph-based vector index claiming 97% storage savings for private on-device search.

OSI-openVectors, documents and extraction

turbovec

RyanCodrai

11.5k

Rust vector index with TurboQuant compression (ICLR 2026) - SIMD kernels, online ingest.

OSI-openVectors, documents and extraction