OSI-openVectors, documents and extraction

turbovec

RyanCodrai

Rust vector index with TurboQuant compression (ICLR 2026) - SIMD kernels, online ingest.

11.5k stars(as of 2026-06-14)View on GitHub

Overview

What is turbovec?

turbovec implements Google Research's TurboQuant algorithm (ICLR 2026) in Rust with Python bindings and hand-written SIMD kernels (NEON, AVX-512). It claims compressing a 10M-document corpus from 31GB to 4GB with search faster than FAISS on 4-bit configs (project's own claim), supports online ingest with no training phase, and integrates with LangChain, LlamaIndex, Haystack and Agno.

Analysis

Pros & Cons

Pros

Grounded in a peer-reviewed ICLR 2026 paper
SIMD-optimized Rust core with ergonomic Python bindings
No training phase - online ingest suits dynamic collections

Cons

Single developer - no visible team or org backing
Beta maturity and a young repo - production reliability unproven at scale
Compression-vs-recall trade-offs not independently benchmarked

License

License

MIT (OSI-open)

When it is interesting

Fast semantic search over large corpora (10M+) with storage budgets too tight for full float32 embeddings.

When it is too early

Use cases needing maximum recall at any storage cost, or a commercially-backed vector DB with SLA.

Context

Commercial alternative & related

Commercial counterpart: Pinecone / Zilliz Cloud

This repo featured in the 2026-07 edition of the Open-Source AI Radar.

Similar repositories

langextract

google

Python library from Google for LLM-powered structured extraction with source grounding.

OSI-openVectors, documents and extraction

LEANN

StarTrail-org

RAG on everything - graph-based vector index claiming 97% storage savings for private on-device search.

OSI-openVectors, documents and extraction

chandra

datalab-to

High-accuracy document digitization (OCR/layout) with code and an open model.

Open weight, with conditionsVectors, documents and extraction