OSI-openVectors, documents and extraction

knowhere

Ontos-AI

Self-hostable document-to-chunks layer for agentic RAG: parses PDFs and Office files into structured chunks with citations.

1.8k stars(as of 2026-06-26)View on GitHub Homepage

Overview

What is knowhere?

A self-hostable document-extraction layer for agentic RAG that parses unstructured documents (PDF, Word, PowerPoint, Excel, CSV, images, Markdown) into structured, hierarchy-preserving chunks with source citations, positioned as a memory layer for agents. It ships as an API plus worker via Docker Compose, with a managed cloud option and Python and Node SDKs.

Analysis

Pros & Cons

Pros

Apache-2.0 and genuinely self-hostable as a full stack
Strong multi-format parsing that preserves structure and returns traceable citations
Active recent releases plus official Python and Node SDKs

Cons

Open-core: the homepage is a paid API, so the best developer experience may favour the cloud
Heavy self-host dependencies (Postgres, Redis, S3, an LLM key, Docker), not plug-and-play
Accuracy and recall figures are unverified vendor benchmarks

License

Apache-2.0 (OSI-open)

When it is interesting

You need an open, self-hostable document-to-structured-chunks layer for agentic RAG with evidence citations.

When it is too early

If you want a single pip-install library or a zero-infrastructure setup; the stack is service-heavy.

Context

Commercial alternative & related

Commercial counterpart: LlamaParse

This repo featured in the 2026-07 edition of the Open-Source AI Radar.

Similar repositories

langextract

google

36.8k

Python library from Google for LLM-powered structured extraction with source grounding.

OSI-openVectors, documents and extraction

LEANN

StarTrail-org

11.9k

RAG on everything - graph-based vector index claiming 97% storage savings for private on-device search.

OSI-openVectors, documents and extraction

turbovec

RyanCodrai

11.5k

Rust vector index with TurboQuant compression (ICLR 2026) - SIMD kernels, online ingest.

OSI-openVectors, documents and extraction