Skip to main content
AI Tool Radar
OSI-openVectors, documents and extraction

knowhere

Ontos-AI

Self-hostable document-to-chunks layer for agentic RAG: parses PDFs and Office files into structured chunks with citations.

1.8k stars(as of 2026-06-26)View on GitHubHomepage

What is knowhere?

A self-hostable document-extraction layer for agentic RAG that parses unstructured documents (PDF, Word, PowerPoint, Excel, CSV, images, Markdown) into structured, hierarchy-preserving chunks with source citations, positioned as a memory layer for agents. It ships as an API plus worker via Docker Compose, with a managed cloud option and Python and Node SDKs.

Pros & Cons

Pros

  • Apache-2.0 and genuinely self-hostable as a full stack
  • Strong multi-format parsing that preserves structure and returns traceable citations
  • Active recent releases plus official Python and Node SDKs

Cons

  • Open-core: the homepage is a paid API, so the best developer experience may favour the cloud
  • Heavy self-host dependencies (Postgres, Redis, S3, an LLM key, Docker), not plug-and-play
  • Accuracy and recall figures are unverified vendor benchmarks

License

Apache-2.0 (OSI-open)

When it is interesting

You need an open, self-hostable document-to-structured-chunks layer for agentic RAG with evidence citations.

When it is too early

If you want a single pip-install library or a zero-infrastructure setup; the stack is service-heavy.

Commercial alternative & related

  • Commercial counterpart: LlamaParse

This repo featured in the 2026-07 edition of the Open-Source AI Radar.