OpenKB
VectifyAI
CLI that compiles documents into a cross-linked, Obsidian-friendly wiki instead of re-querying like classic RAG.
What is OpenKB?
An open-source CLI that turns raw documents (PDF, Word, Markdown, PowerPoint, HTML, Excel, CSV, URLs) into a structured, interlinked wiki-style knowledge base with LLMs. Rather than re-deriving knowledge per query like classic RAG, it compiles documents once into persistent wiki pages following Google's Open Knowledge Format, with auto cross-links, Obsidian-compatible Markdown, query, chat with citations and graph visualisation.
Pros & Cons
Pros
- Apache-2.0 Python tool from the credible PageIndex/Vectify team, pip-installable and local
- Novel approach: a persistent, cross-linked, OKF-compliant wiki rather than re-querying, vectorless via PageIndex tree indexing
- Broad input formats plus multimodal handling and extras like graph visualisation and deck generation
Cons
- Pre-1.0 (v0.4.2-rc1), so formats and APIs may still shift
- Advanced features (OCR, faster indexing) need a PAGEINDEX_API_KEY for the vendor's commercial cloud
- Quality and cost depend on the external LLM you bring, with no formal retrieval benchmarks
License
Apache-2.0 (OSI-open)
When it is interesting
Turning a pile of documents into a navigable, cross-linked, Obsidian-friendly knowledge base locally.
When it is too early
If you need a stable 1.0, want to avoid the PageIndex dependency, or require proven retrieval benchmarks.
Commercial alternative & related
- Commercial counterpart: PageIndex
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
langextract
Python library from Google for LLM-powered structured extraction with source grounding.
LEANN
StarTrail-org
RAG on everything - graph-based vector index claiming 97% storage savings for private on-device search.
turbovec
RyanCodrai
Rust vector index with TurboQuant compression (ICLR 2026) - SIMD kernels, online ingest.