Hyper-Extract
yifanfeng97
LLM CLI that turns unstructured text into typed knowledge: lists, tables, graphs, hypergraphs, via YAML templates.
What is Hyper-Extract?
An LLM-powered CLI and Python library that transforms unstructured text into structured 'knowledge abstracts', lists, tables, graphs, hypergraphs and spatio-temporal graphs, with one command. It ships 80+ YAML extraction templates (finance, legal, medical, general), 10+ extraction engines including GraphRAG and LightRAG, incremental extraction, search and visualisation, Obsidian/Markdown export and an MCP server, and works with OpenAI, Anthropic and local models.
Pros & Cons
Pros
- An unusually broad structured-output range (graphs, hypergraphs, spatio-temporal) from one tool with zero-code YAML templates
- Genuinely usable now: a PyPI install, multiple providers, an MCP server and Obsidian export
- Truly OSI-open (Apache-2.0) despite GitHub's misleading NOASSERTION label
Cons
- Pre-1.0 (v0.3.0), so interfaces and extraction quality may change
- Requires paid or local LLM access; extraction quality and cost depend on the chosen model
- GitHub's NOASSERTION badge may scare off adopters until the appendix line is normalised
License
Apache-2.0 (OSI-open)
The LICENSE file is verbatim Apache-2.0 (OSI-open, commercial use permitted); GitHub mislabels it as 'NOASSERTION' only because the appendix copyright line keeps the template brackets, which defeats GitHub's hash-based classifier.
When it is interesting
Turning document corpora into typed knowledge graphs or hypergraphs for RAG or analysis without building extraction pipelines yourself.
When it is too early
If you need a stable, frozen API or guaranteed extraction accuracy for production.
Commercial alternative & related
- Commercial counterpart: Diffbot
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
langextract
Python library from Google for LLM-powered structured extraction with source grounding.
LEANN
StarTrail-org
RAG on everything - graph-based vector index claiming 97% storage savings for private on-device search.
turbovec
RyanCodrai
Rust vector index with TurboQuant compression (ICLR 2026) - SIMD kernels, online ingest.