Skip to main content
AI Tool Radar
OSI-openVectors, documents and extraction

Hyper-Extract

yifanfeng97

LLM CLI that turns unstructured text into typed knowledge: lists, tables, graphs, hypergraphs, via YAML templates.

2.5k stars(as of 2026-06-26)View on GitHubHomepage

What is Hyper-Extract?

An LLM-powered CLI and Python library that transforms unstructured text into structured 'knowledge abstracts', lists, tables, graphs, hypergraphs and spatio-temporal graphs, with one command. It ships 80+ YAML extraction templates (finance, legal, medical, general), 10+ extraction engines including GraphRAG and LightRAG, incremental extraction, search and visualisation, Obsidian/Markdown export and an MCP server, and works with OpenAI, Anthropic and local models.

Pros & Cons

Pros

  • An unusually broad structured-output range (graphs, hypergraphs, spatio-temporal) from one tool with zero-code YAML templates
  • Genuinely usable now: a PyPI install, multiple providers, an MCP server and Obsidian export
  • Truly OSI-open (Apache-2.0) despite GitHub's misleading NOASSERTION label

Cons

  • Pre-1.0 (v0.3.0), so interfaces and extraction quality may change
  • Requires paid or local LLM access; extraction quality and cost depend on the chosen model
  • GitHub's NOASSERTION badge may scare off adopters until the appendix line is normalised

License

Apache-2.0 (OSI-open)

The LICENSE file is verbatim Apache-2.0 (OSI-open, commercial use permitted); GitHub mislabels it as 'NOASSERTION' only because the appendix copyright line keeps the template brackets, which defeats GitHub's hash-based classifier.

When it is interesting

Turning document corpora into typed knowledge graphs or hypergraphs for RAG or analysis without building extraction pipelines yourself.

When it is too early

If you need a stable, frozen API or guaranteed extraction accuracy for production.

Commercial alternative & related

  • Commercial counterpart: Diffbot

This repo featured in the 2026-07 edition of the Open-Source AI Radar.