OSI-openVectors, documents and extraction

Hyper-Extract

yifanfeng97

LLM CLI that turns unstructured text into typed knowledge: lists, tables, graphs, hypergraphs, via YAML templates.

2.5k stars(as of 2026-06-26)View on GitHub Homepage

Overview

What is Hyper-Extract?

An LLM-powered CLI and Python library that transforms unstructured text into structured 'knowledge abstracts', lists, tables, graphs, hypergraphs and spatio-temporal graphs, with one command. It ships 80+ YAML extraction templates (finance, legal, medical, general), 10+ extraction engines including GraphRAG and LightRAG, incremental extraction, search and visualisation, Obsidian/Markdown export and an MCP server, and works with OpenAI, Anthropic and local models.

Analysis

Pros & Cons

Pros

An unusually broad structured-output range (graphs, hypergraphs, spatio-temporal) from one tool with zero-code YAML templates
Genuinely usable now: a PyPI install, multiple providers, an MCP server and Obsidian export
Truly OSI-open (Apache-2.0) despite GitHub's misleading NOASSERTION label

Cons

Pre-1.0 (v0.3.0), so interfaces and extraction quality may change
Requires paid or local LLM access; extraction quality and cost depend on the chosen model
GitHub's NOASSERTION badge may scare off adopters until the appendix line is normalised

License

Apache-2.0 (OSI-open)

The LICENSE file is verbatim Apache-2.0 (OSI-open, commercial use permitted); GitHub mislabels it as 'NOASSERTION' only because the appendix copyright line keeps the template brackets, which defeats GitHub's hash-based classifier.

When it is interesting

Turning document corpora into typed knowledge graphs or hypergraphs for RAG or analysis without building extraction pipelines yourself.

When it is too early

If you need a stable, frozen API or guaranteed extraction accuracy for production.

Context

Commercial alternative & related

Commercial counterpart: Diffbot

This repo featured in the 2026-07 edition of the Open-Source AI Radar.

Similar repositories

langextract

google

36.8k

Python library from Google for LLM-powered structured extraction with source grounding.

OSI-openVectors, documents and extraction

LEANN

StarTrail-org

11.9k

RAG on everything - graph-based vector index claiming 97% storage savings for private on-device search.

OSI-openVectors, documents and extraction

turbovec

RyanCodrai

11.5k

Rust vector index with TurboQuant compression (ICLR 2026) - SIMD kernels, online ingest.

OSI-openVectors, documents and extraction