Skip to main content
AI Tool Radar
OSI-openVectors, documents and extraction

langextract

google

Python library from Google for LLM-powered structured extraction with source grounding.

36.8k stars(as of 2026-06-07)View on GitHub

What is langextract?

A Python library from Google that uses an LLM to pull structured information out of unstructured text, then grounds every extraction back to its exact location in the source ('source grounding') and renders an interactive HTML view. It calls no model itself - you bring a provider: Gemini (default), OpenAI, or local models via Ollama (no API key needed).

Pros & Cons

Pros

  • Apache-2.0, permissive and OSI-open, no copyleft
  • Provider-agnostic: cloud (Gemini/OpenAI/Vertex) or fully local via Ollama with no API key
  • Source-grounding and an out-of-the-box HTML visualization are a genuine differentiator

Cons

  • For cloud models it needs an external LLM API: running token costs, and your text leaves your machine (local only via Ollama)
  • The README states plainly 'this is not an officially supported Google product' - no SLA
  • Accuracy is the project's own claim and depends on the chosen model, prompt and examples

License

Apache-2.0 (OSI-open)

When it is interesting

Turning documents, reports or notes into structured data with traceable provenance.

When it is too early

If you need a supported product with guarantees, or cannot send text to a cloud model and do not want to run Ollama locally.

This repo featured in the 2026-06 edition of the Open-Source AI Radar.