Skip to main content
AI Tool Radar
Open weight, with conditionsVectors, documents and extraction

chandra

datalab-to

High-accuracy document digitization (OCR/layout) with code and an open model.

11.1k stars(as of 2026-06-05)View on GitHub

What is chandra?

A document digitization model for demanding OCR and layout extraction, usable with a GPU locally or through Datalab's managed API.

Pros & Cons

Pros

  • Very broad: tables, forms, handwriting, 90+ languages
  • Usable both locally (HuggingFace) and as a hosted API
  • Backed by an established team (Marker/Surya)

Cons

  • The model is Modified OpenRAIL-M: free only for research, personal use, and startups under $2M - not unrestricted OSI-open
  • A GPU is effectively required for local use
  • Benchmark claims are self-reported

License

Apache-2.0 (code) (Open weight, with conditions) - model license: Modified OpenRAIL-M

Code Apache-2.0, model Modified OpenRAIL-M (open weight, with a revenue/use condition). Worth checking carefully before commercial use, especially commercial self-use above the $2M threshold.

When it is interesting

Demanding document digitization with a GPU or via the API.

When it is too early

Commercial self-use above the $2M threshold (check the model license carefully).

Commercial alternative & related

  • Commercial counterpart: Datalab API

This repo featured in the 2026-06 edition of the Open-Source AI Radar.