Checking backend…

See the page.Not just the text.

Velix is an open source research project on visual-first retrieval and structured extraction for legal and real-asset documents. The retrieval and extraction layers are built. A hosted demo and frontend integration are next.

Open the demo View on GitHub

party

fraction

MineralDeedschema

grantorSmith Family Trust

granteeABC Minerals LLC

fraction1/64

section14 · T2N R3W

Scroll

Approach

Four ideas the whole project rests on.

Visual retrieval

Pages are indexed as images using ColQwen2 multi-vector embeddings. No OCR step on the retrieval path.

Typed extraction

Output is constrained by Pydantic schemas for six oil and gas document types. Invalid output is rejected, not coerced.

Composable layers

Embedder, schema, and store are independent. Mock implementations let the whole pipeline run on CPU for tests.

On-demand by design

Indexing runs once per page. Field extraction runs only when a page is queried. Compute follows usage, not corpus size.

velix/document/9113

workspace

Inbox

Documents

Schemas

Settings

Page 14

grantor

grantee

1/64

MineralDeed

grantorSmith Family Trust

granteeABC Minerals LLC

fraction1/64

section14

townshipT2N

rangeR3W

countyReeves, TX

Validators

PLSS parsespass

Power-of-two denom.pass

Chain consistencypass

The Workspace

One screen for the page, the fields, and the checks.

The PDF on the left, the typed extraction on the right, validators showing their work. Click into any indexed document and try it.

Source PDF rendered inline, scrollable, zoomable.
Pydantic-typed fields mirror the document's schema; what you see is what feeds the LLM.
Domain validators (PLSS, mineral fractions, party chains) show pass or fail per field.

Open this document

The repo is the demo.

Code, tests, schemas, and the build scripts that produced the corpus are all open. Read the README, clone, run the test suite. The hosted demo follows once the backend is live.

View on GitHub Read the README