Checking backend…

See the page.Not just the text.

Velix is an open source research project on visual-first retrieval and structured extraction for legal and real-asset documents. The retrieval and extraction layers are built. A hosted demo and frontend integration are next.

party
fraction
MineralDeedschema
grantorSmith Family Trust
granteeABC Minerals LLC
fraction1/64
section14 · T2N R3W
Scroll
Approach

Four ideas the whole project rests on.

Visual retrieval

Pages are indexed as images using ColQwen2 multi-vector embeddings. No OCR step on the retrieval path.

Typed extraction

Output is constrained by Pydantic schemas for six oil and gas document types. Invalid output is rejected, not coerced.

Composable layers

Embedder, schema, and store are independent. Mock implementations let the whole pipeline run on CPU for tests.

On-demand by design

Indexing runs once per page. Field extraction runs only when a page is queried. Compute follows usage, not corpus size.

workspace
Page 14
grantor
grantee
1/64
MineralDeed
grantorSmith Family Trust
granteeABC Minerals LLC
fraction1/64
section14
townshipT2N
rangeR3W
countyReeves, TX
Validators
PLSS parsespass
Power-of-two denom.pass
Chain consistencypass
The Workspace

One screen for the page, the fields, and the checks.

The PDF on the left, the typed extraction on the right, validators showing their work. Click into any indexed document and try it.

  • Source PDF rendered inline, scrollable, zoomable.
  • Pydantic-typed fields mirror the document's schema; what you see is what feeds the LLM.
  • Domain validators (PLSS, mineral fractions, party chains) show pass or fail per field.

The repo is the demo.

Code, tests, schemas, and the build scripts that produced the corpus are all open. Read the README, clone, run the test suite. The hosted demo follows once the backend is live.