#Setting up extract templates
An extract template is a pre-published ExtractionSchema your
callers reference by slug on POST /api/v1/extract. Author the
schema once, publish, and every caller sending
{ "template": "invoice_v1" } extracts against the same shape.
Edits create a new immutable version; publishing swaps the active
pointer atomically.
Templates are tenant-scoped — there are no platform-default templates. Each tenant owns their own slugs.
Both work — pick by use case. Inline schema (extraction_schema
on the request body) is right for one-off shapes, dynamic
per-request schemas, or testing. Templates are right for
repeated shapes (invoices, receipts, KYC forms) where the schema is
stable and you want central control + version pinning.
#Prerequisites
Sign in to admin.velgent.com as:
- org_admin — manages your own tenant's templates.
- root — Velgent platform staff; pick the target tenant from the sidebar org picker first.
#Create a template
The composer is playground-first: pick an output format, drop in a sample, generate (or paste) a schema, run the extraction, and save once the result looks right. The four panels stack top-to-bottom.
#Step 1 — Pick the output format
| Mode | Use when |
|---|---|
| JSON (default) | Downstream automation. The caller gets structured fields plus per-field confidence and anchors. |
| HTML fields | The caller wants the same JSON and a ready-to-embed HTML render (<dl> for scalars, <table> for line items). |
| HTML document | Re-flowing a scanned PDF or image into semantic HTML — headings, paragraphs, tables, lists. No schema needed; image / PDF input only. |
HTML document mode is fundamentally different: no schema is required and Steps 2 and 4 disappear from the composer. The output is a single sanitised HTML string the caller renders directly.
#Step 2 — Generate a schema from a sample (schema modes only)
If you have a sample document, Velgent can propose the schema for you. The proposed schema is server-side validated before it reaches the form — you'll never see a half-baked JSON blob.
- Paste a sample document into the Sample text box, or upload a sample PDF / image via Sample file.
- (Optional) Add a purpose / hint describing what fields you
want:
"extract invoice number, total, currency, line items". - Click Generate schema →. Velgent calls your tenant's
extract-category LLM (same provider, BYOK, residency as production) and proposes anExtractionSchema. - The JSON editor opens with the proposed schema populated and the suggested display name pre-filled. An AI note under the form calls out any ambiguity the model encountered.
- Review and edit the JSON. AI-generated schemas may miss fields, get a type wrong, or omit constraints — treat them as a strong starter, not the final word. Toggle Show JSON editor if you'd rather hand-author the schema directly.
#Step 3 — Run the extraction (playground)
Run the schema against the same orchestrator that serves
/api/v1/extract. The call hits your tenant's quota — what you see
here is what a customer call would cost.
- Paste a sample text or upload a file in the Run extraction panel.
- Click Run extraction.
- The result panel renders:
- Model used, latency, token count, and PII redaction count.
- HTML document mode → an iframe preview of the reconstructed HTML.
- HTML fields mode → an iframe preview of the deterministic field render, plus the underlying JSON in a collapsible section.
- JSON mode → the extracted JSON, expanded by default.
- Any warnings / schema drift surfaced inline.
Re-run as many times as you like — each run is independently billed. Tweak the schema in the editor and re-run to compare.
#Step 4 — Save as template (schema modes only)
Once the run looks right, save the schema so callers can reference it by slug.
- Fill in the slug (lowercase, digits, hyphen, or underscore).
- Confirm the display name — pre-filled from the AI suggestion.
- (Optional) Add a prompt addendum — operator instructions appended to the system prompt at extract time. Use it for document-specific nudges: "Treat dollar amounts as USD", "Vendor name is usually top-left".
- Tick Publish immediately (default) to make the slug live, or leave it unticked to save as a draft.
- Click Save as template.
The template is now live (or saved as a draft). Callers reach it on the API as soon as a version is published:
{ "template": "invoice_v1", "output_format": "json", ... }
See the Data Extractor reference for the full field-type catalogue.
#Edit a template (new version)
Templates are immutable per version. Every edit creates a new version row; publishing it atomically swaps the active pointer. Older versions are kept indefinitely so historical extract calls can be replayed against the exact schema they ran against.
- From the Extract templates list, click the template's slug.
- The detail page shows the current published version (read-only JSON) and a New version composer underneath.
- Edit the JSON in the composer.
- (Optional) Update the prompt addendum or leave it inherited from the previous version.
- Add a change summary — surfaced in the version-history pane so anyone reading the audit trail later sees what changed and why.
- Decide whether this version publishes immediately:
- Save and publish — atomically swaps the active pointer to this version. New API calls hit the new schema instantly.
- Save as draft — version is created but not active. Useful for staging a schema change with reviewer sign-off.
- Click the button. The version appears at the top of the history pane.
#Pin or roll back to a previous version
Need to roll back? Or pin an experiment?
- Open the template's detail page.
- In the Version history pane, find the version you want.
- Click Publish on that row. The pointer swaps atomically; new API calls hit that version.
To pin a specific historical version from the API (replay / canary
without changing the active pointer), pass template_version:
{ "template": "invoice_v1", "template_version": 3, ... }
#Deactivate (archive) a template
Soft-delete. We never hard-delete — audit references survive.
- Open the template's detail page.
- Scroll to the Archive template panel at the bottom.
- Click Archive template.
The slug stops resolving on POST /api/v1/extract (callers receive
a 404). The template and all its versions stay in the database so
old audit-log rows referencing them remain resolvable.
To bring an archived template back, toggle Show archived templates on the list page; the row is still there with its full history.
#What gets audited
Every privileged action emits an admin.audit event with the
actor's WorkOS subject, role, and tenant:
| Action | When |
|---|---|
extract_template.create | New template created (any tab) |
extract_template.create_version | New version appended |
extract_template.publish | Published a version (active swap) |
extract_template.schema_inferred | "From sample" generation run |
extract_template.deactivate | Template archived |
The schema-inference event captures the input kind (text /
image / pdf_text_layer / pdf_rasterised), pages processed,
and which model proposed the schema — so AI-assisted authoring
shows up in your compliance audit alongside the human edits.
#Routing & costs
The schema-inference path uses your tenant's extract LLM
routing row (under Engine settings → Model routing). Same
provider, same BYOK key, same residency as the production
/api/v1/extract flow — schemas are proposed by the model you
already trust to do the extraction.
One inference = one LLM call. Sample documents over the 10-page PDF cap surface a 413 the same way the production extract does; use a smaller representative sample.