#Setting up extract templates

An extract template is a pre-published ExtractionSchema your callers reference by slug on POST /api/v1/extract. Author the schema once, publish, and every caller sending { "template": "invoice_v1" } extracts against the same shape. Edits create a new immutable version; publishing swaps the active pointer atomically.

Templates are tenant-scoped — there are no platform-default templates. Each tenant owns their own slugs.

Schema or template?

Both work — pick by use case. Inline schema (extraction_schema on the request body) is right for one-off shapes, dynamic per-request schemas, or testing. Templates are right for repeated shapes (invoices, receipts, KYC forms) where the schema is stable and you want central control + version pinning.

#Prerequisites

Sign in to admin.velgent.com as:

  • org_admin — manages your own tenant's templates.
  • root — Velgent platform staff; pick the target tenant from the sidebar org picker first.

#Create a template

The composer is playground-first: pick an output format, drop in a sample, generate (or paste) a schema, run the extraction, and save once the result looks right. The four panels stack top-to-bottom.

#Step 1 — Pick the output format

ModeUse when
JSON (default)Downstream automation. The caller gets structured fields plus per-field confidence and anchors.
HTML fieldsThe caller wants the same JSON and a ready-to-embed HTML render (<dl> for scalars, <table> for line items).
HTML documentRe-flowing a scanned PDF or image into semantic HTML — headings, paragraphs, tables, lists. No schema needed; image / PDF input only.

HTML document mode is fundamentally different: no schema is required and Steps 2 and 4 disappear from the composer. The output is a single sanitised HTML string the caller renders directly.

#Step 2 — Generate a schema from a sample (schema modes only)

If you have a sample document, Velgent can propose the schema for you. The proposed schema is server-side validated before it reaches the form — you'll never see a half-baked JSON blob.

  1. Paste a sample document into the Sample text box, or upload a sample PDF / image via Sample file.
  2. (Optional) Add a purpose / hint describing what fields you want: "extract invoice number, total, currency, line items".
  3. Click Generate schema →. Velgent calls your tenant's extract-category LLM (same provider, BYOK, residency as production) and proposes an ExtractionSchema.
  4. The JSON editor opens with the proposed schema populated and the suggested display name pre-filled. An AI note under the form calls out any ambiguity the model encountered.
  5. Review and edit the JSON. AI-generated schemas may miss fields, get a type wrong, or omit constraints — treat them as a strong starter, not the final word. Toggle Show JSON editor if you'd rather hand-author the schema directly.

#Step 3 — Run the extraction (playground)

Run the schema against the same orchestrator that serves /api/v1/extract. The call hits your tenant's quota — what you see here is what a customer call would cost.

  1. Paste a sample text or upload a file in the Run extraction panel.
  2. Click Run extraction.
  3. The result panel renders:
    • Model used, latency, token count, and PII redaction count.
    • HTML document mode → an iframe preview of the reconstructed HTML.
    • HTML fields mode → an iframe preview of the deterministic field render, plus the underlying JSON in a collapsible section.
    • JSON mode → the extracted JSON, expanded by default.
    • Any warnings / schema drift surfaced inline.

Re-run as many times as you like — each run is independently billed. Tweak the schema in the editor and re-run to compare.

#Step 4 — Save as template (schema modes only)

Once the run looks right, save the schema so callers can reference it by slug.

  1. Fill in the slug (lowercase, digits, hyphen, or underscore).
  2. Confirm the display name — pre-filled from the AI suggestion.
  3. (Optional) Add a prompt addendum — operator instructions appended to the system prompt at extract time. Use it for document-specific nudges: "Treat dollar amounts as USD", "Vendor name is usually top-left".
  4. Tick Publish immediately (default) to make the slug live, or leave it unticked to save as a draft.
  5. Click Save as template.

The template is now live (or saved as a draft). Callers reach it on the API as soon as a version is published:

{ "template": "invoice_v1", "output_format": "json", ... }

See the Data Extractor reference for the full field-type catalogue.

#Edit a template (new version)

Templates are immutable per version. Every edit creates a new version row; publishing it atomically swaps the active pointer. Older versions are kept indefinitely so historical extract calls can be replayed against the exact schema they ran against.

  1. From the Extract templates list, click the template's slug.
  2. The detail page shows the current published version (read-only JSON) and a New version composer underneath.
  3. Edit the JSON in the composer.
  4. (Optional) Update the prompt addendum or leave it inherited from the previous version.
  5. Add a change summary — surfaced in the version-history pane so anyone reading the audit trail later sees what changed and why.
  6. Decide whether this version publishes immediately:
    • Save and publish — atomically swaps the active pointer to this version. New API calls hit the new schema instantly.
    • Save as draft — version is created but not active. Useful for staging a schema change with reviewer sign-off.
  7. Click the button. The version appears at the top of the history pane.

#Pin or roll back to a previous version

Need to roll back? Or pin an experiment?

  1. Open the template's detail page.
  2. In the Version history pane, find the version you want.
  3. Click Publish on that row. The pointer swaps atomically; new API calls hit that version.

To pin a specific historical version from the API (replay / canary without changing the active pointer), pass template_version:

{ "template": "invoice_v1", "template_version": 3, ... }

#Deactivate (archive) a template

Soft-delete. We never hard-delete — audit references survive.

  1. Open the template's detail page.
  2. Scroll to the Archive template panel at the bottom.
  3. Click Archive template.

The slug stops resolving on POST /api/v1/extract (callers receive a 404). The template and all its versions stay in the database so old audit-log rows referencing them remain resolvable.

To bring an archived template back, toggle Show archived templates on the list page; the row is still there with its full history.

#What gets audited

Every privileged action emits an admin.audit event with the actor's WorkOS subject, role, and tenant:

ActionWhen
extract_template.createNew template created (any tab)
extract_template.create_versionNew version appended
extract_template.publishPublished a version (active swap)
extract_template.schema_inferred"From sample" generation run
extract_template.deactivateTemplate archived

The schema-inference event captures the input kind (text / image / pdf_text_layer / pdf_rasterised), pages processed, and which model proposed the schema — so AI-assisted authoring shows up in your compliance audit alongside the human edits.

#Routing & costs

The schema-inference path uses your tenant's extract LLM routing row (under Engine settings → Model routing). Same provider, same BYOK key, same residency as the production /api/v1/extract flow — schemas are proposed by the model you already trust to do the extraction.

One inference = one LLM call. Sample documents over the 10-page PDF cap surface a 413 the same way the production extract does; use a smaller representative sample.


Next: Data Extractor reference →