We’re building the world’s most advanced structured document extraction and OCR models — purpose-built for forms.

Form-native models that output structured fields with schema awareness, field-level confidence, and line-item reasoning. Built for reliability and privacy.

See the API

Structured outputs
JSON keyed to your entities with confidence and provenance.
Layout + reasoning
Understands grids, checkboxes, tables and totals.
Private by design
VPC/on-prem options. Data not used to train shared models.

Purpose-built for forms

General OCR struggles with form semantics and nested tables. Our models are trained on form structure, infer field relationships, and normalize outputs to your schema.

Reliability you can measure

Field-level confidence, bounding boxes, and provenance let you triage edge cases. Integrate via REST + webhooks with predictable schemas.

How it works

1) Ingest

Upload PDFs, scans, or images. Batch and multi-page supported.

2) Extract

Form-native models detect fields, tables, and selections with per-field confidence.

3) Deliver

Emit clean JSON to your systems via webhooks or polling.

A simple, predictable API

• REST + webhooks
• Field-level confidence and bounding boxes
• Schema anchoring and normalization

// Submit a document
curl -X POST https://api.darmis.ai/v1/extract \
  -H "Authorization: Bearer <API_KEY>" \
  -F "[email protected]" \
  -F "schema=insurance.acord_25"

// Example response (truncated)
{
  "document_id": "doc_123",
  "schema": "insurance.acord_25",
  "fields": {
    "policy_number": { 
      "value": "ABC-123456", 
      "confidence": 0.997,
      "source": { "page": 1, "bbox": [120, 92, 240, 108] }
    },
    ...
  }
}

See it on your documents

Bring two or three representative forms. We’ll run them and return structured JSON.