Inferra reads PDFs, scans, and emails against a schema you define — returning structured fields with a confidence score and a pixel-level citation for every value. No templates, no brittle regex.
{ "invoice_no": "NW-4471", "issued": "2026-04-18", "due": "2026-05-18", "currency": "USD", "line_items": 2, "total": 9035.50, "_confidence": 0.991 }
Define the shape you want once. Inferra handles the OCR, layout, and reasoning — and refuses to guess when a field isn't there.
A Pydantic model or plain JSON Schema. Inferra infers types, required fields, and enums from it.
class Invoice(BaseModel): invoice_no: str due: date total: Decimal
PDF, PNG, DOCX, or an email URL. One call streams structured fields back as the model reads.
res = inferra.extract( file="inv.pdf", schema=Invoice, )
Each field carries a confidence score and a bounding box. Route low-confidence rows to review automatically.
res.confidence # 0.991 res.cite("total") # p1 · box(412,..)
We test on 14,200 held-out real-world documents — crumpled receipts, multi-column statements, handwriting, and forms in nine languages. Every number below is field-level exact match, not "looks about right."
Native sinks for warehouses, queues, and storage — plus a typed SDK for Python, TypeScript, and Go.
One page = one side of a document. Failed extractions and abstentions are never billed. Volume bundles roll over for 90 days.
For a prototype or a single internal workflow.
For teams running extraction in production at volume.
For regulated data that can't leave your VPC.
"We were paying a BPO vendor to key 9,000 freight invoices a month. Inferra does it at 98.6% accuracy and the review queue catches the rest. We cut that line item by $41,000 a quarter."
"The fact that it returns null instead of guessing is the whole reason I trust it. The citations let our auditors trace every number back to the pixel it came from."
"Swapped out 1,800 lines of bespoke parsing logic for a Pydantic model and one extract() call. Onboarding a new document type went from a two-week project to an afternoon."
Everything else is in the docs — including the OpenAPI spec, retry semantics, and the eval harness.
We fit an isotonic regression on a 14,200-document holdout per release, so the reported probability matches observed accuracy. At 0.95 the field is correct 95% of the time within ±1.3pp — verified in CI before any model ships.
Inferra returns null with a low confidence and an empty citation rather than fabricating a value. You can set a per-field required=True to raise a validation error instead, or route the document straight to the review queue.
Yes — model a field as list[LineItem] and Inferra returns one object per row with its own confidence and bounding box. Multi-page tables are stitched automatically using layout continuity.
Never by default. Documents are processed in memory and dropped within 24h unless you opt into retention for the review queue. On Enterprise, processing runs entirely inside your VPC and nothing leaves your network.
P50 0.74s, P95 1.18s, P99 2.4s for a typical 3-page document with a 12-field schema. Fields stream back as they resolve, so you can render partial results before the call completes.
1,000 free pages a month, no card. Drop in your schema, point it at a file, ship the JSON.