POST
/
extract

Document schema structure

The document_schema object should conform to the following structure.

{
  "name": "Invoice",
  "description": "Schema for extracting invoice details",
  "fields": [
    {
      "name": "invoice_number",
      "description": "The unique identifier for the invoice",
      "type": "string"
    },
    {
      "name": "issue_date",
      "description": "The date when the invoice was issued",
      "type": "date"
    },
    {
      "name": "customer",
      "description": "Customer information",
      "type": "object",
      "fields": [
        {
          "name": "name",
          "description": "Customer's full name",
          "type": "string"
        },
        {
          "name": "email",
          "description": "Customer's email address",
          "type": "email"
        }
      ]
    },
    {
      "name": "line_items",
      "description": "List of items in the invoice",
      "type": "array",
      "fields": [
        {
          "name": "description",
          "description": "Item description",
          "type": "string"
        },
        {
          "name": "amount",
          "description": "Item cost",
          "type": "number"
        }
      ]
    }
  ]
}

Primitive fields

  • string: Generic text data
  • number: Numeric values
  • email: Email addresses
  • phone: Phone numbers
  • date: Date values

Objects and arrays

  • object: Nested object containing additional fields where each field is a primitive field.
  • array: List of items where each element is an object. As above, fields within each object can be any one of the primitive fields.

Best Practices

Field names

Use clear, descriptive names and avoid special characters. We recommend using snake case.

Descriptions

Descriptions are critical for accuracy. If you want to consistent formatting, include this in the description.

Nested structures

We support 3 levels of nesting for objects. Where possible, we recommend avoiding deeply nested objects as this reduces accuracy.

Nested objects within arrays

We do not currently support nested objects within arrays. This is on our roadmap.

Headers

Authorization
string
required

Bearer token for authentication

Body

application/json
file_id
string
required

ID of the uploaded file to process

document_schema
object
required

Schema definition for extraction

Response

201 - application/json
extraction_id
string
required

ID of the created extraction job

status
string
required

Current status of the job

created_at
string
required

When the job was created