Endpoints
Extract data from a file
Create a new schema extraction job for a file based on predefined schema
POST
/
extract
Document schema structure
The document_schema
object should conform to the following structure.
Primitive fields
string
: Generic text datanumber
: Numeric valuesemail
: Email addressesphone
: Phone numbersdate
: Date values
Objects and arrays
object
: Nested object containing additional fields where each field is a primitive field.array
: List of items where each element is an object. As above, fields within each object can be any one of the primitive fields.
Best Practices
Field names
Use clear, descriptive names and avoid special characters. We recommend using snake case.
Descriptions
Descriptions are critical for accuracy. If you want to consistent formatting, include this in the description.
Nested structures
We support 3 levels of nesting for objects. Where possible, we recommend avoiding deeply nested objects as this reduces accuracy.
Nested objects within arrays
We do not currently support nested objects within arrays. This is on our roadmap.
Headers
Authorization
string
requiredBearer token for authentication
Body
application/json
file_id
string
requiredID of the uploaded file to process
document_schema
object
requiredSchema definition for extraction
Response
201 - application/json
extraction_id
string
requiredID of the created extraction job
status
string
requiredCurrent status of the job
created_at
string
requiredWhen the job was created