Introduction

The standalone APIs allow you to process PDF and image files without consuming a project. We currently provide 3 operations:

  1. Extract structured data: Extract strucutred data from the file following a JSONSchema.
  2. Extract markdown: Extract markdown from the file, including headers, paragraphs, lists and tables.
  3. Extract tables: Extract tables from the file in JSON format.

There are two ways to consume these APIs: synchronously and asynchronously. In synchronous calls, the result is returned in the same call. In asynchronous calls, the result is returned in a separate call using an operationId obtained in the initial call.

Sync vs Async APIs

Each of the operations listed above is available via a Sync API and an Async API. In Sync APIs, you make a single call which triggers the operation and returns the result. In Async APIs, you make two calls: the first call triggers the operation and returns an operationId, and the second call uses the operationId to check the status and get the result.

Depending on the input, the call might take a long time to complete, especially if the file is large or the operation is complex. For this reason, we recommend using the Asynchronous API for most use cases.

Supported file formats

We currently support pdf files and image files. We will be working on supporting other formats soon. Contact us if you have any specific requirements.

In PDF files, you can specify the page numbers to run processing on. If no page numbers are specified, the operation will run on all pages. Check out the API reference for more information.

Extract structured data API

This API allows you to extract data from a file following a JSONSchema. This is useful when you have a document with a known data structure that you want to extract, such as a contract document.

API reference

Extract markdown API

This API allows you to extract markdown from the file, including headers, paragraphs, lists, tables and links. The output is human-readable and can be used for further processing or display.

API reference

Extract tables API

This API allows you to extract tables from the file in JSON format. This is useful when you have a document with tabular data that you want to extract and process further. The result is an array of tables, each table including the page number, title (if any), and the table data.

API reference