You can also use file data extraction as a standalone API. Checkout Standalone File APIs for more info.

Extracting data from files is a common operation when writing scrapers or building browser automations in general. Normally, this involves writing custom rules and regex to parse and extract data. This process can be error-prone and time-consuming.

At Intuned, we simplify this process by providing a utility that allow you to extract structured data from files. Here is the function reference for more info.

Examples

const specPdfs = [
    "https://intuned-docs-public-images.s3.amazonaws.com/27UP600_27UP650_ENG_US.pdf", 
    "https://intuned-docs-public-images.s3.amazonaws.com/32UP83A_ENG_US.pdf"
];
for (const url of specPdfs) {
  const specs = await extractStructuredDataFromFile({
    type: "pdf",
    source: {
      type: "url",
      "data": url,
    },
  }, {
    label: "spec files",
    dataSchema: {
      type: "object",
      properties: {
        "models": {
          description: "models number included in this spec sheet",
          type: "array",
          items: {
              type: "string"
          }
        },
        "color_depth": {
          type: "string",
          description: "color depth of the monitor"
        },
        "max_resolution": {
          type: "string",
          description: "max rolustion of the screen and at what hz"
        },
      },
      required: ["models", "color_depth", "max_resolution"],
    }
  })
}

// {
//  models: [ '27UP600', '27UP650' ],
//  color_depth: '8-bit / 10-bit color is supported.',
//  max_resolution: '3840 x 2160 @ 60 Hz'
// }
// {
//  models: [ '32UP83A' ],
//  color_depth: '8-bit / 10-bit color is supported.',
//  max_resolution: '3840 x 2160 @ 60 Hz'
// }

For more details, see extractStructuredDataFromFile.

How does this work?

In summary, we do the following:

  • Convert the file (selected pages) to markdown.
  • Extract structured data from the markdown using the provided schema.

How is the cost for Data Extraction calculated?

  • Cost for converting the file to markdown is calculated based on the number of pages in the file.
  • Cost for extracting structured data from the markdown is calculated based on the size of input data and the schema used.