Its very common in scraping workloads to need to scrape data from files. Think contracts, financial statements, product specs, etc. In this guide, we will show you how you can use Intuned to extract data from webpages and files in a reliable and scalable way.
To do that, we will use the following page https://sandbox.intuned.dev/pdfs as an example. The page contains a list of products along with a specs file for each of them. Our goal is to build an API that will extract data about each product and return them.
This guide will not go into details related to setting up a job or sending the result data to a webhook - we cover those in a different guides. The focus will be the API logic to extract the data using the @intuned/sdk
helpers.
products.ts
.This logic uses the extractArrayOptimized
helper to extract the monitors info from the table into a monitors
object.
Run the API and make sure the extractor is reading the right data and working as expected. Create empty parameters when asked.
products.ts
API. Intuned has helpers (extractStructuredDataFromFile
) that extracts data from files. The extractStructuredDataFromFile
helper takes a file url and a json schema for data you are trying to extract and returns it as a json object. To learn more about file data extraction, checkout File data extraction.In this guide, we went over how to extract data from a list of items and then extract data from files. We used the extractArrayOptimized
helper to extract the list of items and extractStructuredDataFromFile
to extract data from files.
For more info on Jobs and how to use them.
For more info on Jobs and how to use them.
Its very common in scraping workloads to need to scrape data from files. Think contracts, financial statements, product specs, etc. In this guide, we will show you how you can use Intuned to extract data from webpages and files in a reliable and scalable way.
To do that, we will use the following page https://sandbox.intuned.dev/pdfs as an example. The page contains a list of products along with a specs file for each of them. Our goal is to build an API that will extract data about each product and return them.
This guide will not go into details related to setting up a job or sending the result data to a webhook - we cover those in a different guides. The focus will be the API logic to extract the data using the @intuned/sdk
helpers.
products.ts
.This logic uses the extractArrayOptimized
helper to extract the monitors info from the table into a monitors
object.
Run the API and make sure the extractor is reading the right data and working as expected. Create empty parameters when asked.
products.ts
API. Intuned has helpers (extractStructuredDataFromFile
) that extracts data from files. The extractStructuredDataFromFile
helper takes a file url and a json schema for data you are trying to extract and returns it as a json object. To learn more about file data extraction, checkout File data extraction.In this guide, we went over how to extract data from a list of items and then extract data from files. We used the extractArrayOptimized
helper to extract the list of items and extractStructuredDataFromFile
to extract data from files.
For more info on Jobs and how to use them.
For more info on Jobs and how to use them.