# extractMarkdownFromFile ```typescript function extractMarkdownFromFile(file, options): Promise ``` converts a file to markdown (ImageFile or PdfFile). ## Examples ```typescript extractMarkdownFromFile import { extractMarkdownFromFile } from "@intuned/sdk/ai-extractors"; const markdown = await extractMarkdownFromFile({ source: { type: "url", data: "" }, type: "pdf", // pages array is optional, do not pass it if you want to include all pages in the process pages: [1, 2] }, { label: "extract_markdown" }); console.log(markdown); ``` ## Parameters • **file**: [`ImageFile`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ImageFile) | [`PdfFile`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/PdfFile) | [`SpreadsheetFile`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/SpreadsheetFile) The file you want to extract the markdown content from. • **options** • **options.label**: `string` A label for this extraction process, used for billing and monitoring. ## Returns `Promise`\<`string`> A promise that resolves to the extracted markdown content as a string. # extractMarkdownFromLocator ```typescript function extractMarkdownFromLocator(locator): Promise ``` Extracts markdown content from a specific locator within a web page. ## Examples ```typescript extractMarkdownFromLocator import { extractMarkdownFromLocator } from "@intuned/sdk/ai-extractors"; await page.goto('https://example.com'); const locator = page.locator('.article'); const markdown = await extractMarkdownFromLocator(locator); console.log(markdown); ``` ## Parameters • **locator**: `Locator` The Playwright Locator object from which to extract the markdown content. ## Returns `Promise`\<`string`> A promise that resolves to the extracted markdown content. # extractMarkdownFromPage ```typescript function extractMarkdownFromPage(page): Promise ``` Extracts markdown content from a web page. ## Examples ```typescript extractMarkdownFromPage import { extractMarkdownFromPage } from "@intuned/sdk/ai-extractors"; await page.goto('https://example.com'); const markdown = await extractMarkdownFromPage(page); console.log(markdown); ``` ## Parameters • **page**: `Page` The Playwright Page object from which to extract the markdown content. ## Returns `Promise`\<`string`> A promise that resolves to the extracted markdown content. # extractStructuredDataFromContent ```typescript function extractStructuredDataFromContent(content, options): Promise ``` Extracts structured data from content items (text or images). ## Examples ```typescript extractStructuredDataFromContent import { extractStructuredDataFromContent } from "@intuned/sdk/ai-extractors"; const content = [ { type: "text", data: "Sample text data" }, { type: "image-url", image_type: "jpeg", data: "https://example.com/image.jpg" } ]; const options = { label: "extract_contact_info", dataSchema: { type: "object", properties: { name: { type: "string", description: "contact name" }, phone: { type: "string", description: "contact info" } } }, model: "gpt4-turbo" }; const data = await extractStructuredDataFromContent(content, options); console.log(data); ``` ## Parameters • **content**: [`TextContentItem`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/TextContentItem) | [`ImageBufferContentItem`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ImageBufferContentItem) | [`ImageUrlContentItem`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ImageUrlContentItem) | ([`TextContentItem`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/TextContentItem) | [`ImageBufferContentItem`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ImageBufferContentItem) | [`ImageUrlContentItem`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ImageUrlContentItem))\[] The content items from which to extract the structured data. • **options** • **options.dataSchema**: `ObjectSchema` The JSON schema of the data you're trying to extract. • **options.label**: `string` A label for this extraction process, used for billing and monitoring. • **options.model**: \| `"claude-3-opus"` \| `"claude-3-sonnet"` \| `"claude-3.5-sonnet"` \| `"claude-3-haiku"` \| `"gpt4-turbo"` \| `"gpt-4o"` \| `"gpt3.5-turbo"` The model to use for extraction. • **options.prompt?**: `string` Optional. A prompt to guide the extraction process. ## Returns `Promise`\<`any`> A promise that resolves to the extracted structured data. # extractStructuredDataFromFile ```typescript function extractStructuredDataFromFile(file, options): Promise ``` ## Examples ```typescript extractStructuredDataFromFile import { extractStructuredDataFromFile } from "@intuned/sdk/ai-extractors"; const movie = await extractStructuredDataFromFile({ source: { type: "url", data: "" }, type: "pdf", // pages array is optional, do not pass it if you want to include all pages in the process pages: [1, 2] }, { label: "extract_movie", dataSchema: { type: "object", properties: { "name": { type: "string", description: "movie name" }, revenue: { type: "string", description: "movie revenue" } } } }) ``` ## Parameters • **file**: [`ImageFile`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ImageFile) | [`PdfFile`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/PdfFile) | [`SpreadsheetFile`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/SpreadsheetFile) the file you want to extract the data from, • **options** • **options.dataSchema**: `JsonSchema` the json schema of the data you're trying to extract. • **options.label**: `string` a label for this extraction process, used for billing and monitoring • **options.prompt?**: `string` optional, a prompt to guide the extraction process and provide more context. • **options.strategy?**: [`MarkdownFileStrategy`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/MarkdownFileStrategy) | [`ImageFileStrategy`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ImageFileStrategy) optional, the strategy to use for extraction. use `IMAGE` if the info you're trying to extract is visual and cannot be converted to markdown. Defaults to `MARKDOWN` strategy with `gpt4-turbo` model. ## Returns `Promise`\<`any`> # extractStructuredDataFromLocator ```typescript function extractStructuredDataFromLocator(locator, options): Promise ``` Extracts structured data from a web page. ## Examples ```typescript extractStructuredDataFromLocator import { extractStructuredDataFromLocator } from "@intuned/sdk/ai-extractors"; await page.goto('https://example.com'); const options = { label: "extract_locator_data", dataSchema: { type: "object", properties: { title: { type: "string", description: "The title of the page" }, date: { type: "string", description: "The date of the content" } } }, }; const data = await extractStructuredDataFromLocator(page.locator(".section"), options); console.log(data); ``` ## Parameters • **locator**: `Locator` The Playwright locator from which to extract the structured data. • **options** • **options.dataSchema**: `JsonSchema` The JSON schema of the data you're trying to extract. • **options.label**: `string` A label for this extraction process, used for billing and monitoring. • **options.prompt?**: `string` Optional. A prompt to guide the extraction process and provide more context. • **options.strategy?**: [`ImageStrategy`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ImageStrategy) | [`HtmlStrategy`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/HtmlStrategy) Optional. The strategy to use for extraction, use the `IMAGE` strategy if the info you're trying to extract is visual and does not exist on the html of the page, ## Returns `Promise`\<`any`> A promise that resolves to the extracted structured data. # extractStructuredDataFromPage ```typescript function extractStructuredDataFromPage(page, options): Promise ``` Extracts structured data from a web page. ## Examples ```typescript extractStructuredDataFromPage import { extractStructuredDataFromPage } from "@intuned/sdk/ai-extractors"; await page.goto('https://example.com'); const options = { label: "extract_page_data", dataSchema: { type: "object", properties: { title: { type: "string", description: "The title of the page" }, date: { type: "string", description: "The date of the content" } } }, }; const data = await extractStructuredDataFromPage(page, options); console.log(data); ``` ## Parameters • **page**: `Page` The Playwright Page from which to extract the structured data. • **options** • **options.dataSchema**: `JsonSchema` The JSON schema of the data you're trying to extract. • **options.label**: `string` A label for this extraction process, used for billing and monitoring. • **options.prompt?**: `string` Optional. A prompt to guide the extraction process and provide more context. • **options.strategy?**: [`ImageStrategy`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ImageStrategy) | [`HtmlStrategy`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/HtmlStrategy) Optional. The strategy to use for extraction, use the `IMAGE` strategy if the info you're trying to extract is visual and does not exist on the html of the page, ## Returns `Promise`\<`any`> A promise that resolves to the extracted structured data. # extractTablesFromFile ```typescript function extractTablesFromFile(file, options): Promise ``` Extracts tables from a file (ImageFile or PdfFile). ## Examples ```typescript extractTablesFromFile import { extractTablesFromFile } from "@intuned/sdk/ai-extractors"; const tables = await extractTablesFromFile({ source: { type: "url", data: "" }, type: "pdf", // pages array is optional, do not pass it if you want to include all pages in the process pages: [1, 2] }, { label: "extract_tables" }); console.log(tables); ``` ## Parameters • **file**: [`ImageFile`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ImageFile) | [`PdfFile`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/PdfFile) | [`SpreadsheetFile`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/SpreadsheetFile) The file you want to extract the tables from. • **options** • **options.label**: `string` A label for this extraction process, used for billing and monitoring. ## Returns `Promise`\<[`ExtractedTable`](/automation-sdks/intuned-sdk/ai-extractors/interfaces/ExtractedTable)\[]> A promise that resolves to an array of extracted tables. # ExtractedTable Represents a table extracted from a pdf file. ## Properties ### content ```typescript content: (null | string)[][]; ``` a 2 dimensional array contains the table values. *** ### title ```typescript title: null | string; ``` the title of the table if found # FileBase64Source Represents a file source from a base64 string. ## Properties ### data ```typescript data: string; ``` The base64 string of the file data. *** ### type ```typescript type: "base64"; ``` The type of the file source, which is always "base64". # FileBufferSource Represents a file source from a buffer. ## Properties ### data ```typescript data: Buffer; ``` The buffer data of the file. *** ### type ```typescript type: "buffer"; ``` The type of the file source, which is always "buffer". # FileUrlSource Represents a file source from a URL. ## Properties ### data ```typescript data: string; ``` The URL of the file. *** ### type ```typescript type: "url"; ``` The type of the file source, which is always "url". # HtmlStrategy this strategy will use the html of the page/locator to extract the needed data. we filter out some of the attributes to reduce context. the attributes included are only: `aria-label` `data-name` `name` `type` `placeholder` `value` `role` `title` `href` `id` `alt`, ## Properties ### model ```typescript model: | "claude-3-opus" | "claude-3-sonnet" | "claude-3.5-sonnet" | "claude-3-haiku" | "gpt4-turbo" | "gpt-4o" | "gpt3.5-turbo"; ``` the model to use in the extraction process *** ### type ```typescript type: "HTML"; ``` the type of the strategy # ImageFile Represents an image file source. ## Properties ### source ```typescript source: FileBufferSource | FileUrlSource | FileBase64Source; ``` The source of the file data. *** ### type ```typescript type: "image"; ``` The type of the file, which is always "image". # ImageStrategy this strategy will use a screenshot of the page/locator with some processing to extract the needed data. should be used when the information you're trying to extract is not present in the dom as a text but can be identified visually. ## Properties ### model ```typescript model: | "claude-3-opus" | "claude-3-sonnet" | "claude-3.5-sonnet" | "claude-3-haiku" | "gpt4-turbo" | "gpt-4o"; ``` the model to use in the extraction process. *** ### type ```typescript type: "IMAGE"; ``` the type of the strategy # PdfFile Represents a PDF file source. ## Properties ### pages? ```typescript optional pages: number[]; ``` Optional. The specific pages of the PDF to extract data from, if not provided, all page will be included. *** ### source ```typescript source: FileBufferSource | FileUrlSource | FileBase64Source; ``` The source of the file data. *** ### type ```typescript type: "pdf"; ``` The type of the file, which is always "pdf". # ExcelFile Represents an Excel file and provides methods to interact with it. ## Examples ```typescript ExcelFile import { ExcelFile }from "@intuned/sdk/files" const excelFile = new ExcelFile(excelBuffer); ... ``` ## Constructors ### new ExcelFile() ```typescript new ExcelFile(data): ExcelFile ``` Creates an instance of ExcelFile. #### Parameters • **data**: `Buffer` The binary data of the Excel file. #### Returns [`ExcelFile`](/automation-sdks/intuned-sdk/files/classes/ExcelFile) ## Methods ### getContent() ```typescript getContent(sheetNames?): Promise ``` Gets the content of specified sheets in the Excel file. #### Parameters • **sheetNames?**: `string`\[] Optional. An array of sheet names to get content from. #### Returns `Promise`\<[`ExcelFileSheet`](/automation-sdks/intuned-sdk/files/interfaces/ExcelFileSheet)\[]> A promise that resolves to the content of the specified sheets. #### Examples ```typescript getContent import { ExcelFile }from "@intuned/sdk/files" const content = await excel.getContent(['Sheet1', 'Sheet2']); console.log(content); ``` *** ### fromUrl() ```typescript static fromUrl(url): Promise ``` Creates an ExcelFile instance from a URL. #### Parameters • **url**: `string` The URL of the Excel file. #### Returns `Promise`\<[`ExcelFile`](/automation-sdks/intuned-sdk/files/classes/ExcelFile)> A promise that resolves to an ExcelFile instance. #### Examples ```typescript fromUrl import { ExcelFile }from "@intuned/sdk/files" const excel = await ExcelFile.fromUrl('https://example.com/file.xlsx'); console.log(excel); ``` # PdfFile Represents a PDF file and provides methods to interact with it. ## Examples ```typescript PdfFile import { PdfFile } from "@intuned/sdk/files" const pdf = new PdfFile(pdfBuffer); ``` ## Constructors ### new PdfFile() ```typescript new PdfFile(data): PdfFile ``` Creates an instance of PdfFile. #### Parameters • **data**: `Buffer` The binary data of the PDF file. #### Returns [`PdfFile`](/automation-sdks/intuned-sdk/files/classes/PdfFile) ## Methods ### getContent() ```typescript getContent(pageNumbers?): Promise ``` Gets the text content of specified pages in the PDF file. Does not support links. #### Parameters • **pageNumbers?**: `number`\[] Optional. An array of page numbers to get content from. #### Returns `Promise`\<[`PdfFileContentItem`](/automation-sdks/intuned-sdk/files/interfaces/PdfFileContentItem)\[]> A promise that resolves to the content of the specified pages. #### Examples ```typescript getContent import { PdfFile } from "@intuned/sdk/files" const pdf = await PdfFile.fromUrl('https://example.com/file.pdf'); const content = await pdf.getContent([1, 2, 3]); console.log(content); ``` *** ### pagesCount() ```typescript pagesCount(): Promise ``` Gets the total number of pages in the PDF file. #### Returns `Promise`\<`number`> A promise that resolves to the number of pages. #### Examples ```typescript pagesCount import { PdfFile } from "@intuned/sdk/files" const pdf = await PdfFile.fromUrl('https://example.com/file.pdf'); const pageCount = await pdf.pagesCount(); console.log(pageCount); ``` *** ### search() ```typescript search(search, options?): Promise ``` Searches for a string within the PDF file. #### Parameters • **search**: `string` The string to search for. • **options?**: [`SearchPdfConfigs`](/automation-sdks/intuned-sdk/files/interfaces/SearchPdfConfigs) Optional. Search configuration options. #### Returns `Promise`\<[`SearchPdfResult`](/automation-sdks/intuned-sdk/files/interfaces/SearchPdfResult)\[]> A promise that resolves to an array of search results. #### Examples ```typescript Without options import { PdfFile } from "@intuned/sdk/files" const pdf = await PdfFile.fromUrl('https://example.com/file.pdf'); const results = await pdf.search('keyword'); console.log(results); ``` ```typescript With options import { PdfFile } from "@intuned/sdk/files" const pdf = await PdfFile.fromUrl('https://example.com/file.pdf'); const results = await pdf.search('keyword', { matchCase: true }); console.log(results); ``` *** ### fromUrl() ```typescript static fromUrl(url): Promise ``` Creates a PdfFile instance from a URL. #### Parameters • **url**: `string` The URL of the PDF file. #### Returns `Promise`\<[`PdfFile`](/automation-sdks/intuned-sdk/files/classes/PdfFile)> A promise that resolves to a PdfFile instance. #### Examples ```typescript fromUrl import { PdfFile } from "@intuned/sdk/files" const pdf = await PdfFile.fromUrl('https://example.com/file.pdf'); ``` # downloadFile ```typescript function downloadFile(page, strategy): Promise ``` Downloads a file using the specified strategy. ## Examples ```typescript DirectLink import { downloadFile } from "@intuned/sdk/files"; // use DirectLink strategy when you have the url of the pdf. const download = await downloadFile(page, { type:"DirectLink", link:"https://www.gemini.com/documents/credit/Test_PDF.pdf" }); console.log(await download.path()); // Outputs the file path ``` ```typescript DownloadByOpeningNewTab import { downloadFile } from "@intuned/sdk/files"; await page.goto("https://sandbox.intuned.dev/pdfs") // use DownloadByOpeningNewTab strategy when you have to click on a button to open the pdf in a new tab in the browser viewer. const downloadedFile = await downloadFile(page, { type: "DownloadByOpeningNewTab", downloadTrigger: (page) => page.locator("table > tbody > tr:nth-child(1) > td:nth-child(4) > a").click() }) console.log(await download.path()); // Outputs the file path ``` ```typescript DownloadFromDirectLink import { downloadFile } from "@intuned/sdk/files"; await page.goto("https://freetestdata.com/document-files/pdf/") // use DownloadFromDirectLink strategy when the file gets downloaded immediately after you trigger an action on the page. const downloadedFile = await downloadFile(page, { type: "DownloadFromDirectLink", downloadTrigger: page.locator('.elementor-button').first() }) console.log(await downloadedFile.path()); ``` ```typescript NavigateAndDownloadFromThirdPartyFileViewer import { downloadFile } from "@intuned/sdk/files"; // use NavigateAndDownloadFromThirdPartyFileViewer strategy when the file is viewed in a custom(non standard) viewer. const downloadedFile = await downloadFile(page, { type: "NavigateAndDownloadFromThirdPartyFileViewer", linkToGoTo: "https://txdir.widen.net/view/pdf/ki9p5mluhv/DIR-CPO-4582-RFO-DIR-CPO-TMP-445.pdf?t.download=true&u=tmwul0", downloadActionOnFileViewerPage: (page) => page.locator("#download").click() }) console.log(await downloadedFile.path()); ``` ```typescript PrintPageAsPdf import { downloadFile } from "@intuned/sdk/files"; await page.goto("https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html") // use PrintPageAsPdf strategy when you download a pdf version of the open webpage. const downloadedFile = await downloadFile(page, { type: "PrintPageAsPdf", }) console.log(await downloadedFile.path()); ``` ## Parameters • **page**: `Page` The Playwright Page object. • **strategy**: [`PersistFileStrategy`](/automation-sdks/intuned-sdk/files/type-aliases/PersistFileStrategy) The strategy to use for downloading the file. ## Returns `Promise`\<[`Download`](/automation-sdks/intuned-sdk/files/interfaces/Download)> A promise that resolves to a Download object. # uploadFileToS3 ```typescript function uploadFileToS3(file, options): Promise ``` Uploads a file to S3 bucket. ## Examples ```typescript Download import { downloadFile, PersistFileStrategy, uploadFileToS3 } from "@intuned/sdk/files"; const download = await downloadFile(page, { type: "DownloadByOpeningNewTab", downloadTrigger: (page) => page.locator(".download_button").click(), }); const s3Configs: S3Configs = { bucket: 'my-bucket', region: 'us-west-1', accessKeyId: '....', secretAccessKey: '....' }; const uploadedFile = await uploadFileToS3(download, { s3Configs }); console.log(uploadedFile.urlDescriptor()); ``` ```typescript ReadStream import { uploadFileToS3, S3Configs } from "@intuned/sdk/files"; import { ReadStream } from "node:fs"; const file: ReadStream = ...; // Assume ReadStream is initialized const s3Configs: S3Configs = { bucket: 'my-bucket', region: 'us-west-1', accessKeyId: '....', secretAccessKey: '....' }; const uploadedFile = await uploadFileToS3(file, { s3Configs }); console.log(uploadedFile.urlDescriptor()); ``` ## Parameters • **file**: \| `string` \| `Uint8Array` \| `Buffer` \| `ReadStream` \| [`Download`](/automation-sdks/intuned-sdk/files/interfaces/Download) The file to upload, it can be a downloaded file by the downloadFile function or another content, the file can be `Download | string | Uint8Array | Buffer | ReadStream` • **options** The options for uploading the file. • **options.fileNameOverride?**: `string` Optional. Override for the file name. • **options.s3Configs?**: [`S3Configs`](/automation-sdks/intuned-sdk/files/interfaces/S3Configs) Optional. S3 configuration options. ## Returns `Promise`\<[`File`](/automation-sdks/intuned-sdk/files/interfaces/File)> A promise that resolves to a File object. # Download Represents a downloaded file. ## Properties ### delete() ```typescript delete: () => Promise; ``` Deletes the downloaded file. #### Returns `Promise`\<`void`> *** ### path() ```typescript path: () => Promise; ``` Gets the path of the downloaded file. #### Returns `Promise`\<`null` | `string`> *** ### suggestedFilename() ```typescript suggestedFilename: () => undefined | string; ``` Returns suggested filename for this download. It is typically computed by the browser from the `Content-Disposition` response header or the download attribute. See the spec on [whatwg](https://html.spec.whatwg.org/#downloading-resources). Different browsers can use different logic for computing it. when the file is downloaded using `DirectLink` or `PrintPageAsPdf` this will always return undefined, #### Returns `undefined` | `string` # ExcelFileSheet Represents the content of a sheet in an Excel file. ## Properties ### content ```typescript content: (undefined | string | number | Date)[][]; ``` The content of the sheet. *** ### name ```typescript name: undefined | string; ``` The name of the sheet. # File Represents an s3 file, and provides some functions to operate over it ## Methods ### generateSignedUrl() ```typescript generateSignedUrl(options?): Promise ``` Generates a signed URL for the file. #### Examples ```typescript generateSignedUrl import { File } from "@intuned/sdk/files"; const signedUrl = await file.generateSignedUrl({ expiresIn: 1000 }); console.log(signedUrl); ``` ## Parameters • **options?** Optional. Options for generating the signed URL. • **options.expiresIn?**: `number` The expiration time for the signed URL in seconds. #### Returns `Promise`\<`string`> A promise that resolves to the signed URL. ## *** ### urlDescriptor() ```typescript urlDescriptor(): string ``` Gets S3 URL descriptor of the file. #### Returns `string` The URL descriptor of the file. # PdfFileContentItem Represents the content of a PDF file. ## Properties ### content ```typescript content: string; ``` The content of the page. *** ### pageNumber ```typescript pageNumber: number; ``` The page number of the content. # S3Configs Configuration to connect to S3 bucket ## Properties ### accessKeyId ```typescript accessKeyId: string; ``` The AWS access key ID. *** ### bucket ```typescript bucket: string; ``` The S3 bucket name. *** ### region ```typescript region: string; ``` The AWS region. *** ### secretAccessKey ```typescript secretAccessKey: string; ``` The AWS secret access key. # SearchPdfConfigs Configuration options for searching within a PDF file. ## Properties ### contextWindow? ```typescript optional contextWindow: number; ``` Optional. Number of context letters around the search term to return. *** ### matchCase? ```typescript optional matchCase: boolean; ``` Optional. Whether to match case during the search. *** ### wholeWord? ```typescript optional wholeWord: boolean; ``` Optional. Whether to match whole words only. # SearchPdfResult Represents a search result within a PDF file. ## Properties ### context ```typescript context: string; ``` The context around the search term. *** ### page ```typescript page: number; ``` The page number where the search term was found. # PersistFileStrategy ```typescript type PersistFileStrategy: | object | object | object | object | object; ``` ## Strategies: * `DownloadByOpeningNewTab`: use this strategy when the file you want to download get open in a new tab after doing some action on the page. * `DownloadFromDirectLink`: use this strategy when there's a button or action you do in the page, and the file gets downloaded automatically in the same tab. * `NavigateAndDownloadFromThirdPartyFileViewer`: use this strategy when the file is viewed in a custom(non standard) viewer * `DirectLink`: use this strategy when you have the file url. * `PrintPageAsPdf`: use this action when you download a pdf version of the open webpage. # extractArrayFromLocator ```typescript function extractArrayFromLocator(locator, options): Promise[]> ``` Extracts an array of structured data from a locator. ## Examples ```typescript extractArrayFromLocator import { extractArrayFromLocator } from "@intuned/sdk/optimized-extractors"; await page.goto("https://books.toscrape.com/") const books = await extractArrayFromLocator(page.locator("section"), { itemEntityName: "book", label: "books-extraction", itemEntitySchema: { type: "object", required: ["name"], properties: { name: { type: "string", description: "book name", primary: true } } } }, ) console.log(books) // output: // [ // ... // { name: 'Olio' }, // { name: 'Mesaerion: The Best Science Fiction Stories 1800-1849' }, // { name: 'Libertarianism for Beginners' }, // { name: "It's Only the Himalayas" } // ... // ] ``` ## Parameters • **locator**: `Locator` The Playwright Locator object from which to extract the data. • **options** • **options.itemEntityName**: `string` The name of the entity items being extracted. it must be between 1 and 50 characters long and can only contain letters, digits, periods, underscores, and hyphens. • **options.itemEntitySchema**: [`SimpleArrayItemSchema`](/automation-sdks/intuned-sdk/optimized-extractors/interfaces/SimpleArrayItemSchema) The schema of the entity items being extracted. • **options.label**: `string` A label for this extraction process, used for billing and monitoring. • **options.optionalPropertiesInvalidator?** Optional. A function to invalidate optional properties. • **options.prompt?**: `string` Optional. A prompt to guide the extraction process. • **options.strategy?**: `ImageStrategy` | `HtmlStrategy` Optional. The strategy to use for extraction, if not provided, the html strategy with claude haiku will be used. • **options.variantKey?**: `string` Optional. A variant key for the extraction process. ## Returns `Promise`\<`Record`\<`string`, `string`>\[]> A promise that resolves to a list of extracted data. # extractArrayFromPage ```typescript function extractArrayFromPage(page, options): Promise[]> ``` Extracts an array of structured data from a web page in an optimized way, this function will use ai for the first n times, until it collects multiple examples then it will build reliable selectors in the background to make the process more efficient ## Examples ```typescript extractArrayFromPage import { extractArrayFromPage } from "@intuned/sdk/optimized-extractors"; await page.goto("https://books.toscrape.com/") const books = await extractArrayFromPage(page, { strategy: { model: "gpt4-turbo", type: "HTML" }, itemEntityName: "book", label: "books-extraction", itemEntitySchema: { type: "object", required: ["name"], properties: { name: { type: "string", description: "book name", primary: true } } } }, ) console.log(books) // output: // [ // ... // { name: 'Olio' }, // { name: 'Mesaerion: The Best Science Fiction Stories 1800-1849' }, // { name: 'Libertarianism for Beginners' }, // { name: "It's Only the Himalayas" } // ... // ] ``` ## Parameters • **page**: `Page` The Playwright Page object from which to extract the data. • **options** • **options.itemEntityName**: `string` The name of the entity items being extracted, it must be between 1 and 50 characters long and can only contain letters, digits, periods, underscores, and hyphens. • **options.itemEntitySchema**: [`SimpleArrayItemSchema`](/automation-sdks/intuned-sdk/optimized-extractors/interfaces/SimpleArrayItemSchema) The schema of the entity items being extracted. • **options.label**: `string` A label for this extraction process, used for billing and monitoring. • **options.optionalPropertiesInvalidator?** Optional. A function to invalidate optional properties. • **options.prompt?**: `string` Optional. A prompt to guide the extraction process. • **options.strategy?**: `ImageStrategy` | `HtmlStrategy` Optional. The strategy to use for extraction, if not provided, the html strategy with claude haiku will be used. • **options.variantKey?**: `string` Optional. A variant key for the extraction process, use this when the page has multiple variants/shapes. ## Returns `Promise`\<`Record`\<`string`, `string`>\[]> A promise that resolves to a list of extracted data. # extractObjectFromLocator ```typescript function extractObjectFromLocator(locator, options): Promise | null> ``` Extracts a structured object from a locator. ## Examples ```typescript extractObjectFromLocator import { extractObjectFromLocator } from "@intuned/sdk/optimized-extractors"; await page.goto("https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html") const book = await extractObjectFromLocator(page.locator(".page_inner"), { entityName: "book", label: "book-extraction", entitySchema: { type: "object", required: ["name","price","reviews"], properties: { name: { type: "string", description: "book name", }, price: { type: "string", description: "book price" }, reviews: { type: "string", description: "Number of reviews" } } } }, ) console.log(book) // output: // { name: 'A Light in the Attic', price: '£51.77', reviews: '0' } ``` ## Parameters • **locator**: `Locator` The Playwright Locator object from which to extract the data. • **options** • **options.entityName**: `string` The name of the entity being extracted. it must be between 1 and 50 characters long and can only contain letters, digits, periods, underscores, and hyphens. • **options.entitySchema**: [`SimpleObjectSchema`](/automation-sdks/intuned-sdk/optimized-extractors/interfaces/SimpleObjectSchema) The schema of the entity being extracted. • **options.label**: `string` A label for this extraction process, used for billing and monitoring. • **options.optionalPropertiesInvalidator?** Optional. A function to invalidate optional properties. • **options.prompt?**: `string` Optional. A prompt to guide the extraction process. • **options.strategy?**: `ImageStrategy` | `HtmlStrategy` Optional. The strategy to use for extraction, if not provided, the html strategy with claude haiku will be used. • **options.variantKey?**: `string` Optional. A variant key for the extraction process. ## Returns `Promise`\<`Record`\<`string`, `string` | `null`> | `null`> A promise that resolves to the extracted object. # extractObjectFromPage ```typescript function extractObjectFromPage(page, options): Promise | null> ``` Extracts a structured object from a web page. ## Examples ```typescript extractObjectFromPage import { extractObjectFromPage } from "@intuned/sdk/optimized-extractors"; await page.goto("https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html") const book = await extractObjectFromPage(page, { entityName: "book", label: "book-extraction", entitySchema: { type: "object", required: ["name","price","reviews"], properties: { name: { type: "string", description: "book name", }, price: { type: "string", description: "book price" }, reviews: { type: "string", description: "Number of reviews" } } } }, ) console.log(book) // output: // { name: 'A Light in the Attic', price: '£51.77', reviews: '0' } ``` ## Parameters • **page**: `Page` The Playwright Page object from which to extract the data. • **options** • **options.entityName**: `string` The name of the entity being extracted. it must be between 1 and 50 characters long and can only contain letters, digits, periods, underscores, and hyphens. • **options.entitySchema**: [`SimpleObjectSchema`](/automation-sdks/intuned-sdk/optimized-extractors/interfaces/SimpleObjectSchema) The schema of the entity being extracted. • **options.label**: `string` A label for this extraction process, used for billing and monitoring. • **options.optionalPropertiesInvalidator?** Optional. A function to invalidate optional properties. • **options.prompt?**: `string` Optional. A prompt to guide the extraction process. • **options.strategy?**: `ImageStrategy` | `HtmlStrategy` Optional. The strategy to use for extraction, if not provided, the html strategy with claude haiku will be used. • **options.variantKey?**: `string` Optional. A variant key for the extraction process. ## Returns `Promise`\<`Record`\<`string`, `string` | `null`> | `null`> A promise that resolves to the extracted object. # SimpleArrayItemSchema A simple array item schema with properties. SimpleArrayItemSchema ## Extends * `BasicSchema` ## Properties ### description? ```typescript optional description: string; ``` #### Inherited from `BasicSchema.description` *** ### properties ```typescript properties: Record; ``` The properties of the array item. *** ### required ```typescript required: string[]; ``` The required properties of the array item. *** ### type ```typescript type: "object"; ``` The type of the schema, which is always "object". #### Overrides `BasicSchema.type` # SimpleArrayStringSchema A simple array schema with string properties. SimpleArrayStringSchema ## Extends * `BasicSchema` ## Properties ### description? ```typescript optional description: string; ``` #### Inherited from `BasicSchema.description` *** ### primary? ```typescript optional primary: boolean; ``` Optional. Indicates whether this is a primary property. *** ### type ```typescript type: "string"; ``` The type of the schema, which is always "string". #### Overrides `BasicSchema.type` # SimpleObjectSchema A simple object schema with properties. SimpleObjectSchema ## Extends * `BasicSchema` ## Properties ### description? ```typescript optional description: string; ``` #### Inherited from `BasicSchema.description` *** ### properties ```typescript properties: Record; ``` The properties of the object. *** ### required ```typescript required: string[]; ``` The required properties of the object. *** ### type ```typescript type: "object"; ``` The type of the schema, which is always "object". #### Overrides `BasicSchema.type` # SimpleObjectStringSchema A simple object schema with string properties. SimpleObjectStringSchema ## Extends * `BasicSchema` ## Properties ### description? ```typescript optional description: string; ``` #### Inherited from `BasicSchema.description` *** ### type ```typescript type: "string"; ``` The type of the schema, which is always "string". #### Overrides `BasicSchema.type` # Overview ## Introduction The @intuned/sdk is automatically installed in each new Intuned project, providing a robust set of tools to facilitate browser automation. ## Namespaces and Functions The library is organized into various namespaces, each exposing a specific set of functions designed to address different aspects of browser automation: * `@intuned/sdk/ai-extractors`: Utilities for data and markdown extractions using AI. * `@intuned/sdk/optimized-extractors`: Tools for building and running web extractors reliably at scale. * `@intuned/sdk/playwright`: Additional helpers built on top of Playwright for simplifying common automation patterns. * `@intuned/sdk/runtime`: Functions related to the Intuned runtime environment. * `@intuned/sdk/files`: Utilities to facilitate file handling within automation projects. ## AI Credits Usage Some functions within the @intuned/sdk consume AI credits. This usage is expected, and users should plan accordingly. To manage and limit credit expenditure, you can use labels to control and monitor the usage effectively. ## Use outside Intuned The @intuned/sdk is designed to be used within the Intuned platform. [Let us know](/docs/support/contact-us) if you have a use case to use it outside of Intuned. # extendPlaywrightPage ```typescript function extendPlaywrightPage(page): ExtendedPlaywrightPage ``` Extends a Playwright Page with additional functionalities from intuned, like ai powered data extraction and and actions helpers like fillform. ## Examples ```typescript extendPlaywrightPage import { BrowserContext, Page } from "@intuned/playwright-core"; import { extendPlaywrightPage } from "@intuned/sdk/playwright"; interface Params { // Add your params here } export default async function handler( params: Params, _playwrightPage: Page, context: BrowserContext ) { const page = extendPlaywrightPage(_playwrightPage); const pageMarkdown = await page.extractMarkdown() return pageMarkdown } ``` ## Parameters • **page**: `Page` The Playwright Page to extend. ## Returns [`ExtendedPlaywrightPage`](/automation-sdks/intuned-sdk/playwright/interfaces/ExtendedPlaywrightPage) An extended Page with additional functionalities. # extractArrayFromPageUsingSelectors ```typescript function extractArrayFromPageUsingSelectors(page, listExtractor): Promise> ``` Extracts a list of objects from a web page using the specified static selectors. ## Type parameters • **T** *extends* [`ListStaticExtractor`](/automation-sdks/intuned-sdk/playwright/interfaces/ListStaticExtractor) ## Examples ```typescript extractArrayFromPageUsingSelectors import { extractArrayFromPageUsingSelectors, goto } from "@intuned/sdk/playwright"; await goto(page, 'https://books.toscrape.com/index.html'); const books = await extractArrayFromPageUsingSelectors(page, { containerSelector: { selector: '//*[@id="default"]/div/div/div/div/section/div[2]/ol', type: "xpath" }, propertySelectors: { name: { selector: "h3", }, inStock: { selector: ".price_color", }, imgUrl: { selector: "article > div.image_container > a > img", selectionMethod: { propertyName: "src" } } } }) console.log(books) // output: // [ // { // name: 'A Light in the ...', // inStock: '£51.77', // imgUrl: 'media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg' // }, // { // name: 'Tipping the Velvet', // inStock: '£53.74', // imgUrl: 'media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg' // }, // { // name: 'Soumission', // inStock: '£50.10', // imgUrl: 'media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg' // }, // { // name: 'Sharp Objects', // inStock: '£47.82', // imgUrl: 'media/cache/32/51/3251cf3a3412f53f339e42cac2134093.jpg' // }, // ... // ] ``` ## Parameters • **page**: `Page` The Playwright Page object from which to extract the data. • **listExtractor**: `T` The list static extractor with the selectors to use. ## Returns `Promise`\<[`ExtractListObjectsUsingStaticSelectorsReturnType`](/automation-sdks/intuned-sdk/playwright/type-aliases/ExtractListObjectsUsingStaticSelectorsReturnType)\<`T`>> A promise that resolves to the extracted list of objects. # extractObjectFromPageUsingSelectors ```typescript function extractObjectFromPageUsingSelectors(page, extractor): Promise> ``` Extracts an object from a web page using the specified selectors. ## Type parameters • **T** *extends* [`ObjectExtractor`](/automation-sdks/intuned-sdk/playwright/type-aliases/ObjectExtractor) ## Examples ```typescript extractObjectFromPageUsingSelectors import { extractObjectFromPageUsingSelectors, goto } from "@intuned/sdk/playwright"; await goto(page, 'https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html'); const book = await extractObjectFromPageUsingSelectors(page, { name: { selector: "h1", selectionMethod: "all-text" }, inStock: { selector: ".price_color", }, imgUrl: { selector: "#product_gallery > div > div > div > img", selectionMethod: { propertyName: "src" } } }) console.log(book) // output: // { // name: 'A Light in the Attic', // inStock: '£51.77', // imgUrl: '../../media/cache/fe/72/fe72f0532301ec28892ae79a629a293c.jpg' // } ``` ## Parameters • **page**: `Page` The Playwright Page object from which to extract the data. • **extractor**: `T` The object extractor with the selectors to use. ## Returns `Promise`\<[`ExtractObjectFromPageUsingSelectorsReturnType`](/automation-sdks/intuned-sdk/playwright/type-aliases/ExtractObjectFromPageUsingSelectorsReturnType)\<`T`>> A promise that resolves to the extracted object. # fillForm ```typescript function fillForm(page, options): Promise ``` Fills a form on a web page with specified inputs and submits the form. the function handles static data, and can derive data using ai using your input. the function has the ability to detect form submission errors and use ai to recover from these errors. ## Examples ```typescript fillForm import { BrowserContext, Locator, Page } from "@intuned/playwright-core"; import { FormInputItem, extendPlaywrightPage } from "@intuned/sdk/playwright"; export interface Input { firstName: string; lastName: string; address1: string; address2: string; city: string; state: string; zip: string; country: string; nameOnCard: string; cardNumber: string; expiration: string; cvv: string; saveAddress: boolean; } export default async function handler( params: Input, _playwrightPage: Page, context: BrowserContext ) { const page = extendPlaywrightPage(_playwrightPage); await page.goto("https://demo-site-eta.vercel.app/steps-form/ShippingAddress"); const fields: FormInputItem[] = [ { fieldSelector: { selector: "[name='firstName']", type: "css", }, value: { type: "static", value: params.firstName }, fieldType: "text-input", }, { fieldSelector: { selector: "[name='lastName']", type: "css", }, value: { type: "static", value: params.lastName }, fieldType: "text-input", }, { fieldSelector: { selector: "[name='addressLine1']", type: "css", }, value: { type: "static", value: params.address1 }, fieldType: "text-input", }, { fieldSelector: { selector: "[name='addressLine2']", type: "css", }, value: { type: "static", value: params.address2 }, fieldType: "text-input", }, { fieldSelector: { selector: "[name='city']", type: "css", }, value: { type: "static", value: params.city }, fieldType: "text-input", }, { fieldSelector: { selector: "[name='state']", type: "css", }, value: { type: "static", value: params.state }, fieldType: "text-input", }, { fieldSelector: { selector: "[name='zipCode']", type: "css", }, value: { type: "static", value: params.zip }, fieldType: "text-input", }, { fieldSelector: { selector: "[name='country']", type: "css", }, value: { type: "dynamic", source: { country: params.country } }, fieldType: "select", }, { fieldSelector: { selector: "[name='futurePurchase']", type: "css", }, fieldType: "checkbox", value: { type: "static", value: true }, }, ]; const didFormSucceed = async (locator: Locator): Promise => { return (await locator.page().locator(".error-message").count()) === 0 }; async function formSubmit(locator: Locator) { const nextButtonLocator = locator.page().getByRole("button", { name: "Next" }); await nextButtonLocator.waitFor({ state: "visible" }); await nextButtonLocator.click(); } await page.fillForm({ formLocator: page.locator("main"), formInput: fields, isSubmitSuccessful: didFormSucceed, submitForm: formSubmit, autoRecoveryOptions: { enabled: true, recoveryData: params } }); return {}; } ``` ## Parameters • **page**: `Page` The Playwright Page where the form is located. • **options** • **options.autoRecoveryOptions?** Optional. Options for auto-recovery in case of form submission failure. • **options.autoRecoveryOptions.enabled**: `boolean` Whether auto-recovery is enabled • **options.autoRecoveryOptions.fieldsToMask?**: [`ElementSelector`](/automation-sdks/intuned-sdk/playwright/interfaces/ElementSelector)\[] Fields to mask during auto-recovery, use this if you do not want to send your form values to ai. • **options.autoRecoveryOptions.generateDataToUnblockForm?** • **options.autoRecoveryOptions.generateDataToUnblockForm.enabled**: `boolean` Whether generating data to unblock the form is enabled. • **options.autoRecoveryOptions.generateDataToUnblockForm.prompt**: `string` The prompt to use for generating data. • **options.autoRecoveryOptions.maxRetries?**: `number` Maximum number of retries for auto-recovery • **options.autoRecoveryOptions.recoveryData**: `object` Data to use for auto-recovery • **options.fillFieldTimeout?**: `number` Optional. Timeout for filling each individual field. • **options.formInput**: ([`DynamicFormInputItem`](/automation-sdks/intuned-sdk/playwright/interfaces/DynamicFormInputItem) | [`StaticFormInputItem`](/automation-sdks/intuned-sdk/playwright/interfaces/StaticFormInputItem))\[] An array of form input items (dynamic or static). • **options.formLocator**: `Locator` | [`ElementSelector`](/automation-sdks/intuned-sdk/playwright/interfaces/ElementSelector) The locator for the form element. • **options.isSubmitSuccessful** A function to check if the form submission was successful. • **options.submitForm** A function to submit the form. • **options.timeout?**: `number` Optional. Timeout for the entire form filling process. • **options.waitTimeBetweenFill?**: `number` Optional. Wait time between filling each field. ## Returns `Promise`\<`boolean`> A promise that resolves to a boolean indicating whether the form submission was successful. # goto ```typescript function goto( page, url, options?): ReturnType ``` Navigates to a specified URL on the provided playwright page. ## Examples ```typescript without options import { goto } from "@intuned/sdk/playwright"; await goto(page, 'https://example.com'); ``` ```typescript with options import { goto } from "@intuned/sdk/playwright"; await goto(page, 'https://example.com', { waitUntil: "load", throwOnTimeout: true, timeout: 10_000 }); ``` ## Parameters • **page**: `Page` The Playwright page object to navigate. • **url**: `string` The URL to navigate to. • **options?** • **options.referer?**: `string` Referer header value. If provided, it will take preference over the referer header value set by `page.setExtraHTTPHeaders(headers)`. • **options.throwOnTimeout?**: `boolean` Whether to throw if the `page.goto` times out. By default, it ignores the error. • **options.timeout?**: `number` Maximum operation time in milliseconds. Defaults to `0` (no timeout). This can be configured via various timeout settings on the page or browser context. • **options.waitUntil?**: `"load"` | `"domcontentloaded"` | `"networkidle"` | `"commit"` When to consider the operation succeeded. Defaults to `networkidle` (playwright default to `load`). ## Returns `ReturnType`\<`Page`\[`"goto"`]> * A promise that resolves to the response of the navigation, or null if no response was received. # DynamicFormInputItem \-- ## Properties ### fieldSelector ```typescript fieldSelector: ElementSelector; ``` The selector for the form field. *** ### fieldType ```typescript fieldType: InputFieldType; ``` The type of the input field, supported types are: `text-input` `select` `checkbox` `radiogroup` `submit-button` `auto-complete` *** ### value ```typescript value: object; ``` #### source ```typescript source: string | object; ``` #### type ```typescript type: "dynamic"; ``` # ElementSelector \-- ## Extended by * [`ValueSelector`](/automation-sdks/intuned-sdk/playwright/interfaces/ValueSelector) ## Properties ### selector ```typescript selector: string; ``` The selector string for the element. *** ### type? ```typescript optional type: "css" | "xpath"; ``` Optional. The type of the selector (xpath or css) default is `css` # ExtendedPlaywrightPage ## Extends * `Page` ## Properties ### extractArrayOptimized() ```typescript extractArrayOptimized: (options) => Promise[]>; ``` an alias for [extractArrayFromPage](/automation-sdks/runtime-sdk/optimized-extractors/functions/extractArrayFromPage) function #### Parameters • **options** • **options.itemEntityName**: `string` • **options.itemEntitySchema**: `SimpleArrayItemSchema` • **options.label**: `string` • **options.optionalPropertiesInvalidator?** • **options.prompt?**: `string` • **options.strategy?**: `ImageStrategy` | `HtmlStrategy` • **options.variantKey?**: `string` #### Returns `Promise`\<`Record`\<`string`, `string`>\[]> *** ### extractArrayUsingSelectors() ```typescript extractArrayUsingSelectors: (extractor) => Promise>; ``` an alias for [extractArrayFromPageUsingSelectors](/automation-sdks/runtime-sdk/playwright/functions/extractArrayFromPageUsingSelectors) function #### Type parameters • **T** *extends* [`ListStaticExtractor`](/automation-sdks/intuned-sdk/playwright/interfaces/ListStaticExtractor) #### Parameters • **extractor**: `T` #### Returns `Promise`\<[`ExtractListObjectsUsingStaticSelectorsReturnType`](/automation-sdks/intuned-sdk/playwright/type-aliases/ExtractListObjectsUsingStaticSelectorsReturnType)\<`T`>> *** ### extractMarkdown() ```typescript extractMarkdown: () => Promise; ``` an alias for [extractMarkdownFromPage](/automation-sdks/runtime-sdk/ai-extractors/functions/extractMarkdownFromPage) function #### Returns `Promise`\<`string`> *** ### extractObjectOptimized() ```typescript extractObjectOptimized: (options) => Promise>; ``` an alias for [extractObjectFromPage](/automation-sdks/runtime-sdk/optimized-extractors/functions/extractObjectFromPage) function #### Parameters • **options** • **options.entityName**: `string` • **options.entitySchema**: `SimpleObjectSchema` • **options.label**: `string` • **options.optionalPropertiesInvalidator?** • **options.prompt?**: `string` • **options.strategy?**: `ImageStrategy` | `HtmlStrategy` • **options.variantKey?**: `string` #### Returns `Promise`\<`null` | `Record`\<`string`, `null` | `string`>> *** ### extractObjectUsingSelectors() ```typescript extractObjectUsingSelectors: (extractor) => Promise>; ``` an alias for [extractObjectFromPageUsingSelectors](/automation-sdks/runtime-sdk/playwright/functions/extractObjectFromPageUsingSelectors) function #### Type parameters • **T** *extends* [`ObjectExtractor`](/automation-sdks/intuned-sdk/playwright/type-aliases/ObjectExtractor) #### Parameters • **extractor**: `T` #### Returns `Promise`\<[`ExtractObjectFromPageUsingSelectorsReturnType`](/automation-sdks/intuned-sdk/playwright/type-aliases/ExtractObjectFromPageUsingSelectorsReturnType)\<`T`>> *** ### extractStructuredData() ```typescript extractStructuredData: (options) => Promise; ``` an alias for [extractStructuredDataFromPage](/automation-sdks/runtime-sdk/ai-extractors/functions/extractStructuredDataFromPage) function #### Parameters • **options** • **options.dataSchema**: `JsonSchema` • **options.label**: `string` • **options.prompt?**: `string` • **options.strategy?**: `ImageStrategy` | `HtmlStrategy` #### Returns `Promise`\<`any`> *** ### fillForm() ```typescript fillForm: (options) => Promise; ``` an alias for [fillForm](/automation-sdks/runtime-sdk/playwright/functions/fillForm) function #### Parameters • **options** • **options.autoRecoveryOptions?** • **options.autoRecoveryOptions.enabled**: `boolean` Whether auto-recovery is enabled • **options.autoRecoveryOptions.fieldsToMask?**: [`ElementSelector`](/automation-sdks/intuned-sdk/playwright/interfaces/ElementSelector)\[] Fields to mask during auto-recovery, use this if you do not want to send your form values to ai. • **options.autoRecoveryOptions.generateDataToUnblockForm?** • **options.autoRecoveryOptions.generateDataToUnblockForm.enabled**: `boolean` Whether generating data to unblock the form is enabled. • **options.autoRecoveryOptions.generateDataToUnblockForm.prompt**: `string` The prompt to use for generating data. • **options.autoRecoveryOptions.maxRetries?**: `number` Maximum number of retries for auto-recovery • **options.autoRecoveryOptions.recoveryData**: `object` Data to use for auto-recovery • **options.fillFieldTimeout?**: `number` • **options.formInput**: ([`DynamicFormInputItem`](/automation-sdks/intuned-sdk/playwright/interfaces/DynamicFormInputItem) | [`StaticFormInputItem`](/automation-sdks/intuned-sdk/playwright/interfaces/StaticFormInputItem))\[] • **options.formLocator**: `Locator` | [`ElementSelector`](/automation-sdks/intuned-sdk/playwright/interfaces/ElementSelector) • **options.isSubmitSuccessful** • **options.submitForm** • **options.timeout?**: `number` • **options.waitTimeBetweenFill?**: `number` #### Returns `Promise`\<`boolean`> *** ### goto() ```typescript goto: (url, options?) => Promise; ``` an alias for [extractStructuredDataFromPage](/automation-sdks/runtime-sdk/ai-extractors/functions/extractStructuredDataFromPage) function #### Parameters • **url**: `string` • **options?** • **options.referer?**: `string` • **options.throwOnTimeout?**: `boolean` • **options.timeout?**: `number` • **options.waitUntil?**: `"load"` | `"domcontentloaded"` | `"networkidle"` | `"commit"` #### Returns `Promise`\<`null` | `Response`> #### Overrides `Page.goto` # ListStaticExtractor \-- ## Properties ### containerSelector ```typescript containerSelector: ElementSelector | ElementSelector[]; ``` The selector(s) for the container elements of the list, all list items should be direct children of this container. *** ### propertySelectors ```typescript propertySelectors: Record; ``` The selectors for the properties to extract. the values of the selector should be relative to the list item. **example:** if the list was: ```html
  • title 1
    price 1
  • title 2
    price 2
``` the css relative selectors should be: title -> `.title` price -> `.price` # StaticFormInputItem \-- ## Properties ### fieldSelector ```typescript fieldSelector: ElementSelector; ``` The selector for the form field. *** ### fieldType ```typescript fieldType: InputFieldType; ``` The type of the input field, supported types are: `text-input` `select` `checkbox` `radiogroup` `submit-button` `auto-complete` *** ### value ```typescript value: object; ``` #### type ```typescript type: "static"; ``` #### value ```typescript value: string | number | boolean; ``` # ValueSelector represents a dom element selector and the method to extract the value from the element. ## Extends * [`ElementSelector`](/automation-sdks/intuned-sdk/playwright/interfaces/ElementSelector) ## Properties ### multiValue? ```typescript optional multiValue: boolean; ``` Optional. Whether the selector extracts multiple values, if set to true the returned value will be array of strings *** ### regex? ```typescript optional regex: object; ``` Optional. A regex pattern and match index for extracting the value. #### matchIndex? ```typescript optional matchIndex: number; ``` #### pattern ```typescript pattern: string; ``` *** ### selectionMethod? ```typescript optional selectionMethod: object | "direct-text" | "all-text"; ``` Optional. The method for selecting the value. `all-text` selects all text content, `direct-text` selects the direct text content(does not include the text inside nested elements), and `propertyName` selects the value of a property. *** ### selector ```typescript selector: string; ``` The selector string for the element. #### Inherited from [`ElementSelector`](/automation-sdks/intuned-sdk/playwright/interfaces/ElementSelector).[`selector`](/automation-sdks/intuned-sdk/playwright/interfaces/ElementSelector#selector) *** ### type? ```typescript optional type: "css" | "xpath"; ``` Optional. The type of the selector (xpath or css) default is `css` #### Inherited from [`ElementSelector`](/automation-sdks/intuned-sdk/playwright/interfaces/ElementSelector).[`type`](/automation-sdks/intuned-sdk/playwright/interfaces/ElementSelector#type) # ExtractListObjectsUsingStaticSelectorsReturnType ```typescript type ExtractListObjectsUsingStaticSelectorsReturnType: { [K in keyof T["propertySelectors"]]: T["propertySelectors"][K] extends Object ? string[] | null : string | null }[]; ``` ## Type parameters • **T** *extends* [`ListStaticExtractor`](/automation-sdks/intuned-sdk/playwright/interfaces/ListStaticExtractor) # ExtractObjectFromPageUsingSelectorsReturnType ```typescript type ExtractObjectFromPageUsingSelectorsReturnType: { [K in keyof T]: T[K] extends Object ? string[] | null : string | null }; ``` ## Type parameters • **T** *extends* [`ObjectExtractor`](/automation-sdks/intuned-sdk/playwright/type-aliases/ObjectExtractor) # InputFieldType ```typescript type InputFieldType: | "text-input" | "select" | "checkbox" | "radiogroup" | "submit-button" | "auto-complete"; ``` # ObjectExtractor ```typescript type ObjectExtractor: Record; ``` a record or property name and the value selector to extract the value from the page. you can provide a list of `ValueSelector` to provide a backup selector in case the first one fails. the primary selector is the first one in the list. # RunError Represents an error that occurs during a run. ## Param The error message. ## Param Optional. Additional options for the error. ## Examples ```typescript RunError import { RunError } from "@intuned/sdk/runtime" throw new RunError('An error occurred', { retryable: true, status_code: 500, error_code: 'SERVER_ERROR' }); ``` ## Extends * `Error` ## Constructors ### new RunError() ```typescript new RunError(message, options?): RunError ``` #### Parameters • **message**: `string` • **options?**: [`RunErrorOptions`](/automation-sdks/intuned-sdk/runtime/interfaces/RunErrorOptions) #### Returns [`RunError`](/automation-sdks/intuned-sdk/runtime/classes/RunError) #### Overrides `Error.constructor` ## Properties ### options ```typescript options: RunErrorOptions; ``` The options associated with the error. # extendPayload ```typescript function extendPayload(payload): void ``` ## Description In the context of a job or queue execution, extendPayload appends new payloads to the end of the queue of job. ## Examples ```typescript Single payload import { extendPayload } from "@intuned/sdk/runtime" // this function will append the exampleApi to the end of the queue or job it's executing in. extendPayload({ api: 'exampleApi', parameters: { key: 'value' } }); ``` ```typescript Array of payloads import { extendPayload } from "@intuned/sdk/runtime" const payloadArray: Payload[] = [ { api: 'exampleApi1', parameters: { key1: 'value1' } }, { api: 'exampleApi2', parameters: { key2: 'value2' } } ]; // this function will append 2 apis to the end of the queue or job it's executing in. extendPayload(payloadArray); ``` ## Parameters • **payload**: [`Payload`](/automation-sdks/intuned-sdk/runtime/interfaces/Payload) | [`Payload`](/automation-sdks/intuned-sdk/runtime/interfaces/Payload)\[] The payload or array of payloads to extend. you can specify the api name and what parameters you want to pass it, the new added apis will use the same proxy and auth-session settings as the api that extended them ## Returns `void` # requestMultipleChoice ```typescript function requestMultipleChoice(message, choices): unknown ``` in the create auth session flow, you might to need a multiple choice answer from the user, requestMultipleChoice prompts the user with the question and possible options and returns their selection. **Note:** This function is currently in beta and may be subject to changes. ## Examples ```typescript requestMultipleChoice // in auth-sessions/create.ts import { requestMultipleChoice } from "@intuned/sdk/runtime" const message = "What is your favorite color?"; const choices = ["Red", "Blue", "Green", "Yellow"]; const selectedChoice = yield requestMultipleChoice(message, choices); console.log(selectedChoice); ``` ## Parameters • **message**: `string` The message to display to the user. • **choices**: `string`\[] An array of choices to present to the user. ## Returns `unknown` # requestOTP ```typescript function requestOTP(message): unknown ``` **Note:** This function is currently in beta and may be subject to changes. requestOTP help you to ask the user for an otp in the create auth-session flow. ## Examples ```typescript requestOTP // in auth-sessions/create.ts import { requestOTP } from "@intuned/sdk/runtime" const message = "please submit and OTP from your authenticator app"; const otp = yield requestOTP(message); console.log(otp); ``` ## Parameters • **message**: `string` The message to display to the user. ## Returns `unknown` # runInfo ```typescript function runInfo(): RunInfo ``` Retrieves information about the current run environment. ## Examples ```typescript runInfo import { runInfo } from "@intuned/sdk/runtime" const info = runInfo(); console.log(info.runEnvironment); // Outputs the run environment, IDE or DEPLOYED console.log(info.runId); // Outputs the run ID, if available, in IDE run id will be undefined ``` ## Returns [`RunInfo`](/automation-sdks/intuned-sdk/runtime/interfaces/RunInfo) An object containing details about the run environment and the run ID. # Payload Payload ## Examples ```typescript payload import { Payload } from "@intuned/sdk/runtime" const payload: Payload = { api: 'exampleApi', parameters: { key1: 'value1', key2: 'value2' } }; ``` ## Properties ### api ```typescript api: string; ``` The API path you want to extend. *** ### parameters ```typescript parameters: Record; ``` A record of key-value pairs representing the parameters to be sent to the API # RunErrorOptions RunErrorOptions ## Examples ```typescript RunErrorOptions import { RunErrorOptions } from "@intuned/sdk/runtime" const options: RunErrorOptions = { retryable: true, status_code: 500, error_code: 'SERVER_ERROR' }; ``` ## Properties ### error\_code? ```typescript optional error_code: string; ``` Optional. A specific error code to identify the type of error. *** ### retryable? ```typescript optional retryable: boolean; ``` Optional. Indicates whether the error is retryable. *** ### status\_code? ```typescript optional status_code: number; ``` Optional. The HTTP status code associated with the error. # RunInfo Represents information about the current run. RunInfo ## Properties ### runEnvironment ```typescript runEnvironment: RunEnvironment; ``` the run environment `IDE` or `DEPLOYED` *** ### runId? ```typescript optional runId: string; ``` Optional. The ID of the current run, in IDE environment, run id will be undefined # Overview ## Introduction Intuned's browser automations are code-based, utilizing the [Playwright](https://playwright.dev/) framework as the underlying technology. Playwright is a Node.js library that enables browser automation through a single API. If you are looking to perform browser automation using Playwright, Intuned is an excellent platform to facilitate that. ## `@intuned/sdk` Intuned also provides the @intuned/sdk, a library built on top of Playwright, extending its capabilities by offering powerful helpers designed for common browser automation tasks. The @intuned/sdk enhances Playwright by making automation tasks more reliable and easier to implement. ### Namespaces of `@intuned/sdk` The [@intuned/sdk](./intuned-sdk) is organized into several namespaces, each tailored for specific functionalities: * `@intuned/sdk/ai-extractors`: Provides powerful utilities for data and markdown extractions using AI. * `@intuned/sdk/optimized-extractors`: Enables the creation and execution of web extractors that are reliable and scalable. * `@intuned/sdk/playwright`: Offers additional helpers on top of Playwright to simplify common automation patterns. * `@intuned/sdk/runtime`: Includes helpers related to the Intuned runtime environment. * `@intuned/sdk/files`: Contains utilities to facilitate working with files. ### Leveraging AI for Enhanced Reliability The Intuned SDK heavily utilizes AI, with many functions aimed at improving the reliability of browser automation tasks. This focus on AI helps make automation processes more robust and easier to develop. ## Learning Resources ### Using Playwright If you are new to Playwright or want to deepen your understanding of how to use it for browser automation, we provide an overview in the [Playwright section](./playwright/overview). ### Detailed Documentation for @intuned/sdk For comprehensive information on using the @intuned/sdk, including detailed documentation on the various namespaces and their helpers, refer to the section [@intuned/sdk](./intuned-sdk). This section covers everything you need to know about the SDK and how to leverage it for your automation tasks. # Playwright Overview Learn about using playwright for browser automation We are improving our docs! This document will be updated soon. Here are some resources to help you get started with Playwright: * Learn about Input API: [Playwright Input](https://playwright.dev/docs/input) * Learn about Locators: [Playwright Locators](https://playwright.dev/docs/locators) and [Playwright Other Locators](https://playwright.dev/docs/other-locators) * Learn about AutoWaiting: [Playwright AutoWaiting](https://playwright.dev/docs/actionability) * Learn about Evaluating JavaScript: [Playwright Evaluating](https://playwright.dev/docs/evaluating) # Auth Sessions Overview Learn how to create and manage auth sessions for your projects. ## Introduction To learn more about auth sessions, checkout the in depth explanation on [Authentication Sessions](/docs/auth-sessions/overview). ### API reference * [Create cred based auth session](./projectauthsessionscreate/create-auth-session--start.mdx) * [Record auth session](./projectauthsessionsrecorder/start-recorder-session-for-an-auth-session.mdx) * [Update auth session](./projectauthsessionsupdate/update-auth-session--start.mdx) * [Get auth session](./projectauthsessions/get-auth-session.mdx) * [Get auth sessions](./projectauthsessions/get-auth-sessions.mdx) * [Delete auth session](./projectauthsessions/delete-auth-session.mdx) # Standalone File API Overview Learn how to use the Standalone File API to process files without a project. ## Introduction The standalone APIs allow you to process PDF and image files without consuming a project. We currently provide 3 operations: 1. **Extract structured data**: Extract strucutred data from the file following a [JSONSchema](https://json-schema.org/). 2. **Extract markdown**: Extract markdown from the file, including headers, paragraphs, lists and tables. 3. **Extract tables**: Extract tables from the file in JSON format. There are two ways to consume these APIs: synchronously and asynchronously. In synchronous calls, the result is returned in the same call. In asynchronous calls, the result is returned in a separate call using an `operationId` obtained in the initial call. ## Sync vs Async APIs Each of the operations listed above is available via a Sync API and an Async API. In Sync APIs, you make a single call which triggers the operation and returns the result. In Async APIs, you make two calls: the first call triggers the operation and returns an `operationId`, and the second call uses the `operationId` to check the status and get the result. Depending on the input, the call might take a long time to complete, especially if the file is large or the operation is complex. If the API is taking too long, the request might time out before the file processing is finished. For this reason, we recommend using the Asynchronous API for most use cases. The Sync API is limited to 10 requests per minute per operation. If you need a higher rate limit, [contact us](/docs/support/contact-us). ## Supported file formats We currently support pdf files and image files. We will be working on supporting other formats soon. [Contact us](/docs/support/contact-us) if you have any specific requirements. In PDF files, you can specify the page numbers to run processing on. If no page numbers are specified, the operation will run on all pages. [Check out the API reference](/client-apis/api-reference/filesextractstructureddata/extract-structured-data--sync) for more information. ## Extract structured data API This API allows you to extract data from a file following a JSONSchema. This is useful when you have a document with a known data structure that you want to extract, such as a contract document. ### API reference * [Sync API](/client-apis/api-reference/filesextractstructureddata/extract-structured-data--sync) * [Async Start API](/client-apis/api-reference/filesextractstructureddata/extract-structured-data--async-start) * [Async Result API](/client-apis/api-reference/filesextractstructureddata/extract-structured-data--async-result) ## Extract markdown API This API allows you to extract markdown from the file, including headers, paragraphs, lists, tables and links. The output is human-readable and can be used for further processing or display. ### API reference * [Sync API](/client-apis/api-reference/filesextractstructureddata/extract-markdown--sync) * [Async Start API](/client-apis/api-reference/filesextractstructureddata/extract-markdown--async-start) * [Async Result API](/client-apis/api-reference/filesextractstructureddata/extract-markdown--async-result) ## Extract tables API This API allows you to extract tables from the file in JSON format. This is useful when you have a document with tabular data that you want to extract and process further. The result is an array of tables, each table including the page number, title (if any), and the table data. ### API reference * [Sync API](/client-apis/api-reference/filesextractstructureddata/extract-tables--sync) * [Async Start API](/client-apis/api-reference/filesextractstructureddata/extract-tables--async-start) * [Async Result API](/client-apis/api-reference/filesextractstructureddata/extract-tables--async-result) # Extract Markdown - Async Result get /{workspaceId}/files/extract/markdown/{operationId}/result Gets the result of the markdown extraction operation using the operation ID. # Extract Markdown - Async Start post /{workspaceId}/files/extract/markdown/start Starts an asynchronous operation to extract markdown from a file. Supported file types are image, pdf (more coming soon!). This methods accepts the file. The API responds with an ID to track the operation status and retrieve the result. # Extract Markdown - Sync post /{workspaceId}/files/extract/markdown Extracts markdown from a file. Supported file types are image, pdf (more coming soon!). It accepts the file. # Extract Structured Data - Async Result get /{workspaceId}/files/extract/structuredData/{operationId}/result Gets the result of the structured data extraction operation using the operation ID. # Extract Structured Data - Async Start post /{workspaceId}/files/extract/structuredData/start Starts an asynchronous operation to extract structured data from a file. Supported file types are image, pdf (more coming soon!). It accepts the file and requested schema for the data to be extracted. It responds with an ID to track the operation status and retrieve the result. # Extract Structured Data - Sync post /{workspaceId}/files/extract/structuredData Extracts structured data from a file. Supported file types are image, pdf (more coming soon!). It accepts the file and requested schema for the data to be extracted. # Extract Tables - Async Result get /{workspaceId}/files/extract/tables/{operationId}/result Gets the result of the tables extraction operation using the operation ID. # Extract Tables - Async Start post /{workspaceId}/files/extract/tables/start Starts an asynchronous operation to extract tables from a file. Supported file types are image, pdf (more coming soon!). This methods accepts the file. The API responds with an ID to track the operation status and retrieve the result. # Extract Tables - Sync post /{workspaceId}/files/extract/tables Extracts tables from a file. Supported file types are image, pdf (more coming soon!). It accepts the file. # Project Consumption APIs Overview ## Introduction To use the browser automations, scrapers and integrations that are created on the Intuned Platform, we expose a set of APIs that allow you to consume your projects in different ways. ## Run API The Run API is the simplest way to consume your projects. It allows you to trigger single APIs in your projects and get the result back. There are two ways to use the run API: 1. Synchronously: Call the API and wait for the result in the same request. 2. Asynchronously: Start the API call and receive a run ID which you can use to check the status of the run and get the result. More information on the Run API can be found [here](./run-overview). ## Job API The Job API is a more advanced way to consume your projects. It allows you to create jobs that can run multiple APIs in your project. Jobs allow you to: * Configure advanced retry strategies such as exponential backoff. * Run APIs concurrently. * Sink the results of the APIs to a destination such as a webhook or an S3 bucket. * Extend the API payloads to dynamically run other APIs. * Schedule the job to run periodically. * Pause and resume job execution and schedule. More information on the Job API can be found [here](./projectjobs). ## Queue API The Queue API allows you to consume your projects in an order-based manner. It allows you to queue up APIs to run on demand. Queues allow you to: * Queue up APIs to run on demand with guaranteed order. * Configure APIs to queue up periodically. * Sink the results of the APIs to a destination such as a webhook. * Extend the API payloads to dynamically queue up other APIs. * Impose rate limits for the queued up API runs. * Configure periods which the queue can pause execution. * Add random delays between API runs. More information on the Queue API can be found [here](./queues-overview). # Delete Auth Session delete /{workspaceId}/projects/{projectName}/auth-sessions/{authSessionId} Deletes an authentication session by ID. # Get Auth Session get /{workspaceId}/projects/{projectName}/auth-sessions/{authSessionId} Gets authentication session of project by ID # Get Auth Sessions get /{workspaceId}/projects/{projectName}/auth-sessions Gets all authentication sessions of project # Create Auth Session - Result get /{workspaceId}/projects/{projectName}/auth-sessions/create/{operationId}/result Gets authentication session creation operation result. # Create Auth Session - Resume post /{workspaceId}/projects/{projectName}/auth-sessions/create/{operationId}/resume Resume authentication session creation operation. This is needed if the operation requests more info. # Create Auth Session - Start post /{workspaceId}/projects/{projectName}/auth-sessions/create Starts creation process of an authentication session for a project with the authentication session creation setting enabled. # Start recorder session for an auth session post /{workspaceId}/projects/{projectName}/auth-sessions/recorder/start create a recording session for a specific auth session # Update Auth Session - Result get /{workspaceId}/projects/{projectName}/auth-sessions/{authSessionId}/update/{operationId}/result Gets authentication session creation operation result. # Update Auth Session - Resume post /{workspaceId}/projects/{projectName}/auth-sessions/{authSessionId}/update/{operationId}/resume Resume authentication session creation operation. This is needed if the operation requests more info. # Update Auth Session - Start post /{workspaceId}/projects/{projectName}/auth-sessions/{authSessionId}/update Starts updating process of an authentication session. # Create Job post /{workspaceId}/projects/{projectName}/jobs Creates a new job for a project. # Delete Job delete /{workspaceId}/projects/{projectName}/jobs/{jobId} Deletes a job by ID. # Get Job get /{workspaceId}/projects/{projectName}/jobs/{jobId} Gets a job in a project by ID. # Get Jobs get /{workspaceId}/projects/{projectName}/jobs Gets all jobs in a project. # Jobs API Overview ## Overview To learn more about jobs, checkout the in depth explanation on [Jobs](/docs/platform/consume/jobs). ## API reference * [Create job](./create-job). * [Get jobs](./get-jobs). * [Get job](./get-job). * [Delete job](./delete-job). * [Pause job](./pause-job). * [Resume job](./resume-job). * [Trigger job](./trigger-job). # Pause Job post /{workspaceId}/projects/{projectName}/jobs/{jobId}/pause Pauses a job. Will pause any job runs and the job schedule if applicable. # Resume Job post /{workspaceId}/projects/{projectName}/jobs/{jobId}/resume Resumes a paused job. Will resume any paused job runs and the job schedule if applicable. # Trigger Job post /{workspaceId}/projects/{projectName}/jobs/{jobId}/trigger Manually triggers a job run for a job. If the job is paused, the trigger fails. # Get Job Runs get /{workspaceId}/projects/{projectName}/jobs/{jobId}/runs Get all job runs of a job. # Terminate Job Run post /{workspaceId}/projects/{projectName}/jobs/{jobId}/runs/{runId}/terminate Terminate a job run by ID. # Create Queue post /{workspaceId}/projects/{projectName}/queues Creates a new queue. # Delete Queue delete /{workspaceId}/projects/{projectName}/queues/{queueId} Deletes a queue by ID. # Get Queue get /{workspaceId}/projects/{projectName}/queues/{queueId} Gets a queue in a project by ID. # Get Queues get /{workspaceId}/projects/{projectName}/queues Gets all queues in a project. # Append Queue Item post /{workspaceId}/projects/{projectName}/queues/{queueId}/items Appends an item to the queue. # Delete Queue item delete /{workspaceId}/projects/{projectName}/queues/{queueId}/items/{itemRunId} Delete queued item. If the item is currently processing, the delete will fail. # Get Queue Item result get /{workspaceId}/projects/{projectName}/queues/{queueId}/items/{itemRunId} Get queue item result. # Append Queue Repeat Item post /{workspaceId}/projects/{projectName}/queues/{queueId}/repeatItems Creates and appends a repeatable item to the queue. Repeatable items will automatically re-append to the queue according to the repeat settings. # Delete Queue Repeat Item delete /{workspaceId}/projects/{projectName}/queues/{queueId}/repeatItems/{itemId} Deletes a repeatable item by ID. The item will no longer be re-appended to the queue. # Get Queue Repeat Item get /{workspaceId}/projects/{projectName}/queues/{queueId}/repeatItems/{itemId} Gets a repeatable item from a queue by ID. The last execution result of the item is also returned. # Get Queue Repeat Items get /{workspaceId}/projects/{projectName}/queues/{queueId}/repeatItems Gets all repeatable items of a queue. # Update Queue Repeat Item put /{workspaceId}/projects/{projectName}/queues/{queueId}/repeatItems/{itemId} Updates the configurations of a repeatable item by ID. # Run API - Async Result get /{workspaceId}/projects/{projectName}/run/{runId}/result Retrieves the result of a started project API run operation. # Run API - Async Start post /{workspaceId}/projects/{projectName}/run/start Starts a project API run operation # Run API - Sync post /{workspaceId}/projects/{projectName}/run Runs a project API synchronously. # Queues API Overview ## Overview To learn more about queues, checkout the in depth explanation on [Queues](/docs/platform/consume/queues). ### API reference * [Create queue](./projectqueues/create-queue). * [Get queues](./projectqueues/get-queues). * [Get queue](./projectqueues/get-queue). * [Delete queue](./projectqueues/delete-queue). * [Append item](./projectqueuesitems/append-queue-item). * [Delete item](./projectqueuesitems/delete-queue-item). * [Get item result](./projectqueuesitems/get-queue-item-result). * [Create repeatable item](./projectqueuesrepeatitems/append-queue-repeat-item). * [Get repeatable items](./projectqueuesrepeatitems/get-queue-repeat-items). * [Get repeatable item](./projectqueuesrepeatitems/get-queue-repeat-item). * [Update repeatable item](./projectqueuesrepeatitems/update-queue-repeat-item). * [Delete repeatable item](./projectqueuesrepeatitems/delete-queue-repeat-item). # Run API ## Introduction The Run API is simplest way to consume your project APIs. You trigger a single API from your project, and then retrieve the results directly. There are two ways to consume these APIs: synchronously and asynchronously. In synchronous calls, the result is returned in the same call. In asynchronous calls, the result is returned in a separate call using a `runId` obtained in the initial call. ## Synchronous API The synchronous API (Sync API) is the simplest way to consume your project APIs. It allows you to trigger single APIs in your projects and get the result back in the same call. In order to consume it, just call [**sync**](./projectrun/run-api--sync) with the API name, parameters and any other configurations. Keep in mind that the project API run can take a long time to complete, especially (but not necessarily) if the implementation is complex and involves many steps. If the API is taking too long, the request might time out before the project API finishes running. For this reason, we recommend using the [Asynchronous API](#asynchronous-api) for most use cases. The synchronous API is limited to 10 requests per minute per project. If you need a higher rate limit, [contact us](/docs/support/contact-us). ### API reference * [Run API - Sync](./projectrun/run-api--sync). ## Asynchronous API The asynchronous API (Async API) allows you to start the API call and receive a `runId` which you can use to check the status of the run and get the result. In more detail, consuming the Async API involves the following steps: 1. **Start the API run**: Start the API run by calling [**start**](./projectrun/run-api--async-start) with the API name, parameters and any other configurations. 2. **Get the `runId`**: The **start** response will include a `runId`, it will be used to get the result. For example: ```json { "runId": "abcdegf" } ``` 3. **Check the result**: Use the `runId` to check the result of the run by calling [**result**](./projectrun/run-api--async-result). * The response will include a `status` field to indicate the status of the run. `pending` means the run is still in progress, `completed` or `failed` means the run finished, with success and failure respectively. The `runId` can also be used to monitor your runs in the **Runs** tab of your project. ### API reference * [Run API - Async start](./projectrun/run-api--async-start). * [Run API - Result](./projectrun/run-api--async-result). # Sink Body ## Introduction When a sink is configured, the output of the job or queue is written to the sink. The output is a JSON object that contains the result of the API run with additional information. The structure of the output is mostly similar between jobs and queues with some minor differences. ## Format Info about the API that was run. Includes the API name and result of run. The API name. The parameters to passed to the API for this run. The status of the API run. Available options: `completed` The result of the API run. The status code of the API run. The status of the API run. Available options: `failed` Error code. Error message. The status code of the API run. The run ID for the API run. The workspace ID of the project that the run belongs to. Details of the project that the run belongs to. The project UUID. The project name. Details of the auth used in this run. The auth session ID. Job ID. Job run ID. Info about the API that was run. Includes the API name and result of run. The API name. The parameters to passed to the API for this run. The status of the API run. Available options: `completed` The result of the API run. The status code of the API run. The status of the API run. Available options: `failed` Error code. Error message. The status code of the API run. The run ID for the API run. The workspace ID of the project that the run belongs to. Details of the project that the run belongs to. The project UUID. The project name. Details of the auth used in this run. The auth session ID. Queue ID. ## Typescript SDK Our Typescript SDK supports the sink body by providing the `SinkResult` type. You can read the sink results and cast them to this type to access the properties in a type-safe manner. ```typescript import { SinkResult } from "./src/models/components/sinkresult" const sinkResult: SinkResult = JSON.parse(sinkBody); console.log(sinkResult.apiInfo.result); ``` # null Learn how sinks function within Intuned ## Introduction If you are using the job or queue APIs to consume your project APIs, you have the option to sink the output of the job or queue. This allows you to manage the results of your project API runs in a more flexible way. At Intuned, we currently support 2 types of sinks: * **Webhook**: Send the output of the job or queue to a webhook URL. * **AWS S3**: Store the output of the job or queue in an AWS S3 bucket. ## Webhook The Webhook sink allows you to send the output of the job or queue to a webhook URL. This is useful if you want to send the output to a third-party service or to your own service. Both jobs and queues support the Webhook sink. For more information on how to configure the Webhook sink, see the [Webhook sink documentation](./webhook). ## AWS S3 The AWS S3 sink allows you to store the output of the job or queue in an AWS S3 bucket as a file. This is useful if you want to easily persist the output of your project API runs and easily access them later. Only jobs support the AWS S3 sink. For more information on how to configure the AWS S3 sink, see the [AWS S3 sink documentation](./s3). # AWS S3 Sink ## Introduction The S3 sink allows you to write the output of the job to an AWS S3 bucket. This is useful if you want to easily persist the output of your project API runs and easily access them later. Only jobs support the AWS S3 sink. ## Configuration The S3 sink requires the following configurations: * **Region**: The S3 bucket region. * **Bucket**: The S3 bucket name. * **Access Key ID**: Access key ID for the IAM user to use the bucket. The IAM user has to have write permissions to the bucket. * **Secret Access Key**: Secret access key of the IAM user to use the bucket. * **Prefix**: A prefix added to the key of the file to be written. This can be used to define a folder where all results are stored. * **Skip On Fail**: If enabled, failed payload runs will ***not*** be written to the bucket. * **APIs to Send**: Specify which API results should be sent to the sink. If not specified, results from all APIs will be sent. ## Output File Content The output file is a `.json` file that contains the result of the API run with additional information. Check out the [sink body page](./body) for more information on the output file content of the S3 sink. # Webhook Sink ## Introduction The Webhook sink allows you to send the output of the job or queue to a webhook URL. This is useful if you want to send the output to a third-party service or to your own service. Jobs and queues support the Webhook sink. ## Configuration The Webhook sink requires configuring the URL to send the output to. This URL must be accessible from the Intuned platform. you can optionally add additional headers to be sent with the webhook request. ```json { "type": "webhook", "url": "https://example.com/webhook", "headers": { "Authorization": "Bearer " } } ``` ## API ### Request The Webhook sink will send a POST request to the configured URL with a JSON payload containing the API run result and additional data. Check out the [sink body page](./body) for more information on the body of the Webhook sink request. ### Response The webhook call will be considered successful if the response status code is [an ok response](https://developer.mozilla.org/en-US/docs/Web/API/Response/ok). If the response status code is not an ok response, the webhook call will be considered a failure. # Overview Learn how to consume the Intuned Client APIs. ## Introduction At Intuned, we provide different kinds of APIs. Project consumption APIs allows you to consume the browser automations, scrapers and integrations that are created on the Intuned Platform. Auth Session APIs allow you to create and manage auth sessions for your projects. Standalone file APIs allow you to process files without the need of creating a project. We offer both REST APIs that you can call directly and a TypeScript SDK ([`@intuned/client`](https://www.npmjs.com/package/@intuned/client)) that you can use in your project. Intuned APIs are consumed using an API key in the context of your workspace. [How to generate an API key](/docs/getting-started/quick-start#create-an-api-key). [How to get your workspace Id](/docs/getting-started/quick-start#get-your-workspace-id). ## Project Consumption To use the browser automations, scrapers and integrations that are created on the Intuned Platform, we expose a set of APIs that allow you to consume your projects in different ways. More information on the Project Consumption APIs can be found [here](./api-reference/project-consumption-overview/). ## Auth Session Management If your projects require you to authenticate to a website, you can use the Auth Session APIs to create and manage auth sessions. Auth sessions allow you to authenticate to a website and maintain the session for project API runs. More information on the Auth Session APIs can be found [here](./api-reference/auth-sessions-overview/). ## Standalone File Processing The standalone APIs allow you to process PDF and image files without consuming a project. We currently provide 3 operations: 1. **Extract structured data from file**: Extract strucutred data from the file following a [JSONSchema](https://json-schema.org/). 2. **Extract markdown from file**: Extract markdown from the file, including headers, paragraphs, lists and tables. 3. **Extract tables from file**: Extract tables from the file in JSON format. More information on the Standalone File APIs can be found [here](./api-reference/files-overview/). ## Rate Limits Intuned APIs are rated limited to 20 requests per minute per URI path. Some specific types of APIs are more strictly limited: * Sync project consumption APIs are limited to 10 requests per minute per project. * Sync standalone file APIs are limited to 10 requests per minute per operation. If you need a higher rate limit, [contact us](/docs/support/contact-us). ## Typescript SDK ### Installation ```bash npm install @intuned/client ``` ```bash yarn add @intuned/client ``` ### Example Usage ```typescript import { IntunedClient } from "@intuned/client"; const intunedClient = new IntunedClient({ apiKey: "", workspaceId: "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", }); async function run() { const result = await intunedClient.project.run.sync("my-project", { api: "get-contracts", parameters: { "page": 1, "isLastPage": false, }, retry: { maximumAttempts: 3, }, }); // Handle the result console.log(result) } run(); ``` ## Other SDKs [Contact Us](/docs/support/contact-us) if you would like to request an SDK for a different language. ## Postman Collection For feedback on the Postman collection, [contact us](/docs/support/contact-us). # Actions in Intuned `@intuned/sdk` provides utilities to help execute common browser automation tasks. ## File download The [`downloadFile` utility](/automation-sdks/intuned-sdk/files/functions/downloadFile) helps with downloading file tasks. Checkout the reference for example usage and more info. ## Upload file to S3 The [`uploadFileToS3` utility](/automation-sdks/intuned-sdk/files/functions/uploadFileToS3) helps with uploading files to S3. Checkout the reference for example usage and more info. ## FillForm The [`fillForm` utility](/automation-sdks/intuned-sdk/playwright/functions/fillForm) helps with filling standard forms on webpages. Checkout the reference for example usage and more info. # Consume APIs with auth sessions ## Introduction Learn how to authenticate your users and take actions on their behalf. Before you start this document, the following is recommended: * Understand [core Intuned concepts](/docs/getting-started/concepts) * Understand [Auth session types](/docs/auth-sessions/overview#auth-session-types). Specifically, understand the difference between credentials-based auth sessions and recorder-based auth sessions and when to use each. * Understand the [development of an project with auth session enabled](/docs/auth-sessions/develop-auth-sessions) ## Creating/managing auth sessions To create an auth session, you must have the following: * \[Required] Access to user credentials, ether it is stored or the user is present * \[Optional] Proxy to the location you wish to have the user authenticate from Auth sessions can be managed via the Intuned UI or via the API. * The API provides a way to manage auth sessions programmatically. This is typically used when you are building an integration that has no APIs and you want to automate actions on behalf of your users (you need to provide credentials). Building a LinkedIn integration is a good example here. For API docs, you can look at the [Auth Sessions API reference](/client-apis/api-reference/auth-sessions-overview). * The UI provides a simple way to create, update, and delete auth sessions. This is typically used when you are building an authenticated scraper or doing back-office automation with a limited set of auth sessions. ![Auth Session UI Management](https://intuned-docs-public-images.s3.amazonaws.com/auth/auth-sessions-ui-management.png) To learn more and dig deeper: * [How to create Credentials-based auth session](/docs/guides/platform/how-to-manage-auth-sessions#create-credentials-based-auth-session-for-deployed-projects) * [How to create Recorder-based auth session](/docs/guides/platform/how-to-manage-auth-sessions#create-recorder-based-auth-session-for-deployed-projects) ## Utilizing auth Sessions Calling an API in an project with auth sessions enabled requires an auth session to be present. Prior to executing any APIs, Intuned will check if the auth session is valid by attempting to call the `auth-sessions/check` API that is defined in the project. After creating an auth session, it must be included in input payload of the direct API calls. If jobs or queues are being configured on an project with auth sessions enabled then the configuration must include auth session as well. To learn more and dig deeper: * [Use authSession in the body to make an async call](/client-apis/api-reference/projectrun/run-api--async-start) * [How to use auth sessions in job](/docs/guides/auth/how-to-automate-linkedin#4-create-a-job-to-be-updated-of-all-pending-requests-weekly) * [How to use auth sessions in a queue](/docs/guides/auth/how-to-authenticate-with-credentials#5-create-a-queue-to-add-new-users) ## Other aspects of auth session ### Proxies An auth session can be configured with a proxy to ensure all actions and traffic from the browser are showing up from the same IP address. Proxies are not required to use auth sessions but are commonly used depending on the target service. Once an auth session is configured with a proxy, all API calls with this auth session will be routed via the proxy so if the proxy fails the API calls will fail as well. If there are failures in the proxy, you can [update the proxy configured on an auth session](/docs/auth-sessions/consume-auth-sessions#updating-auth-sessions). ### Updating auth sessions Once an auth session has been created, it should serve as the only auth session instance for that user/service pairing. For example is Sam created an auth session for Service A with the id `sam-serviceA-session`then this auth session should be maintained and updated by you. This means that you must update and maintain the auth session for the duration you need it. [Learn how to update auth sessions](/docs/guides/platform/how-to-manage-auth-sessions#update-an-expired-auth-session) # Develop projects with auth sessions ## Introduction Learn how to develop projects that use auth sessions. Before you start this document, the following is recommended: * Understand [core Intuned concepts](/docs/getting-started/concepts) * Understand [Auth session types](/docs/auth-sessions/overview#auth-session-types). Specifically, understand the difference between credentials-based auth sessions and recorder-based auth sessions and when to use each. ## Using auth sessions in Intuned projects Intuned projects can either be developed with or without auth sessions. If auth sessions are enabled, then all the project APIs will require a valid auth session to be passed in the request for the API to be executed. We do not support projects that have only some APIs that require auth sessions and some that do not. ### Enabling auth sessions in a project * To enable auth sessions in a project that was created without auth sessions, you can enable it in the project settings. In the `Intuned.json` settings you will find auth session configuration pane. ![ide-settings](https://intuned-docs-public-images.s3.amazonaws.com/auth/ide-settings-pane.png) * Once auth sessions are enabled, you need to pick up the auth session type to use from the dropdown that will appear. This is set to Credentials-based auth session by default. ![auth-session-type-dropdown](https://intuned-docs-public-images.s3.amazonaws.com/auth/auth-session-type-dropdown.png) ### After enabling auth sessions Upon enabling auth sessions and selecting the strategy new capabilities will be enabled in the IDE. ![ide-auth-session-settings](https://intuned-docs-public-images.s3.amazonaws.com/auth/ide-auth-sessions-controls.gif) 1. Folder named 'auth-sessions' defines the auth workflow. The strategy will determine how much of the auth workflow will need to be defined. 2. Auth sessions dropdown manages auth sessions in IDE, and is only made available when running APIs defined in the `api` folder. 3. Run settings contains new controls to define auth session behavior when running APIs in the IDE * `Check/Refresh as part of the API - match deployed behavior`:\ This option allows you to emulate production behavior and ensure that the auth session check is made before executing the API, and if the check fails it will attempt to refresh as well. It is recommended to use this to test failures caught in production. * `Load auth-session and run API - skip check/refresh`:\ This option skips the check and refresh behavior that is standard in production. It is recommended to use this as you are developing the project. * `Reuse session if open`:\ This option should be selected if you've already authenticated in the browser session and just want to continue from where the browser is currently. #### Credentials-based auth session If Credentials-based auth is selected, then `auth-sessions` folder will following APIs that need to be defined: * `create.ts` - needs to contain the end to end automation workflow that can navigate to the target service, enter user's credentials, resolve any challenges, and complete authentication. * `check.ts` - needs to contain a simple workflow that is able to validate that the auth session is valid. * `refresh.ts` - needs to contain the workflow required to update or refresh the auth session. [Learn how to create an Credentials-based auth session in the IDE](/docs/guides/platform/how-to-manage-auth-sessions#create-credentials-based-auth-session-in-the-ide). #### Recorder-based auth session If Credentials-based auth is selected, then `auth-sessions` folder will only contain the following API that need to be defined: * `check.ts` - needs to contain a simple workflow that is able to validate that the auth session is valid. * The `create.ts` API **doesn't** need to be defined because the recording experience will prompt the end user to enter their credentials on a streaming browser. Upon completion of the sign in flow the auth session will be captured and the pop up browser will be closed. * The `refresh.ts` API **doesn't** need to be defined because Recorder-based auth sessions do not support refreshing expired auth sessions. [Learn how to create a Recorder-based auth session in the IDE](/docs/guides/platform/how-to-manage-auth-sessions#create-recorder-based-auth-session-in-the-ide). ## Running APIs in project with auth sessions Before deploying the project, you must validate the auth sessions and APIs in the IDE. Follow the steps below: 1. Create [cred based](/docs/guides/platform/how-to-manage-auth-sessions#create-recorder-based-auth-session-in-the-ide) or [recorder](/docs/guides/platform/how-to-manage-auth-sessions#create-recorder-based-auth-session-in-the-ide) auth session in the IDE. 2. Run the APIs in the IDE and validate the output in the IDE ## Deploying and validating auth sessions Upon validating the auth session in the IDE you can [deploy the project](/docs/guides/platform/how-to-deploy-a-project). Then when the project is deployed, it is recommended to create an auth session in production and validate the APIs. Follow the steps below: 1. [Create auth session for deployed project](/docs/guides/platform/how-to-manage-auth-sessions#create-auth-session-for-deployed-projects) 2. Run an [async API](/client-apis/api-reference/projectrun/run-api--async-start) to validate the authenticated APIs. # Auth Sessions ## Introduction Auth sessions in Intuned allow you to add authentication to any browser automation task. This includes creating authenticated scrapers, automated RPA processes, or building authenticated integrations when APIs are not available. Intuned's auth sessions come with features that handle common authentication challenges, such as: securing credentials, solving challenges, proxy support, and more! ### Use cases Build authenticated scrapers that capture data from authenticated websites. Build LinkedIn integration to automate GTM tasks on behalf of your users. Automate back-office workflows like order fulfillment, adjustments, and tracking. Automate reservation processes for flights, hotels, restaurants, and more. ## Auth session types Intuned offers two types of auth sessions. Its important to understand the differences between the two types to pick the right one for your use case. ### Credentials-based auth Sessions These auth sessions are created when users provide credentials to run a browser automation code that completes the authentication workflow. The credentials are also stored and when the session expires Intuned will attempt to re-authenticate with the saved credentials. From a developer point of view, the developer needs to write the authentication code but everything else is handled by Intuned. The auth session will continue to be active as long as the credentials are valid. This should be the default choice for using auth sessions in Intuned. **Unique characteristics:** * Requires providing credentials to create the auth session * Requires Intuned to persist the credentials * Allows for longer-lasting sessions, as they can be refreshed programmatically * Developer needs to write the authentication code ### Recorder-based auth sessions Recorder-based auth sessions are created by asking the user to authenticate on a browser and then capturing the authenticated session (cookies, sessions, local storage). Recorder-based auth session allows sharing access without providing credentials. The developer has to provide some info about how the authentication works on the targeted website and Intuned will handle the rest. Creating an auth session here requires the user to be present and to authenticate to a browser session. The auth session will expire when the user's session on the targeted website expires. When the session expires, the user must be present to re-authenticate. This type should be used when the user is not comfortable sharing their credentials or with services that have a long session time - a typical example here is LinkedIn auth. **Unique characteristics:** * Does not require users to share their credentials * Does not require Intuned to persist the credentials * Has a session lifetime that is dependent on the target service and typically bound to the user's original session duration * Developer needs to provide some info about how the authentication works on the targeted website (start url, auth final url) ### Common features Credentials-based and recorder-based auth sessions share the common key features including: * lifecycle management via API or UI * proxy support: ensure actions are coming from the same IP * monitoring support * Jobs and Queues support ### Picking the right auth session type to use Prior to starting with auth sessions, you must decide on which type you want to use for your use case, some useful questions to ask yourself: * Are you authenticating into a target services to complete actions on behalf of your users? * Are you authenticating into a target service on behalf of your own company to automate back-office tasks? * Are your users comfortable sharing their credentials to access the target service? * How frequently would your users require actions to be completed on their behalf on the target service? Are they daily/weekly scheduled tasks or on demand tasks? * How long can you stay signed in to the target service before being prompted to enter your credentials again? * Does the target service prompt for two factor authentication when requests are coming from a new device or new location or new browser? Do you need to use proxies with Auth Sessions? To determine which auth session to use you can refer to the flow chart below: ```mermaid flowchart TD A{Need to build an integration to act onbehalf of a user?} A --> |No| B([Auth session not required]) A --> |Yes| C{User require actions to be completed on their behalf very frequently? i.e pull data daily} C -->|Yes| D{Can user remain signed in for a long time on the same device/browser once signed in? i.e like LinkedIn} C -->|No| E{Is user comfortable sharing credentials?} E -->|Yes| F([Utilize API auth session]) E -->|No| G([Utilize Recorder auth session]) D --> |Yes| E D --> |No| F ``` ## Auth session lifecycle One of the important aspects of dealing with auth sessions is understanding the session lifecycle: when the session has expired and recovering from that. Intuned asks developers to implement a `check` function (inside a `check.ts` file) this function should return a boolean (true/false) that indicates if the session is still valid or not. This function is called before executing any authenticated action (API). If the function returns false, Intuned will attempt to re-authenticate the session using the saved credentials (for Credentials-based auth sessions) or mark the session as expired (for Recorder-based auth sessions). ### Credentials-based auth sessions lifecycle * When created, the session is marked as active. * Before executing any API call, the `check.ts` function is called. If the function returns false, Intuned will attempt to re-authenticate the session using the saved credentials. This is done by calling the `refresh.ts` API using the saved credentials. * If the `refresh.ts` API fails, the session is marked as expired. Any further APIs that use this session will fail with 401 status code. * The session state can be checked using the [get session API](/client-apis/api-reference/projectauthsessions/get-auth-session.mdx). * The session can be updated using the update session API - this API requires the credentials to be passed in the body - for more info check: [update auth session - Start](/client-apis/api-reference/projectauthsessionsupdate/update-auth-session--start), [update auth session - resume](/client-apis/api-reference/projectauthsessionsupdate/update-auth-session--resume) and [update auth session - result](/client-apis/api-reference/projectauthsessionsupdate/update-auth-session--result). * Any Job or Queue that uses the auth session will be paused when the auth session is marked as expired. ### Recorder-based auth sessions lifecycle * When created, the session is marked as active. * Before executing any API call, the `check.ts` function is called. If the function returns false, the session is marked as expired. * The session state can be checked using the [get session API](/client-apis/api-reference/projectauthsessions/get-auth-session.mdx). * The session can be updated - however, this process requires the user to be present to re-authenticate. * Any Job or Queue that uses the auth session will be paused when the auth session is marked as expired. ## Managing auth sessions Auth sessions can be managed via the Intuned UI or via the API. * The UI provides a simple way to create, update, and delete auth sessions. This is typically used when you are building an authenticated scraper or doing back-office automation with a limited set of auth sessions. ![Auth Session UI Management](https://intuned-docs-public-images.s3.amazonaws.com/auth/auth-sessions-ui-management.png) * The API provides a way to manage auth sessions programmatically. This is typically used when you are building an integration that has no APIs and you want to automate actions on behalf of your users (you need to provide credentials). Building a LinkedIn integration is a good example here. For API docs, you can look at the [Auth Sessions API reference](/client-apis/api-reference/auth-sessions-overview). ## Learn more Learn how to develop auth sessions Learn how to consume auth sessions at scale # File data extraction You can also use file data extraction as a standalone API. Checkout [Standalone File APIs](./standalone-file-apis) for more info. Extracting data from files is a common operation when writing scrapers or building browser automations in general. Normally, this involves writing custom rules and regex to parse and extract data. This process can be error-prone and time-consuming. At Intuned, we simplify this process by providing a utility that allow you to extract structured data from files. [Here](/automation-sdks/intuned-sdk/ai-extractors/functions/extractStructuredDataFromFile) is the function reference for more info. ## Examples ```typescript AI Extraction From Page const specPdfs = [ "https://intuned-docs-public-images.s3.amazonaws.com/27UP600_27UP650_ENG_US.pdf", "https://intuned-docs-public-images.s3.amazonaws.com/32UP83A_ENG_US.pdf" ]; for (const url of specPdfs) { const specs = await extractStructuredDataFromFile({ type: "pdf", source: { type: "url", "data": url, }, }, { label: "spec files", dataSchema: { type: "object", properties: { "models": { description: "models number included in this spec sheet", type: "array", items: { type: "string" } }, "color_depth": { type: "string", description: "color depth of the monitor" }, "max_resolution": { type: "string", description: "max rolustion of the screen and at what hz" }, }, required: ["models", "color_depth", "max_resolution"], } }) } // { // models: [ '27UP600', '27UP650' ], // color_depth: '8-bit / 10-bit color is supported.', // max_resolution: '3840 x 2160 @ 60 Hz' // } // { // models: [ '32UP83A' ], // color_depth: '8-bit / 10-bit color is supported.', // max_resolution: '3840 x 2160 @ 60 Hz' // } ``` For more details, see [extractStructuredDataFromFile](/automation-sdks/intuned-sdk/ai-extractors/functions/extractStructuredDataFromFile). ## How does this work? In summary, we do the following: * Convert the file (selected pages) to markdown. * Extract structured data from the markdown using the provided schema. ## How is the cost for Data Extraction calculated? * Cost for converting the file to markdown is calculated based on the number of pages in the file. * Cost for extracting structured data from the markdown is calculated based on the size of input data and the schema used. {/** ## What is the Strategy option and what does it do? For now, in the strategy, you can control the model that is used to extract the data. By default we use Gpt-4. We will be adding more strategies in the future. */} # Markdown and tables ## Converting web pages/html to markdown. Intuned provides utilities to convert web pages to markdown. Markdown is a particularly good format for working with LLMs. For more info checkout: [extractMarkdownFromPage reference](/automation-sdks/intuned-sdk/ai-extractors/functions/extractMarkdownFromPage) and [extractMarkdownFromLocator reference](/automation-sdks/intuned-sdk/ai-extractors/functions/extractMarkdownFromLocator). ```typescript Convert web page to markdown await page.goto("https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"); const siteMarkdown = extractMarkdownFromPage(page); // [Books to Scrape](../../index.html) We love being scraped! // - [Home](../../index.html) // - [Books](../category/books_1/index.html) // - [Poetry](../category/books/poetry_23/index.html) // - A Light in the Attic // ![A Light in the Attic](../../media/cache/fe/72/fe72f0532301ec28892ae79a629a293c.jpg) // # A Light in the Attic // £51.77 // \_\_ In stock (22 available) ``` ```typescript Convert web page to markdown using locator await page.goto("https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"); const siteMarkdown = extractMarkdownFromLocator(page.locator("#content_inner > article > div.row > div.col-sm-6.product_main")); // # A Light in the Attic // // £51.77 // // \_\_ In stock (22 available) ``` ## Converting files to markdown You can also use File Markdown Conversion as a standalone API. Checkout [Standalone File APIs](./standalone-file-apis) for more info. Intuned provides utilities to convert files to markdown. Markdown is a particularly good format for working with LLMs. For more info checkout: [extractMarkdownFromFile reference](/automation-sdks/intuned-sdk/ai-extractors/functions/extractMarkdownFromFile). ```typescript Convert file to markdown const specMarkdown = await extractMarkdownFromFile({ type: "pdf", source: { type: "url", data: "https://intuned-docs-public-images.s3.amazonaws.com/27UP600_27UP650_ENG_US.pdf" }, }, { label: "pdf_markdown" }); // LG // Life's Good // # OWNER'S MANUAL // LED LCD MONITOR // \(LED Monitor\*\) // \* LG LED Monitor applies LCD screen with LED backlights. Please read this manual carefully before operating your set and retain it for future reference. // 27UP600 // 27UP650 // .... ``` ## Extracting tables from files You can also use Table Extraction as a standalone API. Checkout [Standalone File APIs](./standalone-file-apis) for more info. Intuned provides utilities to extract tables from files. Tables are some of the common elements in data-rich files. For more info on how to use this, checkout [extractTablesFromFile reference](/automation-sdks/intuned-sdk/ai-extractors/functions/extractTablesFromFile). ```typescript Extract tables from file const fileTables = await extractTablesFromFile({ type: "pdf", source: { type: "url", data: "https://intuned-docs-public-images.s3.amazonaws.com/27UP600_27UP650_ENG_US.pdf" }, }, { label: "pdf_markdown" }) // [ // { // pageNumber: 2, // title: 'PRODUCT SPECIFICATION 27UP600', // content: [ // [Array], [Array], [Array], // [Array], [Array], [Array], // [Array], [Array], [Array], // [Array], [Array], [Array], // [Array], [Array], [Array] // ] // } // ] ``` # Data extraction in Intuned Data extraction is a fundamental task in browser automation and web scraping. In some cases, data also lives in files of different formats. Traditionally, data extraction is unreliable and error-prone, requiring custom code to parse, clean, and transform data into a usable format. This process is labor-intensive, error-prone and time-consuming. At Intuned, we streamline data extraction to be easy and reliable by leveraging LLMs. We offer a suite of utilities that simplify the extraction of data from both websites and files. This section focuses on these capabilities. The following is a summary of the available utilities: ## Intuned Automation Projects The Intuned SDK includes several helper methods designed for data extraction, available under the following namespaces: ### `@intuned/sdk/ai-extractors` * [**Web Data Extraction**](./web-data-extraction): Utilities to extract structured data from webpages. Use of these methods will include cost, costs depend on the webpage size and the requested data schema. * [**File Data Extraction**](./file-data-extraction): Utilities for extracting structured data from files. Use of these methods will include cost, costs will vary based on the number of pages, file contents and the requested data schema. * [**Web Markdown Conversion**](./markdown-and-tables): Convert webpages to markdown. * [**File to Markdown Conversion**](./markdown-and-tables): Convert files to markdown. This uses our file processing pipeline, with costs based on the number of file pages processed. * [**Table Extraction from Files**](./markdown-and-tables): Extract tables from files. This uses our file processing pipeline, with costs based on the number of file pages processed. ### `@intuned/sdk/optimized-extractors` Utilities for optimized data extraction from web pages, focusing on cost-efficiency. These utilities aim to minimize the reliance on LLMs. These utilities support a limited set of schemas and are restricted in the type of data they can extract. Further details on these utilities will be discussed [here](./web-data-extraction#optimized-extractors). ### `@intuned/sdk/playwright` **Static Extraction Utilities** to extract data from webpages with selectors. These utilities require manual configuration of selectors and incur no cost when used. Checkout [extractArrayFromPageUsingSelectors](/automation-sdks/intuned-sdk/playwright/functions/extractArrayFromPageUsingSelectors) and [extractObjectFromPageUsingSelectors](/automation-sdks/intuned-sdk/playwright/functions/extractObjectFromPageUsingSelectors) for more info. ### `playwright` [playwright](/automation-sdks/playwright/overview) can directly be used to interact with webpages and extract data. ## Standalone file APIs In addition to the `@intuned/sdk` utilities, we offer standalone APIs for file data extraction. These APIs can be utilized without creating projects or writing any browser automation logic, with costs varying based on the operation used and file size. More details are available [here](./standalone-file-apis). # Standalone file APIs In addition to the `@intuned/sdk` utilities, we offer standalone APIs for file related operations (extract structured data, extract/convert-to markdown, extract tables). These APIs can be utilized without creating projects or writing any browser automation logic, with costs varying based on the operation used and file size. ## Sync vs. Async APIs Each of the operations listed above is available via a Sync API and an Async API. In Sync APIs, the result is returned in the same HTTP call that was made to invoke API. In Async APIs, the result is returned in a separate HTTP call (Async Result call), using an operationId that is returned in the first call (Async Start call). ## Extract structured data from files For more info, checkout the reference for the [Sync API](/client-apis/api-reference/filesextractstructureddata/extract-structured-data--sync) and Async APIs: [Async Start](/client-apis/api-reference/filesextractstructureddata/extract-structured-data--async-start) and [Async Result](/client-apis/api-reference/filesextractstructureddata/extract-structured-data--async-result). ## Convert files to markdown For more info, checkout the reference for the [Sync API](/client-apis/api-reference/filesextractstructureddata/extract-structured-data--sync) and Async APIs: [Async Start](/client-apis/api-reference/filesextractstructureddata/extract-structured-data--async-start) and [Async Result](/client-apis/api-reference/filesextractstructureddata/extract-structured-data--async-result). ## Extract tables from files For more info, checkout the reference for the [Sync API](/client-apis/api-reference/filesextractmarkdown/extract-markdown--sync) and Async APIs: [Async Start](/client-apis/api-reference/filesextractmarkdown/extract-markdown--async-start) and [Async Result](/client-apis/api-reference/filesextractmarkdown/extract-markdown--async-result). ## Supported file formats We currently support pdf files and image files. We will be working on supporting other formats soon. [Contact us](/docs/support/contact-us) if you have any specific requirements. ## How are costs calculated? * For Markdown conversion and Table extraction APIs, cost is based on number of pages in the files processes. * For Structured data extraction APIs, cost is based on number of pages in the files processes and the size of the input data and the schema used. This is because structured data extraction is a two step process: convert to markdown + run extraction. # Web data extraction Extracting data from webpages is core operation when writing scrapers. Normally, this involves writing custom code to parse and extract data from the HTML. This process is error-prone and time-consuming. At Intuned, we streamline data extraction to be easy and reliable by leveraging LLMs. Our utilities, available in `@intuned/sdk`, allow you to extract data by providing a schema that describes the desired output. ## AI extractors `extractStructuredDataFromPage` and `extractStructuredDataFromLocator` extract structured data from a full page or a specific section (using locator) of a page. These methods retrieve the content, pass it to the LLM for extraction, and incur costs based on the input size, schema complexity, and selected strategy. ```typescript AI extraction from page await page.goto("https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"); const result = await extractStructuredDataFromPage(page, { label: "books_to_scrape", dataSchema: { type: "object", properties: { title: { type: "string", description: "title of the book", }, in_stock: { type: "boolean" }, UPC: { type: "string" }, product_type: { type: "string" }, availableBooks: { type: "number", description: "number of avaible books" }, price: { type: "object", properties: { price_include_tax: { type: "number", }, price_execluding_tax: { type: "number" }, tax_amount: { type: "number" }, currency: { type: "string", enum: ["pound", "dollar"], }, } } }, required: ["title"], }, }); // { // UPC: 'a897fe39b1053632', // availableBooks: 22, // in_stock: true, // price: { // currency: 'pound', // price_execluding_tax: 51.77, // price_include_tax: 51.77, // tax_amount: 0 // }, // product_type: 'Books', // title: 'A Light in the Attic' // } ``` ```typescript AI extraction from locator (page section) await page.goto("https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"); const result = await extractStructuredDataFromLocator(page.locator("#content_inner > article > table"), { label: "price", dataSchema: { type: "object", properties: { price_include_tax: { type: "number", }, price_execluding_tax: { type: "number" }, tax_amount: { type: "number" }, currency: { type: "string", enum: ["pound", "dollar"], }, }, required: ["price_include_tax"], }, }); // { // price_include_tax: 51.77, // price_execluding_tax: 51.77, // tax_amount: 0, // currency: 'pound' // }, ``` For complete function reference, see [extractStructuredDataFromPage](/automation-sdks/intuned-sdk/ai-extractors/functions/extractStructuredDataFromPage) and [extractStructuredDataFromLocator](/automation-sdks/intuned-sdk/ai-extractors/functions/extractStructuredDataFromLocator). ## Optimized extractors Optimized extractors deliver the benefits, reliability and convenience of AI extractors in a cost optimized manner. This is done by only using AI extraction in limited scenarios and creating/using selectors otherwise (More on this later). There are four optimized extractor methods: * `extractArrayFromPage` * `extractArrayFromLocator` * `extractObjectFromPage` * `extractObjectFromLocator` Here are these can be used in your Intuned project APIs: ```typescript Optimized extraction - Object await page.goto("https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"); const result = await extractObjectFromPage(page, { label: "books_to_scrape", entityName: "book_info", entitySchema: { type: "object", properties: { title: { type: "string", description: "title of the book", }, price: { type: "string", }, in_stock: { type: "string" }, UPC: { type: "string" }, product_type: { type: "string" }, }, required: ["title", "price", "in_stock", "UPC", "product_type"], }, }); // { // UPC: 'a897fe39b1053632', // price: '£51.77', // title: 'A Light in the Attic', // in_stock: 'In stock (22 available)', // product_type: 'Books' // } ``` ```typescript Optimized extraction - Locator await page.goto("https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"); const result = await extractObjectFromLocator(page.locator("#content_inner > article > div.row > div.col-sm-6.product_main"), { label: "books_to_scrape", entityName: "book_info", entitySchema: { type: "object", properties: { title: { type: "string", description: "title of the book", }, price: { type: "string", }, in_stock: { type: "string" }, }, required: ["title", "price", "in_stock"], }, }); // { // title: 'A Light in the Attic', // price: '£51.77', // in_stock: 'In stock (22 available)' // } ``` ```typescript Optimized extraction - Array await page.goto("https://books.toscrape.com/") const books = await extractArrayFromPage(page, { label: "books_list", itemEntityName: "book", prompt: "scrape the books list from the page.", strategy: { model: "claude-3-opus", type: "HTML", }, itemEntitySchema: { type: "object", properties: { title: { type: "string", primary: true, description: "book title" }, price: { type: "string", description: "book price" }, in_stock: { type: "string", description: "book in stock or out of stock" } }, required: ["title", "price", "in_stock"], } }); // [ // { // title: 'A Light in the ...', // price: '£51.77', // in_stock: 'In stock' // }, // { // title: 'Tipping the Velvet', // price: '£53.74', // in_stock: 'In stock' // }, // ... // ] ``` ```typescript Optimized extraction - Array from locator await page.goto("https://books.toscrape.com/") const books = await extractArrayFromLocator(page.locator("#default > div > div > div > div > section"), { label: "books_list", itemEntityName: "book", prompt: "scrape the books list from the page.", strategy: { model: "claude-3-opus", type: "HTML", }, itemEntitySchema: { type: "object", properties: { title: { type: "string", primary: true, description: "book title" }, price: { type: "string", description: "book price" }, in_stock: { type: "string", description: "book in stock or out of stock" } }, required: ["title", "price", "in_stock"], } }); // [ // { // title: 'A Light in the ...', // price: '£51.77', // in_stock: 'In stock' // }, // { // title: 'Tipping the Velvet', // price: '£53.74', // in_stock: 'In stock' // }, // ... // ] ``` ### How do they work? How are they saving cost? Optimized extractors operate in two modes: AI Extraction and Static Extraction. * **AI extraction**: In this mode, the extractor leverages LLMs to extract data directly from the webpage. This is the initial mode used when the extractor is first invoked. * **Static extraction**: After collecting a sufficient number of examples via AI Extraction, the Intuned platform runs background workflows to automatically generate selectors. Once the selectors are correctly generated, the optimized extractors switch to Static Extraction mode, using these cached/auto-generated selectors to extract data from the page. This saves cost by avoiding the need for LLM calls on every extraction. The platform handles scenarios where invalid data is not returned by the static extractors. This is taken as a signal that the static extractors may have become invalid or the page structure has changed. In such cases, the extractor automatically falls back to AI Extraction mode. After collecting new examples via AI Extraction, the platform recreates the static extractors and returns to the optimized state. It's important to note that this entire process is managed seamlessly by the Intuned platform. As a user, you simply need to provide the necessary extractor parameters, and the platform takes care of optimizing the extraction process to save costs while maintaining accuracy. ### What are the scenarios where optimized extractors perform AI extraction and incur cost? Optimized extractors perform AI extraction in the following scenarios: * **Initial extraction**: When used for the first time on a new page or locator. * **Insufficient examples**: When collected examples are insufficient to generate reliable selectors. * **Page Structure Changes/Invalid Extracted Data**: When the page structure changes or expected data is not returned by static selectors. ### What are the limitations of optimized extractors? While optimized extractors offer a cost-effective solution for extracting structured data from webpages, they do have certain limitations: 1. Limited JSONSchema Support\*\*: Currently, optimized extractors have limited support for complex JSONSchema structures. They can handle basic objects (objects with string properties) using `extractObjectFromPage` and `extractObjectFromLocator`, and arrays of basic objects using `extractArrayFromPage` and `extractArrayFromLocator`. More complex schemas with nested objects or arrays, or properties with non-string types, are not yet supported. 2. Exact String Extraction: optimized extractors rely on the ability to create static selectors for optimization. To achieve this, the data being extracted must be exact strings that exist in the webpage. 3. In rare cases, the platform may not be able to generate reliable static selectors, even after collecting multiple examples. In such scenarios, the optimized Extractor will continue to operate in AI Extraction mode, incurring costs for on extraction. To avoid incurring unexpected costs, you can set limits on AI spend using labels. Its also worth mentioning that the Intuned team closely monitors these cases and works on continuously improving the selector generation algorithms. ### What is a `variantKey` and how to use it? In advanced scenarios, you may want to apply the same optimized extraction logic to different websites with varying page structures. To enable the Intuned platform to group examples effectively and create static extractors per group, we use the concept of variants. By default, the variant is determined by the origin of the webpage on which the extraction is performed. This means that examples collected from pages with the same origin will be grouped together to generate static extractors specific to that website. However, there may be cases where you need more fine-grained control over example grouping to make static extraction creation feasible. For instance, consider a situation where you want to extract data from two different pages that have the same origin but different structures. In such cases, you can manually provide a `variantKey` to differentiate between the two pages and ensure accurate example grouping. Here's how you can use the `variantKey`: 1. Identify the webpages or sections that require different example grouping, even though they have the same origin. 2. Assign a unique `variantKey` to each distinct webpage or section. The `variantKey` should be a string that meaningfully identifies the specific variation of the page structure. 3. When calling the optimized Extractor functions (`extractObjectFromPage`, `extractObjectFromLocator`, `extractArrayFromPage`, or `extractArrayFromLocator`), pass the corresponding `variantKey` as an optional parameter. 4. The Intuned platform will use the provided `variantKey` to group examples separately for each variant, enabling the creation of static extractors tailored to each specific page structure. By utilizing the `variantKey`, you can effectively handle situations where the same optimized extraction logic needs to be applied to pages with different structures, even if they share the same origin. This allows for more precise example grouping and enables the generation of static extractors that are specific to each variant of the page structure. It's important to note that the `variantKey` should be used only when necessary. In most cases, the default behavior of grouping examples by the page origin is sufficient. However, when dealing with complex websites or when you require more granular control over example grouping, the `variantKey` provides a powerful mechanism to optimize the extraction process and ensure accurate results. ## When to use optimized extractors vs AI extractors * Use AI extractors when: * Extracting non-exact strings (e.g., booleans, summaries). * Dealing with complex schemas. * Expecting a small number of executions. * Use optimized extractors when: * Cost is a significant factor. * Expecting a high number of runs. * Page structure is very similar across executions. Choose the appropriate extractor based on your specific requirements, considering factors such as data complexity, execution frequency, and cost optimization. ## What are labels and how to use them? Labels are used to identify and differentiate extractors for billing and monitoring purposes. Assign a unique label to each extractor to track its usage and costs effectively. You can also use labels to set limits on AI spend per extractor. More on this later. ## What is `strategy` and how should it be used? The `strategy` parameter in the extractor functions allows you to control two key aspects of the extraction process: 1. **Web extraction method:** It determines how data is extracted from the webpage before passing it to the LLM. Currently supported strategies are: * `"HTML"`: *This is the default option*. Uses the HTML source of the page or locator for extraction. This strategy is suitable when the desired data is present within the HTML elements and is best extracted based on the DOM structure. * `"IMAGE"`: Uses screenshots of the page or locator for extraction. This strategy is useful when the information you want to extract is primarily visual and not easily identifiable in the HTML structure. 2. **LLM selection:** The strategy also influences the choice of the LLM to use for extraction, which directly impacts the cost. The `model` property within the strategy allows you to specify the desired model. Options are: `"claude-3-opus"`, `"claude-3-sonnet"`, `"claude-3-haiku"`, `"gpt4-turbo"`, or `"gpt3.5-turbo"`. By default, the `"claude-3-haiku"` model is used. When deciding on the strategy to use, consider the following factors: * Nature of the page: If the information you want to extract is mainly visual or not easily accessible through the HTML structure, use the `"IMAGE"` strategy. If the data is well-structured within the HTML elements, the `"HTML"` strategy is more suitable. * Cost considerations: The AI model used for extraction directly affects the cost incurred. Overall, we suggest that you start with the default strategy (method and model) and iterate based on the results. For more details, see [extractArrayFromPage](/automation-sdks/intuned-sdk/optimized-extractors/functions/extractArrayFromPage), [extractArrayFromLocator](/automation-sdks/intuned-sdk/optimized-extractors/functions/extractArrayFromLocator), [extractObjectFromPage](/automation-sdks/intuned-sdk/optimized-extractors/functions/extractObjectFromPage), and [extractObjectFromLocator](/automation-sdks/intuned-sdk/optimized-extractors/functions/extractObjectFromLocator). ## extractStructuredDataFromContent `extractStructuredDataFromContent` enables extracting data from arbitrary content, useful when you want to extract structured data from some text or an image. ```typescript extractStructuredDataFromContent - Text const result = extractStructuredDataFromContent({ type: "text", data: `"To Kill a Mockingbird" is a fiction novel written by Harper Lee. Published in 1960, this classic book delves into the themes of racial injustice and moral growth. The story is set in the American South during the 1930s. The book's ISBN is 978-0-06-112008-4.` }, { label: "book", model: "claude-3-haiku", dataSchema: { "type": "object", "properties": { "title": { "type": "string", "description": "The title of the book" }, "author": { "type": "string", "description": "The author of the book" }, "published_year": { "type": "integer", "description": "The year the book was published" }, "genre": { "type": "string", "description": "The genre of the book" }, "ISBN": { "type": "string", "description": "The International Standard Book Number of the book" } }, "required": ["title", "author", "published_year", "genre", "ISBN"] } }) // { // title: 'To Kill a Mockingbird', // author: 'Harper Lee', // published_year: 1960, // genre: 'fiction', // ISBN: '978-0-06-112008-4' // } ``` ```typescript extractStructuredDataFromContent - Image const result2 = extractStructuredDataFromContent({ type: "image-url", image_type: "png", data: "https://intuned-docs-public-images.s3.amazonaws.com/guides/book-details.png" }, { label: "image", model: "claude-3-haiku", dataSchema: { "type": "object", properties: { title: { type: "string", }, in_stock: { type: "boolean" } }, "required": ["title", "in_stock"] } }); // { title: 'A Light in the Attic', in_stock: true } ``` For more details, see [extractStructuredDataFromContent](/automation-sdks/intuned-sdk/ai-extractors/functions/extractStructuredDataFromContent). # Concepts and terminology ## What is Intuned? Intuned is the browser automation platform for developers and product teams. Our mission is to bridge the API gap when official APIs are not available. Developers use Intuned to develop, deploy, and monitor reliable browser automations. ## Projects in Intuned ### What is an Intuned project? A Project is a set of APIs and settings that are encapsulated as a single entity. A project can be created, edited, and deployed. * Projects have names, and each project's name should be unique within the workspace. * Projects need to be deployed to be consumed and used. ### What is an API in Intuned? Intuned projects consist of a set of APIs. An API is a function that can be called to execute a specific action or extract specific data. You define APIs within Intuned projects as code. ## Authenticated projects in Intuned To extract data or take actions on behalf of a user, APIs needs to work in the context of a user. To obtain user context, you must be authorized to log into the target service with the user's identity (Auth Session). Intuned streamlines creating and maintaining authenticated integrations. ### Can an project have Authenticated and Non-Authenticated APIs? Authentication adds overhead to API executions. For this reason, we recommend that a project's APIs are all authenticated or all unauthenticated. ## Workspace in Intuned A workspace is the top-level logical entity that allows you to govern access control over your Intuned resources. Each workspace can have more than one project, and multiple users can be in the same workspace. In general, we recommend that a single company or team share a workspace. ### Can a user have access to more than one workspace? Yes, users have access to more than one workspace. Once logged in, you can switch between the workspaces they are a member of. ## Consuming a project Once a project is deployed, it can be consumed by directly calling the APIs within the project or using Jobs and Queues as a way to orchestrator calling these APIs. ### Sync API HTTP API that can be called synchronously. A synchronous API will return the result of the run in the same HTTP call that triggered the run. ### Async API HTTP API that can be called asynchronously. This means that the API will return a run id that can be used to check the status/result of the run. ### Jobs API Jobs are a higher-level abstraction on top of the APIs. One of the main use cases for it is scrapers that need to run regularly. Jobs can be created, deleted, and triggered via the Jobs API or Intuned's UX. Each job has a schedule (when to run), a sink (where to send data to), a configuration (how to run), and a payload (what to run). ## Monitoring in Intuned Intuned provides a monitoring functionality that allows you to monitor the usage and reliability of your projects. # Welcome to Intuned The browser automation platform for developers and product teams. Build your first automation project! See Intuned in action Let us help } href="https://join.slack.com/t/intuned-users/shared_invite/zt-2k6bjpzyo-~6ez73_z8cR8I87H~qYDTQ"> Join us! ## Introduction Building integrations and scrapers when official APIs are not available is not easy.. Intuned is a platform to develop, deploy, and monitor reliable browser automation when official APIs are not available - *Reliability Powered by AI.* Intuned's automation projects are code-based functions, written in typescript and work by interacting with a browser via a powerful runtime ([playwright](/automation-sdks/playwright/overview) + [@intuned/sdk](/automation-sdks/intuned-sdk/overview)). We have an [automation IDE](/docs/platform/develop/ide) to make the development of those projects easier, we also handle (deploying)\[/docs/platform/deploy] them and exposing them as APIs for [easy consumption](/docs/platform/consume). You also get [monitoring](/docs/platform/monitor) out of the box! Last but not least, we have a deep focus on reliability and we use AI to do that. Build your first automation project! ## How does it work The steps below describe how the Intuned platform works at a high level. Intuned platform has an IDE built to speed up the process of building browser automations. Intuned IDE comes with custom features like action recording, selector creator and more! To learn more checkout [Develop](/docs/platform/develop/overview).

Intuned's projects can be deployed to with a single click and within a minute. To learn more checkout [Deploy](/docs/platform/deploy).

Start consuming your projects by making API calls directly or scheduling jobs for scraping tasks. To learn more checkout [Consume](/docs/platform/consume)

Get proactive alerts on failures - with access to previous runs' traces, recordings, and results. To learn more checkout [Monitor](/docs/platform/monitor)

## Why Intuned? Ready to get started? checkout our [Quick start](./quick-start) guide # Quick start Intuned is the browser automation platform for developers and product teams. Follow this step by step tutorial to build, deploy, and call your first browser automaton project on Intuned. ### 1. Create a workspace [Contact us](https://cal.com/forms/d01e34f3-5ef7-4057-8a3c-701dfa2d4f28) to create a workspace. Its free to try! A workspace is the top level logical entity allows you to govern access control over your Intuned resources. To read more, checkout [Workspace](/docs/platform/manage). ### 2. Create a project Projects are the core building block in Intuned. Each workspace can have one or more projects. Use a project to build a scraper or, an automation or an integration with website that lacks APIs. Open your workspace and create a new project with name `new-project`. ![New Project](https://intuned-docs-public-images.s3.amazonaws.com/quick-start/new-project-full.gif) ### 3. Create Books API APIs are the building blocks of your project. They are the functions that you will call to interact with the browser. Create new API (`api/books.ts`). ![New API](https://intuned-docs-public-images.s3.amazonaws.com/quick-start/new-api-full.gif) Copy the following code into the newly created `api/books.ts` file. ```typescript import { BrowserContext, Page } from "@intuned/playwright-core"; import { extendPlaywrightPage } from "@intuned/sdk/playwright"; interface Params { // Add your params here category: string; } export default async function handler( params: Params, _playwrightPage: Page, context: BrowserContext ) { const page = extendPlaywrightPage(_playwrightPage); await page.goto("https://books.toscrape.com/"); // playwright logic await page.getByRole("link", { name: params.category }).click(); // @intuned/sdk helper! const result = await page.extractArrayOptimized({ itemEntityName: "book", label: "books-scraper", itemEntitySchema: { type: "object", properties: { name: { type: "string", description: "name of the book", primary: true, }, price: { type: "string", description: "price of the book. An example is £26.80", }, }, required: ["name", "price"], }, }); return result; } ``` ### 4. Run the API Test your API in the Intuned IDE. Pick the `books` api from dropdown. ![Pick api](https://intuned-docs-public-images.s3.amazonaws.com/quick-start/pick-api.gif) Click the Run Button. ![Run Button](https://intuned-docs-public-images.s3.amazonaws.com/quick-start/run-button-full.png) Create a new parameters set for the API you just created. Intuned enables you to create multiple parameter sets for the same API. This helps you to test and iterate on API. Create param set "Novels" and copy the following: ```json { "category": "Novels" } ``` ![New Parameters](https://intuned-docs-public-images.s3.amazonaws.com/quick-start/new-params-full.png) Click the Run Button again that the first param set is created. ![Run Button](https://intuned-docs-public-images.s3.amazonaws.com/quick-start/run-button-full.png) After the API run is complete. You can look at the returned result in the terminal. ![Result](https://intuned-docs-public-images.s3.amazonaws.com/quick-start/api-results-full.png) ### 5. Deploy it Intuned enabled you to deploy your project with a click of a button. Lets do it! ![Deploy](https://intuned-docs-public-images.s3.amazonaws.com/quick-start/deployment-full.gif) To learn more about deployments, checkout [Deployments](/docs/platform/deploy). ### 6. Call your API Now that your project API is deployed you have [multiple ways](/docs/platform/consume) to call it. You can call the API directly or schedule a job to run it at a specific time. For now, we will call the API directly. You can create an API key by going to [https://app.intuned.io/api-keys](https://app.intuned.io/api-keys) ![API Key](https://intuned-docs-public-images.s3.amazonaws.com/quick-start/create-api-key.gif) You can find it by going to [https://app.intuned.io/settings/workspace](https://app.intuned.io/settings/workspace) ![Workspace Id](https://intuned-docs-public-images.s3.amazonaws.com/quick-start/create-workspace-id.gif) Now that you have your API key and your workspace Id, you are ready to call the API. Intuned exposes a REST API that you can call either call directly or use the [`@intuned/client`](/client-apis) to call. ```bash REST API # Replace and with your workspace id and api key from step above. curl --request POST \ --url https://app.intuned.io/api/v1/workspace//projects/new-project/run \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "api": "books", "parameters": { "category": "Novels" } }' ``` ```typescript Typescript Client SDK // Replace and with your workspace id and api key from step above. // you can install the client sdk by running `npm install @intuned/client` or `yarn add @intuned/client` import { IntunedClient } from "@intuned/client"; const intunedClient = new IntunedClient({ apiKey: "", workspaceId: "", }); async function run() { const result = await intunedClient.project.run.sync("new-project", { api: "books", parameters: { "category": "Novels" }, }); // Handle the result console.log(result) } run(); ``` # How to use Credentials-based auth sessions ## Goal In this how to guide, we will go over how to use Credentials-based auth sessions. For this example, we will use [OrangeHRM (demo site)](https://opensource-demo.orangehrmlive.com/web/index.php/auth/login) as the target service. Follow this guide step by step or you can use the `Credentials based auth sessions` project template to get a jump start. You can also watch a walkthrough of this guide below: