Skip to main content
export declare function extractArrayFromPage(
  page: Page,
  options: {
    label: string;
    itemEntityName: string;
    itemEntitySchema: SimpleArrayItemSchema;
    strategy?: ImageStrategy | HtmlStrategy;
    prompt?: string;
    optionalPropertiesInvalidator?: (
      result: Record<string, string>[]
    ) => string[];
    variantKey?: string;
    apiKey?: string;
  }
): Promise<Record<string, string>[]>;
Deprecated: This function is deprecated and will be removed in the future.
Extracts an array of structured data from a web page in an optimized way, this function will use ai for the first n times, until it collects multiple examples then it will build reliable selectors in the background to make the process more efficient

Examples

 import { extractArrayFromPage } from "@intuned/sdk/optimized-extractors";

 await page.goto("https://books.toscrape.com/")
 const books = await extractArrayFromPage(page,
   {
     strategy: {
       model: "gpt4-turbo",
       type: "HTML"
     },
     itemEntityName: "book",
     label: "books-extraction",
     itemEntitySchema: {
       type: "object",
       required: ["name"],
       properties: {
         name: {
           type: "string",
           description: "book name",
           primary: true
         }
       }
     }
   },
 )

 console.log(books)

 // output:
 // [
 // ...
 // { name: 'Olio' },
 // { name: 'Mesaerion: The Best Science Fiction Stories 1800-1849' },
 // { name: 'Libertarianism for Beginners' },
 // { name: "It's Only the Himalayas" }
 // ...
 // ]

Arguments

page
any
required
The Playwright Page object from which to extract the data.
options
object
required

Returns: any

A promise that resolves to a list of extracted data.