Skip to main content
Deprecated: This function is deprecated and will be removed in the future.
Extracts an array of structured data from a web page in an optimized way. This function uses AI for the first few extractions until it collects multiple examples, then builds reliable selectors in the background for improved efficiency.
export declare function extractArrayFromPage(
  page: Page,
  options: {
    label: string;
    itemEntityName: string;
    itemEntitySchema: SimpleArrayItemSchema;
    strategy?: ImageStrategy | HtmlStrategy;
    prompt?: string;
    optionalPropertiesInvalidator?: (
      result: Record<string, string>[]
    ) => string[];
    variantKey?: string;
    apiKey?: string;
  }
): Promise<Record<string, string>[]>;

Examples

import { extractArrayFromPage } from "@intuned/browser/optimized-extractors";

await page.goto("https://books.toscrape.com/");
const books = await extractArrayFromPage(page, {
  strategy: {
    model: "gpt4-turbo",
    type: "HTML",
  },
  itemEntityName: "book",
  label: "books-extraction",
  itemEntitySchema: {
    type: "object",
    required: ["name"],
    properties: {
      name: {
        type: "string",
        description: "book name",
        primary: true,
      },
    },
  },
});

console.log(books);

// output:
// [
// ...
// { name: 'Olio' },
// { name: 'Mesaerion: The Best Science Fiction Stories 1800-1849' },
// { name: 'Libertarianism for Beginners' },
// { name: "It's Only the Himalayas" }
// ...
// ]

Arguments

page
any
required
The Playwright Page object from which to extract the data.
options
object
required

Returns: any

A promise that resolves to a list of extracted data.