Skip to main content

Recipe

This recipe shows how to extract structured data from web pages using AI without writing CSS or XPath selectors. Use extractStructuredData (TypeScript) or extract_structured_data (Python).
TypeScript
import { BrowserContext, Page } from "playwright-core";
import { extractStructuredData } from "@intuned/browser/ai";
import { z } from "zod";

// Define the schema for a single product
const ProductSchema = z.object({
  name: z.string().describe("Product name"),
  price: z.string().describe("Product price"),
  stock: z.string().describe("Stock status"),
  category: z.string().describe("Product category"),
});

// Define the schema for the list of products
const ProductsSchema = z.object({
  products: z.array(ProductSchema).describe("List of products from the table"),
});

export default async function handler(
  params: any,
  page: Page,
  context: BrowserContext
) {
  await page.goto("https://www.scrapingcourse.com/table-parsing");

  // Extract products using AI - no selectors needed
  const result = await extractStructuredData({
    source: page,
    dataSchema: ProductsSchema,
    prompt: "Extract all products from the table",
  });

  console.log(`Extracted ${result.products.length} products`);
  return result.products;
}

How it works

  1. Define a schema - Use Zod (TypeScript) or Pydantic (Python) to describe the data structure you want to extract
  2. Call the AI extractor - Pass the page and schema to extractStructuredData / extract_structured_data
  3. Get structured data - The AI analyzes the page and returns data matching your schema
No need to inspect the DOM, write selectors, or handle edge cases—the AI handles it all.