This function has multiple overloads
- Extract From Page or Locator
- Extract From Content
Extracts structured data from web pages using AI-powered content analysis.This function provides intelligent data extraction from web pages using various strategies
including HTML parsing, image analysis, and Markdown conversion. Or by using Text or Image Content.
It supports extraction from entire pages or specific elements, with built-in caching and retry mechanisms.Extract data from web pages or specific elements using HTML, IMAGE, or MARKDOWN strategies with DOM matching support.
Features and limitations
Features:- Smart caching: Hashes inputs and uses KV Cache for persistent storage
- DOM matching: With
enableDomMatching=true, values match DOM elements for smart caching - Multiple strategies: HTML, IMAGE, or MARKDOWN based on content type
- Flexible models: Use any up-to-date model from Anthropic, OpenAI, or Google based on your needs
- Model variability: Quality varies by model—experiment to find the best fit
- DOM complexity: Dynamic structures can affect caching and matching
- IMAGE strategy constraints: Can’t capture truncated or off-screen content
- Schema design: Complex schemas may reduce accuracy
Examples
Arguments
Configuration object containing extraction parameters