- From Content
- From Page or Locator
Examples
Arguments
Content to extract data from - can be a single content item or array of content items. Check ContentItem for more details.
Schema defining the expected structure of the extracted data. Can be either a Pydantic BaseModel class or a JSON Schema dictionary.
Optional prompt to guide the extraction process and provide more context. Defaults to None.
Maximum number of retry attempts on failures. Failures can be validation errors, API errors, output errors, etc. Defaults to 3.
Whether to enable caching of the extracted data. Defaults to True.
AI model to use for extraction. See SUPPORTED_MODELS for all supported models. Defaults to “claude-3-5-haiku-latest”.
Optional API key for AI extraction (if provided, will not be billed to your account). Defaults to None.
Returns: Any
The extracted structured data conforming to the provided schema.Key Features & Limitations
- No DOM Matching: This overload does not support DOM matching since it doesn’t operate on web pages.
- Smart Caching: Caching is based on content hash to avoid redundant API calls.
- Automatic Image Fetching: Image URLs are automatically fetched and converted to image buffers for processing.
- Batch Processing: Multiple content items can be processed together for comprehensive extraction.