Skip to main content

Recipe

This recipe shows how to crawl websites and extract content as markdown using Crawl4AI with Intuned’s browser infrastructure.

Project structure

api/
  └── simple.py          # Simple crawling example
hooks/
  └── setup_context.py   # Browser context setup
utils/
  └── config.py          # Browser configuration
pyproject.toml         # Dependencies

Setup

pyproject.toml

[build-system]
requires = ["hatchling>=1.18.0"]
build-backend = "hatchling.build"

[project]
name = "default"
version = "0.0.1"
description = "Empty Intuned project"
readme = "README.md"
requires-python = ">=3.12,<3.13"
authors = [
  { name = "Intuned", email = "[email protected]" }
]
keywords = ["Python", "intuned-browser-sdk"]

dependencies = [
  "playwright==1.55.0",
  "intuned-runtime==1.3.10",
  "intuned-browser==0.1.9",
  "crawl4ai==0.7.7",
]


[tool.uv]
package = false

hooks/setup_context.py

Store the CDP URL so Crawl4AI can connect to Intuned’s browser:
from intuned_runtime import attempt_store


async def setup_context(*, api_name: str, api_parameters: str, cdp_url: str):
    attempt_store.set("cdp_url", cdp_url)

utils/config.py

Create the browser configuration for Crawl4AI using the CDP URL:
from crawl4ai import BrowserConfig
from intuned_runtime import attempt_store


def get_browser_config() -> BrowserConfig:
    cdp_url = attempt_store.get("cdp_url")

    return BrowserConfig(
        verbose=True,
        cdp_url=cdp_url,
        headless=False,
        accept_downloads=True,
    )

Crawl a single page

Crawl a single page and extract its content as markdown:
from playwright.async_api import Page
from typing import TypedDict
from crawl4ai import (
    AsyncWebCrawler,
    CrawlerRunConfig,
    DefaultMarkdownGenerator,
    PruningContentFilter,
    CrawlResult,
)
from utils.config import get_browser_config


class Params(TypedDict):
    pass


async def automation(page: Page, params: Params | None = None, **_kwargs):
    browser_config = get_browser_config()
    async with AsyncWebCrawler(config=browser_config) as crawler:
        crawler_config = CrawlerRunConfig(
            markdown_generator=DefaultMarkdownGenerator(
                content_filter=PruningContentFilter(),
            ),
        )
        result: CrawlResult = await crawler.arun(
            url="https://www.helloworld.org", config=crawler_config
        )
        return result.markdown.raw_markdown