Converting web pages/html to markdown.

Intuned provides utilities to convert web pages to markdown. Markdown is a particularly good format for working with LLMs. For more info checkout: extractMarkdownFromPage reference and extractMarkdownFromLocator reference.

await page.goto("https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html");
const siteMarkdown = extractMarkdownFromPage(page);

// [Books to Scrape](../../index.html) We love being scraped!
// - [Home](../../index.html)
// - [Books](../category/books_1/index.html)
// - [Poetry](../category/books/poetry_23/index.html)
// - A Light in the Attic
// ![A Light in the Attic](../../media/cache/fe/72/fe72f0532301ec28892ae79a629a293c.jpg)
// # A Light in the Attic
// £51.77
// \_\_ In stock (22 available)

Converting files to markdown

You can also use File Markdown Conversion as a standalone API. Checkout Standalone File APIs for more info.

Intuned provides utilities to convert files to markdown. Markdown is a particularly good format for working with LLMs. For more info checkout: extractMarkdownFromFile reference.

const specMarkdown = await extractMarkdownFromFile({
  type: "pdf",
  source: {
    type: "url",
    data: "https://intuned-docs-public-images.s3.amazonaws.com/27UP600_27UP650_ENG_US.pdf"
  },
}, { label: "pdf_markdown" });

// LG
// Life's Good
// # OWNER'S MANUAL
// LED LCD MONITOR
// \(LED Monitor\*\)
// \* LG LED Monitor applies LCD screen with LED backlights. Please read this manual carefully before operating your set and retain it for future reference.
// 27UP600
// 27UP650
// ....

Extracting tables from files

You can also use Table Extraction as a standalone API. Checkout Standalone File APIs for more info.

Intuned provides utilities to extract tables from files. Tables are some of the common elements in data-rich files. For more info on how to use this, checkout extractTablesFromFile reference.

const fileTables = await extractTablesFromFile({
  type: "pdf",
  source: {
    type: "url",
    data: "https://intuned-docs-public-images.s3.amazonaws.com/27UP600_27UP650_ENG_US.pdf"
  },
}, { label: "pdf_markdown" })

// [
//  {
//    pageNumber: 2,
//    title: 'PRODUCT SPECIFICATION 27UP600',
//    content: [
//      [Array], [Array], [Array],
//      [Array], [Array], [Array],
//      [Array], [Array], [Array],
//      [Array], [Array], [Array],
//      [Array], [Array], [Array]
//    ]
//  }
// ]