html

← All adapters

html(options: HtmlOptions): HtmlAdapter

Extract data from HTML using CSS selectors (powered by cheerio), or read/write HTML files.

Transformer mode (in-memory HTML parsing):

// Extract text from title
.transform(html({ selector: 'title', extract: 'text' }))

// Extract multiple elements (returns array)
.transform(html({ selector: 'h2', extract: 'text' }))
// Result: ['First Heading', 'Second Heading', ...]

// Extract HTML content
.transform(html({ selector: '.content', extract: 'html' }))

// Extract attribute value
.transform(html({ selector: 'a', extract: 'attr', attr: 'href' }))

// Extract outer HTML (including element tag)
.transform(html({ selector: 'article', extract: 'outerHtml' }))

// Custom parsing from sub-field
.transform(html({
  selector: 'p',
  extract: 'text',
  from: (body) => body.htmlContent,
  to: (body, result) => ({ ...body, paragraphs: result })
}))

Source mode (read HTML files and extract):

// Read HTML file and extract title
.from(html({
  path: './page.html',
  selector: 'title',
  extract: 'text'
}))

// Extract multiple links from file
.from(html({
  path: './page.html',
  selector: 'a',
  extract: 'attr',
  attr: 'href'
}))
// Emits array: ['https://example.com', '/about', ...]

Destination mode (write HTML files):

// Write HTML string to file
.to(html({ path: './output.html' }))

// Dynamic paths with directory creation
.to(html({
  path: (exchange) => `./pages/${exchange.body.slug}.html`,
  createDirs: true
}))

// Append to HTML file
.to(html({
  path: './log.html',
  mode: 'append'
}))

Transformer Options (when no path provided):

OptionTypeDefaultDescription
selectorstringRequiredCSS selector to match elements
extract'text' | 'html' | 'attr' | 'outerHtml' | 'innerText' | 'textContent''text'What to extract from matched elements
attrstring--Attribute name (required when extract: 'attr')
from(body) => stringUses body or body.bodyExtract HTML string from exchange
to(body, result) => RReplaces bodyWhere to put extracted result

File Options (when path is provided):

All transformer options above, plus:

OptionTypeDefaultDescription
pathstring | (exchange) => stringRequiredFile path (static or dynamic)
mode'read' | 'write' | 'append''read' for source, 'write' for destinationFile operation mode
encodingBufferEncoding'utf-8'Text encoding
createDirsbooleanfalseCreate parent directories (destination only)
onParseError'fail' | 'abort' | 'drop''fail'How to handle an extraction failure (source only). See parse error handling.

Extract types:

  • text / innerText / textContent: Plain text content (strips HTML tags, removes <style> and <script>)
  • html: Inner HTML content
  • outerHtml: Element including its tag
  • attr: Attribute value (requires attr option)

Behavior:

  • Single match: Returns string
  • Multiple matches: Returns array of strings
  • No matches: Returns empty string
  • Source mode: Reads HTML file and extracts data using selector
  • Destination mode: Writes HTML string (from exchange.body or exchange.body.body) to file

Exported types: HtmlAdapter, HtmlOptions, HtmlResult