DOMHarvestPlaywright-powered web scraping

Extract DOM elements with precision and speed

Get Started

View on GitHub

🎯

Precise Extraction

Extract DOM elements using CSS selectors or custom functions with Playwright's powerful automation.

🚀

Fast & Reliable

Built on Playwright for consistent, cross-browser web scraping with excellent performance.

📦

Simple API

Clean, intuitive API that makes web scraping straightforward and enjoyable.

🔧

Highly Configurable

Customize timeouts, headless mode, and extraction logic to fit your needs.

✨

Modern JavaScript

Uses ES modules and modern JavaScript features for a clean development experience.

🧪

Well Tested

Comprehensive test suite ensuring reliability and stability.

Quick Example

javascript

import { harvest } from 'domharvest-playwright'

// Extract quotes from quotes.toscrape.com (a site designed for scraping practice)
const quotes = await harvest(
  'https://quotes.toscrape.com/',
  '.quote',
  (el) => ({
    text: el.querySelector('.text')?.textContent?.trim(),
    author: el.querySelector('.author')?.textContent?.trim(),
    tags: Array.from(el.querySelectorAll('.tag')).map(tag => tag.textContent?.trim())
  })
)

console.log(quotes)
// Output: Array of 10 quotes with authors and tags

Why DOMHarvest?

DOMHarvest makes web scraping simple and reliable by leveraging Playwright's battle-tested browser automation. Whether you're building a data pipeline, monitoring websites, or extracting content for analysis, DOMHarvest provides the tools you need with minimal setup.

Features at a Glance

Easy to use: Simple API for common scraping tasks
Powerful: Access to full Playwright capabilities when needed
Flexible: Support for both simple selectors and custom extraction logic
Standard compliant: Follows JavaScript Standard Style
Well documented: Comprehensive guides and API documentation