Getting Started

Welcome to DOMHarvest! This guide will help you get up and running with DOM harvesting using Playwright.

What is DOMHarvest?

DOMHarvest is a lightweight, powerful tool for extracting data from web pages using Playwright's browser automation capabilities. It provides a simple API for common web scraping tasks while giving you access to Playwright's full power when needed.

Prerequisites

Before you begin, ensure you have:

Node.js 18 or higher installed
Basic knowledge of JavaScript and CSS selectors
Familiarity with async/await syntax

Installation

See the Installation Guide for detailed setup instructions.

Your First Harvest

Let's extract all paragraph texts from a web page:

javascript

import { harvest } from 'domharvest-playwright'

const paragraphs = await harvest(
  'https://example.com',
  'p',
  (el) => ({ text: el.textContent?.trim() })
)

console.log(paragraphs)

That's it! You've just extracted data from a web page with three simple parameters:

URL: The page to scrape
Selector: CSS selector for the elements you want
Extractor: Function to transform each element into the data you need

Next Steps

Learn more in the Quick Start Guide
Explore Examples for common use cases
Check the API Reference for detailed documentation

Getting Started ​

What is DOMHarvest? ​

Prerequisites ​

Installation ​

Your First Harvest ​

Next Steps ​