Installation
npm Installation
Install DOMHarvest from npm:
bash
npm install domharvest-playwrightInstall Playwright Browsers
After installing the package, you need to install the Playwright browsers:
bash
npx playwright installThis downloads the necessary browser binaries (Chromium, Firefox, and WebKit) that Playwright uses for automation.
Verify Installation
Create a test file to verify everything is working:
javascript
// test.js
import { harvest } from 'domharvest-playwright'
const result = await harvest(
'https://example.com',
'h1',
(el) => ({ heading: el.textContent?.trim() })
)
console.log('Success!', result)Run it:
bash
node test.jsIf you see the output with the heading from example.com, you're all set!
Development Installation
If you want to contribute to DOMHarvest or run it from source:
bash
# Clone the repository
git clone https://github.com/domharvest/domharvest-playwright.git
cd domharvest-playwright
# Install dependencies
npm install
# Install Playwright browsers
npm run playwright:install
# Run tests to verify
npm test
# Run examples
node examples/basic-example.jsSystem Requirements
- Node.js: Version 18 or higher
- Operating Systems:
- macOS (Intel and Apple Silicon)
- Linux (Ubuntu, Debian, Fedora, CentOS)
- Windows 10+
- Disk Space: ~500MB for browser binaries
Troubleshooting
Playwright Installation Issues
If you encounter issues installing Playwright browsers:
bash
# Try with sudo on Linux/macOS
sudo npx playwright install
# Or set a custom download directory
PLAYWRIGHT_BROWSERS_PATH=~/pw-browsers npx playwright installHeadless Mode Issues
Some systems may have issues with headless mode. You can run in headed mode:
javascript
const harvester = new DOMHarvester({ headless: false })Timeout Errors
If you're scraping slow-loading pages, increase the timeout:
javascript
const harvester = new DOMHarvester({ timeout: 60000 }) // 60 secondsNext Steps
Now that you have DOMHarvest installed, check out the Quick Start Guide to begin scraping!