ayakashivspuppeteer-extra
Ayakashi is a web scraping library for Node.js that allows developers to easily extract structured data from websites. It is built on top of the popular "puppeteer" library and provides a simple and intuitive API for defining and querying the structure of a website.
Features:
- Powerful querying and data models
Ayakashi's way of finding things in the page and using them is done with props and domQL. Directly inspired by the relational database world (and SQL), domQL makes DOM access easy and readable no matter how obscure the page's structure is. Props are the way to package domQL expressions as re-usable structures which can then be passed around to actions or to be used as models for data extraction. - High level builtin actions
Ready made actions so you can focus on what matters. Easily handle infinite scrolling, single page navigation, events and more. Plus, you can always build your own actions, either from scratch or by composing other actions. - Preload code on pages
Need to include a bunch of code, a library you made or a 3rd party module and make it available on a page? Preloaders have you covered.
Puppeteer-extra is a modular plugin framework that wraps Puppeteer (and Playwright) to add extra functionality through a plugin system. It acts as a drop-in replacement for Puppeteer while enabling powerful extensions for stealth, captcha solving, ad blocking, and more.
The most popular plugins include:
- puppeteer-extra-plugin-stealth Applies various evasion techniques to make the automated browser harder to detect. Patches common detection vectors like navigator.webdriver, Chrome.runtime, WebGL renderer strings, and more. This is the most widely used Puppeteer stealth solution.
- puppeteer-extra-plugin-recaptcha Automatically detects and solves reCAPTCHA and hCaptcha challenges using third-party solving services (2captcha, anti-captcha).
- puppeteer-extra-plugin-adblocker Blocks ads and trackers to speed up page loading and reduce bandwidth usage during scraping.
- puppeteer-extra-plugin-anonymize-ua Randomizes the User-Agent string to avoid fingerprinting.
Key features of the framework:
- Drop-in replacement
Use
puppeteer-extrainstead ofpuppeteerin your imports - existing code works without changes. - Plugin composition Multiple plugins can be stacked and they work together without conflicts.
- Playwright support
The same plugin system works with Playwright via
playwright-extra. - Community plugins Active community creating and maintaining plugins for various use cases.
Puppeteer-extra is the go-to solution for adding stealth capabilities to Puppeteer-based scrapers without rewriting existing code.
Highlights
plugin-systemextendiblecommunity-toolsstealth
Example Use
```javascript
const ayakashi = require("ayakashi");
const myAyakashi = ayakashi.init();
// navigate the browser
await myAyakashi.goTo("https://example.com/product");
// parsing HTML
// first by defnining a selector
myAyakashi
.select("productList")
.where({class: {eq: "product-item"}});
// then executing selector on current HTML:
const productList = await myAyakashi.extract("productList");
console.log(productList);
```
```javascript
const puppeteer = require('puppeteer-extra');
// Add stealth plugin to avoid bot detection
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
// Add recaptcha solving plugin
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha');
puppeteer.use(RecaptchaPlugin({
provider: { id: '2captcha', token: 'YOUR_API_KEY' },
}));
(async () => {
// Launch browser - stealth is applied automatically
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com/login');
// If there's a captcha, it will be solved automatically
const { solved } = await page.solveRecaptchas();
console.log(`Solved ${solved.length} captchas`);
// Regular Puppeteer API works as normal
await page.type('#username', 'user@example.com');
await page.type('#password', 'password');
await page.click('#login-button');
await page.waitForNavigation();
console.log('Logged in:', page.url());
await browser.close();
})();
```
Alternatives / Similar
katana
new
puppeteer-extra
new
crawl4ai
new
scrapling
new
crawlee
new
mechanize
new
scrapegraphai
new
botasaurus
new
goutte
new
kimurai
new
firecrawl
new