Skip to content

ayakashivscrawlee

AGPL-3.0-only 8 1 217
166 (month) Apr 18 2019 1.0.0-beta8.4(2023-06-29 12:37:12 ago)
22,720 26 175 Apache-2.0
Apr 22 2022 341.9 thousand (month) 3.16.0(2026-04-09 07:36:53 ago)

Ayakashi is a web scraping library for Node.js that allows developers to easily extract structured data from websites. It is built on top of the popular "puppeteer" library and provides a simple and intuitive API for defining and querying the structure of a website.

Features:

  • Powerful querying and data models
    Ayakashi's way of finding things in the page and using them is done with props and domQL. Directly inspired by the relational database world (and SQL), domQL makes DOM access easy and readable no matter how obscure the page's structure is. Props are the way to package domQL expressions as re-usable structures which can then be passed around to actions or to be used as models for data extraction.
  • High level builtin actions
    Ready made actions so you can focus on what matters. Easily handle infinite scrolling, single page navigation, events and more. Plus, you can always build your own actions, either from scratch or by composing other actions.
  • Preload code on pages
    Need to include a bunch of code, a library you made or a 3rd party module and make it available on a page? Preloaders have you covered.

Crawlee is a modern web scraping and browser automation framework for JavaScript and TypeScript, built by Apify. It is the successor to the Apify SDK and provides a unified interface for building reliable web scrapers and crawlers that can scale from simple scripts to large-scale data extraction projects.

Crawlee supports multiple crawling strategies through different crawler classes:

  • CheerioCrawler For fast, lightweight HTML scraping using Cheerio (no browser needed). Best for static pages.
  • PlaywrightCrawler Uses Playwright for full browser automation. Handles JavaScript-rendered pages, SPAs, and complex interactions.
  • PuppeteerCrawler Similar to PlaywrightCrawler but uses Puppeteer as the browser automation backend.
  • HttpCrawler Minimal crawler for raw HTTP requests without HTML parsing.

Key features include:

  • Automatic request queue management with configurable concurrency and rate limiting
  • Built-in proxy rotation with session management
  • Persistent request queue and dataset storage (local or cloud via Apify)
  • Automatic retry and error handling with configurable strategies
  • TypeScript-first design with full type safety
  • Middleware-like request/response hooks (preNavigationHooks, postNavigationHooks)
  • Output pipelines for storing extracted data
  • Easy deployment to Apify cloud platform

Crawlee is considered the most feature-complete web scraping framework in the JavaScript/TypeScript ecosystem, comparable to Python's Scrapy but with native browser automation support.

Highlights


populartypescriptextendiblemiddlewaresoutput-pipelineslarge-scaleproxy

Example Use


```javascript const ayakashi = require("ayakashi"); const myAyakashi = ayakashi.init(); // navigate the browser await myAyakashi.goTo("https://example.com/product"); // parsing HTML // first by defnining a selector myAyakashi .select("productList") .where({class: {eq: "product-item"}}); // then executing selector on current HTML: const productList = await myAyakashi.extract("productList"); console.log(productList); ```
```javascript import { PlaywrightCrawler, Dataset } from 'crawlee'; // Create a crawler with Playwright for JS rendering const crawler = new PlaywrightCrawler({ // Limit concurrency to avoid overwhelming the target maxConcurrency: 5, // This function is called for each URL async requestHandler({ request, page, enqueueLinks }) { const title = await page.title(); // Extract data from the page const products = await page.$$eval('.product', (els) => els.map((el) => ({ name: el.querySelector('.name')?.textContent, price: el.querySelector('.price')?.textContent, })) ); // Store extracted data await Dataset.pushData({ url: request.url, title, products, }); // Follow links to crawl more pages await enqueueLinks({ globs: ['https://example.com/products/**'], }); }, }); // Start crawling await crawler.run(['https://example.com/products']); ```

Alternatives / Similar


Was this page helpful?