Skip to content

crawleevsspidr

Apache-2.0 175 26 22,720
341.9 thousand (month) Apr 22 2022 3.16.0(2026-04-09 07:36:53 ago)
835 2 16 MIT
Jul 25 2009 2.4 thousand (month) 0.7.2(2025-02-03 07:58:27 ago)

Crawlee is a modern web scraping and browser automation framework for JavaScript and TypeScript, built by Apify. It is the successor to the Apify SDK and provides a unified interface for building reliable web scrapers and crawlers that can scale from simple scripts to large-scale data extraction projects.

Crawlee supports multiple crawling strategies through different crawler classes:

  • CheerioCrawler For fast, lightweight HTML scraping using Cheerio (no browser needed). Best for static pages.
  • PlaywrightCrawler Uses Playwright for full browser automation. Handles JavaScript-rendered pages, SPAs, and complex interactions.
  • PuppeteerCrawler Similar to PlaywrightCrawler but uses Puppeteer as the browser automation backend.
  • HttpCrawler Minimal crawler for raw HTTP requests without HTML parsing.

Key features include:

  • Automatic request queue management with configurable concurrency and rate limiting
  • Built-in proxy rotation with session management
  • Persistent request queue and dataset storage (local or cloud via Apify)
  • Automatic retry and error handling with configurable strategies
  • TypeScript-first design with full type safety
  • Middleware-like request/response hooks (preNavigationHooks, postNavigationHooks)
  • Output pipelines for storing extracted data
  • Easy deployment to Apify cloud platform

Crawlee is considered the most feature-complete web scraping framework in the JavaScript/TypeScript ecosystem, comparable to Python's Scrapy but with native browser automation support.

Spidr is a Ruby gem that provides a simple and flexible way to spider and scrape websites. It allows you to easily navigate through a website, following links and scraping data as you go. It is built on top of Nokogiri, a popular Ruby gem for parsing and searching HTML and XML documents, and it provides a simple and intuitive API for defining and running web scraping operations.

One of the main features of Spidr is its ability to spider a website, following all the links on a page and visiting all the pages of a website. This allows you to easily and quickly scrape large amounts of data from a website, without having to manually specify which pages to visit.

In addition to its spidering capabilities, Spidr also provides a variety of other features that can simplify the web scraping process. It can automatically filter which links to follow and which pages to visit, it can handle cookies and authentication, and it can automatically store the scraped data in a database or file. It also provides a built-in support for parallelism and queueing to speed up the scraping process.

Highlights


populartypescriptextendiblemiddlewaresoutput-pipelineslarge-scaleproxy

Example Use


```javascript import { PlaywrightCrawler, Dataset } from 'crawlee'; // Create a crawler with Playwright for JS rendering const crawler = new PlaywrightCrawler({ // Limit concurrency to avoid overwhelming the target maxConcurrency: 5, // This function is called for each URL async requestHandler({ request, page, enqueueLinks }) { const title = await page.title(); // Extract data from the page const products = await page.$$eval('.product', (els) => els.map((el) => ({ name: el.querySelector('.name')?.textContent, price: el.querySelector('.price')?.textContent, })) ); // Store extracted data await Dataset.pushData({ url: request.url, title, products, }); // Follow links to crawl more pages await enqueueLinks({ globs: ['https://example.com/products/**'], }); }, }); // Start crawling await crawler.run(['https://example.com/products']); ```
```ruby require 'spidr' Spidr.start_at("http://example.com") do |spider| spider.every_page do |page| puts "Visiting: #{page.url}" # Extract data from the page using Nokogiri doc = Nokogiri::HTML(page.body) title = doc.css("title").text puts "Title: #{title}" end end ```

Alternatives / Similar


Was this page helpful?