Skip to content

crawleevsbotasaurus

Apache-2.0 175 26 22,720
341.9 thousand (month) Apr 22 2022 3.16.0(2026-04-09 07:36:53 ago)
4,321 5 52 MIT
Oct 01 2023 35.5 thousand (month) 4.0.97(2026-01-06 07:45:54 ago)

Crawlee is a modern web scraping and browser automation framework for JavaScript and TypeScript, built by Apify. It is the successor to the Apify SDK and provides a unified interface for building reliable web scrapers and crawlers that can scale from simple scripts to large-scale data extraction projects.

Crawlee supports multiple crawling strategies through different crawler classes:

  • CheerioCrawler For fast, lightweight HTML scraping using Cheerio (no browser needed). Best for static pages.
  • PlaywrightCrawler Uses Playwright for full browser automation. Handles JavaScript-rendered pages, SPAs, and complex interactions.
  • PuppeteerCrawler Similar to PlaywrightCrawler but uses Puppeteer as the browser automation backend.
  • HttpCrawler Minimal crawler for raw HTTP requests without HTML parsing.

Key features include:

  • Automatic request queue management with configurable concurrency and rate limiting
  • Built-in proxy rotation with session management
  • Persistent request queue and dataset storage (local or cloud via Apify)
  • Automatic retry and error handling with configurable strategies
  • TypeScript-first design with full type safety
  • Middleware-like request/response hooks (preNavigationHooks, postNavigationHooks)
  • Output pipelines for storing extracted data
  • Easy deployment to Apify cloud platform

Crawlee is considered the most feature-complete web scraping framework in the JavaScript/TypeScript ecosystem, comparable to Python's Scrapy but with native browser automation support.

Botasaurus is an all-in-one Python web scraping framework that combines browser automation, anti-detection, and scaling features into a single package. It aims to simplify the entire web scraping workflow from development to deployment.

Key features include:

  • Anti-detect browser Ships with a stealth-patched browser that passes common bot detection tests. Automatically handles fingerprinting, user agent rotation, and other anti-detection measures.
  • Decorator-based API Uses Python decorators (@browser, @request) to define scraping tasks, making code clean and easy to organize.
  • Built-in parallelism Easy parallel execution of scraping tasks across multiple browser instances with configurable concurrency.
  • Caching Built-in caching layer to avoid re-scraping pages during development and debugging.
  • Profile persistence Can save and reuse browser profiles (cookies, localStorage) across scraping sessions for maintaining login state.
  • Output handling Automatic output to JSON, CSV, or custom formats with built-in data filtering.
  • Web dashboard Includes a web UI for monitoring scraping progress, viewing results, and managing tasks.

Botasaurus is designed for developers who want a batteries-included framework that handles anti-detection automatically, without needing to manually configure stealth settings or manage browser fingerprints.

Highlights


populartypescriptextendiblemiddlewaresoutput-pipelineslarge-scaleproxy
anti-detectstealthlarge-scale

Example Use


```javascript import { PlaywrightCrawler, Dataset } from 'crawlee'; // Create a crawler with Playwright for JS rendering const crawler = new PlaywrightCrawler({ // Limit concurrency to avoid overwhelming the target maxConcurrency: 5, // This function is called for each URL async requestHandler({ request, page, enqueueLinks }) { const title = await page.title(); // Extract data from the page const products = await page.$$eval('.product', (els) => els.map((el) => ({ name: el.querySelector('.name')?.textContent, price: el.querySelector('.price')?.textContent, })) ); // Store extracted data await Dataset.pushData({ url: request.url, title, products, }); // Follow links to crawl more pages await enqueueLinks({ globs: ['https://example.com/products/**'], }); }, }); // Start crawling await crawler.run(['https://example.com/products']); ```
```python from botasaurus.browser import browser, Driver from botasaurus.request import request, Request # Browser-based scraping with anti-detection @browser(parallel=3, cache=True) def scrape_products(driver: Driver, url: str): driver.get(url) # Wait for content to load driver.wait_for_element(".product-list") # Extract product data products = [] for el in driver.select_all(".product-card"): products.append({ "name": el.select(".product-name").text, "price": el.select(".product-price").text, "url": el.select("a").get_attribute("href"), }) return products # HTTP-based scraping (no browser needed) @request(parallel=5, cache=True) def scrape_api(req: Request, url: str): response = req.get(url) return response.json() # Run the scraper results = scrape_products( ["https://example.com/page/1", "https://example.com/page/2"] ) ```

Alternatives / Similar


Was this page helpful?