crawleevsruia
Crawlee is a modern web scraping and browser automation framework for JavaScript and TypeScript, built by Apify. It is the successor to the Apify SDK and provides a unified interface for building reliable web scrapers and crawlers that can scale from simple scripts to large-scale data extraction projects.
Crawlee supports multiple crawling strategies through different crawler classes:
- CheerioCrawler For fast, lightweight HTML scraping using Cheerio (no browser needed). Best for static pages.
- PlaywrightCrawler Uses Playwright for full browser automation. Handles JavaScript-rendered pages, SPAs, and complex interactions.
- PuppeteerCrawler Similar to PlaywrightCrawler but uses Puppeteer as the browser automation backend.
- HttpCrawler Minimal crawler for raw HTTP requests without HTML parsing.
Key features include:
- Automatic request queue management with configurable concurrency and rate limiting
- Built-in proxy rotation with session management
- Persistent request queue and dataset storage (local or cloud via Apify)
- Automatic retry and error handling with configurable strategies
- TypeScript-first design with full type safety
- Middleware-like request/response hooks (preNavigationHooks, postNavigationHooks)
- Output pipelines for storing extracted data
- Easy deployment to Apify cloud platform
Crawlee is considered the most feature-complete web scraping framework in the JavaScript/TypeScript ecosystem, comparable to Python's Scrapy but with native browser automation support.
Ruia is an async web scraping micro-framework, written with asyncio and aiohttp, aims to make crawling url as convenient as possible.
Ruia is inspired by scrapy however instead of Twisted it's based entirely on asyncio and aiohttp.
It also supports various features like cookies, headers, and proxy, which makes it very useful in dealing with complex web scraping tasks.