Skip to content

botasaurusvsgerapy

MIT 52 5 4,321
35.5 thousand (month) Oct 01 2023 4.0.97(2026-01-06 07:45:54 ago)
3,495 4 74 MIT
Jul 04 2017 514 (month) 0.9.13(2023-07-19 18:53:46 ago)

Botasaurus is an all-in-one Python web scraping framework that combines browser automation, anti-detection, and scaling features into a single package. It aims to simplify the entire web scraping workflow from development to deployment.

Key features include:

  • Anti-detect browser Ships with a stealth-patched browser that passes common bot detection tests. Automatically handles fingerprinting, user agent rotation, and other anti-detection measures.
  • Decorator-based API Uses Python decorators (@browser, @request) to define scraping tasks, making code clean and easy to organize.
  • Built-in parallelism Easy parallel execution of scraping tasks across multiple browser instances with configurable concurrency.
  • Caching Built-in caching layer to avoid re-scraping pages during development and debugging.
  • Profile persistence Can save and reuse browser profiles (cookies, localStorage) across scraping sessions for maintaining login state.
  • Output handling Automatic output to JSON, CSV, or custom formats with built-in data filtering.
  • Web dashboard Includes a web UI for monitoring scraping progress, viewing results, and managing tasks.

Botasaurus is designed for developers who want a batteries-included framework that handles anti-detection automatically, without needing to manually configure stealth settings or manage browser fingerprints.

Gerapy is a Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js.

It is built on top of the Scrapy framework and provides a simple and easy-to-use interface for performing web scraping tasks. Gerapy also includes features such as support for scheduling and distributed crawling, as well as a built-in web-based dashboard for monitoring and managing scraping tasks. Additionally, Gerapy is designed to be highly extensible, allowing users to easily create custom plugins and integrations.

Overall, Gerapy is a useful tool for those looking to automate web scraping tasks and extract data from websites.

Highlights


anti-detectstealthlarge-scale

Example Use


```python from botasaurus.browser import browser, Driver from botasaurus.request import request, Request # Browser-based scraping with anti-detection @browser(parallel=3, cache=True) def scrape_products(driver: Driver, url: str): driver.get(url) # Wait for content to load driver.wait_for_element(".product-list") # Extract product data products = [] for el in driver.select_all(".product-card"): products.append({ "name": el.select(".product-name").text, "price": el.select(".product-price").text, "url": el.select("a").get_attribute("href"), }) return products # HTTP-based scraping (no browser needed) @request(parallel=5, cache=True) def scrape_api(req: Request, url: str): response = req.get(url) return response.json() # Run the scraper results = scrape_products( ["https://example.com/page/1", "https://example.com/page/2"] ) ```

Alternatives / Similar


Was this page helpful?