Skip to content

botasaurusvsgracy

MIT 52 5 4,321
35.5 thousand (month) Oct 01 2023 4.0.97(2026-01-06 07:45:54 ago)
248 2 - MIT
Feb 05 2023 6.8 thousand (month) 1.34.0(2024-11-27 14:57:34 ago)

Botasaurus is an all-in-one Python web scraping framework that combines browser automation, anti-detection, and scaling features into a single package. It aims to simplify the entire web scraping workflow from development to deployment.

Key features include:

  • Anti-detect browser Ships with a stealth-patched browser that passes common bot detection tests. Automatically handles fingerprinting, user agent rotation, and other anti-detection measures.
  • Decorator-based API Uses Python decorators (@browser, @request) to define scraping tasks, making code clean and easy to organize.
  • Built-in parallelism Easy parallel execution of scraping tasks across multiple browser instances with configurable concurrency.
  • Caching Built-in caching layer to avoid re-scraping pages during development and debugging.
  • Profile persistence Can save and reuse browser profiles (cookies, localStorage) across scraping sessions for maintaining login state.
  • Output handling Automatic output to JSON, CSV, or custom formats with built-in data filtering.
  • Web dashboard Includes a web UI for monitoring scraping progress, viewing results, and managing tasks.

Botasaurus is designed for developers who want a batteries-included framework that handles anti-detection automatically, without needing to manually configure stealth settings or manage browser fingerprints.

Gracy is an API client library based on httpx that provides an extra stability layer with:

  • Retry logic
  • Logging
  • Connection throttling
  • Tracking/Middleware

In web scraping, Gracy can be a convenient tool for creating scraper based API clients.

Highlights


anti-detectstealthlarge-scale

Example Use


```python from botasaurus.browser import browser, Driver from botasaurus.request import request, Request # Browser-based scraping with anti-detection @browser(parallel=3, cache=True) def scrape_products(driver: Driver, url: str): driver.get(url) # Wait for content to load driver.wait_for_element(".product-list") # Extract product data products = [] for el in driver.select_all(".product-card"): products.append({ "name": el.select(".product-name").text, "price": el.select(".product-price").text, "url": el.select("a").get_attribute("href"), }) return products # HTTP-based scraping (no browser needed) @request(parallel=5, cache=True) def scrape_api(req: Request, url: str): response = req.get(url) return response.json() # Run the scraper results = scrape_products( ["https://example.com/page/1", "https://example.com/page/2"] ) ```
```python # 0. Import import asyncio from typing import Awaitable from gracy import BaseEndpoint, Gracy, GracyConfig, LogEvent, LogLevel # 1. Define your endpoints class PokeApiEndpoint(BaseEndpoint): GET_POKEMON = "/pokemon/{NAME}" # 👈 Put placeholders as needed # 2. Define your Graceful API class GracefulPokeAPI(Gracy[str]): class Config: # type: ignore BASE_URL = "https://pokeapi.co/api/v2/" # 👈 Optional BASE_URL # 👇 Define settings to apply for every request SETTINGS = GracyConfig( log_request=LogEvent(LogLevel.DEBUG), log_response=LogEvent(LogLevel.INFO, "{URL} took {ELAPSED}"), parser={ "default": lambda r: r.json() } ) async def get_pokemon(self, name: str) -> Awaitable[dict]: return await self.get(PokeApiEndpoint.GET_POKEMON, {"NAME": name}) # Note: since Gracy is based on httpx we can customized the used client with custom headers etc" def _create_client(self) -> httpx.AsyncClient: client = super()._create_client() client.headers = {"User-Agent": f"My Scraper"} return client pokeapi = GracefulPokeAPI() async def main(): try: pokemon = await pokeapi.get_pokemon("pikachu") print(pokemon) finally: pokeapi.report_status("rich") asyncio.run(main()) ```

Alternatives / Similar


Was this page helpful?