Skip to content

botasaurusvsgeziyor

MIT 52 5 4,321
35.5 thousand (month) Oct 01 2023 4.0.97(2026-01-06 07:45:54 ago)
2,772 1 30 MPL-2.0
Jun 06 2019 2026-04-11(2026-04-11 21:30:25 ago)

Botasaurus is an all-in-one Python web scraping framework that combines browser automation, anti-detection, and scaling features into a single package. It aims to simplify the entire web scraping workflow from development to deployment.

Key features include:

  • Anti-detect browser Ships with a stealth-patched browser that passes common bot detection tests. Automatically handles fingerprinting, user agent rotation, and other anti-detection measures.
  • Decorator-based API Uses Python decorators (@browser, @request) to define scraping tasks, making code clean and easy to organize.
  • Built-in parallelism Easy parallel execution of scraping tasks across multiple browser instances with configurable concurrency.
  • Caching Built-in caching layer to avoid re-scraping pages during development and debugging.
  • Profile persistence Can save and reuse browser profiles (cookies, localStorage) across scraping sessions for maintaining login state.
  • Output handling Automatic output to JSON, CSV, or custom formats with built-in data filtering.
  • Web dashboard Includes a web UI for monitoring scraping progress, viewing results, and managing tasks.

Botasaurus is designed for developers who want a batteries-included framework that handles anti-detection automatically, without needing to manually configure stealth settings or manage browser fingerprints.

Geziyor is a blazing fast web crawling and web scraping framework. It can be used to crawl websites and extract structured data from them. Geziyor is useful for a wide range of purposes such as data mining, monitoring and automated testing.

Features:

  • JS Rendering
  • 5.000+ Requests/Sec
  • Caching (Memory/Disk/LevelDB)
  • Automatic Data Exporting (JSON, CSV, or custom)
  • Metrics (Prometheus, Expvar, or custom)
  • Limit Concurrency (Global/Per Domain)
  • Request Delays (Constant/Randomized)
  • Cookies, Middlewares, robots.txt
  • Automatic response decoding to UTF-8
  • Proxy management (Single, Round-Robin, Custom)

Highlights


anti-detectstealthlarge-scale

Example Use


```python from botasaurus.browser import browser, Driver from botasaurus.request import request, Request # Browser-based scraping with anti-detection @browser(parallel=3, cache=True) def scrape_products(driver: Driver, url: str): driver.get(url) # Wait for content to load driver.wait_for_element(".product-list") # Extract product data products = [] for el in driver.select_all(".product-card"): products.append({ "name": el.select(".product-name").text, "price": el.select(".product-price").text, "url": el.select("a").get_attribute("href"), }) return products # HTTP-based scraping (no browser needed) @request(parallel=5, cache=True) def scrape_api(req: Request, url: str): response = req.get(url) return response.json() # Run the scraper results = scrape_products( ["https://example.com/page/1", "https://example.com/page/2"] ) ```
```go // This example extracts all quotes from quotes.toscrape.com and exports to JSON file. func main() { geziyor.NewGeziyor(&geziyor.Options{ StartURLs: []string{"http://quotes.toscrape.com/"}, ParseFunc: quotesParse, Exporters: []export.Exporter{&export.JSON{}}, }).Start() } func quotesParse(g *geziyor.Geziyor, r *client.Response) { r.HTMLDoc.Find("div.quote").Each(func(i int, s *goquery.Selection) { g.Exports <- map[string]interface{}{ "text": s.Find("span.text").Text(), "author": s.Find("small.author").Text(), } }) if href, ok := r.HTMLDoc.Find("li.next > a").Attr("href"); ok { g.Get(r.JoinURL(href), quotesParse) } } ```

Alternatives / Similar


Was this page helpful?