nodrivervsbotasaurus

AGPL-3.0 14 2 4,003

321.9 thousand (month) Jan 15 2024 0.48.1(2025-11-09 05:57:23 ago)

4,321 5 52 MIT

Oct 01 2023 35.5 thousand (month) 4.0.97(2026-01-06 07:45:54 ago)

nodriver is a Python library for browser automation that communicates directly with the browser via the Chrome DevTools Protocol (CDP), without relying on Selenium or chromedriver. It is the successor to undetected-chromedriver, created by the same author, and is designed from the ground up to be undetectable by anti-bot systems.

Key advantages over traditional browser automation:

No chromedriver dependency Communicates directly with Chrome/Chromium via CDP websocket, eliminating the most common detection vector (chromedriver fingerprint).
Undetectable by default Does not set the navigator.webdriver flag, does not inject automation-related JavaScript, and avoids CDP detection patterns that anti-bot systems look for.
Fast and lightweight Without the Selenium/WebDriver protocol overhead, nodriver is significantly faster at launching browsers and executing commands.
Async-first Built entirely on Python's asyncio, enabling efficient concurrent browser automation.
Simple API Clean, Pythonic API that is easier to use than raw CDP or Selenium.

nodriver is particularly useful for scraping websites protected by advanced anti-bot systems like Cloudflare, DataDome, or PerimeterX, where standard Selenium or Playwright setups get detected and blocked.

Botasaurus is an all-in-one Python web scraping framework that combines browser automation, anti-detection, and scaling features into a single package. It aims to simplify the entire web scraping workflow from development to deployment.

Key features include:

Anti-detect browser Ships with a stealth-patched browser that passes common bot detection tests. Automatically handles fingerprinting, user agent rotation, and other anti-detection measures.
Decorator-based API Uses Python decorators (@browser, @request) to define scraping tasks, making code clean and easy to organize.
Built-in parallelism Easy parallel execution of scraping tasks across multiple browser instances with configurable concurrency.
Caching Built-in caching layer to avoid re-scraping pages during development and debugging.
Profile persistence Can save and reuse browser profiles (cookies, localStorage) across scraping sessions for maintaining login state.
Output handling Automatic output to JSON, CSV, or custom formats with built-in data filtering.
Web dashboard Includes a web UI for monitoring scraping progress, viewing results, and managing tasks.

Botasaurus is designed for developers who want a batteries-included framework that handles anti-detection automatically, without needing to manually configure stealth settings or manage browser fingerprints.

Highlights

anti-detectcdpasyncfast

anti-detectstealthlarge-scale

Example Use

```python import nodriver as uc import asyncio async def main(): # Launch browser - undetected by default browser = await uc.start() # Open a new tab and navigate tab = await browser.get("https://example.com") # Wait for an element and interact with it search_box = await tab.find("input[name='q']") await search_box.send_keys("web scraping") # Click a button button = await tab.find("button[type='submit']") await button.click() # Wait for navigation and extract content await tab.wait_for("div.results") results = await tab.query_selector_all("div.result") for result in results: title = await result.query_selector("h3") print(await title.get_text()) # Take a screenshot await tab.save_screenshot("results.png") browser.stop() asyncio.run(main()) ```

```python from botasaurus.browser import browser, Driver from botasaurus.request import request, Request # Browser-based scraping with anti-detection @browser(parallel=3, cache=True) def scrape_products(driver: Driver, url: str): driver.get(url) # Wait for content to load driver.wait_for_element(".product-list") # Extract product data products = [] for el in driver.select_all(".product-card"): products.append({ "name": el.select(".product-name").text, "price": el.select(".product-price").text, "url": el.select("a").get_attribute("href"), }) return products # HTTP-based scraping (no browser needed) @request(parallel=5, cache=True) def scrape_api(req: Request, url: str): response = req.get(url) return response.json() # Run the scraper results = scrape_products( ["https://example.com/page/1", "https://example.com/page/2"] ) ```