Skip to content

pydollvshrequests

None - - -
Jun 01 2024 0.0.0(2025-02-01 00:00:00 ago)
1,001 1 51 MIT
Feb 23 2022 33.3 thousand (month) 0.9.2(2024-12-01 02:55:27 ago)

Pydoll is a Python library for browser automation that uses the Chrome DevTools Protocol (CDP) directly, designed to be undetectable by anti-bot systems. Unlike Selenium-based tools, Pydoll does not use WebDriver and avoids the common detection vectors that anti-bot systems look for.

Key features include:

  • Native CDP communication Connects directly to Chrome/Chromium via CDP websocket without intermediary drivers, avoiding the automation flags and fingerprints that WebDriver-based tools leave behind.
  • Event-driven architecture Built around an async event system that can listen for and react to browser events like network requests, console messages, and DOM changes.
  • Network interception Can intercept, modify, and mock network requests and responses, useful for blocking unnecessary resources or modifying API responses during scraping.
  • Async-first design Fully asynchronous API built on Python's asyncio for efficient concurrent automation.
  • Clean API Provides a high-level, Pythonic API for common browser automation tasks while still allowing direct CDP command execution for advanced use cases.
  • Multi-browser support Can manage multiple browser instances and pages concurrently.

Pydoll fills a similar niche to nodriver and camoufox — browser automation with a focus on avoiding detection — but takes a different approach by providing more granular control over CDP communication and network interception.

hrequests is a feature rich modern replacement for a famous requests library for Python. It provides a feature rich HTTP client capable of resisting popular scraper identification techniques: - Seamless transition between headless browser and http client based requests - Integrated HTML parser - Mimicking of real browser TLS fingerprints - Javascript rendering - HTTP2 support - Realistic browser headers

Highlights


anti-detectcdpasync
bypasshttp2tls-fingerprinthttp-fingerprintsyncasync

Example Use


```python import asyncio from pydoll.browser import Chrome from pydoll.constants import By async def main(): async with Chrome() as browser: # Open a new page page = await browser.new_page() await page.go_to("https://example.com") # Find and interact with elements search_input = await page.find_element(By.CSS, "input[name='q']") await search_input.type_text("web scraping") submit_btn = await page.find_element(By.CSS, "button[type='submit']") await submit_btn.click() # Wait for results and extract content await page.wait_element(By.CSS, ".results") results = await page.find_elements(By.CSS, ".result-item") for result in results: title = await result.get_text() print(title) # Network interception example await page.enable_network_interception() # intercept and analyze API calls made by the page asyncio.run(main()) ```
hrequests has almost identical API and UX as requests and here's a quick overview: ```python import hrequests # perform HTTP client requests resp = hrequests.get('https://httpbin.org/html') print(resp.status_code) # 200 # use headless browsers and sessions: session = hrequests.Session('chrome', version=122, os="mac") # supports asyncio and easy concurrency requests = [ hrequests.async_get('https://www.google.com/', browser='firefox'), hrequests.async_get('https://www.duckduckgo.com/'), hrequests.async_get('https://www.yahoo.com/'), hrequests.async_get('https://www.httpbin.org/'), ] responses = hrequests.map(requests, size=3) # max 3 conccurency ```

Alternatives / Similar


Was this page helpful?