Skip to content

hrequestsvsselenium-driverless

Apache-2.0 13 1 541
2.5 thousand (month) Feb 23 2022 0.8.2(3 months ago)
386 1 8 NOASSERTION
Jul 22 2022 4.4 thousand (month) 1.9.3.1(a month ago)

hrequests is a feature rich modern replacement for a famous requests library for Python. It provides a feature rich HTTP client capable of resisting popular scraper identification techniques: - Seamless transition between headless browser and http client based requests - Integrated HTML parser - Mimicking of real browser TLS fingerprints - Javascript rendering - HTTP2 support - Realistic browser headers

Selenium Driverless is a Selenium inspired browser automation library with focus on web scraping detection bypass. It shares most of Selenium API and UX but implements several extensions that make the scraper more difficult to detect and extra usability features like: - Bypass Cloudflare - Multiple Tab scraping - Multiple context support - Proxy auth - Network interception

Highlights


bypasshttp2tls-fingerprinthttp-fingerprintsyncasync

Example Use


hrequests has almost identical API and UX as requests and here's a quick overview:
import hrequests

# perform HTTP client requests
resp = hrequests.get('https://httpbin.org/html')
print(resp.status_code)
# 200

# use headless browsers and sessions:
session = hrequests.Session('chrome', version=122, os="mac")

# supports asyncio and easy concurrency
requests = [
    hrequests.async_get('https://www.google.com/', browser='firefox'),
    hrequests.async_get('https://www.duckduckgo.com/'),
    hrequests.async_get('https://www.yahoo.com/'),
    hrequests.async_get('https://www.httpbin.org/'),
]
responses = hrequests.map(requests, size=3)  # max 3 conccurency
# It works the same as Selenium just with a different import.
import undetected_chromedriver as uc
driver = uc.Chrome(headless=True, use_subprocess=False)
driver.get('https://nowsecure.nl')
driver.save_screenshot('screenshot.png')
driver.close()

Alternatives / Similar


Was this page helpful?