hrequestsvsselenium-driverless
hrequests is a feature rich modern replacement for a famous requests library for Python. It provides a feature rich HTTP client capable of resisting popular scraper identification techniques: - Seamless transition between headless browser and http client based requests - Integrated HTML parser - Mimicking of real browser TLS fingerprints - Javascript rendering - HTTP2 support - Realistic browser headers
Selenium Driverless is a Selenium inspired browser automation library with focus on web scraping detection bypass. It shares most of Selenium API and UX but implements several extensions that make the scraper more difficult to detect and extra usability features like: - Bypass Cloudflare - Multiple Tab scraping - Multiple context support - Proxy auth - Network interception
Highlights
Example Use
import hrequests
# perform HTTP client requests
resp = hrequests.get('https://httpbin.org/html')
print(resp.status_code)
# 200
# use headless browsers and sessions:
session = hrequests.Session('chrome', version=122, os="mac")
# supports asyncio and easy concurrency
requests = [
hrequests.async_get('https://www.google.com/', browser='firefox'),
hrequests.async_get('https://www.duckduckgo.com/'),
hrequests.async_get('https://www.yahoo.com/'),
hrequests.async_get('https://www.httpbin.org/'),
]
responses = hrequests.map(requests, size=3) # max 3 conccurency
# It works the same as Selenium just with a different import.
import undetected_chromedriver as uc
driver = uc.Chrome(headless=True, use_subprocess=False)
driver.get('https://nowsecure.nl')
driver.save_screenshot('screenshot.png')
driver.close()