Skip to content

curl-cffivsselenium-driverless

MIT 28 2 1,603
282.6 thousand (month) Feb 23 2022 0.6.4(a month ago)
386 1 8 NOASSERTION
Jul 22 2022 4.4 thousand (month) 1.9.3.1(a month ago)

Curl-cffi is a Python library for implementing curl-impersonate which is a HTTP client that appears as one of popular web browsers like: - Google Chrome - Microsoft Edge - Safari - Firefox Unlike requests and httpx which are native Python libraries, curl-cffi uses cURL and inherits it's powerful features like extensive HTTP protocol support and detection patches for TLS and HTTP fingerprinting.

Using curl-cffi web scrapers can bypass TLS and HTTP fingerprinting.

Selenium Driverless is a Selenium inspired browser automation library with focus on web scraping detection bypass. It shares most of Selenium API and UX but implements several extensions that make the scraper more difficult to detect and extra usability features like: - Bypass Cloudflare - Multiple Tab scraping - Multiple context support - Proxy auth - Network interception

Highlights


bypasshttp2tls-fingerprinthttp-fingerprintsyncasync

Example Use


curl-cffi can be accessed as low-level curl client as well as an easy high-level HTTP client:
from curl_cffi import requests

response = requests.get('https://httpbin.org/json')
print(response.json())

# or using sessions
session = requests.Session()
response = session.get('https://httpbin.org/json')

# also supports async requests using asyncio
import asyncio
from curl_cffi.requests import AsyncSession

urls = [
  "http://httpbin.org/html",
  "http://httpbin.org/html",
  "http://httpbin.org/html",
]

async with AsyncSession() as s:
    tasks = []
    for url in urls:
        task = s.get(url)
        tasks.append(task)
    # scrape concurrently:
    responses = await asyncio.gather(*tasks)

# also supports websocket connections
from curl_cffi.requests import Session, WebSocket

def on_message(ws: WebSocket, message):
    print(message)

with Session() as s:
    ws = s.ws_connect(
        "wss://api.gemini.com/v1/marketdata/BTCUSD",
        on_message=on_message,
    )
    ws.run_forever()
# It works the same as Selenium just with a different import.
import undetected_chromedriver as uc
driver = uc.Chrome(headless=True, use_subprocess=False)
driver.get('https://nowsecure.nl')
driver.save_screenshot('screenshot.png')
driver.close()

Alternatives / Similar


Was this page helpful?