Skip to content

curl-cffivshttr

MIT 28 2 1,603
282.6 thousand (month) Feb 23 2022 0.6.4(a month ago)
983 9 2 MIT
May 06 2012 738.3 thousand (month) 1.4.7(1 year, 1 month ago)

Curl-cffi is a Python library for implementing curl-impersonate which is a HTTP client that appears as one of popular web browsers like: - Google Chrome - Microsoft Edge - Safari - Firefox Unlike requests and httpx which are native Python libraries, curl-cffi uses cURL and inherits it's powerful features like extensive HTTP protocol support and detection patches for TLS and HTTP fingerprinting.

Using curl-cffi web scrapers can bypass TLS and HTTP fingerprinting.

The aim of httr is to provide a wrapper for the curl package, customised to the demands of modern web APIs.

Key features:

  • Functions for the most important http verbs: GET(), HEAD(), PATCH(), PUT(), DELETE() and POST().
  • Automatic connection sharing across requests to the same website (by default, curl handles are managed automatically), cookies are maintained across requests, and a up-to-date root-level SSL certificate store is used.
  • Requests return a standard reponse object that captures the http status line, headers and body, along with other useful information.
  • Response content is available with content() as a raw vector (as = "raw"), a character vector (as = "text"), or parsed into an R object (as = "parsed"), currently for html, xml, json, png and jpeg.
  • You can convert http errors into R errors with stop_for_status().
  • Config functions make it easier to modify the request in common ways: set_cookies(), add_headers(), authenticate(), use_proxy(), verbose(), timeout(), content_type(), accept(), progress().
  • Support for OAuth 1.0 and 2.0 with oauth1.0_token() and oauth2.0_token(). The demo directory has eight OAuth demos: four for 1.0 (twitter, vimeo, withings and yahoo) and four for 2.0 (facebook, github, google, linkedin). OAuth credentials are automatically cached within a project.

Highlights


bypasshttp2tls-fingerprinthttp-fingerprintsyncasync

Example Use


curl-cffi can be accessed as low-level curl client as well as an easy high-level HTTP client:
from curl_cffi import requests

response = requests.get('https://httpbin.org/json')
print(response.json())

# or using sessions
session = requests.Session()
response = session.get('https://httpbin.org/json')

# also supports async requests using asyncio
import asyncio
from curl_cffi.requests import AsyncSession

urls = [
  "http://httpbin.org/html",
  "http://httpbin.org/html",
  "http://httpbin.org/html",
]

async with AsyncSession() as s:
    tasks = []
    for url in urls:
        task = s.get(url)
        tasks.append(task)
    # scrape concurrently:
    responses = await asyncio.gather(*tasks)

# also supports websocket connections
from curl_cffi.requests import Session, WebSocket

def on_message(ws: WebSocket, message):
    print(message)

with Session() as s:
    ws = s.ws_connect(
        "wss://api.gemini.com/v1/marketdata/BTCUSD",
        on_message=on_message,
    )
    ws.run_forever()
library(httr)

# GET requests:
resp <- GET("http://httpbin.org/get")
status_code(resp)  # status code
headers(resp)  # headers
str(content(resp))  # body

# POST requests: 
# Form encoded
resp <- POST(url, body = body, encode = "form")
# Multipart encoded
resp <- POST(url, body = body, encode = "multipart")
# JSON encoded
resp <- POST(url, body = body, encode = "json")

# setting cookies:
resp <- GET("http://httpbin.org/cookies", set_cookies("MeWant" = "cookies"))
content(r)$cookies  # get response cookies

Alternatives / Similar


Was this page helpful?