curl-cffivsfaraday
Curl-cffi is a Python library for implementing curl-impersonate which is a
HTTP client that appears as one of popular web browsers like:
- Google Chrome
- Microsoft Edge
- Safari
- Firefox
Unlike requests
and httpx
which are native Python libraries, curl-cffi
uses cURL and inherits it's powerful features
like extensive HTTP protocol support and detection patches for TLS and HTTP fingerprinting.
Using curl-cffi web scrapers can bypass TLS and HTTP fingerprinting.
Faraday is a Ruby gem that provides a simple and flexible interface for making HTTP requests. It allows you to create a Faraday connection object, which you can use to send requests and receive responses.
Faraday abstracts away the details of the underlying HTTP client library, so you can use it with different libraries such as Net::HTTP, HTTPClient, typhoeus and others.
Since Faraday can adapt many other HTTP clients it's very popular choice in web scraping.
Highlights
Example Use
from curl_cffi import requests
response = requests.get('https://httpbin.org/json')
print(response.json())
# or using sessions
session = requests.Session()
response = session.get('https://httpbin.org/json')
# also supports async requests using asyncio
import asyncio
from curl_cffi.requests import AsyncSession
urls = [
"http://httpbin.org/html",
"http://httpbin.org/html",
"http://httpbin.org/html",
]
async with AsyncSession() as s:
tasks = []
for url in urls:
task = s.get(url)
tasks.append(task)
# scrape concurrently:
responses = await asyncio.gather(*tasks)
# also supports websocket connections
from curl_cffi.requests import Session, WebSocket
def on_message(ws: WebSocket, message):
print(message)
with Session() as s:
ws = s.ws_connect(
"wss://api.gemini.com/v1/marketdata/BTCUSD",
on_message=on_message,
)
ws.run_forever()
# GET requests
response = Faraday.get('http://httpbingo.org')
put response.status
put response.headers
put response.body
# or use a persistent client session:
conn = Faraday.new(
url: 'http://httpbin.org/get',
params: {param: '1'},
headers: {'Content-Type' => 'application/json'}
)
# POST requests
response = conn.post('/post') do |req|
req.params['limit'] = 100
req.body = {query: 'chunky bacon'}.to_json
end