needlevscurl-cffi
needle is an HTTP client library for Node.js that provides a simple, flexible, and powerful API for making HTTP requests. It supports all major HTTP methods and has a clean and easy-to-use interface for handling responses and errors.
Curl-cffi is a Python library for implementing curl-impersonate which is a
HTTP client that appears as one of popular web browsers like:
- Google Chrome
- Microsoft Edge
- Safari
- Firefox
Unlike requests
and httpx
which are native Python libraries, curl-cffi
uses cURL and inherits it's powerful features
like extensive HTTP protocol support and detection patches for TLS and HTTP fingerprinting.
Using curl-cffi web scrapers can bypass TLS and HTTP fingerprinting.
Highlights
bypasshttp2tls-fingerprinthttp-fingerprintsyncasync
Example Use
const needle = require('needle');
// needle supports both Promises and async/await
needle.get('https://httpbin.org/get', (err, res) => {
if (err) {
console.error(err);
return;
}
console.log(res.body);
});
const response = await needle.get('https://httpbin.org/get')
// concurrent requests can be sent using Promise.all
const results = await Promise.all([
needle.get('http://httpbin.org/html'),
needle.get('http://httpbin.org/html'),
needle.get('http://httpbin.org/html'),
])
// POST requests
const data = { name: 'John Doe' };
await needle.post('https://api.example.com', data)
// proxy
const options = {
proxy: 'http://proxy.example.com:8080'
};
await needle.get('https://httpbin.org/ip', options)
// headers and cookies
const options = {
headers: {
'Cookie': 'myCookie=123',
'X-My-Header': 'myValue'
}
};
await needle.get('https://httpbin.org/headers', options)
curl-cffi can be accessed as low-level curl client as well as an easy high-level HTTP client:
from curl_cffi import requests
response = requests.get('https://httpbin.org/json')
print(response.json())
# or using sessions
session = requests.Session()
response = session.get('https://httpbin.org/json')
# also supports async requests using asyncio
import asyncio
from curl_cffi.requests import AsyncSession
urls = [
"http://httpbin.org/html",
"http://httpbin.org/html",
"http://httpbin.org/html",
]
async with AsyncSession() as s:
tasks = []
for url in urls:
task = s.get(url)
tasks.append(task)
# scrape concurrently:
responses = await asyncio.gather(*tasks)
# also supports websocket connections
from curl_cffi.requests import Session, WebSocket
def on_message(ws: WebSocket, message):
print(message)
with Session() as s:
ws = s.ws_connect(
"wss://api.gemini.com/v1/marketdata/BTCUSD",
on_message=on_message,
)
ws.run_forever()