axiosvscurl-cffi
axios is a popular JavaScript library that allows you to make HTTP requests from a Node.js environment. It is a promise-based library that works in both the browser and Node.js. It is similar to the Fetch API, but with a more powerful feature set and better browser compatibility.
One of the main benefits of using axios is that it automatically transforms the response data into a JSON object, making it easy to work with.
Axios is known for user-friendly API and support for asynchronous async/await syntax making it very accessible in web scraping.
Curl-cffi is a Python library for implementing curl-impersonate which is a
HTTP client that appears as one of popular web browsers like:
- Google Chrome
- Microsoft Edge
- Safari
- Firefox
Unlike requests
and httpx
which are native Python libraries, curl-cffi
uses cURL and inherits it's powerful features
like extensive HTTP protocol support and detection patches for TLS and HTTP fingerprinting.
Using curl-cffi web scrapers can bypass TLS and HTTP fingerprinting.
Highlights
Example Use
// axios can be used with promises:
axios.get('http://httpbin.org/json')
.then(response => {
console.log(response.data);
})
.catch(error => {
console.log(error);
});
// or async await syntax:
var resp = await axios.get('http://httpbin.org/json');
console.log(resp.data);
// to make requests concurrently Promise.all function can be used:
const results = await Promise.all([
axios.get('http://httpbin.org/html'),
axios.get('http://httpbin.org/html'),
axios.get('http://httpbin.org/html'),
])
// axios also supports other type of requests like POST and even automatically serialize them:
await axios.post('http://httpbin.org/post', {'query': 'hello world'});
// or formdata
const data = {name: 'John Doe', email: 'johndoe@example.com'};
await axios.post('https://jsonplaceholder.typicode.com/users',
querystring.stringify(data),
{
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
}
}
);
// default values like headers can be configured globally
axios.defaults.headers.common['User-Agent'] = 'webscraping.fyi';
// or for session instance:
const instance = axios.create({
headers: {"User-Agent": "webscraping.fyi"},
})
from curl_cffi import requests
response = requests.get('https://httpbin.org/json')
print(response.json())
# or using sessions
session = requests.Session()
response = session.get('https://httpbin.org/json')
# also supports async requests using asyncio
import asyncio
from curl_cffi.requests import AsyncSession
urls = [
"http://httpbin.org/html",
"http://httpbin.org/html",
"http://httpbin.org/html",
]
async with AsyncSession() as s:
tasks = []
for url in urls:
task = s.get(url)
tasks.append(task)
# scrape concurrently:
responses = await asyncio.gather(*tasks)
# also supports websocket connections
from curl_cffi.requests import Session, WebSocket
def on_message(ws: WebSocket, message):
print(message)
with Session() as s:
ws = s.ws_connect(
"wss://api.gemini.com/v1/marketdata/BTCUSD",
on_message=on_message,
)
ws.run_forever()