axiosvshrequests
axios is a popular JavaScript library that allows you to make HTTP requests from a Node.js environment. It is a promise-based library that works in both the browser and Node.js. It is similar to the Fetch API, but with a more powerful feature set and better browser compatibility.
One of the main benefits of using axios is that it automatically transforms the response data into a JSON object, making it easy to work with.
Axios is known for user-friendly API and support for asynchronous async/await syntax making it very accessible in web scraping.
hrequests is a feature rich modern replacement for a famous requests library for Python. It provides a feature rich HTTP client capable of resisting popular scraper identification techniques: - Seamless transition between headless browser and http client based requests - Integrated HTML parser - Mimicking of real browser TLS fingerprints - Javascript rendering - HTTP2 support - Realistic browser headers
Highlights
Example Use
// axios can be used with promises:
axios.get('http://httpbin.org/json')
.then(response => {
console.log(response.data);
})
.catch(error => {
console.log(error);
});
// or async await syntax:
var resp = await axios.get('http://httpbin.org/json');
console.log(resp.data);
// to make requests concurrently Promise.all function can be used:
const results = await Promise.all([
axios.get('http://httpbin.org/html'),
axios.get('http://httpbin.org/html'),
axios.get('http://httpbin.org/html'),
])
// axios also supports other type of requests like POST and even automatically serialize them:
await axios.post('http://httpbin.org/post', {'query': 'hello world'});
// or formdata
const data = {name: 'John Doe', email: 'johndoe@example.com'};
await axios.post('https://jsonplaceholder.typicode.com/users',
querystring.stringify(data),
{
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
}
}
);
// default values like headers can be configured globally
axios.defaults.headers.common['User-Agent'] = 'webscraping.fyi';
// or for session instance:
const instance = axios.create({
headers: {"User-Agent": "webscraping.fyi"},
})
import hrequests
# perform HTTP client requests
resp = hrequests.get('https://httpbin.org/html')
print(resp.status_code)
# 200
# use headless browsers and sessions:
session = hrequests.Session('chrome', version=122, os="mac")
# supports asyncio and easy concurrency
requests = [
hrequests.async_get('https://www.google.com/', browser='firefox'),
hrequests.async_get('https://www.duckduckgo.com/'),
hrequests.async_get('https://www.yahoo.com/'),
hrequests.async_get('https://www.httpbin.org/'),
]
responses = hrequests.map(requests, size=3) # max 3 conccurency