axiosvsralger
axios is a popular JavaScript library that allows you to make HTTP requests from a Node.js environment. It is a promise-based library that works in both the browser and Node.js. It is similar to the Fetch API, but with a more powerful feature set and better browser compatibility.
One of the main benefits of using axios is that it automatically transforms the response data into a JSON object, making it easy to work with.
Axios is known for user-friendly API and support for asynchronous async/await syntax making it very accessible in web scraping.
ralger is a small web scraping framework for R based on rvest and xml2.
It's goal to simplify basic web scraping and it provides a convenient and easy to use API.
It offers functions for retrieving pages, parsing HTML using CSS selectors, automatic table parsing and auto link, title, image and paragraph extraction.
Example Use
// axios can be used with promises:
axios.get('http://httpbin.org/json')
.then(response => {
console.log(response.data);
})
.catch(error => {
console.log(error);
});
// or async await syntax:
var resp = await axios.get('http://httpbin.org/json');
console.log(resp.data);
// to make requests concurrently Promise.all function can be used:
const results = await Promise.all([
axios.get('http://httpbin.org/html'),
axios.get('http://httpbin.org/html'),
axios.get('http://httpbin.org/html'),
])
// axios also supports other type of requests like POST and even automatically serialize them:
await axios.post('http://httpbin.org/post', {'query': 'hello world'});
// or formdata
const data = {name: 'John Doe', email: 'johndoe@example.com'};
await axios.post('https://jsonplaceholder.typicode.com/users',
querystring.stringify(data),
{
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
}
}
);
// default values like headers can be configured globally
axios.defaults.headers.common['User-Agent'] = 'webscraping.fyi';
// or for session instance:
const instance = axios.create({
headers: {"User-Agent": "webscraping.fyi"},
})
library("ralger")
url <- "http://www.shanghairanking.com/rankings/arwu/2021"
# retrieve HTML and select elements using CSS selectors:
best_uni <- scrap(link = url, node = "a span", clean = TRUE)
head(best_uni, 5)
#> [1] "Harvard University"
#> [2] "Stanford University"
#> [3] "University of Cambridge"
#> [4] "Massachusetts Institute of Technology (MIT)"
#> [5] "University of California, Berkeley"
# ralger can also parse HTML attributes
attributes <- attribute_scrap(
link = "https://ropensci.org/",
node = "a", # the a tag
attr = "class" # getting the class attribute
)
head(attributes, 10) # NA values are a tags without a class attribute
#> [1] "navbar-brand logo" "nav-link" NA
#> [4] NA NA "nav-link"
#> [7] NA "nav-link" NA
#> [10] NA
#
# ralger can automatically scrape tables:
data <- table_scrap(link ="https://www.boxofficemojo.com/chart/top_lifetime_gross/?area=XWW")
head(data)
#> # A tibble: 6 × 4
#> Rank Title `Lifetime Gross` Year
#> <int> <chr> <chr> <int>
#> 1 1 Avatar $2,847,397,339 2009
#> 2 2 Avengers: Endgame $2,797,501,328 2019
#> 3 3 Titanic $2,201,647,264 1997
#> 4 4 Star Wars: Episode VII - The Force Awakens $2,069,521,700 2015
#> 5 5 Avengers: Infinity War $2,048,359,754 2018
#> 6 6 Spider-Man: No Way Home $1,901,216,740 2021