ralger
ralger is a small web scraping framework for R based on rvest and xml2.
It's goal to simplify basic web scraping and it provides a convenient and easy to use API.
It offers functions for retrieving pages, parsing HTML using CSS selectors, automatic table parsing and auto link, title, image and paragraph extraction.
Example Use
```r library("ralger")
url <- "http://www.shanghairanking.com/rankings/arwu/2021"
retrieve HTML and select elements using CSS selectors:
best_uni <- scrap(link = url, node = "a span", clean = TRUE) head(best_uni, 5)
> [1] "Harvard University"
> [2] "Stanford University"
> [3] "University of Cambridge"
> [4] "Massachusetts Institute of Technology (MIT)"
> [5] "University of California, Berkeley"
ralger can also parse HTML attributes
attributes <- attribute_scrap( link = "https://ropensci.org/", node = "a", # the a tag attr = "class" # getting the class attribute )
head(attributes, 10) # NA values are a tags without a class attribute
> [1] "navbar-brand logo" "nav-link" NA
> [4] NA NA "nav-link"
> [7] NA "nav-link" NA
> [10] NA
ralger can automatically scrape tables:
data <- table_scrap(link ="https://www.boxofficemojo.com/chart/top_lifetime_gross/?area=XWW")
head(data)
> # A tibble: 6 × 4
> Rank Title Lifetime Gross Year
>
> 1 1 Avatar $2,847,397,339 2009
> 2 2 Avengers: Endgame $2,797,501,328 2019
> 3 3 Titanic $2,201,647,264 1997
> 4 4 Star Wars: Episode VII - The Force Awakens $2,069,521,700 2015
> 5 5 Avengers: Infinity War $2,048,359,754 2018
> 6 6 Spider-Man: No Way Home $1,901,216,740 2021
```