Skip to content

ralgervshttpful

MIT 3 1 152
1.2 thousand (month) Dec 22 2019 2.2.4(3 years ago)
1,734 4 91 MIT
0.3.2(4 years ago) Apr 14 2012 4.8 thousand (month)

ralger is a small web scraping framework for R based on rvest and xml2.

It's goal to simplify basic web scraping and it provides a convenient and easy to use API.

It offers functions for retrieving pages, parsing HTML using CSS selectors, automatic table parsing and auto link, title, image and paragraph extraction.

Httpful is a simple Http Client library for PHP 7.2+. There is an emphasis of readability, simplicity, and flexibility – basically provide the features and flexibility to get the job done and make those features really easy to use.

Features

  • Readable HTTP Method Support (GET, PUT, POST, DELETE, HEAD, PATCH and OPTIONS)
  • Custom Headers
  • Automatic "Smart" Parsing
  • Automatic Payload Serialization
  • Basic Auth
  • Client Side Certificate Auth
  • Request "Templates"

Example Use


library("ralger")

url <- "http://www.shanghairanking.com/rankings/arwu/2021"

# retrieve HTML and select elements using CSS selectors:
best_uni <- scrap(link = url, node = "a span", clean = TRUE)
head(best_uni, 5)
#>  [1] "Harvard University"
#>  [2] "Stanford University"
#>  [3] "University of Cambridge"
#>  [4] "Massachusetts Institute of Technology (MIT)"
#>  [5] "University of California, Berkeley"

# ralger can also parse HTML attributes
attributes <- attribute_scrap(
  link = "https://ropensci.org/",
  node = "a", # the a tag
  attr = "class" # getting the class attribute
)

head(attributes, 10) # NA values are a tags without a class attribute
#>  [1] "navbar-brand logo" "nav-link"          NA
#>  [4] NA                  NA                  "nav-link"
#>  [7] NA                  "nav-link"          NA
#> [10] NA
#

# ralger can automatically scrape tables:
data <- table_scrap(link ="https://www.boxofficemojo.com/chart/top_lifetime_gross/?area=XWW")

head(data)
#> # A tibble: 6 × 4
#>    Rank Title                                      `Lifetime Gross`  Year
#>   <int> <chr>                                      <chr>            <int>
#> 1     1 Avatar                                     $2,847,397,339    2009
#> 2     2 Avengers: Endgame                          $2,797,501,328    2019
#> 3     3 Titanic                                    $2,201,647,264    1997
#> 4     4 Star Wars: Episode VII - The Force Awakens $2,069,521,700    2015
#> 5     5 Avengers: Infinity War                     $2,048,359,754    2018
#> 6     6 Spider-Man: No Way Home                    $1,901,216,740    2021
require 'vendor/autoload.php';

use Httpful\Request;

// make GET request
$response = \Httpful\Request::get("http://httpbin.org/get")
    ->send();
echo $response->body;

// make POST request
$data = array('name' => 'Bob', 'age' => 35);
$response = \Httpful\Request::post("http://httpbin.org/post")
    ->sendsJson()
    ->body(json_encode($data))
    ->send();
echo $response->body;

// add headers or cookies
$response = \Httpful\Request::get("http://httpbin.org/headers")
    ->addHeader("API-KEY", "mykey")
    ->addHeader("Cookie", "foo=bar")
    ->send();
echo $response->body;

Alternatives / Similar