urlvsrequests-cache

MIT - 1 103

54 (month) Apr 15 2018 v2.1.3(3 months ago)

1,369 5 36 BSD-2-Clause

Feb 14 2011 2.1 million (month) 1.2.1(8 months ago)

url is a package for working, modifying and parsing web URLs. This package is for you when PHP's parse_url() is not enough.

Key Features:

Parse a URL and access or modify all its components separately.
Resolve any relative reference you may find in an HTML document to an absolute URL, based on the document's URL.
Get not only the full host of a URL, but also the registrable domain, the domain suffix and the subdomain parts of the host separately (Thanks to the Mozilla Public Suffix List).
An advanced API to access and manipulate the URL query component.
Compare URLs or components of URLs (e.g. checking if different URLs point to the same host or domain)
Thanks to symfony/polyfill-intl-idn it's also no problem to parse internationalized domain names (IDN).
Includes an adapter class which implements the PSR-7 UriInterface.

requests-cache is an extension to the popular requests package and it provides easy request/response caching.

This can be very useful in web scraper development as it'll speed up all requests. requests-cache can also be used for programs that integrate web scrapers as it's an easy caching layer for the most time consuming part of web scraping - http connections.

Some features:

Ease of use
Keep using the requests library you're already familiar with. Add caching with a drop-in replacement for requests.Session, or install globally to add transparent caching to all requests functions.
Performance
Get sub-millisecond response times for cached responses. When they expire, you still save time with conditional requests.
Persistence
Works with several storage backends including SQLite, Redis, MongoDB, and DynamoDB; or save responses as plain JSON files, YAML, and more
Expiration
Use Cache-Control and other standard HTTP headers, define your own expiration schedule, keep your cache clutter-free with backends that natively support TTL, or any combination of strategies
Customization
Works out of the box with zero config, but with a robust set of features for configuring and extending the library to suit your needs
Compatibility
Can be combined with other popular libraries based on requests

Example Use

<?php
use Crwlr\Url;

$url = new Url('https://www.example.com/path?query=value#fragment');

// Parse url for parts:
echo $url->scheme();
echo $url->host();
echo $url->path();
echo $url->query();
echo $url->fragment();
// update values
echo $url->path("some/new/query");


// Create url from parts:
$url = new Url();
$url->scheme('https');
$url->host('www.example.com');
$url->path('/path');
$url->query('query=value');
$url->fragment('fragment');
echo $url->toString();

import requests_cache

# to use requests_cache just replace requests.Session with requests_cache.CachedSession
session = requests_cache.CachedSession('demo_cache')
for i in range(60):
    session.get('https://httpbin.org/delay/1')

# or patch global requests
requests_cache.install_cache('demo_cache')
requests.get('https://httpbin.org/delay/1')

# there are various configuration options: 
session = CachedSession(
  'demo_cache',
  use_cache_dir=True,                # Save files in the default user cache dir
  cache_control=True,                # Use Cache-Control response headers for expiration, if available
  expire_after=timedelta(days=1),    # Otherwise expire responses after one day
  allowable_codes=[200, 400],        # Cache 400 responses as a solemn reminder of your failures
  allowable_methods=['GET', 'POST'], # Cache whatever HTTP methods you want
  ignored_parameters=['api_key'],    # Don't match this request param, and redact if from the cache
  match_headers=['Accept-Language'], # Cache a different response per language
  stale_if_error=True,               # In case of request errors, use stale cache data if possible
)

urlvsrequests-cache

Example Use

Alternatives / Similar