Skip to content

urlvsrequests-cache

MIT - 1 102
49 (month) Apr 15 2018 v2.1.2(2 days ago)
1,321 5 32 BSD-2-Clause
Feb 14 2011 3.2 million (month) 1.2.1(4 months ago)

url is a package for working, modifying and parsing web URLs. This package is for you when PHP's parse_url() is not enough.

Key Features:

  • Parse a URL and access or modify all its components separately.
  • Resolve any relative reference you may find in an HTML document to an absolute URL, based on the document's URL.
  • Get not only the full host of a URL, but also the registrable domain, the domain suffix and the subdomain parts of the host separately (Thanks to the Mozilla Public Suffix List).
  • An advanced API to access and manipulate the URL query component.
  • Compare URLs or components of URLs (e.g. checking if different URLs point to the same host or domain)
  • Thanks to symfony/polyfill-intl-idn it's also no problem to parse internationalized domain names (IDN).
  • Includes an adapter class which implements the PSR-7 UriInterface.

requests-cache is an extension to the popular requests package and it provides easy request/response caching.

This can be very useful in web scraper development as it'll speed up all requests. requests-cache can also be used for programs that integrate web scrapers as it's an easy caching layer for the most time consuming part of web scraping - http connections.

Some features:

  • Ease of use
    Keep using the requests library you're already familiar with. Add caching with a drop-in replacement for requests.Session, or install globally to add transparent caching to all requests functions.
  • Performance
    Get sub-millisecond response times for cached responses. When they expire, you still save time with conditional requests.
  • Persistence
    Works with several storage backends including SQLite, Redis, MongoDB, and DynamoDB; or save responses as plain JSON files, YAML, and more
  • Expiration
    Use Cache-Control and other standard HTTP headers, define your own expiration schedule, keep your cache clutter-free with backends that natively support TTL, or any combination of strategies
  • Customization
    Works out of the box with zero config, but with a robust set of features for configuring and extending the library to suit your needs
  • Compatibility
    Can be combined with other popular libraries based on requests

Example Use


<?php
use Crwlr\Url;

$url = new Url('https://www.example.com/path?query=value#fragment');

// Parse url for parts:
echo $url->scheme();
echo $url->host();
echo $url->path();
echo $url->query();
echo $url->fragment();
// update values
echo $url->path("some/new/query");


// Create url from parts:
$url = new Url();
$url->scheme('https');
$url->host('www.example.com');
$url->path('/path');
$url->query('query=value');
$url->fragment('fragment');
echo $url->toString();
import requests_cache

# to use requests_cache just replace requests.Session with requests_cache.CachedSession
session = requests_cache.CachedSession('demo_cache')
for i in range(60):
    session.get('https://httpbin.org/delay/1')

# or patch global requests
requests_cache.install_cache('demo_cache')
requests.get('https://httpbin.org/delay/1')

# there are various configuration options: 
session = CachedSession(
  'demo_cache',
  use_cache_dir=True,                # Save files in the default user cache dir
  cache_control=True,                # Use Cache-Control response headers for expiration, if available
  expire_after=timedelta(days=1),    # Otherwise expire responses after one day
  allowable_codes=[200, 400],        # Cache 400 responses as a solemn reminder of your failures
  allowable_methods=['GET', 'POST'], # Cache whatever HTTP methods you want
  ignored_parameters=['api_key'],    # Don't match this request param, and redact if from the cache
  match_headers=['Accept-Language'], # Cache a different response per language
  stale_if_error=True,               # In case of request errors, use stale cache data if possible
) 

Alternatives / Similar


Was this page helpful?