uri.jsvsrequests-cache

MIT 104 2 6,256

100 (month) Jun 18 2013 0.1.3(2 years ago)

1,369 5 36 BSD-2-Clause

Feb 14 2011 2.1 million (month) 1.2.1(8 months ago)

URI.js is a lightweight JavaScript library for working with URLs and URIs in Node.js and the browser. It provides a simple and consistent interface for parsing, manipulating, and building URLs and URIs.

requests-cache is an extension to the popular requests package and it provides easy request/response caching.

This can be very useful in web scraper development as it'll speed up all requests. requests-cache can also be used for programs that integrate web scrapers as it's an easy caching layer for the most time consuming part of web scraping - http connections.

Some features:

Ease of use
Keep using the requests library you're already familiar with. Add caching with a drop-in replacement for requests.Session, or install globally to add transparent caching to all requests functions.
Performance
Get sub-millisecond response times for cached responses. When they expire, you still save time with conditional requests.
Persistence
Works with several storage backends including SQLite, Redis, MongoDB, and DynamoDB; or save responses as plain JSON files, YAML, and more
Expiration
Use Cache-Control and other standard HTTP headers, define your own expiration schedule, keep your cache clutter-free with backends that natively support TTL, or any combination of strategies
Customization
Works out of the box with zero config, but with a robust set of features for configuring and extending the library to suit your needs
Compatibility
Can be combined with other popular libraries based on requests

Example Use

const URI = require('uri-js');

// parse url for values
const parsedUrl = URI.parse("https://www.example.com/search?q=query+string#fragment");
console.log(parsedUrl);
/* Output:
{
    scheme: 'https',
    authority: 'www.example.com',
    path: '/search',
    query: 'q=query+string',
    fragment: 'fragment'
}
*/

// create url from values
const urlComponents = {
    scheme: 'https',
    authority: 'www.example.com',
    path: '/search',
    query: 'q=query+string',
    fragment: 'fragment'
};
const url = URI.serialize(urlComponents);
console.log(url);
// Output: 'https://www.example.com/search?q=query+string#fragment'

import requests_cache

# to use requests_cache just replace requests.Session with requests_cache.CachedSession
session = requests_cache.CachedSession('demo_cache')
for i in range(60):
    session.get('https://httpbin.org/delay/1')

# or patch global requests
requests_cache.install_cache('demo_cache')
requests.get('https://httpbin.org/delay/1')

# there are various configuration options: 
session = CachedSession(
  'demo_cache',
  use_cache_dir=True,                # Save files in the default user cache dir
  cache_control=True,                # Use Cache-Control response headers for expiration, if available
  expire_after=timedelta(days=1),    # Otherwise expire responses after one day
  allowable_codes=[200, 400],        # Cache 400 responses as a solemn reminder of your failures
  allowable_methods=['GET', 'POST'], # Cache whatever HTTP methods you want
  ignored_parameters=['api_key'],    # Don't match this request param, and redact if from the cache
  match_headers=['Accept-Language'], # Cache a different response per language
  stale_if_error=True,               # In case of request errors, use stale cache data if possible
)