ua-parservscloudscraper

Apache-2.0 6 6 596

7.0 million (month) Dec 29 2012 1.0.1(6 months ago)

4,683 1 7 MIT

Dec 28 2012 1.0 million (month) 1.2.71(2 years ago)

ua-parser is an User-Agent header string parser for Python. It's inspired by javascript package with the same name ua-parser and performs identifcal functionality but in Python.

In web scraping, ua-parser is used in anti-scraping bypass. It helps to design user agent string rotation in scraper fingerprinting by providing consisten and reliably information about user agents.

A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests. Cloudflare changes their techniques periodically, so I will update this repo frequently.

This can be useful if you wish to scrape or crawl a website protected with Cloudflare. Cloudflare's anti-bot page currently just checks if the client supports Javascript, though they may add additional techniques in the future.

Due to Cloudflare continually changing and hardening their protection page, cloudscraper requires a JavaScript Engine/interpreter to solve Javascript challenges. This allows the script to easily impersonate a regular web browser without explicitly deobfuscating and parsing Cloudflare's Javascript.

For reference, this is the default message Cloudflare uses for these sorts of pages:

Checking your browser before accessing website.com.
This process is automatic. Your browser will redirect to your requested content shortly.

Please allow up to 5 seconds...

Any script using cloudscraper will sleep for ~5 seconds for the first visit to any site with Cloudflare anti-bots enabled, though no delay will occur after the first request.

Cloudscraper is a great introduction to javascript fingerprint/challenge scraper blocking and is a useful educational tool even if it doesn't always work.

Example Use

from ua_parser import user_agent_parser

user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:106.1) Gecko/20100020 Firefox/109.0"
parsed = user_agent_parser.Parse(user_agent)
print(parsed)
{
    "device": {
        "brand": "Apple", 
        "family": "Mac", 
        "model": "Mac",
    },
    "os": {
        "family": "Mac OS X",
        "major": "10",
        "minor": "15",
        "patch": None,
        "patch_minor": None,
    },
    "string": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:106.1) Gecko/20100020 Firefox/109.0",
    "user_agent": {
        "family": "Firefox", 
        "major": "109", 
        "minor": "0", 
        "patch": None,
    },
}

import cloudscraper

scraper = cloudscraper.create_scraper()  # returns a CloudScraper instance
# Or: scraper = cloudscraper.CloudScraper()  # CloudScraper inherits from requests.Session
print(scraper.get("http://somesite.com").text)  # => "<!DOCTYPE html><html><head>..."

Alternatives / Similar

cloudscraper

4,683 compare

ua-parser

596 compare

youtube-dl

134,254 compare

you-get

54,934 compare