How Websites Block Scrapers

Websites use various techniques to detect and block automated scraping. Understanding these methods is the first step to building scrapers that work reliably.

Interactive lesson

This topic is covered in the Scrapfly Academy: Scraper Blocking lesson.

Unintentional Blocking

Before assuming a website is actively blocking you, check for configuration issues:

Missing headers - not sending a User-Agent or Accept header
Wrong HTTP version - using HTTP/1.1 when the site expects HTTP/2
Missing cookies - not accepting or sending required session cookies
Following redirects - not handling 301/302 redirects properly

Fixing these often solves the problem without any bypass techniques.

Intentional Detection Methods

IP-Based Detection

The simplest detection: blocking requests from known data center IP ranges, VPNs, or IPs that make too many requests.

Signals used:

Request rate from a single IP
IP reputation (data center vs residential)
Geographic location anomalies

Mitigation: proxy rotation, residential proxies. See the Scrapfly Academy: Proxies lesson.

TLS Fingerprinting

When your client connects via HTTPS, the TLS handshake reveals a unique fingerprint (JA3/JA4 hash) based on cipher suites, extensions, and curves. Standard Python/Node.js HTTP libraries have fingerprints that are different from real browsers.

Mitigation: use TLS fingerprint impersonation libraries like curl-cffi or primp. See Browser Libraries.

HTTP/2 Fingerprinting

Similar to TLS fingerprinting, but analyzes HTTP/2 connection settings (SETTINGS frame, WINDOW_UPDATE, header ordering, priority schemes).

Mitigation: same libraries that handle TLS fingerprinting.

JavaScript Challenges

The website injects obfuscated JavaScript that must execute correctly. This verifies that your client is a real browser with a JavaScript engine.

Mitigation: use a headless browser or a web scraping API.

Browser Fingerprinting

Collecting Canvas rendering, WebGL renderer, installed fonts, screen resolution, audio context, and dozens of other browser properties to build a unique device fingerprint.

Mitigation: use anti-detect browsers like nodriver or camoufox.

CAPTCHAs

Visual or interactive challenges (reCAPTCHA, hCaptcha, Turnstile, FunCaptcha) that require human-like solving.

Mitigation: CAPTCHA solving services, or Scrapfly which handles CAPTCHAs automatically.

Honeypots

Hidden links or elements that are invisible to human users but visible to scrapers that follow all links or parse all elements. Clicking a honeypot link flags you as a bot.

Mitigation: only follow visible links. Check element visibility (display, opacity, dimensions) before interacting.

Behavioral Analysis

Advanced systems track mouse movement patterns, scroll behavior, click timing, and navigation sequences to distinguish humans from bots.

Mitigation: very difficult to evade. A web scraping API is usually the practical solution.

Anti-Bot Protection Services

Most large websites use commercial anti-bot services. Each has its own detection methods and difficulty level:

Service	Primary Detection	Difficulty
Cloudflare	TLS fingerprinting + Turnstile	Moderate-Hard
DataDome	Per-customer ML + behavioral	Very Hard
Akamai	TLS + sensor data	Hard-Very Hard
PerimeterX	Behavioral biometrics	Hard
Kasada	Proof-of-work	Hard
Imperva	JS fingerprinting (180+ signals)	Moderate-Hard
F5 Shape	VM obfuscation	Very Hard
AWS WAF	ML + token cookies	Moderate

See the full Anti-Bot Protections guide for details on each system and how Scrapfly bypasses them.

You can identify which anti-bot system a website uses with Scrapfly's Anti-Bot Detector.

Bypass Strategy

A practical decision tree:

Check your headers and HTTP version first - most "blocking" is just misconfiguration
Try TLS fingerprinting (curl-cffi, primp) - solves Cloudflare and many WAFs
Try an anti-detect browser (nodriver, camoufox) - solves JS challenges
Use a web scraping API (Scrapfly) - solves everything with minimal code

Next Steps

Anti-Bot Protections - detailed guide to each anti-bot system
Browser Libraries - anti-detect and TLS fingerprint tools
Scrapfly Academy: Scraper Blocking - interactive lesson
Scrapfly Academy: Proxies - proxy rotation strategies