How Websites Block Scrapers
Websites use various techniques to detect and block automated scraping. Understanding these methods is the first step to building scrapers that work reliably.
Interactive lesson
This topic is covered in the Scrapfly Academy: Scraper Blocking lesson.
Unintentional Blocking
Before assuming a website is actively blocking you, check for configuration issues:
- Missing headers - not sending a User-Agent or Accept header
- Wrong HTTP version - using HTTP/1.1 when the site expects HTTP/2
- Missing cookies - not accepting or sending required session cookies
- Following redirects - not handling 301/302 redirects properly
Fixing these often solves the problem without any bypass techniques.
Intentional Detection Methods
IP-Based Detection
The simplest detection: blocking requests from known data center IP ranges, VPNs, or IPs that make too many requests.
Signals used:
- Request rate from a single IP
- IP reputation (data center vs residential)
- Geographic location anomalies
Mitigation: proxy rotation, residential proxies. See the Scrapfly Academy: Proxies lesson.
TLS Fingerprinting
When your client connects via HTTPS, the TLS handshake reveals a unique fingerprint (JA3/JA4 hash) based on cipher suites, extensions, and curves. Standard Python/Node.js HTTP libraries have fingerprints that are different from real browsers.
Mitigation: use TLS fingerprint impersonation libraries like curl-cffi or primp. See Browser Libraries.
HTTP/2 Fingerprinting
Similar to TLS fingerprinting, but analyzes HTTP/2 connection settings (SETTINGS frame, WINDOW_UPDATE, header ordering, priority schemes).
Mitigation: same libraries that handle TLS fingerprinting.
JavaScript Challenges
The website injects obfuscated JavaScript that must execute correctly. This verifies that your client is a real browser with a JavaScript engine.
Mitigation: use a headless browser or a web scraping API.
Browser Fingerprinting
Collecting Canvas rendering, WebGL renderer, installed fonts, screen resolution, audio context, and dozens of other browser properties to build a unique device fingerprint.
Mitigation: use anti-detect browsers like nodriver or camoufox.
CAPTCHAs
Visual or interactive challenges (reCAPTCHA, hCaptcha, Turnstile, FunCaptcha) that require human-like solving.
Mitigation: CAPTCHA solving services, or Scrapfly which handles CAPTCHAs automatically.
Honeypots
Hidden links or elements that are invisible to human users but visible to scrapers that follow all links or parse all elements. Clicking a honeypot link flags you as a bot.
Mitigation: only follow visible links. Check element visibility (display, opacity, dimensions) before interacting.
Behavioral Analysis
Advanced systems track mouse movement patterns, scroll behavior, click timing, and navigation sequences to distinguish humans from bots.
Mitigation: very difficult to evade. A web scraping API is usually the practical solution.
Anti-Bot Protection Services
Most large websites use commercial anti-bot services. Each has its own detection methods and difficulty level:
| Service | Primary Detection | Difficulty |
|---|---|---|
| Cloudflare | TLS fingerprinting + Turnstile | Moderate-Hard |
| DataDome | Per-customer ML + behavioral | Very Hard |
| Akamai | TLS + sensor data | Hard-Very Hard |
| PerimeterX | Behavioral biometrics | Hard |
| Kasada | Proof-of-work | Hard |
| Imperva | JS fingerprinting (180+ signals) | Moderate-Hard |
| F5 Shape | VM obfuscation | Very Hard |
| AWS WAF | ML + token cookies | Moderate |
See the full Anti-Bot Protections guide for details on each system and how Scrapfly bypasses them.
You can identify which anti-bot system a website uses with Scrapfly's Anti-Bot Detector.
Bypass Strategy
A practical decision tree:
- Check your headers and HTTP version first - most "blocking" is just misconfiguration
- Try TLS fingerprinting (curl-cffi, primp) - solves Cloudflare and many WAFs
- Try an anti-detect browser (nodriver, camoufox) - solves JS challenges
- Use a web scraping API (Scrapfly) - solves everything with minimal code
Next Steps
- Anti-Bot Protections - detailed guide to each anti-bot system
- Browser Libraries - anti-detect and TLS fingerprint tools
- Scrapfly Academy: Scraper Blocking - interactive lesson
- Scrapfly Academy: Proxies - proxy rotation strategies