Anti-Bot Protections
Modern websites use anti-bot protection systems to detect and block automated traffic, including web scrapers. Understanding these systems is essential for building reliable scrapers.
Need to bypass anti-bot protections?
Scrapfly handles anti-bot bypass automatically for all major protection systems with a single API parameter. See the bypass hub for details, or check Scrapeway benchmarks for independent success rate data.
How Anti-Bot Systems Work
Most anti-bot systems use a combination of these detection methods:
| Detection Method | What It Does | Hard to Evade? |
|---|---|---|
| TLS Fingerprinting | Analyzes the TLS handshake (cipher suites, extensions, curves) to identify the client. Each browser has a unique TLS signature (JA3/JA4 hash). Standard HTTP libraries have different signatures than real browsers. | Yes |
| HTTP/2 Fingerprinting | Examines HTTP/2 frame settings, header ordering, and priority schemes that differ between browsers and HTTP libraries. | Yes |
| JavaScript Challenges | Injects obfuscated JavaScript that must be executed correctly. Verifies the browser environment is genuine. | Moderate |
| Browser Fingerprinting | Collects Canvas, WebGL, Audio context, fonts, screen resolution, and other browser properties to build a device fingerprint. | Hard |
| Behavioral Analysis | Monitors mouse movements, scroll patterns, click timing, and navigation flow to distinguish humans from bots. | Very Hard |
| CAPTCHA Challenges | Presents visual or interactive challenges (Turnstile, reCAPTCHA, hCaptcha, FunCaptcha) that require human solving or CAPTCHA API services. | Hard |
| IP Reputation | Checks the request IP against databases of known data centers, VPNs, and proxy services. Residential IPs are trusted more. | Moderate |
Major Anti-Bot Systems
Cloudflare
The most widely deployed anti-bot system, protecting millions of websites. Cloudflare uses a layered approach combining TLS fingerprinting, JavaScript challenges (Turnstile CAPTCHA), and behavioral analysis.
| Detection | TLS fingerprinting (JA3/JA4), Turnstile CAPTCHA, JS challenges, HTTP header validation |
| Difficulty | Moderate to Hard |
| Used By | Indeed, and hundreds of thousands of other websites |
| Bypass | Bypass Cloudflare with Scrapfly (98% success rate) |
Cloudflare is the most common anti-bot system you will encounter. For scraping Cloudflare-protected sites, standard HTTP libraries will not work. You need either a TLS fingerprint library like curl-cffi or primp, an anti-detect browser, or a web scraping API.
DataDome
One of the most sophisticated anti-bot systems, using per-customer ML models that learn from each website's unique traffic patterns. DataDome is very difficult to bypass at scale because its behavioral analysis continuously adapts.
| Detection | Real-time ML models, device fingerprinting, behavioral analysis (mouse, scroll, keyboard), slider CAPTCHA |
| Difficulty | Very Hard |
| Used By | Etsy, TripAdvisor, Foot Locker, SoundCloud |
| Bypass | Bypass DataDome with Scrapfly (96% success rate) |
DataDome's per-customer ML models make each website unique, so bypass techniques that work on one site may not work on another. For reliable scraping, a web scraping API is usually the best approach.
Akamai Bot Manager
Akamai's anti-bot solution is deployed at the CDN edge, making it fast and hard to circumvent. TLS fingerprinting is its primary detection vector, combined with sensor data validation and device fingerprinting.
| Detection | TLS fingerprinting, sensor data (_abck cookies), device fingerprinting, behavioral analysis |
| Difficulty | Hard to Very Hard |
| Used By | Major enterprise websites across finance, retail, and media |
| Bypass | Bypass Akamai with Scrapfly (97% success rate) |
Akamai's TLS fingerprinting is particularly effective because it blocks at the edge before requests even reach the origin server. Libraries like curl-cffi that impersonate browser TLS fingerprints are essential for direct bypass attempts.
PerimeterX (HUMAN Security)
PerimeterX (now HUMAN Security) uses sophisticated behavioral biometrics to detect automation. It tracks mouse movements, click patterns, keystroke timing, and navigation sequences to build a behavioral profile.
| Detection | Behavioral biometrics (_px cookies), Human Challenge, browser fingerprinting, IP reputation |
| Difficulty | Hard |
| Used By | Zillow, StockX, Wayfair, Booking.com, Craigslist |
| Bypass | Bypass PerimeterX with Scrapfly (95% success rate) |
PerimeterX is commonly found on e-commerce and real estate websites. For scraping targets like Zillow or StockX, you will need to handle PerimeterX challenges.
Kasada
Kasada uses proof-of-work challenges that require computational resources to solve, combined with behavioral analysis and threat intelligence. This makes automated bypass more expensive.
| Detection | Proof-of-work challenges, kas.js cookies, behavioral analysis, threat intelligence |
| Difficulty | Hard |
| Used By | Realtor.com and other high-value targets |
| Bypass | Bypass Kasada with Scrapfly (94% success rate) |
Kasada's proof-of-work approach means each request costs computational time, making high-volume bypass expensive. For scraping Realtor.com and similar Kasada-protected sites, a web scraping API is the practical choice.
Imperva / Incapsula
One of the oldest WAF/anti-bot providers. Imperva collects 180+ encrypted values via client-side JavaScript to build a trust score for each visitor.
| Detection | reese84 challenges, incap_ses cookies, JS fingerprinting (180+ signals), behavioral analysis |
| Difficulty | Moderate to Hard |
| Used By | Enterprise websites across healthcare, finance, and government |
| Bypass | Bypass Incapsula with Scrapfly (96% success rate) |
Imperva typically returns 403 errors when it detects automated traffic. The reese84 challenge mechanism requires JavaScript execution, so basic HTTP clients will be blocked.
F5 Shape Security
F5's bot defense uses a randomized virtual machine with custom opcodes for client-side detection, making it one of the hardest anti-bot systems to reverse engineer.
| Detection | VM-based obfuscation, TS cookies, BIG-IP ASM, TLS fingerprinting, client-side protection (f5_cspm) |
| Difficulty | Very Hard |
| Used By | Major airlines, banks, and enterprise sites |
| Bypass | Bypass F5 with Scrapfly (95% success rate) |
F5's VM-based obfuscation is extremely difficult to reverse engineer. Standard headless browsers and basic HTTP clients have no chance. This is one of the few anti-bot systems where a web scraping API is almost always the right approach.
AWS WAF Bot Control
Amazon's cloud-native WAF with two detection levels: Common (self-identifying bots) and Targeted (ML-based detection of sophisticated bots).
| Detection | ML analysis, aws-waf-token cookies, challenge.js scripts, Bot Control rules |
| Difficulty | Moderate |
| Used By | Amazon (CloudFront WAF), and websites hosted on AWS |
| Bypass | Bypass AWS WAF with Scrapfly (96% success rate) |
AWS WAF is one of the easier anti-bot systems to handle compared to specialized providers. For scraping Amazon specifically, the main challenge is CloudFront WAF which uses AWS WAF Bot Control under the hood.
Arkose Labs / FunCaptcha
Arkose Labs specializes in interactive CAPTCHA challenges (gamified puzzles, 3D object manipulation) combined with behavioral deep scanning.
| Detection | Gamified CAPTCHA challenges, behavioral deep scan, risk profiling, device fingerprinting |
| Difficulty | Hard (requires CAPTCHA solving) |
| Used By | LinkedIn, Adobe, Roblox, Microsoft, OpenAI |
Arkose Labs challenges require either manual solving, CAPTCHA solving services (2captcha, anti-captcha), or specialized automation. For scraping LinkedIn, Arkose Labs (FunCaptcha) is the primary challenge.
Difficulty Ranking
From easiest to hardest to bypass:
| Rank | System | Difficulty | Primary Detection |
|---|---|---|---|
| 1 | AWS WAF | Moderate | ML + token cookies |
| 2 | Imperva | Moderate-Hard | JS fingerprinting + reese84 |
| 3 | Cloudflare | Moderate-Hard | TLS + Turnstile |
| 4 | PerimeterX | Hard | Behavioral biometrics |
| 5 | Kasada | Hard | Proof-of-work |
| 6 | Arkose Labs | Hard | Interactive CAPTCHA |
| 7 | Akamai | Hard-Very Hard | TLS + sensor data |
| 8 | DataDome | Very Hard | Per-customer ML |
| 9 | F5 Shape | Very Hard | VM obfuscation |
Choosing the Right Approach
| Your Situation | Recommended Approach |
|---|---|
| No anti-bot detected | Standard HTTP client (httpx, requests) |
| TLS/HTTP fingerprinting only | curl-cffi, primp |
| JavaScript challenges | Anti-detect browser or headless browser |
| CAPTCHA challenges | CAPTCHA solving service + browser automation |
| Multiple protection layers | Web scraping API like Scrapfly |
| Production at scale | Web scraping API for reliability and maintenance-free operation |
For an independent comparison of how well different scraping APIs handle anti-bot protections, see Scrapeway's benchmarks.
Identify Anti-Bot Protection
Not sure which anti-bot system a website uses? Try Scrapfly's Anti-Bot Detector to identify the protection system before you start building your scraper.
Related
- Browser Libraries - anti-detect browsers and TLS fingerprint tools
- Browser Automation - Playwright, Puppeteer, Selenium
- Web Scrapers - ready-to-use scrapers for popular protected websites
- Frameworks - web scraping frameworks