Skip to content

Anti-Bot Protections

Modern websites use anti-bot protection systems to detect and block automated traffic, including web scrapers. Understanding these systems is essential for building reliable scrapers.

Need to bypass anti-bot protections?

Scrapfly handles anti-bot bypass automatically for all major protection systems with a single API parameter. See the bypass hub for details, or check Scrapeway benchmarks for independent success rate data.

How Anti-Bot Systems Work

Most anti-bot systems use a combination of these detection methods:

Detection Method What It Does Hard to Evade?
TLS Fingerprinting Analyzes the TLS handshake (cipher suites, extensions, curves) to identify the client. Each browser has a unique TLS signature (JA3/JA4 hash). Standard HTTP libraries have different signatures than real browsers. Yes
HTTP/2 Fingerprinting Examines HTTP/2 frame settings, header ordering, and priority schemes that differ between browsers and HTTP libraries. Yes
JavaScript Challenges Injects obfuscated JavaScript that must be executed correctly. Verifies the browser environment is genuine. Moderate
Browser Fingerprinting Collects Canvas, WebGL, Audio context, fonts, screen resolution, and other browser properties to build a device fingerprint. Hard
Behavioral Analysis Monitors mouse movements, scroll patterns, click timing, and navigation flow to distinguish humans from bots. Very Hard
CAPTCHA Challenges Presents visual or interactive challenges (Turnstile, reCAPTCHA, hCaptcha, FunCaptcha) that require human solving or CAPTCHA API services. Hard
IP Reputation Checks the request IP against databases of known data centers, VPNs, and proxy services. Residential IPs are trusted more. Moderate

Major Anti-Bot Systems

Cloudflare

The most widely deployed anti-bot system, protecting millions of websites. Cloudflare uses a layered approach combining TLS fingerprinting, JavaScript challenges (Turnstile CAPTCHA), and behavioral analysis.

Detection TLS fingerprinting (JA3/JA4), Turnstile CAPTCHA, JS challenges, HTTP header validation
Difficulty Moderate to Hard
Used By Indeed, and hundreds of thousands of other websites
Bypass Bypass Cloudflare with Scrapfly (98% success rate)

Cloudflare is the most common anti-bot system you will encounter. For scraping Cloudflare-protected sites, standard HTTP libraries will not work. You need either a TLS fingerprint library like curl-cffi or primp, an anti-detect browser, or a web scraping API.


DataDome

One of the most sophisticated anti-bot systems, using per-customer ML models that learn from each website's unique traffic patterns. DataDome is very difficult to bypass at scale because its behavioral analysis continuously adapts.

Detection Real-time ML models, device fingerprinting, behavioral analysis (mouse, scroll, keyboard), slider CAPTCHA
Difficulty Very Hard
Used By Etsy, TripAdvisor, Foot Locker, SoundCloud
Bypass Bypass DataDome with Scrapfly (96% success rate)

DataDome's per-customer ML models make each website unique, so bypass techniques that work on one site may not work on another. For reliable scraping, a web scraping API is usually the best approach.


Akamai Bot Manager

Akamai's anti-bot solution is deployed at the CDN edge, making it fast and hard to circumvent. TLS fingerprinting is its primary detection vector, combined with sensor data validation and device fingerprinting.

Detection TLS fingerprinting, sensor data (_abck cookies), device fingerprinting, behavioral analysis
Difficulty Hard to Very Hard
Used By Major enterprise websites across finance, retail, and media
Bypass Bypass Akamai with Scrapfly (97% success rate)

Akamai's TLS fingerprinting is particularly effective because it blocks at the edge before requests even reach the origin server. Libraries like curl-cffi that impersonate browser TLS fingerprints are essential for direct bypass attempts.


PerimeterX (HUMAN Security)

PerimeterX (now HUMAN Security) uses sophisticated behavioral biometrics to detect automation. It tracks mouse movements, click patterns, keystroke timing, and navigation sequences to build a behavioral profile.

Detection Behavioral biometrics (_px cookies), Human Challenge, browser fingerprinting, IP reputation
Difficulty Hard
Used By Zillow, StockX, Wayfair, Booking.com, Craigslist
Bypass Bypass PerimeterX with Scrapfly (95% success rate)

PerimeterX is commonly found on e-commerce and real estate websites. For scraping targets like Zillow or StockX, you will need to handle PerimeterX challenges.


Kasada

Kasada uses proof-of-work challenges that require computational resources to solve, combined with behavioral analysis and threat intelligence. This makes automated bypass more expensive.

Detection Proof-of-work challenges, kas.js cookies, behavioral analysis, threat intelligence
Difficulty Hard
Used By Realtor.com and other high-value targets
Bypass Bypass Kasada with Scrapfly (94% success rate)

Kasada's proof-of-work approach means each request costs computational time, making high-volume bypass expensive. For scraping Realtor.com and similar Kasada-protected sites, a web scraping API is the practical choice.


Imperva / Incapsula

One of the oldest WAF/anti-bot providers. Imperva collects 180+ encrypted values via client-side JavaScript to build a trust score for each visitor.

Detection reese84 challenges, incap_ses cookies, JS fingerprinting (180+ signals), behavioral analysis
Difficulty Moderate to Hard
Used By Enterprise websites across healthcare, finance, and government
Bypass Bypass Incapsula with Scrapfly (96% success rate)

Imperva typically returns 403 errors when it detects automated traffic. The reese84 challenge mechanism requires JavaScript execution, so basic HTTP clients will be blocked.


F5 Shape Security

F5's bot defense uses a randomized virtual machine with custom opcodes for client-side detection, making it one of the hardest anti-bot systems to reverse engineer.

Detection VM-based obfuscation, TS cookies, BIG-IP ASM, TLS fingerprinting, client-side protection (f5_cspm)
Difficulty Very Hard
Used By Major airlines, banks, and enterprise sites
Bypass Bypass F5 with Scrapfly (95% success rate)

F5's VM-based obfuscation is extremely difficult to reverse engineer. Standard headless browsers and basic HTTP clients have no chance. This is one of the few anti-bot systems where a web scraping API is almost always the right approach.


AWS WAF Bot Control

Amazon's cloud-native WAF with two detection levels: Common (self-identifying bots) and Targeted (ML-based detection of sophisticated bots).

Detection ML analysis, aws-waf-token cookies, challenge.js scripts, Bot Control rules
Difficulty Moderate
Used By Amazon (CloudFront WAF), and websites hosted on AWS
Bypass Bypass AWS WAF with Scrapfly (96% success rate)

AWS WAF is one of the easier anti-bot systems to handle compared to specialized providers. For scraping Amazon specifically, the main challenge is CloudFront WAF which uses AWS WAF Bot Control under the hood.


Arkose Labs / FunCaptcha

Arkose Labs specializes in interactive CAPTCHA challenges (gamified puzzles, 3D object manipulation) combined with behavioral deep scanning.

Detection Gamified CAPTCHA challenges, behavioral deep scan, risk profiling, device fingerprinting
Difficulty Hard (requires CAPTCHA solving)
Used By LinkedIn, Adobe, Roblox, Microsoft, OpenAI

Arkose Labs challenges require either manual solving, CAPTCHA solving services (2captcha, anti-captcha), or specialized automation. For scraping LinkedIn, Arkose Labs (FunCaptcha) is the primary challenge.


Difficulty Ranking

From easiest to hardest to bypass:

Rank System Difficulty Primary Detection
1 AWS WAF Moderate ML + token cookies
2 Imperva Moderate-Hard JS fingerprinting + reese84
3 Cloudflare Moderate-Hard TLS + Turnstile
4 PerimeterX Hard Behavioral biometrics
5 Kasada Hard Proof-of-work
6 Arkose Labs Hard Interactive CAPTCHA
7 Akamai Hard-Very Hard TLS + sensor data
8 DataDome Very Hard Per-customer ML
9 F5 Shape Very Hard VM obfuscation

Choosing the Right Approach

Your Situation Recommended Approach
No anti-bot detected Standard HTTP client (httpx, requests)
TLS/HTTP fingerprinting only curl-cffi, primp
JavaScript challenges Anti-detect browser or headless browser
CAPTCHA challenges CAPTCHA solving service + browser automation
Multiple protection layers Web scraping API like Scrapfly
Production at scale Web scraping API for reliability and maintenance-free operation

For an independent comparison of how well different scraping APIs handle anti-bot protections, see Scrapeway's benchmarks.

Identify Anti-Bot Protection

Not sure which anti-bot system a website uses? Try Scrapfly's Anti-Bot Detector to identify the protection system before you start building your scraper.

Was this page helpful?