Skip to content

Browser Libraries

Beyond the core browser automation tools (Playwright, Puppeteer, Selenium), a growing ecosystem of libraries focuses on specialized browser-based scraping needs: anti-detection, AI-powered control, and stealth enhancement.

Anti-Detect Browsers

These libraries are designed to make browser automation invisible to anti-bot systems. They address detection vectors like navigator.webdriver, CDP fingerprinting, TLS fingerprinting, and canvas/WebGL fingerprinting.

Library Language Base Browser Approach
nodriver Python Chrome Direct CDP, no WebDriver dependency
camoufox Python Firefox C++-level patches, realistic fingerprints
pydoll Python Chrome CDP-native, network interception
undetected-chromedriver Python Chrome Patched chromedriver, Selenium-based
puppeteer-extra + stealth NodeJS Chrome Plugin framework with stealth patches
selenium-driverless Python Chrome Selenium API without chromedriver binary

Choosing an anti-detect library:

  • nodriver is the recommended default for Python - fast, modern, and maintained by the undetected-chromedriver author.
  • camoufox is best when you need Firefox specifically (some anti-bot systems treat Firefox differently than Chrome).
  • puppeteer-extra with stealth plugin is the standard for JavaScript/NodeJS.
  • selenium-driverless is useful when you need Selenium's API compatibility without chromedriver.

AI Browser Agents

A new category of tools uses large language models to control browsers through natural language instructions. Instead of writing selectors, you describe what you want and the AI navigates, clicks, and extracts.

Library Language Approach
browser-use Python LLM agent + Playwright, multi-step task automation
stagehand NodeJS act/extract/observe primitives, TypeScript, Browserbase
skyvern Python LLM + computer vision, screenshot-based interaction
crawl4ai Python LLM extraction with markdown conversion
scrapegraphai Python Graph-based LLM pipelines with Pydantic schemas

When to use AI browser agents:

  • 👍 Scraping diverse sites with varying layouts (no single CSS selector works)
  • 👍 Rapid prototyping without studying page structure
  • 👍 Complex multi-step workflows (login → navigate → fill form → extract)
  • 👎 High-volume production scraping (LLM API cost per page)
  • 👎 Sites with stable, simple HTML (traditional selectors are cheaper and faster)

TLS/HTTP Fingerprint Libraries

A different approach to avoiding detection operates at the HTTP connection level rather than the browser level. These libraries impersonate real browsers' TLS handshakes, HTTP/2 settings, and header ordering.

Library Language Approach
curl-cffi Python cURL-based, Chrome/Firefox/Safari impersonation
primp Python Rust-powered, lightweight browser fingerprint matching
hrequests Python requests-like API with TLS fingerprinting
curl-impersonate Python Low-level cURL with browser TLS patches

These are useful when you don't need a full browser but standard HTTP clients (requests, httpx) get blocked due to TLS/HTTP fingerprinting. They are much faster and lighter than running a browser.

Comparison

Need Recommended Approach
JavaScript rendering required Playwright, Puppeteer, or anti-detect browser
Blocked by TLS/HTTP fingerprinting curl-cffi, primp
Blocked by browser fingerprinting nodriver, camoufox, puppeteer-extra
Diverse sites, varying layouts AI browser agent (browser-use, stagehand)
Rapid prototyping crawl4ai, scrapegraphai
High-volume production Playwright or curl-cffi + traditional selectors

See also: Anti-Bot Protections for a guide to Cloudflare, DataDome, Akamai and other WAFs, Browser Automation for Playwright/Puppeteer/Selenium basics, Frameworks for full scraping frameworks, and Web Scrapers for ready-to-use scrapers for popular websites.

Was this page helpful?