Skip to content

puppeteer-stealthvscurl-cffi

MIT 301 30 94,086
3.6 million (month) May 29 2018 2.11.2(2023-04-11 04:13:00 ago)
1,751 2 34 MIT
Feb 23 2022 594.9 thousand (month) 0.7.1(2024-07-13 09:07:25 ago)

Puppeteer Stealth is puppeteer plugin that fortifies headles browser for web scraping. This makes detection of puppeteer scrapers more difficult allowing to scrape targets which use headless browser detection techniques.

Puppeteer-stealth does this by applying various javascript patches to cover up traces of headless browser presence in the web scraping browser's environment.

Curl-cffi is a Python library for implementing curl-impersonate which is a HTTP client that appears as one of popular web browsers like: - Google Chrome - Microsoft Edge - Safari - Firefox Unlike requests and httpx which are native Python libraries, curl-cffi uses cURL and inherits it's powerful features like extensive HTTP protocol support and detection patches for TLS and HTTP fingerprinting.

Using curl-cffi web scrapers can bypass TLS and HTTP fingerprinting.

Highlights


bypasshttp2tls-fingerprinthttp-fingerprintsyncasync

Example Use


```javascript const puppeteer = require('puppeteer-extra') // add stealth plugin and use defaults (all evasion techniques) const StealthPlugin = require('puppeteer-extra-plugin-stealth') puppeteer.use(StealthPlugin()) // puppeteer usage as normal puppeteer.launch({ headless: true }).then(async browser => { console.log('Running tests..') const page = await browser.newPage() await page.goto('https://bot.sannysoft.com') await page.waitForTimeout(5000) await page.screenshot({ path: 'result.png', fullPage: true }) await browser.close() console.log("success - check the result.png screenshot") }) ```
curl-cffi can be accessed as low-level curl client as well as an easy high-level HTTP client: ```python from curl_cffi import requests response = requests.get('https://httpbin.org/json') print(response.json()) # or using sessions session = requests.Session() response = session.get('https://httpbin.org/json') # also supports async requests using asyncio import asyncio from curl_cffi.requests import AsyncSession urls = [ "http://httpbin.org/html", "http://httpbin.org/html", "http://httpbin.org/html", ] async with AsyncSession() as s: tasks = [] for url in urls: task = s.get(url) tasks.append(task) # scrape concurrently: responses = await asyncio.gather(*tasks) # also supports websocket connections from curl_cffi.requests import Session, WebSocket def on_message(ws: WebSocket, message): print(message) with Session() as s: ws = s.ws_connect( "wss://api.gemini.com/v1/marketdata/BTCUSD", on_message=on_message, ) ws.run_forever() ```

Alternatives / Similar


Was this page helpful?