Skip to content


MIT 34 5 6,609
39.9 thousand (month) Sep 10 2012 1.5.0(3 months ago)
411 2 24 MIT
0.1.3(8 months ago) Feb 20 2022 27 (month)

node-crawler is a popular web scraping library for Node.js that allows you to easily navigate and extract data from websites. It has a simple API and supports concurrency, making it efficient for scraping large numbers of pages.


  • Server-side DOM & automatic jQuery insertion with Cheerio (default) or JSDOM,
  • Configurable pool size and retries,
  • Control rate limit,
  • Priority queue of requests,
  • forceUTF8 mode to let crawler deal for you with charset detection and conversion,
  • Compatible with 4.x or newer version.
  • Http2 support
  • Proxy support

Dude (dude uncomplicated data extraction) is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax.

The simplest web scraper will look like this:

from dude import select

def get_link(element):
    return {"url": element.get_attribute("href")}

dude supports multiple parser backends: - playwright
- lxml
- parsel - beautifulsoup - pyppeteer - selenium

Example Use

const Crawler = require('crawler');

const c = new Crawler({
    maxConnections: 10,
    // This will be called for each crawled page
    callback: (error, res, done) => {
        if (error) {
        } else {
            const $ = res.$;
            // $ is Cheerio by default
            //a lean implementation of core jQuery designed specifically for the server

// Queue just one URL, with default callback

// Queue a list of URLs

// Queue URLs with custom callbacks & parameters
    uri: '',
    jQuery: false,

    // The global callback won't be called
    callback: (error, res, done) => {
        if (error) {
        } else {
            console.log('Grabbed', res.body.length, 'bytes');

// Queue some HTML code directly without grabbing (mostly for tests)
    html: '<p>This is a <strong>test</strong></p>'
from dude import select

This example demonstrates how to use Parsel + async HTTPX
To access an attribute, use:
You can also access an attribute using the ::attr(name) pseudo-element, for example "a::attr(href)", then:
To get the text, use ::text pseudo-element, then:

@select(css="a.url", priority=2)
async def result_url(selector):
    return {"url": selector.attrib["href"]}

# Option to get url using ::attr(name) pseudo-element
@select(css="a.url::attr(href)", priority=2)
async def result_url2(selector):
    return {"url2": selector.get()}

@select(css=".title::text", priority=1)
async def result_title(selector):
    return {"title": selector.get()}

@select(css=".description::text", priority=0)
async def result_description(selector):
    return {"description": selector.get()}

if __name__ == "__main__":
    import dude[""], parser="parsel")

Alternatives / Similar