Skip to content


MIT 29 6 6,733
16.0 thousand (month) Sep 10 2012 2.0.2(7 months ago)
554 2 27 GPL-3.0-or-later
May 04 2020 113 (month) 3.0.0(10 months ago)

node-crawler is a popular web scraping library for Node.js that allows you to easily navigate and extract data from websites. It has a simple API and supports concurrency, making it efficient for scraping large numbers of pages.


  • Server-side DOM & automatic jQuery insertion with Cheerio (default) or JSDOM,
  • Configurable pool size and retries,
  • Control rate limit,
  • Priority queue of requests,
  • forceUTF8 mode to let crawler deal for you with charset detection and conversion,
  • Compatible with 4.x or newer version.
  • Http2 support
  • Proxy support

PHPScraper is a universal web-util for PHP. The main goal is to get stuff done instead of getting distracted with selectors, preparing & converting data structures, etc. Instead, you can just go to a website and get the relevant information for your project.

PHPScraper is a minimalistic scraper framework that is built on top of other popular scraping tools.


  • Direct access to page basic features like: Meta data, Links, Images, Headings, Content, Keywords etc.
  • File downloading.
  • RSS, Sitemap and other feed processing.
  • CSV, XML and JSON file processing.

Example Use

const Crawler = require('crawler');

const c = new Crawler({
    maxConnections: 10,
    // This will be called for each crawled page
    callback: (error, res, done) => {
        if (error) {
        } else {
            const $ = res.$;
            // $ is Cheerio by default
            //a lean implementation of core jQuery designed specifically for the server

// Queue just one URL, with default callback

// Queue a list of URLs

// Queue URLs with custom callbacks & parameters
    uri: '',
    jQuery: false,

    // The global callback won't be called
    callback: (error, res, done) => {
        if (error) {
        } else {
            console.log('Grabbed', res.body.length, 'bytes');

// Queue some HTML code directly without grabbing (mostly for tests)
    html: '<p>This is a <strong>test</strong></p>'
// create scraper object
$web = new \Spekulatius\PHPScraper\PHPScraper;
// go to URL

// elements can be found using XPath:
echo $web->filter("//*[@id='by-id']")->text();   // "Content by ID"

// or pre-defined variables covering basic page data:
$web->links;  // for all links
$web->outline;  // basic page outline
$web->cleanOutlineWithParagraphs;  // basic page outline

Alternatives / Similar

Was this page helpful?