Skip to content

cheeriovshtmlparser2

MIT 51 13 28,045
34.5 million (month) Oct 08 2011 1.0.0-rc.12(8 months ago)
4,345 4 12 MIT
Aug 28 2011 124.8 million (month) 9.1.0(6 months ago)

cheerio is a popular JavaScript library that allows you to interact with and manipulate HTML and XML documents in a similar way to how you would with jQuery in a browser. It is a fast, flexible, and lean implementation of core jQuery designed specifically for the server.

One of the main benefits of using cheerio is that it allows you to use jQuery-like syntax to navigate and m anipulate the Document Object Model (DOM) of an HTML or XML document, making it easy to work with.

cheerio supports CSS selectors though not XPath.

htmlparser2 is a Node.js library for parsing HTML and XML documents. It works by building a tree of elements, similar to the Document Object Model (DOM) in web browsers. This allows you to easily traverse and manipulate the structure of the document.

htmlparser2 is a low-level html tree parser but it can still be useful in web scraping as it's a powerful tool for HTML restructuring and serialization.

Example Use


const cheerio = require('cheerio');
const $ = cheerio.load('<html><head><title>My title</title></head><body><h1 class='name'>Hello World!</h1></body></html>');
// use css selectors
console.log($('title').text()); // My title
console.log($('.name').text()); // Hello World!

// select multiple elements
const $ = cheerio.load('<html><body><ul><li>item 1</li><li>item 2</li></ul></body></html>');
$('li').each(function(i, elem) {
  console.log($(this).text());
});

// modify elements
const $ = cheerio.load('<html><body><h1>Hello World!</h1></body></html>');
$('h1').text('Hello, Cheerio!');
console.log($.html());
const htmlparser = require("htmlparser2");
const parser = new htmlparser.Parser({
    onopentag: (name, attribs) => {
        console.log(`Opening tag: ${name}`);
    },
    ontext: (text) => {
        console.log(`Text: ${text}`);
    },
    onclosetag: (name) => {
        console.log(`Closing tag: ${name}`);
    }
}, {decodeEntities: true});

const html = "<p>Hello, <b>world</b>!</p>";
parser.write(html);
parser.end();

Alternatives / Similar


Was this page helpful?