Skip to content

htmlparser2vsparse5

MIT 12 4 4,263
127.1 million (month) Aug 28 2011 9.1.0(2 months ago)
3,539 6 27 MIT
7.1.2(8 months ago) Jul 03 2013 173.6 million (month)

htmlparser2 is a Node.js library for parsing HTML and XML documents. It works by building a tree of elements, similar to the Document Object Model (DOM) in web browsers. This allows you to easily traverse and manipulate the structure of the document.

htmlparser2 is a low-level html tree parser but it can still be useful in web scraping as it's a powerful tool for HTML restructuring and serialization.

parse5 is a Node.js library for parsing and manipulating HTML and XML documents. It is designed to be fast and flexible, and it is commonly used in web scraping and web development projects.

parse5 is used by popular libraries such as Angular, Lit, Cheerio and many more. Unlike Cheerio parse5 is a low level html parsing library that might be useful directly in web scraping without higher level abstraction.

Example Use


const htmlparser = require("htmlparser2");
const parser = new htmlparser.Parser({
    onopentag: (name, attribs) => {
        console.log(`Opening tag: ${name}`);
    },
    ontext: (text) => {
        console.log(`Text: ${text}`);
    },
    onclosetag: (name) => {
        console.log(`Closing tag: ${name}`);
    }
}, {decodeEntities: true});

const html = "<p>Hello, <b>world</b>!</p>";
parser.write(html);
parser.end();
const parse5 = require("parse5");

// parse string
const document = parse5.parse('<html><body>Hello World!</body></html>');
console.log(document);

// html tree can be traversed as javascript object:
const body = document.childNodes[1];
console.log(body.childNodes[0].value); // "Hello World!"

// and modified
const newElement = parse5.parseFragment('<p>New Element</p>');
body.appendChild(newElement.childNodes[0]);
console.log(parse5.serialize(document)); 

Alternatives / Similar