Skip to content

sax-jsvshtmlparser2

ISC 101 1 1,070
141.1 million (month) Feb 09 2011 1.3.0(5 months ago)
4,169 4 4 MIT
9.1.0(a month ago) Aug 28 2011 88.4 million (month)

sax-js is a streaming XML parser for Node.js that is built on top of the sax C library. It is designed to be fast, low-memory, and easy to use. It is commonly used for parsing large XML files, as it allows you to process the XML data incrementally, rather than loading the entire file into memory at once.

sax-js is a low-level html tree parser and does not provide html query capabilities (like CSS selectors) though it can be useful in HTML tree parsing and serialization.

htmlparser2 is a Node.js library for parsing HTML and XML documents. It works by building a tree of elements, similar to the Document Object Model (DOM) in web browsers. This allows you to easily traverse and manipulate the structure of the document.

htmlparser2 is a low-level html tree parser but it can still be useful in web scraping as it's a powerful tool for HTML restructuring and serialization.

Example Use


const fs = require("fs");
const sax = require("sax");

const xmlStream = fs.createReadStream("example.xml");
const saxParser = sax.createStream(true, {});

saxParser.on("opentag", function(node) {
    console.log(`<${node.name}>`);
});

saxParser.on("closetag", function(nodeName) {
    console.log(`</${nodeName}>`);
});

saxParser.on("text", function(text) {
    console.log(text);
});

xmlStream.pipe(saxParser);
const htmlparser = require("htmlparser2");
const parser = new htmlparser.Parser({
    onopentag: (name, attribs) => {
        console.log(`Opening tag: ${name}`);
    },
    ontext: (text) => {
        console.log(`Text: ${text}`);
    },
    onclosetag: (name) => {
        console.log(`Closing tag: ${name}`);
    }
}, {decodeEntities: true});

const html = "<p>Hello, <b>world</b>!</p>";
parser.write(html);
parser.end();

Alternatives / Similar