Skip to content

sax-jsvsparse5

ISC 103 1 1,073
149.3 million (month) Feb 09 2011 1.3.0(6 months ago)
3,539 6 27 MIT
7.1.2(8 months ago) Jul 03 2013 173.6 million (month)

sax-js is a streaming XML parser for Node.js that is built on top of the sax C library. It is designed to be fast, low-memory, and easy to use. It is commonly used for parsing large XML files, as it allows you to process the XML data incrementally, rather than loading the entire file into memory at once.

sax-js is a low-level html tree parser and does not provide html query capabilities (like CSS selectors) though it can be useful in HTML tree parsing and serialization.

parse5 is a Node.js library for parsing and manipulating HTML and XML documents. It is designed to be fast and flexible, and it is commonly used in web scraping and web development projects.

parse5 is used by popular libraries such as Angular, Lit, Cheerio and many more. Unlike Cheerio parse5 is a low level html parsing library that might be useful directly in web scraping without higher level abstraction.

Example Use


const fs = require("fs");
const sax = require("sax");

const xmlStream = fs.createReadStream("example.xml");
const saxParser = sax.createStream(true, {});

saxParser.on("opentag", function(node) {
    console.log(`<${node.name}>`);
});

saxParser.on("closetag", function(nodeName) {
    console.log(`</${nodeName}>`);
});

saxParser.on("text", function(text) {
    console.log(text);
});

xmlStream.pipe(saxParser);
const parse5 = require("parse5");

// parse string
const document = parse5.parse('<html><body>Hello World!</body></html>');
console.log(document);

// html tree can be traversed as javascript object:
const body = document.childNodes[1];
console.log(body.childNodes[0].value); // "Hello World!"

// and modified
const newElement = parse5.parseFragment('<p>New Element</p>');
body.appendChild(newElement.childNodes[0]);
console.log(parse5.serialize(document)); 

Alternatives / Similar