Skip to content


MIT 4 4 4,169
88.4 million (month) Aug 28 2011 9.1.0(a month ago)
1,431 5 36 MIT
2.8.1(9 months ago) Jun 01 2013 147.7 thousand (month)

htmlparser2 is a Node.js library for parsing HTML and XML documents. It works by building a tree of elements, similar to the Document Object Model (DOM) in web browsers. This allows you to easily traverse and manipulate the structure of the document.

htmlparser2 is a low-level html tree parser but it can still be useful in web scraping as it's a powerful tool for HTML restructuring and serialization.

HTML5 is a standards-compliant HTML5 parser and writer written entirely in PHP. It is stable and used in many production websites, and has well over five million downloads.

HTML5 provides the following features:

  • An HTML5 serializer
  • Support for PHP namespaces
  • Composer support
  • Event-based (SAX-like) parser
  • A DOM tree builder
  • Interoperability with QueryPath
  • Runs on PHP 5.3.0 or newer

Note that html5-php is a low-level HTML parser and does not feature any query features like CSS selectors.

Example Use

const htmlparser = require("htmlparser2");
const parser = new htmlparser.Parser({
    onopentag: (name, attribs) => {
        console.log(`Opening tag: ${name}`);
    ontext: (text) => {
        console.log(`Text: ${text}`);
    onclosetag: (name) => {
        console.log(`Closing tag: ${name}`);
}, {decodeEntities: true});

const html = "<p>Hello, <b>world</b>!</p>";
// Assuming you installed from Composer:
require "vendor/autoload.php";

use Masterminds\HTML5;

// An example HTML document:
$html = <<< 'HERE'
  <body id='foo'>
    <h1>Hello World</h1>
    <p>This is a test of the HTML5 parser.</p>

// Parse the document. $dom is a DOMDocument.
$html5 = new HTML5();
$dom = $html5->loadHTML($html);

// Render it as HTML5:
print $html5->saveHTML($dom);

// Or save it to a file:
$html5->save($dom, 'out.html');

Alternatives / Similar