Skip to content

htmlparser2vshtml5-php

MIT 14 4 4,304
120.7 million (month) Aug 28 2011 9.1.0(5 months ago)
1,492 5 32 MIT
Jun 01 2013 169.3 thousand (month) 2.9.0(2 months ago)

htmlparser2 is a Node.js library for parsing HTML and XML documents. It works by building a tree of elements, similar to the Document Object Model (DOM) in web browsers. This allows you to easily traverse and manipulate the structure of the document.

htmlparser2 is a low-level html tree parser but it can still be useful in web scraping as it's a powerful tool for HTML restructuring and serialization.

HTML5 is a standards-compliant HTML5 parser and writer written entirely in PHP. It is stable and used in many production websites, and has well over five million downloads.

HTML5 provides the following features:

  • An HTML5 serializer
  • Support for PHP namespaces
  • Composer support
  • Event-based (SAX-like) parser
  • A DOM tree builder
  • Interoperability with QueryPath
  • Runs on PHP 5.3.0 or newer

Note that html5-php is a low-level HTML parser and does not feature any query features like CSS selectors.

Example Use


const htmlparser = require("htmlparser2");
const parser = new htmlparser.Parser({
    onopentag: (name, attribs) => {
        console.log(`Opening tag: ${name}`);
    },
    ontext: (text) => {
        console.log(`Text: ${text}`);
    },
    onclosetag: (name) => {
        console.log(`Closing tag: ${name}`);
    }
}, {decodeEntities: true});

const html = "<p>Hello, <b>world</b>!</p>";
parser.write(html);
parser.end();
<?php
// Assuming you installed from Composer:
require "vendor/autoload.php";

use Masterminds\HTML5;

// An example HTML document:
$html = <<< 'HERE'
  <html>
  <head>
    <title>TEST</title>
  </head>
  <body id='foo'>
    <h1>Hello World</h1>
    <p>This is a test of the HTML5 parser.</p>
  </body>
  </html>
HERE;

// Parse the document. $dom is a DOMDocument.
$html5 = new HTML5();
$dom = $html5->loadHTML($html);

// Render it as HTML5:
print $html5->saveHTML($dom);

// Or save it to a file:
$html5->save($dom, 'out.html');

Alternatives / Similar


Was this page helpful?