html5-phpvsxml2

MIT 34 5 1,638

200.0 thousand (month) Jun 01 2013 2.9.0(10 months ago)

221 3 61 MIT

Apr 20 2015 831.7 thousand (month) 1.3.6(1 year, 7 months ago)

HTML5 is a standards-compliant HTML5 parser and writer written entirely in PHP. It is stable and used in many production websites, and has well over five million downloads.

HTML5 provides the following features:

An HTML5 serializer
Support for PHP namespaces
Composer support
Event-based (SAX-like) parser
A DOM tree builder
Interoperability with QueryPath
Runs on PHP 5.3.0 or newer

Note that html5-php is a low-level HTML parser and does not feature any query features like CSS selectors.

The xml2 package is a binding to libxml2, making it easy to work with HTML and XML from R. The API is somewhat inspired by jQuery.

xml2 can be used to parse HTML documents using XPath selectors and is a successor to R's XML package with a few improvements:

xml2 takes care of memory management for you. It will automatically free the memory used by an XML document as soon as the last reference to it goes away.
xml2 has a very simple class hierarchy so don't need to think about exactly what type of object you have, xml2 will just do the right thing.
More convenient handling of namespaces in Xpath expressions - see xml_ns() and xml_ns_strip() to get started.

Example Use

<?php
// Assuming you installed from Composer:
require "vendor/autoload.php";

use Masterminds\HTML5;

// An example HTML document:
$html = <<< 'HERE'
  <html>
  <head>
    <title>TEST</title>
  </head>
  <body id='foo'>
    <h1>Hello World</h1>
    <p>This is a test of the HTML5 parser.</p>
  </body>
  </html>
HERE;

// Parse the document. $dom is a DOMDocument.
$html5 = new HTML5();
$dom = $html5->loadHTML($html);

// Render it as HTML5:
print $html5->saveHTML($dom);

// Or save it to a file:
$html5->save($dom, 'out.html');

library("xml2")
x <- read_xml("<foo> <bar> text <baz/> </bar> </foo>")
x

xml_name(x)
xml_children(x)
xml_text(x)
xml_find_all(x, ".//baz")

h <- read_html("<html><p>Hi <b>!")
h
xml_name(h)