choppervshtml5-php
Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules.
Compared to other HTML parsers Chopper is designed to retain original HTML tree but eliminate elements that do not match parsing rules. Meaning, we can parse HTML elements and keep thei structure for machine learning or other tasks where data structure is needed as well as the data value.
HTML5 is a standards-compliant HTML5 parser and writer written entirely in PHP. It is stable and used in many production websites, and has well over five million downloads.
HTML5 provides the following features:
- An HTML5 serializer
- Support for PHP namespaces
- Composer support
- Event-based (SAX-like) parser
- A DOM tree builder
- Interoperability with QueryPath
- Runs on PHP 5.3.0 or newer
Note that html5-php is a low-level HTML parser and does not feature any query features like CSS selectors.