Skip to content

xml2vsdomcrawler

MIT 62 3 213
675.4 thousand (month) Apr 20 2015 1.3.6(7 months ago)
3,881 8 - MIT
v7.0.3(28 days ago) Sep 26 2011 154.4 thousand (month)

The xml2 package is a binding to libxml2, making it easy to work with HTML and XML from R. The API is somewhat inspired by jQuery.

xml2 can be used to parse HTML documents using XPath selectors and is a successor to R's XML package with a few improvements:

  • xml2 takes care of memory management for you. It will automatically free the memory used by an XML document as soon as the last reference to it goes away.
  • xml2 has a very simple class hierarchy so don't need to think about exactly what type of object you have, xml2 will just do the right thing.
  • More convenient handling of namespaces in Xpath expressions - see xml_ns() and xml_ns_strip() to get started.

DOMCrawler library is part of the Symfony Components project and provides an easy way to traverse and manipulate HTML and XML documents using the Document Object Model (DOM) in PHP.

DOMcrawler supports both CSS selectors and XPath for HTML document parsing and is one the most popular HTML parsing tools used in web scraping with PHP.

Example Use


library("xml2")
x <- read_xml("<foo> <bar> text <baz/> </bar> </foo>")
x

xml_name(x)
xml_children(x)
xml_text(x)
xml_find_all(x, ".//baz")

h <- read_html("<html><p>Hi <b>!")
h
xml_name(h)
use Symfony\Component\DomCrawler\Crawler;

$html = '<html><body><h1 class="title">Hello World</h1></body></html>';
$crawler = new Crawler($html);

// Find all elements using CSS selectors
$elements = $crawler->filter('.title')i;
// or XPath
$elements = $crawler->filterXPath('//h1');

// Print the text content of the elements
foreach ($elements as $element) {
    echo $element->textContent;
}

Alternatives / Similar