Skip to content

choppervsdomcrawler

MIT 1 3 23
1.7 thousand (month) Jul 24 2014 0.6.0(2023-04-26 10:16:25 ago)
4,038 9 - MIT
Sep 26 2011 209.2 thousand (month) v8.0.8(2026-03-30 15:14:47 ago)

Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules.

Compared to other HTML parsers Chopper is designed to retain original HTML tree but eliminate elements that do not match parsing rules. Meaning, we can parse HTML elements and keep thei structure for machine learning or other tasks where data structure is needed as well as the data value.

DOMCrawler library is part of the Symfony Components project and provides an easy way to traverse and manipulate HTML and XML documents using the Document Object Model (DOM) in PHP.

DOMcrawler supports both CSS selectors and XPath for HTML document parsing and is one the most popular HTML parsing tools used in web scraping with PHP.

Example Use


```python HTML = """ Test
HELLO WORLD Do not want

<div id="footer"></div>

"""

CSS = """ div { border: 1px solid black; } div#main { color: blue; } div.iwantthis { background-color: red; } a { color: green; } div#footer { border-top: 2px solid red; } """

extractor = Extractor.keep('//div[@class="iwantthis"]').discard('//a') html, css = extractor.extract(HTML, CSS)

will result in:

html """

HELLO WORLD

"""

css """ div{border:1px solid black;} div#main{color:blue;} div.iwantthis{background-color:red;} """ ```

```javascript use Symfony\Component\DomCrawler\Crawler;

$html = '

Hello World

'; $crawler = new Crawler($html);

// Find all elements using CSS selectors $elements = $crawler->filter('.title')i; // or XPath $elements = $crawler->filterXPath('//h1');

// Print the text content of the elements foreach ($elements as $element) { echo $element->textContent; } ```

Alternatives / Similar


Was this page helpful?