Skip to content

htmlqueryvsdomcrawler

MIT 8 1 693
58.1 thousand (month) Feb 07 2019 v1.3.0(1 year, 2 months ago)
3,892 8 - MIT
v7.0.4(a month ago) Sep 26 2011 169.8 thousand (month)

htmlquery is a Go library that allows you to parse and extract data from HTML documents using XPath expressions. It provides a simple and intuitive API for traversing and querying the HTML tree structure, and it is built on top of the popular Goquery library.

DOMCrawler library is part of the Symfony Components project and provides an easy way to traverse and manipulate HTML and XML documents using the Document Object Model (DOM) in PHP.

DOMcrawler supports both CSS selectors and XPath for HTML document parsing and is one the most popular HTML parsing tools used in web scraping with PHP.

Example Use


package main

import (
  "fmt"
  "log"

  "github.com/antchfx/htmlquery"
)

func main() {
  // Parse the HTML string
  doc, err := htmlquery.Parse([]byte(`
    <html>
      <body>
        <h1>Hello, World!</h1>
        <ul>
          <li>Item 1</li>
          <li>Item 2</li>
          <li>Item 3</li>
        </ul>
      </body>
    </html>
  `))
  if err != nil {
    log.Fatal(err)
  }

  // Extract the text of the first <h1> element
  h1 := htmlquery.FindOne(doc, "//h1")
  fmt.Println(htmlquery.InnerText(h1)) // "Hello, World!"

  // Extract the text of all <li> elements
  lis := htmlquery.Find(doc, "//li")
  for _, li := range lis {
    fmt.Println(htmlquery.InnerText(li))
  }
  // "Item 1"
  // "Item 2"
  // "Item 3"
}
use Symfony\Component\DomCrawler\Crawler;

$html = '<html><body><h1 class="title">Hello World</h1></body></html>';
$crawler = new Crawler($html);

// Find all elements using CSS selectors
$elements = $crawler->filter('.title')i;
// or XPath
$elements = $crawler->filterXPath('//h1');

// Print the text content of the elements
foreach ($elements as $element) {
    echo $element->textContent;
}

Alternatives / Similar