Skip to content

cascadiavsdomcrawler

BSD-2-Clause 1 1 683
58.1 thousand (month) Feb 20 2018 Start(6 years ago)
3,915 8 - MIT
Sep 26 2011 163.2 thousand (month) v7.1.0-RC1(a month ago)

cascadia is a library for Go that provides a CSS selector engine, allowing you to use CSS selectors to select elements from an HTML document.

It is built on top of the html package in the Go standard library, and provides a more efficient and powerful way to select elements from an HTML document.

DOMCrawler library is part of the Symfony Components project and provides an easy way to traverse and manipulate HTML and XML documents using the Document Object Model (DOM) in PHP.

DOMcrawler supports both CSS selectors and XPath for HTML document parsing and is one the most popular HTML parsing tools used in web scraping with PHP.

Example Use


package main

import (
  "fmt"
  "github.com/andybalholm/cascadia"
  "golang.org/x/net/html"
  "strings"
)

func main() {
  // Create an HTML string
  html := `<html>
        <body>
          <div id="content">
            <p>Hello, World!</p>
            <a href="http://example.com">Example</a>
          </div>
        </body>
      </html>`

  // Parse the HTML string into a node tree
  doc, err := html.Parse(strings.NewReader(html))
  if err != nil {
    fmt.Println("Error:", err)
    return
  }

  // Compile the CSS selector
  sel, err := cascadia.Compile("p")
  if err != nil {
    fmt.Println("Error:", err)
    return
  }

  // Use the Selector.Match method to select elements from the document
  matches := sel.Match(doc)
  if len(matches) > 0 {
    fmt.Println(matches[0].FirstChild.Data)
    // > Hello, World!
  }
}
use Symfony\Component\DomCrawler\Crawler;

$html = '<html><body><h1 class="title">Hello World</h1></body></html>';
$crawler = new Crawler($html);

// Find all elements using CSS selectors
$elements = $crawler->filter('.title')i;
// or XPath
$elements = $crawler->filterXPath('//h1');

// Print the text content of the elements
foreach ($elements as $element) {
    echo $element->textContent;
}

Alternatives / Similar


Was this page helpful?