Skip to content

cascadiavsnokogiri

BSD-2-Clause 1 1 754
58.1 thousand (month) Feb 20 2018 Start(2018-02-20 18:47:44 ago)
6,248 23 108 MIT
Jul 25 2009 5.8 million (month) 1.19.2(2026-03-19 21:12:43 ago)

cascadia is a library for Go that provides a CSS selector engine, allowing you to use CSS selectors to select elements from an HTML document.

It is built on top of the html package in the Go standard library, and provides a more efficient and powerful way to select elements from an HTML document.

Nokogiri is a Ruby gem that provides a simple and powerful way to parse and search XML and HTML documents. It is built on top of the underlying C library libxml2, which is known for its speed and reliability.

Nokogiri provides a simple and intuitive API for parsing and searching XML and HTML documents, and it is widely used in the Ruby ecosystem for web scraping and data extraction.

One of the main features of Nokogiri is its ability to search and navigate through XML and HTML documents using a CSS or XPath selectors.

Nokogiri also provides a variety of other features that can simplify the process of working with XML and HTML documents. It can automatically handle character encodings and normalize documents, it can parse and search large documents with low memory usage, and it can validate documents against a DTD or schema.

Highlights


css-selectorsxpathpopular

Example Use


```go package main import ( "fmt" "github.com/andybalholm/cascadia" "golang.org/x/net/html" "strings" ) func main() { // Create an HTML string html := `

Hello, World!

Example
` // Parse the HTML string into a node tree doc, err := html.Parse(strings.NewReader(html)) if err != nil { fmt.Println("Error:", err) return } // Compile the CSS selector sel, err := cascadia.Compile("p") if err != nil { fmt.Println("Error:", err) return } // Use the Selector.Match method to select elements from the document matches := sel.Match(doc) if len(matches) > 0 { fmt.Println(matches[0].FirstChild.Data) // > Hello, World! } } ```

```ruby require 'nokogiri'

html_string = 'Page Title

Hello World!

This is a sample webpage.

'

Parse the HTML string

doc = Nokogiri::HTML(html_string)

Extract the class attribute of h1 tag using CSS selector

h1_class = doc.css("h1")[0]['class']

or XPath

h1_class = doc.xpath("//h1")[0]['class'] puts "H1 class: #{h1_class}" ```

Alternatives / Similar


Was this page helpful?