Skip to content

cascadiavshtml5lib

BSD-2-Clause 1 1 675
58.1 thousand (month) Feb 20 2018 Start(6 years ago)
1,092 14 83 MIT License
1.1(3 years ago) Jul 30 2007 18.9 million (month)

cascadia is a library for Go that provides a CSS selector engine, allowing you to use CSS selectors to select elements from an HTML document.

It is built on top of the html package in the Go standard library, and provides a more efficient and powerful way to select elements from an HTML document.

html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.

As html5lib is implemented in pure-python it is significantly slower than alternatives powered by lxml (like parsel or beautifulsoup). However, html5lib implements a more true html5 parsing which can represent HTML tree more correctly than alternatives.

Example Use


package main

import (
  "fmt"
  "github.com/andybalholm/cascadia"
  "golang.org/x/net/html"
  "strings"
)

func main() {
  // Create an HTML string
  html := `<html>
        <body>
          <div id="content">
            <p>Hello, World!</p>
            <a href="http://example.com">Example</a>
          </div>
        </body>
      </html>`

  // Parse the HTML string into a node tree
  doc, err := html.Parse(strings.NewReader(html))
  if err != nil {
    fmt.Println("Error:", err)
    return
  }

  // Compile the CSS selector
  sel, err := cascadia.Compile("p")
  if err != nil {
    fmt.Println("Error:", err)
    return
  }

  // Use the Selector.Match method to select elements from the document
  matches := sel.Match(doc)
  if len(matches) > 0 {
    fmt.Println(matches[0].FirstChild.Data)
    // > Hello, World!
  }
}
import html5lib
from html5lib import parse

html_doc = "<html><head><title>My Title</title></head><body></body></html>"
parsed = parse(html_doc)
title = parsed.getElementsByTagName("title")[0]
print(title.childNodes[0].nodeValue)

Alternatives / Similar