Skip to content

html5libvscascadia

MIT 88 14 1,153
20.5 million (month) Jul 30 2007 1.1(5 years ago)
717 1 1 BSD-2-Clause
Feb 20 2018 58.1 thousand (month) Start(7 years ago)

html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.

As html5lib is implemented in pure-python it is significantly slower than alternatives powered by lxml (like parsel or beautifulsoup). However, html5lib implements a more true html5 parsing which can represent HTML tree more correctly than alternatives.

cascadia is a library for Go that provides a CSS selector engine, allowing you to use CSS selectors to select elements from an HTML document.

It is built on top of the html package in the Go standard library, and provides a more efficient and powerful way to select elements from an HTML document.

Example Use


import html5lib
from html5lib import parse

html_doc = "<html><head><title>My Title</title></head><body></body></html>"
parsed = parse(html_doc)
title = parsed.getElementsByTagName("title")[0]
print(title.childNodes[0].nodeValue)
package main

import (
  "fmt"
  "github.com/andybalholm/cascadia"
  "golang.org/x/net/html"
  "strings"
)

func main() {
  // Create an HTML string
  html := `<html>
        <body>
          <div id="content">
            <p>Hello, World!</p>
            <a href="http://example.com">Example</a>
          </div>
        </body>
      </html>`

  // Parse the HTML string into a node tree
  doc, err := html.Parse(strings.NewReader(html))
  if err != nil {
    fmt.Println("Error:", err)
    return
  }

  // Compile the CSS selector
  sel, err := cascadia.Compile("p")
  if err != nil {
    fmt.Println("Error:", err)
    return
  }

  // Use the Selector.Match method to select elements from the document
  matches := sel.Match(doc)
  if len(matches) > 0 {
    fmt.Println(matches[0].FirstChild.Data)
    // > Hello, World!
  }
}

Alternatives / Similar


Was this page helpful?