Skip to content

xpathvshtml5lib

MIT 13 2 646
58.1 thousand (month) Jun 08 2019 (a day ago)
1,092 14 83 MIT License
1.1(3 years ago) Jul 30 2007 18.9 million (month)

xpath is a library for Go that allows you to use XPath expressions to select elements from an HTML document. It is built on top of the html package in the Go standard library, and provides a way to select elements from an HTML document using XPath expressions, which are more powerful and expressive than CSS selectors.

html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.

As html5lib is implemented in pure-python it is significantly slower than alternatives powered by lxml (like parsel or beautifulsoup). However, html5lib implements a more true html5 parsing which can represent HTML tree more correctly than alternatives.

Example Use


package main

import (
  "fmt"
  "github.com/antchfx/xpath"
  "golang.org/x/net/html"
  "strings"
)

func main() {
  // Create an HTML string
  html := `<html>
        <body>
          <div id="content">
            <p>Hello, World!</p>
            <a href="http://example.com">Example</a>
          </div>
        </body>
      </html>`

  // Parse the HTML string into a node tree
  doc, err := html.Parse(strings.NewReader(html))
  if err != nil {
    fmt.Println("Error:", err)
    return
  }

  // Compile the XPath expression
  expr, err := xpath.Compile("//p")
  if err != nil {
    fmt.Println("Error:", err)
    return
  }

  // Use the Evaluate method to select elements from the document
  nodes, err := expr.Evaluate(xpath.NodeNavigator(doc))
  if err != nil {
    fmt.Println("Error:", err)
    return
  }
  if nodes.MoveNext() {
    fmt.Println(nodes.Current().Value())
    // > Hello, World!
  }
}
import html5lib
from html5lib import parse

html_doc = "<html><head><title>My Title</title></head><body></body></html>"
parsed = parse(html_doc)
title = parsed.getElementsByTagName("title")[0]
print(title.childNodes[0].nodeValue)

Alternatives / Similar