html5libvsxpath

MIT 97 14 1,220

32.8 million (month) Jul 30 2007 1.1(2020-06-22 23:32:36 ago)

739 2 18 MIT

Jun 08 2019 58.1 thousand (month) v1.3.6(2026-02-23 07:10:29 ago)

html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.

As html5lib is implemented in pure-python it is significantly slower than alternatives powered by lxml (like parsel or beautifulsoup). However, html5lib implements a more true html5 parsing which can represent HTML tree more correctly than alternatives.

xpath is a library for Go that allows you to use XPath expressions to select elements from an HTML document. It is built on top of the html package in the Go standard library, and provides a way to select elements from an HTML document using XPath expressions, which are more powerful and expressive than CSS selectors.

Example Use

```python import html5lib from html5lib import parse html_doc = "My Title" parsed = parse(html_doc) title = parsed.getElementsByTagName("title")[0] print(title.childNodes[0].nodeValue) ```

```go package main import ( "fmt" "github.com/antchfx/xpath" "golang.org/x/net/html" "strings" ) func main() { // Create an HTML string html := `

Hello, World!

Example

` // Parse the HTML string into a node tree doc, err := html.Parse(strings.NewReader(html)) if err != nil { fmt.Println("Error:", err) return } // Compile the XPath expression expr, err := xpath.Compile("//p") if err != nil { fmt.Println("Error:", err) return } // Use the Evaluate method to select elements from the document nodes, err := expr.Evaluate(xpath.NodeNavigator(doc)) if err != nil { fmt.Println("Error:", err) return } if nodes.MoveNext() { fmt.Println(nodes.Current().Value()) // > Hello, World! } } ```