xpathvscssselect
xpath is a library for Go that allows you to use XPath expressions to select elements from an HTML document. It is built on top of the html package in the Go standard library, and provides a way to select elements from an HTML document using XPath expressions, which are more powerful and expressive than CSS selectors.
cssselect is a BSD-licensed Python library to parse CSS3 selectors and translate them to XPath 1.0 expressions.
XPath 1.0 expressions can be used in lxml or another XPath engine to find the matching elements in an XML or HTML document.
cssselect is used by other popular Python packages like parsel
and scrapy
but can also be used on it's own to generate
valid XPath 1.0 expressions for parsing HTML and XML documents in other tools.
Note that because XPath selectors are more powerful than CSS selectors this translation is only possible one way. Converting XPath to CSS selectors is impractical and not supported by cssselect.
Example Use
package main
import (
"fmt"
"github.com/antchfx/xpath"
"golang.org/x/net/html"
"strings"
)
func main() {
// Create an HTML string
html := `<html>
<body>
<div id="content">
<p>Hello, World!</p>
<a href="http://example.com">Example</a>
</div>
</body>
</html>`
// Parse the HTML string into a node tree
doc, err := html.Parse(strings.NewReader(html))
if err != nil {
fmt.Println("Error:", err)
return
}
// Compile the XPath expression
expr, err := xpath.Compile("//p")
if err != nil {
fmt.Println("Error:", err)
return
}
// Use the Evaluate method to select elements from the document
nodes, err := expr.Evaluate(xpath.NodeNavigator(doc))
if err != nil {
fmt.Println("Error:", err)
return
}
if nodes.MoveNext() {
fmt.Println(nodes.Current().Value())
// > Hello, World!
}
}
from cssselect import GenericTranslator, SelectorError
translator = GenericTranslator()
try:
expression = translator.css_to_xpath('div.content')
print(expression)
'descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' content ')]'
except SelectorError as e:
print(f'Invalid selector {e}')