Skip to content

xpathvscssselect

MIT 14 2 659
58.1 thousand (month) Jun 08 2019 (3 months ago)
287 8 21 NOASSERTION
Apr 14 2012 6.3 million (month) 1.2.0(1 year, 8 months ago)

xpath is a library for Go that allows you to use XPath expressions to select elements from an HTML document. It is built on top of the html package in the Go standard library, and provides a way to select elements from an HTML document using XPath expressions, which are more powerful and expressive than CSS selectors.

cssselect is a BSD-licensed Python library to parse CSS3 selectors and translate them to XPath 1.0 expressions.

XPath 1.0 expressions can be used in lxml or another XPath engine to find the matching elements in an XML or HTML document.

cssselect is used by other popular Python packages like parsel and scrapy but can also be used on it's own to generate valid XPath 1.0 expressions for parsing HTML and XML documents in other tools.

Note that because XPath selectors are more powerful than CSS selectors this translation is only possible one way. Converting XPath to CSS selectors is impractical and not supported by cssselect.

Example Use


package main

import (
  "fmt"
  "github.com/antchfx/xpath"
  "golang.org/x/net/html"
  "strings"
)

func main() {
  // Create an HTML string
  html := `<html>
        <body>
          <div id="content">
            <p>Hello, World!</p>
            <a href="http://example.com">Example</a>
          </div>
        </body>
      </html>`

  // Parse the HTML string into a node tree
  doc, err := html.Parse(strings.NewReader(html))
  if err != nil {
    fmt.Println("Error:", err)
    return
  }

  // Compile the XPath expression
  expr, err := xpath.Compile("//p")
  if err != nil {
    fmt.Println("Error:", err)
    return
  }

  // Use the Evaluate method to select elements from the document
  nodes, err := expr.Evaluate(xpath.NodeNavigator(doc))
  if err != nil {
    fmt.Println("Error:", err)
    return
  }
  if nodes.MoveNext() {
    fmt.Println(nodes.Current().Value())
    // > Hello, World!
  }
}
from cssselect import GenericTranslator, SelectorError

translator = GenericTranslator()
try:
    expression = translator.css_to_xpath('div.content')
    print(expression)
    'descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' content ')]'
except SelectorError as e:
    print(f'Invalid selector {e}')

Alternatives / Similar


Was this page helpful?