cascadiavsparse5
cascadia is a library for Go that provides a CSS selector engine, allowing you to use CSS selectors to select elements from an HTML document.
It is built on top of the html package in the Go standard library, and provides a more efficient and powerful way to select elements from an HTML document.
parse5 is a Node.js library for parsing and manipulating HTML and XML documents. It is designed to be fast and flexible, and it is commonly used in web scraping and web development projects.
parse5 is used by popular libraries such as Angular, Lit, Cheerio and many more. Unlike Cheerio parse5 is a low level html parsing library that might be useful directly in web scraping without higher level abstraction.
Example Use
package main
import (
"fmt"
"github.com/andybalholm/cascadia"
"golang.org/x/net/html"
"strings"
)
func main() {
// Create an HTML string
html := `<html>
<body>
<div id="content">
<p>Hello, World!</p>
<a href="http://example.com">Example</a>
</div>
</body>
</html>`
// Parse the HTML string into a node tree
doc, err := html.Parse(strings.NewReader(html))
if err != nil {
fmt.Println("Error:", err)
return
}
// Compile the CSS selector
sel, err := cascadia.Compile("p")
if err != nil {
fmt.Println("Error:", err)
return
}
// Use the Selector.Match method to select elements from the document
matches := sel.Match(doc)
if len(matches) > 0 {
fmt.Println(matches[0].FirstChild.Data)
// > Hello, World!
}
}
const parse5 = require("parse5");
// parse string
const document = parse5.parse('<html><body>Hello World!</body></html>');
console.log(document);
// html tree can be traversed as javascript object:
const body = document.childNodes[1];
console.log(body.childNodes[0].value); // "Hello World!"
// and modified
const newElement = parse5.parseFragment('<p>New Element</p>');
body.appendChild(newElement.childNodes[0]);
console.log(parse5.serialize(document));