Skip to content

htmlqueryvsparse5

MIT 8 1 693
58.1 thousand (month) Feb 07 2019 v1.3.0(1 year, 2 months ago)
3,539 6 27 MIT
7.1.2(8 months ago) Jul 03 2013 173.6 million (month)

htmlquery is a Go library that allows you to parse and extract data from HTML documents using XPath expressions. It provides a simple and intuitive API for traversing and querying the HTML tree structure, and it is built on top of the popular Goquery library.

parse5 is a Node.js library for parsing and manipulating HTML and XML documents. It is designed to be fast and flexible, and it is commonly used in web scraping and web development projects.

parse5 is used by popular libraries such as Angular, Lit, Cheerio and many more. Unlike Cheerio parse5 is a low level html parsing library that might be useful directly in web scraping without higher level abstraction.

Example Use


package main

import (
  "fmt"
  "log"

  "github.com/antchfx/htmlquery"
)

func main() {
  // Parse the HTML string
  doc, err := htmlquery.Parse([]byte(`
    <html>
      <body>
        <h1>Hello, World!</h1>
        <ul>
          <li>Item 1</li>
          <li>Item 2</li>
          <li>Item 3</li>
        </ul>
      </body>
    </html>
  `))
  if err != nil {
    log.Fatal(err)
  }

  // Extract the text of the first <h1> element
  h1 := htmlquery.FindOne(doc, "//h1")
  fmt.Println(htmlquery.InnerText(h1)) // "Hello, World!"

  // Extract the text of all <li> elements
  lis := htmlquery.Find(doc, "//li")
  for _, li := range lis {
    fmt.Println(htmlquery.InnerText(li))
  }
  // "Item 1"
  // "Item 2"
  // "Item 3"
}
const parse5 = require("parse5");

// parse string
const document = parse5.parse('<html><body>Hello World!</body></html>');
console.log(document);

// html tree can be traversed as javascript object:
const body = document.childNodes[1];
console.log(body.childNodes[0].value); // "Hello World!"

// and modified
const newElement = parse5.parseFragment('<p>New Element</p>');
body.appendChild(newElement.childNodes[0]);
console.log(parse5.serialize(document)); 

Alternatives / Similar