Skip to content

jsdomvshtmlquery

MIT 412 30 21,552
263.7 million (month) Nov 21 2011 29.0.2(2026-04-07 03:38:38 ago)
781 1 8 MIT
Feb 07 2019 58.1 thousand (month) v1.3.6(2026-03-06 04:46:15 ago)

jsdom is a pure JavaScript implementation of web standards, notably the WHATWG DOM and HTML standards, for use with Node.js. It simulates a browser environment in Node.js, allowing you to parse HTML, manipulate the DOM, and interact with web pages using the same APIs available in web browsers.

Key features for web scraping:

  • Full DOM implementation Provides document.querySelector, document.querySelectorAll, and other standard DOM methods for traversing and manipulating parsed HTML.
  • Browser-like environment Simulates window, document, navigator, and other browser globals, enabling code that was written for browsers to run in Node.js.
  • JavaScript execution Can execute JavaScript embedded in HTML pages, including external scripts, making it possible to process pages that generate content dynamically (though much slower than a real browser).
  • Standards-compliant parsing Uses the same HTML parsing algorithm as web browsers (the WHATWG HTML specification), ensuring accurate handling of malformed HTML.
  • Cookie support Implements the tough-cookie library for cookie handling across requests.

For web scraping, jsdom is useful when you need more than simple CSS selector matching (what cheerio provides) but don't need a full browser. It's ideal for parsing complex HTML and running simple inline scripts without the overhead of Playwright or Puppeteer. However, for heavy JavaScript-rendered pages, a real browser automation tool is recommended.

htmlquery is a Go library that allows you to parse and extract data from HTML documents using XPath expressions. It provides a simple and intuitive API for traversing and querying the HTML tree structure, and it is built on top of the popular Goquery library.

Highlights


popularcss-selectors

Example Use


```javascript const { JSDOM } = require('jsdom'); // Parse an HTML string const html = `

Product A

$10.99

Product B

$24.99

</body>

`;

const dom = new JSDOM(html); const document = dom.window.document;

// Use standard DOM APIs to extract data const products = document.querySelectorAll('.product'); products.forEach(product => { const name = product.querySelector('h2').textContent; const price = product.querySelector('.price').textContent; console.log(${name}: ${price}); });

// Fetch and parse a remote page JSDOM.fromURL('https://example.com').then(dom => { const title = dom.window.document.title; console.log('Page title:', title); }); ```

```go package main

import ( "fmt" "log"

"github.com/antchfx/htmlquery" )

func main() { // Parse the HTML string doc, err := htmlquery.Parse([]byte(<html> <body> <h1>Hello, World!</h1> <ul> <li>Item 1</li> <li>Item 2</li> <li>Item 3</li> </ul> </body> </html>)) if err != nil { log.Fatal(err) }

// Extract the text of the first

element h1 := htmlquery.FindOne(doc, "//h1") fmt.Println(htmlquery.InnerText(h1)) // "Hello, World!"

// Extract the text of all

  • elements lis := htmlquery.Find(doc, "//li") for _, li := range lis { fmt.Println(htmlquery.InnerText(li)) } // "Item 1" // "Item 2" // "Item 3" } ```

  • Alternatives / Similar


    Was this page helpful?