Skip to content

jsdomvspyquery

MIT 412 30 21,552
263.7 million (month) Nov 21 2011 29.0.2(2026-04-07 03:38:38 ago)
2,381 5 55 NOASSERTION
Dec 05 2008 2.0 million (month) 2.0.1(2024-08-30 08:12:22 ago)

jsdom is a pure JavaScript implementation of web standards, notably the WHATWG DOM and HTML standards, for use with Node.js. It simulates a browser environment in Node.js, allowing you to parse HTML, manipulate the DOM, and interact with web pages using the same APIs available in web browsers.

Key features for web scraping:

  • Full DOM implementation Provides document.querySelector, document.querySelectorAll, and other standard DOM methods for traversing and manipulating parsed HTML.
  • Browser-like environment Simulates window, document, navigator, and other browser globals, enabling code that was written for browsers to run in Node.js.
  • JavaScript execution Can execute JavaScript embedded in HTML pages, including external scripts, making it possible to process pages that generate content dynamically (though much slower than a real browser).
  • Standards-compliant parsing Uses the same HTML parsing algorithm as web browsers (the WHATWG HTML specification), ensuring accurate handling of malformed HTML.
  • Cookie support Implements the tough-cookie library for cookie handling across requests.

For web scraping, jsdom is useful when you need more than simple CSS selector matching (what cheerio provides) but don't need a full browser. It's ideal for parsing complex HTML and running simple inline scripts without the overhead of Playwright or Puppeteer. However, for heavy JavaScript-rendered pages, a real browser automation tool is recommended.

PyQuery is a Python library for working with XML and HTML documents. It is similar to BeautifulSoup and is often used as a drop-in replacement for it.

PyQuery is inspired by javascript's jQuery and uses similar API allowing selecting of HTML nodes through CSS selectors. This makes it easy for developers who are already familiar with jQuery to use PyQuery in Python.

Unlike jQuery, PyQuery doesn't support XPath selectors and relies entirely on CSS selectors though offers similar HTML parsing features like selection of HTML elements, their attributes and text as well as html tree modification.

PyQuery also comes with a http client (through requests) so it can load and parse web URLs by itself.

Highlights


popularcss-selectors
css-selectors

Example Use


```javascript const { JSDOM } = require('jsdom'); // Parse an HTML string const html = `

Product A

$10.99

Product B

$24.99

</body>

`;

const dom = new JSDOM(html); const document = dom.window.document;

// Use standard DOM APIs to extract data const products = document.querySelectorAll('.product'); products.forEach(product => { const name = product.querySelector('h2').textContent; const price = product.querySelector('.price').textContent; console.log(${name}: ${price}); });

// Fetch and parse a remote page JSDOM.fromURL('https://example.com').then(dom => { const title = dom.window.document.title; console.log('Page title:', title); }); ```

```python from pyquery import PyQuery as pq

this is our HTML page:

html = """ Hello World!

Product Title

paragraph 1

paragraph2

$10

"""

doc = pq(html)

we can use CSS selectors:

print(doc('#product .price').text()) "$10"

it's also possible to modify HTML tree in various ways:

insert text into selected element:

print(doc('h1').append('discounted')) "

Product Titlediscounted

"

or remove elements

doc('p').remove() print(doc('#product').html()) """

Product Titlediscounted

$10 """

pyquery can also retrieve web documents using requests:

doc = pq(url='http://httpbin.org/html', headers={"User-Agent": "webscraping.fyi"}) print(doc('h1').html()) ```

Alternatives / Similar


Was this page helpful?