Skip to content

jsdomvsembed

MIT 412 30 21,552
263.7 million (month) Nov 21 2011 29.0.2(2026-04-07 03:38:38 ago)
2,103 6 71 MIT
Oct 26 2013 5.2 thousand (month) v4.4.15(2025-01-02 16:53:09 ago)

jsdom is a pure JavaScript implementation of web standards, notably the WHATWG DOM and HTML standards, for use with Node.js. It simulates a browser environment in Node.js, allowing you to parse HTML, manipulate the DOM, and interact with web pages using the same APIs available in web browsers.

Key features for web scraping:

  • Full DOM implementation Provides document.querySelector, document.querySelectorAll, and other standard DOM methods for traversing and manipulating parsed HTML.
  • Browser-like environment Simulates window, document, navigator, and other browser globals, enabling code that was written for browsers to run in Node.js.
  • JavaScript execution Can execute JavaScript embedded in HTML pages, including external scripts, making it possible to process pages that generate content dynamically (though much slower than a real browser).
  • Standards-compliant parsing Uses the same HTML parsing algorithm as web browsers (the WHATWG HTML specification), ensuring accurate handling of malformed HTML.
  • Cookie support Implements the tough-cookie library for cookie handling across requests.

For web scraping, jsdom is useful when you need more than simple CSS selector matching (what cheerio provides) but don't need a full browser. It's ideal for parsing complex HTML and running simple inline scripts without the overhead of Playwright or Puppeteer. However, for heavy JavaScript-rendered pages, a real browser automation tool is recommended.

PHP library to get information from any web page (using oembed, opengraph, twitter-cards, scrapping the html, etc). It's compatible with any web service (youtube, vimeo, flickr, instagram, etc) and has adapters to some sites like (archive.org, github, facebook, etc).

Highlights


popularcss-selectors

Example Use


```javascript const { JSDOM } = require('jsdom'); // Parse an HTML string const html = `

Product A

$10.99

Product B

$24.99

</body>

`;

const dom = new JSDOM(html); const document = dom.window.document;

// Use standard DOM APIs to extract data const products = document.querySelectorAll('.product'); products.forEach(product => { const name = product.querySelector('h2').textContent; const price = product.querySelector('.price').textContent; console.log(${name}: ${price}); });

// Fetch and parse a remote page JSDOM.fromURL('https://example.com').then(dom => { const title = dom.window.document.title; console.log('Page title:', title); }); ```

```javascript use Embed\Embed;

$embed = new Embed();

//Load any url: $info = $embed->get('https://www.youtube.com/watch?v=PP1xn5wHtxE');

//Get content info

$info->title; //The page title $info->description; //The page description $info->url; //The canonical url $info->keywords; //The page keywords

$info->image; //The thumbnail or main image

$info->code->html; //The code to embed the image, video, etc $info->code->width; //The exact width of the embed code (if exists) $info->code->height; //The exact height of the embed code (if exists) $info->code->ratio; //The aspect ratio (width/height)

$info->authorName; //The resource author $info->authorUrl; //The author url

$info->cms; //The cms used $info->language; //The language of the page $info->languages; //The alternative languages

$info->providerName; //The provider name of the page (Youtube, Twitter, Instagram, etc) $info->providerUrl; //The provider url $info->icon; //The big icon of the site $info->favicon; //The favicon of the site (an .ico file or a png with up to 32x32px)

$info->publishedTime; //The published time of the resource $info->license; //The license url of the resource $info->feeds; //The RSS/Atom feeds ```

Alternatives / Similar


Was this page helpful?