Skip to content

parse5vsnokogiri

MIT 27 6 3,539
173.6 million (month) Jul 03 2013 7.1.2(8 months ago)
6,097 21 128 MIT
1.16.3(21 days ago) Jul 25 2009 4.4 million (month)

parse5 is a Node.js library for parsing and manipulating HTML and XML documents. It is designed to be fast and flexible, and it is commonly used in web scraping and web development projects.

parse5 is used by popular libraries such as Angular, Lit, Cheerio and many more. Unlike Cheerio parse5 is a low level html parsing library that might be useful directly in web scraping without higher level abstraction.

Nokogiri is a Ruby gem that provides a simple and powerful way to parse and search XML and HTML documents. It is built on top of the underlying C library libxml2, which is known for its speed and reliability.

Nokogiri provides a simple and intuitive API for parsing and searching XML and HTML documents, and it is widely used in the Ruby ecosystem for web scraping and data extraction.

One of the main features of Nokogiri is its ability to search and navigate through XML and HTML documents using a CSS or XPath selectors.

Nokogiri also provides a variety of other features that can simplify the process of working with XML and HTML documents. It can automatically handle character encodings and normalize documents, it can parse and search large documents with low memory usage, and it can validate documents against a DTD or schema.

Highlights


css-selectorsxpathpopular

Example Use


const parse5 = require("parse5");

// parse string
const document = parse5.parse('<html><body>Hello World!</body></html>');
console.log(document);

// html tree can be traversed as javascript object:
const body = document.childNodes[1];
console.log(body.childNodes[0].value); // "Hello World!"

// and modified
const newElement = parse5.parseFragment('<p>New Element</p>');
body.appendChild(newElement.childNodes[0]);
console.log(parse5.serialize(document)); 
require 'nokogiri'

html_string = '<html><head><title>Page Title</title></head><body><h1 class="header-class">Hello World!</h1><p>This is a sample webpage.</p></body></html>'

# Parse the HTML string
doc = Nokogiri::HTML(html_string)

# Extract the class attribute of h1 tag using CSS selector
h1_class = doc.css("h1")[0]['class']
# or XPath
h1_class = doc.xpath("//h1")[0]['class']
puts "H1 class: #{h1_class}"

Alternatives / Similar