cheeriovsnokogiri
cheerio is a popular JavaScript library that allows you to interact with and manipulate HTML and XML documents in a similar way to how you would with jQuery in a browser. It is a fast, flexible, and lean implementation of core jQuery designed specifically for the server.
One of the main benefits of using cheerio is that it allows you to use jQuery-like syntax to navigate and m anipulate the Document Object Model (DOM) of an HTML or XML document, making it easy to work with.
cheerio supports CSS selectors though not XPath.
Nokogiri is a Ruby gem that provides a simple and powerful way to parse and search XML and HTML documents. It is built on top of the underlying C library libxml2, which is known for its speed and reliability.
Nokogiri provides a simple and intuitive API for parsing and searching XML and HTML documents, and it is widely used in the Ruby ecosystem for web scraping and data extraction.
One of the main features of Nokogiri is its ability to search and navigate through XML and HTML documents using a CSS or XPath selectors.
Nokogiri also provides a variety of other features that can simplify the process of working with XML and HTML documents. It can automatically handle character encodings and normalize documents, it can parse and search large documents with low memory usage, and it can validate documents against a DTD or schema.
Highlights
Example Use
const cheerio = require('cheerio');
const $ = cheerio.load('<html><head><title>My title</title></head><body><h1 class='name'>Hello World!</h1></body></html>');
// use css selectors
console.log($('title').text()); // My title
console.log($('.name').text()); // Hello World!
// select multiple elements
const $ = cheerio.load('<html><body><ul><li>item 1</li><li>item 2</li></ul></body></html>');
$('li').each(function(i, elem) {
console.log($(this).text());
});
// modify elements
const $ = cheerio.load('<html><body><h1>Hello World!</h1></body></html>');
$('h1').text('Hello, Cheerio!');
console.log($.html());
require 'nokogiri'
html_string = '<html><head><title>Page Title</title></head><body><h1 class="header-class">Hello World!</h1><p>This is a sample webpage.</p></body></html>'
# Parse the HTML string
doc = Nokogiri::HTML(html_string)
# Extract the class attribute of h1 tag using CSS selector
h1_class = doc.css("h1")[0]['class']
# or XPath
h1_class = doc.xpath("//h1")[0]['class']
puts "H1 class: #{h1_class}"