Skip to content

pyqueryvsscrapling

NOASSERTION 55 5 2,381
2.0 million (month) Dec 05 2008 2.0.1(2024-08-30 08:12:22 ago)
36,206 2 7 BSD-3-Clause
Aug 01 2024 397.4 thousand (month) 0.4.5(2026-04-07 04:22:27 ago)

PyQuery is a Python library for working with XML and HTML documents. It is similar to BeautifulSoup and is often used as a drop-in replacement for it.

PyQuery is inspired by javascript's jQuery and uses similar API allowing selecting of HTML nodes through CSS selectors. This makes it easy for developers who are already familiar with jQuery to use PyQuery in Python.

Unlike jQuery, PyQuery doesn't support XPath selectors and relies entirely on CSS selectors though offers similar HTML parsing features like selection of HTML elements, their attributes and text as well as html tree modification.

PyQuery also comes with a http client (through requests) so it can load and parse web URLs by itself.

Scrapling is an adaptive web scraping framework for Python that introduces "self-healing" selectors — selectors that can track and find elements even when the website's DOM structure changes. This solves one of the biggest maintenance headaches in web scraping: broken selectors after website updates.

Key features include:

  • Self-healing selectors Scrapling uses smart element matching that can identify target elements even after the page structure changes. It builds a fingerprint of the element based on multiple attributes (text, position, siblings, attributes) and uses fuzzy matching to relocate it.
  • Multiple parsing backends Supports different parsing engines including lxml (fast) and a custom engine, allowing you to choose the right balance of speed and features.
  • Scrapy-like Spider API Provides a familiar Spider class pattern for organizing crawling logic, similar to Scrapy but with the added benefit of adaptive selectors.
  • CSS and XPath selectors Full support for CSS selectors and XPath, plus the adaptive matching system on top.
  • Type hints and modern Python Built with full type annotations and 92% test coverage for reliability.
  • Async support Supports asynchronous crawling for efficient concurrent scraping.

Scrapling gained massive traction in 2025 as one of the most starred new Python scraping libraries. It is particularly useful for scraping targets that frequently update their HTML structure, where traditional selector-based scrapers would break.

Highlights


css-selectors
css-selectorsxpathfastpopular

Example Use


```python from pyquery import PyQuery as pq # this is our HTML page: html = """ Hello World!

Product Title

paragraph 1

paragraph2

$10
""" doc = pq(html) # we can use CSS selectors: print(doc('#product .price').text()) "$10" # it's also possible to modify HTML tree in various ways: # insert text into selected element: print(doc('h1').append('discounted')) "

Product Titlediscounted

" # or remove elements doc('p').remove() print(doc('#product').html()) """

Product Titlediscounted

$10 """ # pyquery can also retrieve web documents using requests: doc = pq(url='http://httpbin.org/html', headers={"User-Agent": "webscraping.fyi"}) print(doc('h1').html()) ```
```python from scrapling import Fetcher, StealthFetcher, PlayWrightFetcher # Simple fetching with adaptive parsing fetcher = Fetcher() page = fetcher.get("https://example.com/products") # CSS selectors work as expected products = page.css(".product-card") for product in products: name = product.css_first(".name").text() price = product.css_first(".price").text() print(f"{name}: {price}") # Adaptive selector - finds the element even if DOM changes # Uses element fingerprinting for resilient matching element = page.find("Product Title", auto_match=True) # Stealth fetching with anti-bot bypass stealth = StealthFetcher() page = stealth.get("https://protected-site.com") # Playwright-based fetching for JS-rendered pages pw = PlayWrightFetcher() page = pw.get("https://spa-example.com", headless=True) ```

Alternatives / Similar


Was this page helpful?