Skip to content

choppervsscrapling

MIT 1 3 23
1.7 thousand (month) Jul 24 2014 0.6.0(2023-04-26 10:16:25 ago)
36,206 2 7 BSD-3-Clause
Aug 01 2024 397.4 thousand (month) 0.4.5(2026-04-07 04:22:27 ago)

Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules.

Compared to other HTML parsers Chopper is designed to retain original HTML tree but eliminate elements that do not match parsing rules. Meaning, we can parse HTML elements and keep thei structure for machine learning or other tasks where data structure is needed as well as the data value.

Scrapling is an adaptive web scraping framework for Python that introduces "self-healing" selectors — selectors that can track and find elements even when the website's DOM structure changes. This solves one of the biggest maintenance headaches in web scraping: broken selectors after website updates.

Key features include:

  • Self-healing selectors Scrapling uses smart element matching that can identify target elements even after the page structure changes. It builds a fingerprint of the element based on multiple attributes (text, position, siblings, attributes) and uses fuzzy matching to relocate it.
  • Multiple parsing backends Supports different parsing engines including lxml (fast) and a custom engine, allowing you to choose the right balance of speed and features.
  • Scrapy-like Spider API Provides a familiar Spider class pattern for organizing crawling logic, similar to Scrapy but with the added benefit of adaptive selectors.
  • CSS and XPath selectors Full support for CSS selectors and XPath, plus the adaptive matching system on top.
  • Type hints and modern Python Built with full type annotations and 92% test coverage for reliability.
  • Async support Supports asynchronous crawling for efficient concurrent scraping.

Scrapling gained massive traction in 2025 as one of the most starred new Python scraping libraries. It is particularly useful for scraping targets that frequently update their HTML structure, where traditional selector-based scrapers would break.

Highlights


css-selectorsxpathfastpopular

Example Use


```python HTML = """ Test
HELLO WORLD Do not want

<div id="footer"></div>

"""

CSS = """ div { border: 1px solid black; } div#main { color: blue; } div.iwantthis { background-color: red; } a { color: green; } div#footer { border-top: 2px solid red; } """

extractor = Extractor.keep('//div[@class="iwantthis"]').discard('//a') html, css = extractor.extract(HTML, CSS)

will result in:

html """

HELLO WORLD

"""

css """ div{border:1px solid black;} div#main{color:blue;} div.iwantthis{background-color:red;} """ ```

```python from scrapling import Fetcher, StealthFetcher, PlayWrightFetcher

Simple fetching with adaptive parsing

fetcher = Fetcher() page = fetcher.get("https://example.com/products")

CSS selectors work as expected

products = page.css(".product-card") for product in products: name = product.css_first(".name").text() price = product.css_first(".price").text() print(f"{name}: {price}")

Adaptive selector - finds the element even if DOM changes

Uses element fingerprinting for resilient matching

element = page.find("Product Title", auto_match=True)

Stealth fetching with anti-bot bypass

stealth = StealthFetcher() page = stealth.get("https://protected-site.com")

Playwright-based fetching for JS-rendered pages

pw = PlayWrightFetcher() page = pw.get("https://spa-example.com", headless=True) ```

Alternatives / Similar


Was this page helpful?