Scrapling is an adaptive web scraping framework for Python that introduces "self-healing"
selectors — selectors that can track and find elements even when the website's DOM structure
changes. This solves one of the biggest maintenance headaches in web scraping: broken selectors
after website updates.
Key features include:
- Self-healing selectors
Scrapling uses smart element matching that can identify target elements even after the
page structure changes. It builds a fingerprint of the element based on multiple attributes
(text, position, siblings, attributes) and uses fuzzy matching to relocate it.
- Multiple parsing backends
Supports different parsing engines including lxml (fast) and a custom engine, allowing
you to choose the right balance of speed and features.
- Scrapy-like Spider API
Provides a familiar Spider class pattern for organizing crawling logic, similar to Scrapy
but with the added benefit of adaptive selectors.
- CSS and XPath selectors
Full support for CSS selectors and XPath, plus the adaptive matching system on top.
- Type hints and modern Python
Built with full type annotations and 92% test coverage for reliability.
- Async support
Supports asynchronous crawling for efficient concurrent scraping.
Scrapling gained massive traction in 2025 as one of the most starred new Python scraping
libraries. It is particularly useful for scraping targets that frequently update their
HTML structure, where traditional selector-based scrapers would break.
ralger is a small web scraping framework for R based on rvest and xml2.
It's goal to simplify basic web scraping and it provides a convenient and easy to use API.
It offers functions for retrieving pages, parsing HTML using CSS selectors, automatic table parsing and
auto link, title, image and paragraph extraction.
```python
from scrapling import Fetcher, StealthFetcher, PlayWrightFetcher
# Simple fetching with adaptive parsing
fetcher = Fetcher()
page = fetcher.get("https://example.com/products")
# CSS selectors work as expected
products = page.css(".product-card")
for product in products:
name = product.css_first(".name").text()
price = product.css_first(".price").text()
print(f"{name}: {price}")
# Adaptive selector - finds the element even if DOM changes
# Uses element fingerprinting for resilient matching
element = page.find("Product Title", auto_match=True)
# Stealth fetching with anti-bot bypass
stealth = StealthFetcher()
page = stealth.get("https://protected-site.com")
# Playwright-based fetching for JS-rendered pages
pw = PlayWrightFetcher()
page = pw.get("https://spa-example.com", headless=True)
```
```r
library("ralger")
url <- "http://www.shanghairanking.com/rankings/arwu/2021"
# retrieve HTML and select elements using CSS selectors:
best_uni <- scrap(link = url, node = "a span", clean = TRUE)
head(best_uni, 5)
#> [1] "Harvard University"
#> [2] "Stanford University"
#> [3] "University of Cambridge"
#> [4] "Massachusetts Institute of Technology (MIT)"
#> [5] "University of California, Berkeley"
# ralger can also parse HTML attributes
attributes <- attribute_scrap(
link = "https://ropensci.org/",
node = "a", # the a tag
attr = "class" # getting the class attribute
)
head(attributes, 10) # NA values are a tags without a class attribute
#> [1] "navbar-brand logo" "nav-link" NA
#> [4] NA NA "nav-link"
#> [7] NA "nav-link" NA
#> [10] NA
#
# ralger can automatically scrape tables:
data <- table_scrap(link ="https://www.boxofficemojo.com/chart/top_lifetime_gross/?area=XWW")
head(data)
#> # A tibble: 6 × 4
#> Rank Title `Lifetime Gross` Year
#>
#> 1 1 Avatar $2,847,397,339 2009
#> 2 2 Avengers: Endgame $2,797,501,328 2019
#> 3 3 Titanic $2,201,647,264 1997
#> 4 4 Star Wars: Episode VII - The Force Awakens $2,069,521,700 2015
#> 5 5 Avengers: Infinity War $2,048,359,754 2018
#> 6 6 Spider-Man: No Way Home $1,901,216,740 2021
```