Skip to content

scraplingvsxmltodict

BSD-3-Clause 7 2 36,206
397.4 thousand (month) Aug 01 2024 0.4.5(2026-04-07 04:22:27 ago)
5,734 2 1 MIT
Jul 30 2007 114.5 million (month) 1.0.4(2026-02-22 02:21:21 ago)

Scrapling is an adaptive web scraping framework for Python that introduces "self-healing" selectors — selectors that can track and find elements even when the website's DOM structure changes. This solves one of the biggest maintenance headaches in web scraping: broken selectors after website updates.

Key features include:

  • Self-healing selectors Scrapling uses smart element matching that can identify target elements even after the page structure changes. It builds a fingerprint of the element based on multiple attributes (text, position, siblings, attributes) and uses fuzzy matching to relocate it.
  • Multiple parsing backends Supports different parsing engines including lxml (fast) and a custom engine, allowing you to choose the right balance of speed and features.
  • Scrapy-like Spider API Provides a familiar Spider class pattern for organizing crawling logic, similar to Scrapy but with the added benefit of adaptive selectors.
  • CSS and XPath selectors Full support for CSS selectors and XPath, plus the adaptive matching system on top.
  • Type hints and modern Python Built with full type annotations and 92% test coverage for reliability.
  • Async support Supports asynchronous crawling for efficient concurrent scraping.

Scrapling gained massive traction in 2025 as one of the most starred new Python scraping libraries. It is particularly useful for scraping targets that frequently update their HTML structure, where traditional selector-based scrapers would break.

xmltodict is a Python library that allows you to work with XML data as if it were JSON. It allows you to parse XML documents and convert them to dictionaries, which can then be easily manipulated using standard dictionary operations.

You can also use the library to convert a dictionary back into an XML document. xmltodict is built on top of the popular lxml library and provides a simple, intuitive API for working with XML data.

Note that despite using lxml conversion speeds can be quite slow for large XML documents and in web scraping this should be used to parse specific snippets instead of whole HTML documents.

xmltodict pairs well with JSON parsing tools like jmespath or jsonpath. Alternatively, it can be used in reverse mode to parse JSON documents using HTML parsing tools like CSS selectors and XPath.

It can be installed via pip by running pip install xmltodict command.

Highlights


css-selectorsxpathfastpopular

Example Use


```python from scrapling import Fetcher, StealthFetcher, PlayWrightFetcher # Simple fetching with adaptive parsing fetcher = Fetcher() page = fetcher.get("https://example.com/products") # CSS selectors work as expected products = page.css(".product-card") for product in products: name = product.css_first(".name").text() price = product.css_first(".price").text() print(f"{name}: {price}") # Adaptive selector - finds the element even if DOM changes # Uses element fingerprinting for resilient matching element = page.find("Product Title", auto_match=True) # Stealth fetching with anti-bot bypass stealth = StealthFetcher() page = stealth.get("https://protected-site.com") # Playwright-based fetching for JS-rendered pages pw = PlayWrightFetcher() page = pw.get("https://spa-example.com", headless=True) ```
```python import xmltodict xml_string = """ The Great Gatsby F. Scott Fitzgerald Charles Scribner's Sons 1925 """ book_dict = xmltodict.parse(xml_string) print(book_dict) {'book': {'title': 'The Great Gatsby', 'author': 'F. Scott Fitzgerald', 'publisher': "Charles Scribner's Sons", 'publication_date': '1925'}} # and to reverse: book_xml = xmltodict.unparse(book_dict) print(book_xml) # the xml can be loaded and parsed using parsel or beautifulsoup: from parsel import Selector sel = Selector(book_xml) print(sel.css('publication_date::text').get()) '1925' ```

Alternatives / Similar


Was this page helpful?