Skip to content


MIT 27 2 412
53 (month) Feb 20 2022 0.1.3(9 months ago)
10,575 3 52 MIT
Aug 24 2018 335 (month) 1.1.9(5 years ago)

Dude (dude uncomplicated data extraction) is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax.

The simplest web scraper will look like this:

from dude import select

def get_link(element):
    return {"url": element.get_attribute("href")}

dude supports multiple parser backends: - playwright
- lxml
- parsel - beautifulsoup - pyppeteer - selenium

Photon is a Python library for web scraping. It is designed to be lightweight and fast, and can be used to extract data from websites and web pages. Photon can extract the following data while crawling:

  • URLs (in-scope & out-of-scope)
  • URLs with parameters (
  • Intel (emails, social media accounts, amazon buckets etc.)
  • Files (pdf, png, xml etc.)
  • Secret keys (auth/API keys & hashes)
  • JavaScript files & Endpoints present in them
  • Strings matching custom regex pattern
  • Subdomains & DNS related data

The extracted information is saved in an organized manner or can be exported as json.

Example Use

from dude import select

This example demonstrates how to use Parsel + async HTTPX
To access an attribute, use:
You can also access an attribute using the ::attr(name) pseudo-element, for example "a::attr(href)", then:
To get the text, use ::text pseudo-element, then:

@select(css="a.url", priority=2)
async def result_url(selector):
    return {"url": selector.attrib["href"]}

# Option to get url using ::attr(name) pseudo-element
@select(css="a.url::attr(href)", priority=2)
async def result_url2(selector):
    return {"url2": selector.get()}

@select(css=".title::text", priority=1)
async def result_title(selector):
    return {"title": selector.get()}

@select(css=".description::text", priority=0)
async def result_description(selector):
    return {"description": selector.get()}

if __name__ == "__main__":
    import dude[""], parser="parsel")
from photon import Photon

#Create a new Photon instance
ph = Photon()

#Extract data from a specific element of the website
url = ""
selector = "div.main"
data = ph.get_data(url, selector)

#Print the extracted data

#Extract data from multiple websites asynchronously
urls = ["", ""]
data = ph.get_data_async(urls)

Alternatives / Similar

Was this page helpful?