Skip to content


MPL-2.0 23 1 2,470
Jun 06 2019 2024-04-04(a day ago)
411 2 24 MIT
0.1.3(8 months ago) Feb 20 2022 27 (month)

Geziyor is a blazing fast web crawling and web scraping framework. It can be used to crawl websites and extract structured data from them. Geziyor is useful for a wide range of purposes such as data mining, monitoring and automated testing.


  • JS Rendering
  • 5.000+ Requests/Sec
  • Caching (Memory/Disk/LevelDB)
  • Automatic Data Exporting (JSON, CSV, or custom)
  • Metrics (Prometheus, Expvar, or custom)
  • Limit Concurrency (Global/Per Domain)
  • Request Delays (Constant/Randomized)
  • Cookies, Middlewares, robots.txt
  • Automatic response decoding to UTF-8
  • Proxy management (Single, Round-Robin, Custom)

Dude (dude uncomplicated data extraction) is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax.

The simplest web scraper will look like this:

from dude import select

def get_link(element):
    return {"url": element.get_attribute("href")}

dude supports multiple parser backends: - playwright
- lxml
- parsel - beautifulsoup - pyppeteer - selenium

Example Use

// This example extracts all quotes from and exports to JSON file.
func main() {
        StartURLs: []string{""},
        ParseFunc: quotesParse,
        Exporters: []export.Exporter{&export.JSON{}},

func quotesParse(g *geziyor.Geziyor, r *client.Response) {
    r.HTMLDoc.Find("div.quote").Each(func(i int, s *goquery.Selection) {
        g.Exports <- map[string]interface{}{
            "text":   s.Find("span.text").Text(),
            "author": s.Find("").Text(),
    if href, ok := r.HTMLDoc.Find(" > a").Attr("href"); ok {
        g.Get(r.JoinURL(href), quotesParse)
from dude import select

This example demonstrates how to use Parsel + async HTTPX
To access an attribute, use:
You can also access an attribute using the ::attr(name) pseudo-element, for example "a::attr(href)", then:
To get the text, use ::text pseudo-element, then:

@select(css="a.url", priority=2)
async def result_url(selector):
    return {"url": selector.attrib["href"]}

# Option to get url using ::attr(name) pseudo-element
@select(css="a.url::attr(href)", priority=2)
async def result_url2(selector):
    return {"url2": selector.get()}

@select(css=".title::text", priority=1)
async def result_title(selector):
    return {"title": selector.get()}

@select(css=".description::text", priority=0)
async def result_description(selector):
    return {"description": selector.get()}

if __name__ == "__main__":
    import dude[""], parser="parsel")

Alternatives / Similar