geziyor
Geziyor is a blazing fast web crawling and web scraping framework. It can be used to crawl websites and extract structured data from them. Geziyor is useful for a wide range of purposes such as data mining, monitoring and automated testing.
Features:
- JS Rendering
- 5.000+ Requests/Sec
- Caching (Memory/Disk/LevelDB)
- Automatic Data Exporting (JSON, CSV, or custom)
- Metrics (Prometheus, Expvar, or custom)
- Limit Concurrency (Global/Per Domain)
- Request Delays (Constant/Randomized)
- Cookies, Middlewares, robots.txt
- Automatic response decoding to UTF-8
- Proxy management (Single, Round-Robin, Custom)
Example Use
```go // This example extracts all quotes from quotes.toscrape.com and exports to JSON file. func main() { geziyor.NewGeziyor(&geziyor.Options{ StartURLs: []string{"http://quotes.toscrape.com/"}, ParseFunc: quotesParse, Exporters: []export.Exporter{&export.JSON{}}, }).Start() }
func quotesParse(g geziyor.Geziyor, r client.Response) { r.HTMLDoc.Find("div.quote").Each(func(i int, s *goquery.Selection) { g.Exports <- map[string]interface{}{ "text": s.Find("span.text").Text(), "author": s.Find("small.author").Text(), } }) if href, ok := r.HTMLDoc.Find("li.next > a").Attr("href"); ok { g.Get(r.JoinURL(href), quotesParse) } } ```