Skip to content


MPL-2.0 23 1 2,470
Jun 06 2019 2024-04-04(a day ago)
491 2 22 GPL-3.0-or-later
2.0.0(9 months ago) May 04 2020 97 (month)

Geziyor is a blazing fast web crawling and web scraping framework. It can be used to crawl websites and extract structured data from them. Geziyor is useful for a wide range of purposes such as data mining, monitoring and automated testing.


  • JS Rendering
  • 5.000+ Requests/Sec
  • Caching (Memory/Disk/LevelDB)
  • Automatic Data Exporting (JSON, CSV, or custom)
  • Metrics (Prometheus, Expvar, or custom)
  • Limit Concurrency (Global/Per Domain)
  • Request Delays (Constant/Randomized)
  • Cookies, Middlewares, robots.txt
  • Automatic response decoding to UTF-8
  • Proxy management (Single, Round-Robin, Custom)

PHPScraper is a universal web-util for PHP. The main goal is to get stuff done instead of getting distracted with selectors, preparing & converting data structures, etc. Instead, you can just go to a website and get the relevant information for your project.

PHPScraper is a minimalistic scraper framework that is built on top of other popular scraping tools.


  • Direct access to page basic features like: Meta data, Links, Images, Headings, Content, Keywords etc.
  • File downloading.
  • RSS, Sitemap and other feed processing.
  • CSV, XML and JSON file processing.

Example Use

// This example extracts all quotes from and exports to JSON file.
func main() {
        StartURLs: []string{""},
        ParseFunc: quotesParse,
        Exporters: []export.Exporter{&export.JSON{}},

func quotesParse(g *geziyor.Geziyor, r *client.Response) {
    r.HTMLDoc.Find("div.quote").Each(func(i int, s *goquery.Selection) {
        g.Exports <- map[string]interface{}{
            "text":   s.Find("span.text").Text(),
            "author": s.Find("").Text(),
    if href, ok := r.HTMLDoc.Find(" > a").Attr("href"); ok {
        g.Get(r.JoinURL(href), quotesParse)
// create scraper object
$web = new \Spekulatius\PHPScraper\PHPScraper;
// go to URL

// elements can be found using XPath:
echo $web->filter("//*[@id='by-id']")->text();   // "Content by ID"

// or pre-defined variables covering basic page data:
$web->links;  // for all links
$web->outline;  // basic page outline
$web->cleanOutlineWithParagraphs;  // basic page outline

Alternatives / Similar