Skip to content

gerapyvsroach

MIT 64 4 3,199
545 (month) Jul 04 2017 0.9.13(8 months ago)
1,315 2 14 MIT
v3.2.0(2 days ago) Dec 27 2021 199 (month)

Gerapy is a Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js.

It is built on top of the Scrapy framework and provides a simple and easy-to-use interface for performing web scraping tasks. Gerapy also includes features such as support for scheduling and distributed crawling, as well as a built-in web-based dashboard for monitoring and managing scraping tasks. Additionally, Gerapy is designed to be highly extensible, allowing users to easily create custom plugins and integrations.

Overall, Gerapy is a useful tool for those looking to automate web scraping tasks and extract data from websites.

Roach is a complete web scraping toolkit for PHP. It is heavily inspired by the popular Scrapy package for Python.

Roach allows us to define spiders that crawl and scrape web documents. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well.

Just like scrapy, Roach supports: - Middlewares - Item Pipelines - Extendibility through Plugins

It’s your all-in-one resource for web scraping in PHP.

Example Use


<?php

use RoachPHP\Http\Response;
use RoachPHP\Spider\BasicSpider;

class RoachDocsSpider extends BasicSpider
{
    /**
     * @var string[]
     */
    public array $startUrls = [
        'https://roach-php.dev/docs/spiders'
    ];

    public function parse(Response $response): \Generator
    {
        $title = $response->filter('h1')->text();

        $subtitle = $response
            ->filter('main > div:nth-child(2) p:first-of-type')
            ->text();

        yield $this->item([
            'title' => $title,
            'subtitle' => $subtitle,
        ]);
    }
}

Alternatives / Similar