Skip to content

gerapyvsphpscraper

MIT 74 4 3,495
514 (month) Jul 04 2017 0.9.13(2023-07-19 18:53:46 ago)
583 2 28 GPL-3.0-or-later
May 04 2020 104 (month) 3.0.0(2024-04-09 15:34:59 ago)

Gerapy is a Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js.

It is built on top of the Scrapy framework and provides a simple and easy-to-use interface for performing web scraping tasks. Gerapy also includes features such as support for scheduling and distributed crawling, as well as a built-in web-based dashboard for monitoring and managing scraping tasks. Additionally, Gerapy is designed to be highly extensible, allowing users to easily create custom plugins and integrations.

Overall, Gerapy is a useful tool for those looking to automate web scraping tasks and extract data from websites.

PHPScraper is a universal web-util for PHP. The main goal is to get stuff done instead of getting distracted with selectors, preparing & converting data structures, etc. Instead, you can just go to a website and get the relevant information for your project.

PHPScraper is a minimalistic scraper framework that is built on top of other popular scraping tools.

Features:

  • Direct access to page basic features like: Meta data, Links, Images, Headings, Content, Keywords etc.
  • File downloading.
  • RSS, Sitemap and other feed processing.
  • CSV, XML and JSON file processing.

Example Use


```javascript // create scraper object $web = new \Spekulatius\PHPScraper\PHPScraper; // go to URL $web->go('https://test-pages.phpscraper.de/content/selectors.html'); // elements can be found using XPath: echo $web->filter("//*[@id='by-id']")->text(); // "Content by ID" // or pre-defined variables covering basic page data: $web->links; // for all links $web->headings; $web->images; $web->contentKeywords; $web->orderedLists; $web->unorderedLists; $web->paragraphs; $web->outline; // basic page outline $web->cleanOutlineWithParagraphs; // basic page outline ```

Alternatives / Similar


Was this page helpful?