Skip to content

php-spider

1,341 3 1 MIT
v0.7.6 (4 Dec 2025) Mar 16 2013 53 (month)

php-spider is a PHP library for web crawling and scraping. It allows developers to easily navigate and extract data from websites by simulating a web browser's behavior.

  • supports two traversal algorithms: breadth-first and depth-first
  • supports crawl depth limiting, queue size limiting and max downloads limiting
  • supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP
  • comes with a useful set of URI filters, such as Domain limiting
  • supports custom URI filters, both prefetch (URI) and postfetch (Resource content)
  • supports custom request handling logic
  • supports Basic, Digest and NTLM HTTP authentication. See example.
  • comes with a useful set of persistence handlers (memory, file)
  • supports custom persistence handlers
  • collects statistics about the crawl for reporting
  • dispatches useful events, allowing developers to add even more custom behavior
  • supports a politeness policy

This Spider does not support Javascript.

Example Use


```php use Example\StatsHandler; use VDB\Spider\Discoverer\XPathExpressionDiscoverer; use Symfony\Contracts\EventDispatcher\Event; use VDB\Spider\Event\SpiderEvents; use VDB\Spider\Spider;

require_once('example_complex_bootstrap.php');

// Create Spider $spider = new Spider('http://dmoztools.net');

// Add a URI discoverer. Without it, the spider does nothing. In this case, we want tags from a certain

$spider->getDiscovererSet()->set(new XPathExpressionDiscoverer("//div[@id='catalogs']//a"));

// Set some sane options for this example. In this case, we only get the first 10 items from the start page. $spider->getDiscovererSet()->maxDepth = 1; $spider->getQueueManager()->maxQueueSize = 10;

// Let's add something to enable us to stop the script $spider->getDispatcher()->addListener( SpiderEvents::SPIDER_CRAWL_USER_STOPPED, function (Event $event) { echo "\nCrawl aborted by user.\n"; exit(); } );

// Add a listener to collect stats to the Spider and the QueueMananger. // There are more components that dispatch events you can use. $statsHandler = new StatsHandler(); $spider->getQueueManager()->getDispatcher()->addSubscriber($statsHandler); $spider->getDispatcher()->addSubscriber($statsHandler);

// Execute crawl $spider->crawl();

// Build a report echo "\n ENQUEUED: " . count($statsHandler->getQueued()); echo "\n SKIPPED: " . count($statsHandler->getFiltered()); echo "\n FAILED: " . count($statsHandler->getFailed()); echo "\n PERSISTED: " . count($statsHandler->getPersisted());

// Finally we could do some processing on the downloaded resources // In this example, we will echo the title of all resources echo "\n\nDOWNLOADED RESOURCES: "; foreach ($spider->getDownloader()->getPersistenceHandler() as $resource) { echo "\n - " . $resource->getCrawler()->filterXpath('//title')->text(); } ```

Alternatives / Similar


3,062 v2.4.0 (2026-01-08 05:29:21 ago) Jul 17 2018 compare
goutte new
9,215 v4.0.3 (2023-04-01 09:05:33 ago) Dec 02 2012 compare
1,454 v3.2.1 (2025-03-21 06:53:36 ago) Dec 27 2021 compare
583 3.0.0 (2024-04-09 15:34:59 ago) May 04 2020 compare
369 v3.5.6 (2026-01-05 11:13:18 ago) Apr 18 2022 compare

Other Languages

25,231 v2.2.0 (2025-03-27 10:47:28 ago) May 14 2018 compare
katana new
16,499 v1.5.0 (2026-03-10 14:52:47 ago) Nov 07 2022 compare
7,594 v1.4.0 (2026-03-03 03:58:32 ago) Feb 15 2020 compare
2,772 2026-04-11 (2026-04-11 21:30:25 ago) Jun 06 2019 compare
711 2026-03-21 (2026-03-21 09:11:03 ago) Feb 09 2017 compare
61,276 2.15.0 (2026-04-09 12:02:09 ago) Jul 26 2019 compare
crawl4ai new
63,373 0.8.6 (2026-03-24 15:07:50 ago) May 01 2024 compare
1,517 1.0.5 (2024-02-12 21:10:00 ago) Nov 22 2014 compare
scrapling new
36,206 0.4.5 (2026-04-07 04:22:27 ago) Aug 01 2024 compare
crawlee new
22,720 3.16.0 (2026-04-09 07:36:53 ago) Apr 22 2022 compare
mechanize new
4,440 2.14.0 (2025-01-05 18:30:46 ago) Jul 25 2009 compare
23,278 1.76.0 (2026-04-09 09:41:03 ago) Jan 15 2024 compare
5,964 v2.0.0-alpha.7 (2026-04-07 15:33:51 ago) Oct 28 2020 compare
2,053 (2021-05-19 15:14:49 ago) Nov 20 2016 compare
3,087 1.6.0 (2025-07-22 06:00:53 ago) Sep 04 2013 compare
4,321 4.0.97 (2026-01-06 07:45:54 ago) Oct 01 2023 compare
6,790 2.0.2 (2025-05-28 09:36:01 ago) Sep 10 2012 compare
248 1.34.0 (2024-11-27 14:57:34 ago) Feb 05 2023 compare
835 0.7.2 (2025-02-03 07:58:27 ago) Jul 25 2009 compare
kimurai new
1,098 2.2.0 (2026-01-27 17:36:19 ago) Aug 23 2018 compare
3,400 1.6.0 (2025-02-16 13:18:50 ago) Sep 30 2018 compare
12,807 1.1.9 (2018-10-21 03:39:17 ago) Aug 24 2018 compare
1,360 3.3.0 (2026-04-07 16:31:34 ago) Dec 27 2011 compare
7,136 1.1.14 (2022-07-17 17:20:09 ago) Jul 26 2019 compare
3,495 0.9.13 (2023-07-19 18:53:46 ago) Jul 04 2017 compare
1,743 0.8.5 (2022-09-06 08:54:56 ago) Oct 17 2018 compare
165 2.3.0 (2021-03-18 00:10:00 ago) Dec 22 2019 compare
217 1.0.0-beta8.4 (2023-06-29 12:37:12 ago) Apr 18 2019 compare
425 0.1.3 (2023-08-01 20:28:33 ago) Feb 20 2022 compare
firecrawl new
- 0.0.0 (2025-03-15 00:00:00 ago) Apr 01 2024 compare
Was this page helpful?