Frameworks
There are several popular web scraping frameworks of varying complexity and whether to use a framework or not depends on a few key factors:
Pros
- Frameworks come with many batteries-included like automatically configuring request headers, rate limiting, proxy switching etc.
- Community plugins and documentation helps to solve popular problems.
- Easy to scale up.
Cons
- Learning curve.
- Frameworks are often very opaque making it harder to debug and understand the scraping process.
- Hard to patch weak points for avoiding blocking.
In summary, frameworks are best for medium-sized average web scrapers. Here's a list of popular web scraping frameworks:
language | framework | highlights |
---|---|---|
Python | scrapy | most popular web scraping framework, big community, feature rich |
autoscraper | automatic parsing via fuzzy matching | |
Go | colly | simple, aimed at crawling |
gospider | similar to colly | |
dataflowkit | integrated browser automation | |
ferret | custom DSL, integrated browser automation (Chrome) | |
geziyor | scrapy-like | |
PHP | panther | integrated browser automation |
php-spider | extendible | |
Ruby | spidr | simple, aimed at crawling |
wombat | custom DSL | |
NodeJS | ayakashi | custom DSL, extendible |