Skip to content

autoscrapervsscrapydweb

MIT 11 2 5,894
3.9 thousand (month) Jul 26 2019 1.1.14(1 year, 8 months ago)
2,983 1 59 GNU General Public License v3.0
1.5.0(a month ago) Sep 30 2018 1.6 thousand (month)

Autoscraper project is made for automatic web scraping to make scraping easy. It gets a url or the html content of a web page and a list of sample data which we want to scrape from that page. This data can be text, url or any html tag value of that page. It learns the scraping rules and returns the similar elements. Then you can use this learned object with new urls to get similar content or the exact same element of those new pages.

Autoscraper is minimalistic and auto-generative approach to web scraping. For example, here's a scraper that finds all titles on a stackoverflow.com page:

from autoscraper import AutoScraper

url = 'https://stackoverflow.com/questions/2081586/web-scraping-with-python'

# We can add one or multiple candidates here.
# You can also put urls here to retrieve urls.
wanted_list = ["What are metaclasses in Python?"]

scraper = AutoScraper()
result = scraper.build(url, wanted_list)
print(result)

ScrapydWeb is a web-based management tool for the Scrapyd service. It is built using the Python Flask framework and allows you to easily manage and monitor your Scrapy spider projects through a web interface.

ScrapydWeb allows you to view the status of your running spiders, view the logs of completed spiders, schedule new spider runs, and manage spider settings and configurations.

ScrapydWeb provides a simple way to manage your scraping tasks and allows you to schedule and run multiple spiders simultaneously. It also provides a user-friendly web interface that makes it easy to view the status of your spiders and monitor their progress.

You can install the package via pip by running pip install scrapydweb and then you can run the package by running scrapydweb command in your command prompt.

It will start a web server that you can access through your web browser at http://localhost:6800/ You will need to have Scrapyd running in order to use ScrapydWeb, Scrapyd is a service for running Scrapy spiders, it allows you to schedule spiders to run at regular intervals and also allows you to run spiders on remote machines.

Highlights


popularminimalisticauto-generating

Example Use


Alternatives / Similar