Skip to content

Web Scraping FYI

Python photon Library in Web Scraping

photon

python framework html-extractor

11,149 3 51 GPL-3.0

1.1.9 (21 Oct 2018) Aug 24 2018 415 (month)

Photon is a Python library for web scraping. It is designed to be lightweight and fast, and can be used to extract data from websites and web pages. Photon can extract the following data while crawling:

URLs (in-scope & out-of-scope)
URLs with parameters (example.com/gallery.php?id=2)
Intel (emails, social media accounts, amazon buckets etc.)
Files (pdf, png, xml etc.)
Secret keys (auth/API keys & hashes)
JavaScript files & Endpoints present in them
Strings matching custom regex pattern
Subdomains & DNS related data

The extracted information is saved in an organized manner or can be exported as json.

Example Use

from photon import Photon

#Create a new Photon instance
ph = Photon()

#Extract data from a specific element of the website
url = "https://www.example.com"
selector = "div.main"
data = ph.get_data(url, selector)

#Print the extracted data
print(data)


#Extract data from multiple websites asynchronously
urls = ["https://www.example1.com", "https://www.example2.com"]
data = ph.get_data_async(urls)

Alternatives / Similar

html2text

1,897 2024.2.26 (1 year, 6 months ago) Dec 14 2008 compare

scrapy

54,211 2.12.0 (9 months ago) Jul 26 2019 compare

extruct

884 0.18.0 (9 months ago) Oct 27 2015 compare

newspaper

14,364 0.2.8 (6 years ago) Dec 28 2012 compare

3,791 2.0.0 (8 months ago) Jul 17 2019 compare

2,724 0.8.1 (5 years ago) Jun 30 2011 compare

sumy

3,548 0.11.0 (2 years ago) Oct 20 2013 compare

scrapyd

2,980 1.5.0 (10 months ago) Sep 04 2013 compare

6,638 1.1.14 (3 years ago) Jul 26 2019 compare

gracy

247 1.34.0 (8 months ago) Feb 05 2023 compare

3,218 1.6.0 (6 months ago) Sep 30 2018 compare

gerapy

3,365 0.9.13 (2 years ago) Jul 04 2017 compare

263 2.0.7 (2 years ago) Dec 11 2020 compare

ruia

1,754 0.8.5 (2 years ago) Oct 17 2018 compare

dude

428 0.1.3 (2 years ago) Feb 20 2022 compare

Other Languages

colly

23,747 v2.1.0 (5 years ago) May 14 2018 compare

pholcus

7,580 v1.3.4 (5 years ago) Feb 15 2020 compare

geziyor

2,667 2025-02-18 (6 months ago) Jun 06 2019 compare

676 2025-02-16 (6 months ago) Feb 09 2017 compare

rvest

1,498 1.0.4 (3 years ago) Nov 22 2014 compare

gofeed

2,641 v1.3.0 (1 year, 5 months ago) Apr 20 2016 compare

gocrawl

2,039 (4 years ago) Nov 20 2016 compare

ferret

5,716 v0.18.0 (2 years ago) Aug 06 2019 compare

6,733 2.0.2 (1 year, 1 month ago) Sep 10 2012 compare

panther

2,977 v2.2.0 (6 months ago) Jul 17 2018 compare

spidr

813 0.7.2 (6 months ago) Jul 25 2009 compare

wombat

1,316 3.0.0 (3 years ago) Dec 27 2011 compare

ralger

156 2.2.4 (4 years ago) Dec 22 2019 compare

roach

1,384 v3.2.0 (1 year, 4 months ago) Dec 27 2021 compare

ayakashi

213 1.0.0-beta8.4 (2 years ago) Apr 18 2019 compare

554 3.0.0 (1 year, 4 months ago) May 04 2020 compare

1,335 v0.7.2 (1 year, 8 months ago) Mar 16 2013 compare

356 v3.2.3 (6 months ago) Apr 18 2022 compare