Skip to content

photonvsgofeed

GPL-3.0 61 3 12,807
1.4 thousand (month) Aug 24 2018 1.1.9(2018-10-21 03:39:17 ago)
2,824 2 55 MIT
Apr 20 2016 58.1 thousand (month) v1.3.0(2024-03-01 03:34:34 ago)

Photon is a Python library for web scraping. It is designed to be lightweight and fast, and can be used to extract data from websites and web pages. Photon can extract the following data while crawling:

  • URLs (in-scope & out-of-scope)
  • URLs with parameters (example.com/gallery.php?id=2)
  • Intel (emails, social media accounts, amazon buckets etc.)
  • Files (pdf, png, xml etc.)
  • Secret keys (auth/API keys & hashes)
  • JavaScript files & Endpoints present in them
  • Strings matching custom regex pattern
  • Subdomains & DNS related data

The extracted information is saved in an organized manner or can be exported as json.

The gofeed library is a robust feed parser that supports parsing both RSS, Atom and JSON feeds. The library provides a universal gofeed.Parser that will parse and convert all feed types into a hybrid gofeed.Feed model.

You also have the option of utilizing the feed specific atom.Parser or rss.Parser or json.Parser parsers which generate atom. Feed , rss.Feed and json.Feed respectively.

Supported feed types:

  • RSS 0.90
  • Netscape RSS 0.91
  • Userland RSS 0.91
  • RSS 0.92
  • RSS 0.93
  • RSS 0.94
  • RSS 1.0
  • RSS 2.0
  • Atom 0.3
  • Atom 1.0
  • JSON 1.0
  • JSON 1.1

Example Use


```python from photon import Photon #Create a new Photon instance ph = Photon() #Extract data from a specific element of the website url = "https://www.example.com" selector = "div.main" data = ph.get_data(url, selector) #Print the extracted data print(data) #Extract data from multiple websites asynchronously urls = ["https://www.example1.com", "https://www.example2.com"] data = ph.get_data_async(urls) ```
```go // parse feed from URL fp := gofeed.NewParser() fp.UserAgent = "MyCustomAgent 1.0" // we can modify http client with custom headers etc. feed, _ := fp.ParseURL("http://feeds.twit.tv/twit.xml") fmt.Println(feed.Title) // parse feed from string feedData := ` Sample Feed ` fp := gofeed.NewParser() feed, _ := fp.ParseString(feedData) fmt.Println(feed.Title) // or file file, _ := os.Open("/path/to/a/file.xml") defer file.Close() fp := gofeed.NewParser() feed, _ := fp.Parse(file) fmt.Println(feed.Title) ```

Alternatives / Similar


Was this page helpful?