ayakashivskatana
Ayakashi is a web scraping library for Node.js that allows developers to easily extract structured data from websites. It is built on top of the popular "puppeteer" library and provides a simple and intuitive API for defining and querying the structure of a website.
Features:
- Powerful querying and data models
Ayakashi's way of finding things in the page and using them is done with props and domQL. Directly inspired by the relational database world (and SQL), domQL makes DOM access easy and readable no matter how obscure the page's structure is. Props are the way to package domQL expressions as re-usable structures which can then be passed around to actions or to be used as models for data extraction. - High level builtin actions
Ready made actions so you can focus on what matters. Easily handle infinite scrolling, single page navigation, events and more. Plus, you can always build your own actions, either from scratch or by composing other actions. - Preload code on pages
Need to include a bunch of code, a library you made or a 3rd party module and make it available on a page? Preloaders have you covered.
Katana is a next-generation web crawling and spidering framework written in Go by ProjectDiscovery. It is designed for fast, comprehensive endpoint and asset discovery and is widely used in the security research and bug bounty communities.
Katana offers multiple crawling modes:
- Standard mode Fast HTTP-based crawling without a browser. Parses HTML, JavaScript files, and other resources to discover endpoints and links.
- Headless mode Uses a headless Chrome browser for crawling JavaScript-rendered pages and single-page applications (SPAs).
- Passive mode Discovers URLs from external sources (Wayback Machine, CommonCrawl, etc.) without actively visiting the target.
Key features include:
- Scope control Configurable crawl scope with regex patterns for including/excluding URLs, domains, and file extensions.
- JavaScript parsing Extracts endpoints from JavaScript files, inline scripts, and AJAX requests even in standard (non-headless) mode.
- Customizable output Filter and format output with field selection, JSON output, and custom templates.
- Rate limiting Built-in rate limiting and concurrency control to avoid overwhelming targets.
- Proxy support HTTP and SOCKS5 proxy support with rotation.
- Form filling Can detect and auto-fill forms to discover endpoints behind form submissions.
While Katana was designed for security research and reconnaissance, its fast crawling capabilities and JavaScript parsing make it equally useful for web scraping discovery and sitemap generation.