ayakashivspuppeteer

AGPL-3.0-only 8 1 213

119 (month) Apr 18 2019 1.0.0-beta8.4(1 year, 7 months ago)

89,751 30 271 Apache-2.0

Mar 23 2013 17.5 million (month) 24.2.1(4 days ago)

Ayakashi is a web scraping library for Node.js that allows developers to easily extract structured data from websites. It is built on top of the popular "puppeteer" library and provides a simple and intuitive API for defining and querying the structure of a website.

Features:

Powerful querying and data models
Ayakashi's way of finding things in the page and using them is done with props and domQL. Directly inspired by the relational database world (and SQL), domQL makes DOM access easy and readable no matter how obscure the page's structure is. Props are the way to package domQL expressions as re-usable structures which can then be passed around to actions or to be used as models for data extraction.
High level builtin actions
Ready made actions so you can focus on what matters. Easily handle infinite scrolling, single page navigation, events and more. Plus, you can always build your own actions, either from scratch or by composing other actions.
Preload code on pages
Need to include a bunch of code, a library you made or a 3rd party module and make it available on a page? Preloaders have you covered.

Puppeteer is a Node.js library that provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol. It allows you to automate browser tasks such as generating screenshots, creating PDFs, and testing web pages by simulating user interactions.

Puppeteer is commonly used for web scraping, end-to-end testing, and browser automation.

Puppeteer is one of the most popular browser automation toolkits though it's only available in NodeJS. It offers asynchronous API which enables easy asynchronous scaling.

Example Use

const ayakashi = require("ayakashi");
const myAyakashi = ayakashi.init();

// navigate the browser
await myAyakashi.goTo("https://example.com/product");

// parsing HTML
// first by defnining a selector
myAyakashi
    .select("productList")
    .where({class: {eq: "product-item"}});

// then executing selector on current HTML:
const productList = await myAyakashi.extract("productList");
console.log(productList);

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    // go to pages
    await page.goto('https://www.example.com');
    // take a screenshot
    await page.screenshot({path: 'example.png'});
    // fill in the form
    await page.type('input[name="name"]', 'John Doe');
    await page.type('input[name="email"]', 'johndoe@example.com');
    await page.select('select[name="country"]', 'US');

    // submit the form
    await page.click('button[type="submit"]');

    // wait for the page to load after the form is submitted
    await page.waitForNavigation();

    // take a screenshot
    await page.screenshot({path: 'form-submission.png'});

    await browser.close();
})();

Alternatives / Similar

colly

23,747 compare

pholcus

7,580 compare

geziyor

2,667 compare

puppeteer

89,751 compare

dataflowkit

676 compare

scrapy

54,211 compare

puppeteer-stealth

89,751 compare

rvest

1,498 compare

ferret

5,716 compare

gocrawl

2,039 compare

scrapyd

2,980 compare

node-crawler

6,733 compare

panther

2,977 compare

autoscraper

6,638 compare

gracy

247 compare

spidr

813 compare

scrapydweb

3,218 compare

gerapy

3,365 compare

wombat

1,316 compare

ruia

1,754 compare

photon

11,149 compare

ralger

156 compare

roach

1,384 compare

dude

428 compare

phpscraper

554 compare

php-spider

1,335 compare

crwlr-crawler

356 compare

puppeteer-stealth

89,751 compare

ayakashi

213 compare