Skip to content

scrapegraphai

23,278 17 4 MIT
1.76.0 (9 Apr 2026) Jan 15 2024 59.6 thousand (month)

ScrapeGraphAI is a Python library that uses large language models (LLMs) to create web scraping pipelines automatically. Instead of writing CSS selectors or XPath expressions, you describe what data you want in natural language and provide a Pydantic schema — the library handles the rest.

Key features include:

  • Natural language extraction Describe what you want to extract in plain English (e.g., "Extract all product names and prices") and the LLM figures out how to find and extract the data.
  • Pydantic schema output Define the expected output structure using Pydantic models for type-safe, validated extraction results.
  • Graph-based pipeline Built on a directed graph architecture where each node performs a specific task (fetching, parsing, extracting, merging). This makes pipelines modular and debuggable.
  • Multiple graph types SmartScraperGraph (single page), SearchGraph (search + scrape), SpeechGraph (audio output), and more specialized pipelines.
  • Multiple LLM providers Works with OpenAI, Anthropic, Google, Groq, local models via Ollama, and more.
  • HTML and JSON support Can extract data from both HTML pages and JSON API responses.

ScrapeGraphAI is particularly useful for rapid prototyping of scrapers and for extracting data from pages with complex or frequently changing layouts where traditional selectors would be brittle.

Highlights


ai-poweredpopular

Example Use


```python from scrapegraphai.graphs import SmartScraperGraph from pydantic import BaseModel, Field from typing import List

Define the output schema

class Product(BaseModel): name: str = Field(description="Product name") price: float = Field(description="Price in USD") rating: float = Field(description="Customer rating out of 5")

class ProductList(BaseModel): products: List[Product]

Create a scraping graph with natural language instruction

graph = SmartScraperGraph( prompt="Extract all products with their names, prices, and ratings", source="https://example.com/products", schema=ProductList, config={ "llm": { "model": "openai/gpt-4o", "api_key": "YOUR_API_KEY", }, }, )

Run the graph

result = graph.run() for product in result["products"]: print(f"{product['name']}: ${product['price']} ({product['rating']}/5)") ```

Alternatives / Similar


crawl4ai new
63,373 0.8.6 (2026-03-24 15:07:50 ago) May 01 2024 compare
firecrawl new
- 0.0.0 (2025-03-15 00:00:00 ago) Apr 01 2024 compare
87,251 0.12.6 (2026-04-02 07:55:13 ago) Nov 01 2024 compare
61,276 2.15.0 (2026-04-09 12:02:09 ago) Jul 26 2019 compare
scrapling new
36,206 0.4.5 (2026-04-07 04:22:27 ago) Aug 01 2024 compare
skyvern new
21,046 1.0.29 (2026-04-02 14:42:44 ago) Feb 01 2024 compare
3,087 1.6.0 (2025-07-22 06:00:53 ago) Sep 04 2013 compare
4,321 4.0.97 (2026-01-06 07:45:54 ago) Oct 01 2023 compare
248 1.34.0 (2024-11-27 14:57:34 ago) Feb 05 2023 compare
3,400 1.6.0 (2025-02-16 13:18:50 ago) Sep 30 2018 compare
12,807 1.1.9 (2018-10-21 03:39:17 ago) Aug 24 2018 compare
7,136 1.1.14 (2022-07-17 17:20:09 ago) Jul 26 2019 compare
3,495 0.9.13 (2023-07-19 18:53:46 ago) Jul 04 2017 compare
1,743 0.8.5 (2022-09-06 08:54:56 ago) Oct 17 2018 compare
425 0.1.3 (2023-08-01 20:28:33 ago) Feb 20 2022 compare

Other Languages

25,231 v2.2.0 (2025-03-27 10:47:28 ago) May 14 2018 compare
katana new
16,499 v1.5.0 (2026-03-10 14:52:47 ago) Nov 07 2022 compare
7,594 v1.4.0 (2026-03-03 03:58:32 ago) Feb 15 2020 compare
2,772 2026-04-11 (2026-04-11 21:30:25 ago) Jun 06 2019 compare
711 2026-03-21 (2026-03-21 09:11:03 ago) Feb 09 2017 compare
stagehand new
22,012 3.2.1 (2026-04-10 21:10:37 ago) Oct 29 2024 compare
1,517 1.0.5 (2024-02-12 21:10:00 ago) Nov 22 2014 compare
crawlee new
22,720 3.16.0 (2026-04-09 07:36:53 ago) Apr 22 2022 compare
mechanize new
4,440 2.14.0 (2025-01-05 18:30:46 ago) Jul 25 2009 compare
5,964 v2.0.0-alpha.7 (2026-04-07 15:33:51 ago) Oct 28 2020 compare
2,053 (2021-05-19 15:14:49 ago) Nov 20 2016 compare
6,790 2.0.2 (2025-05-28 09:36:01 ago) Sep 10 2012 compare
3,062 v2.4.0 (2026-01-08 05:29:21 ago) Jul 17 2018 compare
goutte new
9,215 v4.0.3 (2023-04-01 09:05:33 ago) Dec 02 2012 compare
835 0.7.2 (2025-02-03 07:58:27 ago) Jul 25 2009 compare
kimurai new
1,098 2.2.0 (2026-01-27 17:36:19 ago) Aug 23 2018 compare
1,360 3.3.0 (2026-04-07 16:31:34 ago) Dec 27 2011 compare
1,454 v3.2.1 (2025-03-21 06:53:36 ago) Dec 27 2021 compare
165 2.3.0 (2021-03-18 00:10:00 ago) Dec 22 2019 compare
217 1.0.0-beta8.4 (2023-06-29 12:37:12 ago) Apr 18 2019 compare
583 3.0.0 (2024-04-09 15:34:59 ago) May 04 2020 compare
1,341 v0.7.6 (2025-12-04 15:08:06 ago) Mar 16 2013 compare
369 v3.5.6 (2026-01-05 11:13:18 ago) Apr 18 2022 compare
Was this page helpful?