Gracy is an API client library based on httpx that provides an extra stability layer with:
- Retry logic
- Logging
- Connection throttling
- Tracking/Middleware
In web scraping, Gracy can be a convenient tool for creating scraper based API clients.
Firecrawl is an AI-powered web scraping API that converts web pages into clean Markdown or
structured data, optimized for use with large language models (LLMs) and retrieval-augmented
generation (RAG) pipelines. It handles JavaScript rendering, anti-bot bypass, and content
extraction automatically.
Firecrawl offers multiple modes:
- Scrape
Convert a single URL into clean Markdown, HTML, or structured data. Handles JavaScript
rendering and anti-bot protections automatically.
- Crawl
Crawl an entire website starting from a URL, with configurable depth, URL patterns,
and page limits. Returns all pages as clean Markdown.
- Map
Quickly discover all URLs on a website without fully scraping each page. Useful for
sitemap generation and crawl planning.
- Extract
Use LLMs to extract specific structured data from pages based on a schema definition.
Key features:
- Clean Markdown output ideal for LLM context windows
- Automatic JavaScript rendering with headless browsers
- Built-in anti-bot bypass for protected websites
- Structured extraction with JSON schemas
- Batch crawling with webhook notifications
- Python and JavaScript SDKs
Firecrawl is a commercial API service (requires API key, has a free tier) backed by
Y Combinator. It has become one of the most popular tools for feeding web content
into AI applications and is widely used in the LLM/RAG ecosystem.
Note: while the primary service is an API, the core is open source and can be self-hosted.
```python
# 0. Import
import asyncio
from typing import Awaitable
from gracy import BaseEndpoint, Gracy, GracyConfig, LogEvent, LogLevel
# 1. Define your endpoints
class PokeApiEndpoint(BaseEndpoint):
GET_POKEMON = "/pokemon/{NAME}" # 👈 Put placeholders as needed
# 2. Define your Graceful API
class GracefulPokeAPI(Gracy[str]):
class Config: # type: ignore
BASE_URL = "https://pokeapi.co/api/v2/" # 👈 Optional BASE_URL
# 👇 Define settings to apply for every request
SETTINGS = GracyConfig(
log_request=LogEvent(LogLevel.DEBUG),
log_response=LogEvent(LogLevel.INFO, "{URL} took {ELAPSED}"),
parser={
"default": lambda r: r.json()
}
)
async def get_pokemon(self, name: str) -> Awaitable[dict]:
return await self.get(PokeApiEndpoint.GET_POKEMON, {"NAME": name})
# Note: since Gracy is based on httpx we can customized the used client with custom headers etc"
def _create_client(self) -> httpx.AsyncClient:
client = super()._create_client()
client.headers = {"User-Agent": f"My Scraper"}
return client
pokeapi = GracefulPokeAPI()
async def main():
try:
pokemon = await pokeapi.get_pokemon("pikachu")
print(pokemon)
finally:
pokeapi.report_status("rich")
asyncio.run(main())
```
```python
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key="YOUR_API_KEY")
# Scrape a single page - get clean markdown
result = app.scrape_url("https://example.com/blog/article")
print(result["markdown"]) # clean markdown content
# Extract structured data with a schema
result = app.scrape_url(
"https://example.com/product/123",
params={
"formats": ["extract"],
"extract": {
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"description": {"type": "string"},
},
}
},
},
)
print(result["extract"]) # {"name": "...", "price": 29.99, ...}
# Crawl an entire website
crawl_result = app.crawl_url(
"https://example.com",
params={"limit": 100, "scrapeOptions": {"formats": ["markdown"]}},
)
for page in crawl_result["data"]:
print(page["metadata"]["title"], page["markdown"][:100])
# Map all URLs on a site
map_result = app.map_url("https://example.com")
print(f"Found {len(map_result['links'])} URLs")
```