Skip to content

stagehandvsrod

MIT 183 14 22,012
2.8 million (month) Oct 29 2024 3.2.1(2026-04-10 21:10:37 ago)
6,853 3 202 MIT
Sep 23 2022 v0.116.2(2024-07-12 11:52:28 ago)

Stagehand is an AI-powered browser automation framework for JavaScript and TypeScript, built by Browserbase. It provides a simple API for controlling browsers using natural language instructions, powered by large language models.

Stagehand offers three core primitives:

  • act() Performs actions on the page described in natural language. For example, page.act("click the login button") will find and click the appropriate element.
  • extract() Extracts structured data from the page based on a natural language description and an optional schema definition.
  • observe() Analyzes the current page state and returns actionable elements and their descriptions, useful for understanding what actions are available on a page.

Key features include:

  • TypeScript-first Built with full TypeScript support and type-safe extraction using Zod schemas.
  • Multiple LLM providers Works with OpenAI, Anthropic, and other LLM providers for powering the AI.
  • Vision and DOM analysis Combines visual screenshot analysis with DOM inspection for robust element identification.
  • Playwright integration Uses Playwright as the browser automation backend, giving access to the full Playwright API alongside AI-powered actions.
  • Browserbase cloud Optionally integrates with Browserbase cloud for managed browser infrastructure.

Stagehand is particularly suited for automating complex web workflows where traditional selectors would be fragile, such as interacting with frequently changing UIs or scraping sites with dynamic layouts.

Rod is a high-level Go library for browser automation built on the Chrome DevTools Protocol (CDP). It provides a simpler and more intuitive API compared to chromedp, making it easier to write browser automation and web scraping scripts in Go.

Key features include:

  • Simple API Rod's API is designed to be intuitive and requires less boilerplate than chromedp. Common operations like clicking, typing, and waiting are straightforward single-line calls.
  • Auto-wait Automatically waits for elements to be ready before interacting with them, reducing the need for explicit wait statements and making scripts more reliable.
  • Page pool Built-in page pool for managing multiple browser pages efficiently, useful for concurrent scraping tasks.
  • Stealth mode Includes a stealth plugin (rod/lib/launcher/flags) that can disable common automation detection vectors.
  • Element screenshots Can take screenshots of specific elements, not just full pages.
  • Network interception Supports hijacking network requests and responses for modification or monitoring.
  • Input emulation Realistic mouse and keyboard input emulation for interacting with complex web applications.

Rod is the recommended choice for new Go browser automation projects due to its simpler API and active maintenance. It is comparable to Playwright in terms of developer experience but native to the Go ecosystem.

Highlights


ai-powerednatural-languagetypescript
cdpfast

Example Use


```javascript import { Stagehand } from '@browserbasehq/stagehand'; import { z } from 'zod'; const stagehand = new Stagehand({ env: 'LOCAL', // or 'BROWSERBASE' for cloud browsers modelName: 'gpt-4o', }); await stagehand.init(); const page = stagehand.page; // Navigate to a page await page.goto('https://news.ycombinator.com'); // Use natural language to interact await page.act('click on the "new" link in the navigation'); // Extract structured data with a schema const stories = await page.extract({ instruction: 'Extract the top 5 story titles and their point counts', schema: z.object({ stories: z.array(z.object({ title: z.string(), points: z.number(), })), }), }); console.log(stories); // Observe available actions on the page const actions = await page.observe('What actions can I take on this page?'); console.log(actions); await stagehand.close(); ```
```go package main import ( "fmt" "github.com/go-rod/rod" "github.com/go-rod/rod/lib/launcher" ) func main() { // Launch browser url := launcher.New().Headless(true).MustLaunch() browser := rod.New().ControlURL(url).MustConnect() defer browser.MustClose() // Navigate and auto-wait for the page to load page := browser.MustPage("https://example.com") page.MustWaitStable() // Find elements and extract text - auto-waits for element title := page.MustElement("h1").MustText() fmt.Println("Title:", title) // Fill in a form page.MustElement("input[name='search']").MustInput("web scraping") page.MustElement("button[type='submit']").MustClick() // Wait for results and extract page.MustWaitStable() results := page.MustElements(".result-item") for _, el := range results { text := el.MustText() href := el.MustElement("a").MustProperty("href").String() fmt.Printf("Result: %s (%s)\n", text, href) } // Take screenshot of specific element page.MustElement(".results").MustScreenshot("results.png") } ```

Alternatives / Similar


Was this page helpful?