Browser automation CLI for AI agents using Playwright
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
skills listSkill Instructions
name: agent-browser description: Browser automation CLI for AI agents using Playwright version: 1
agent-browser
Browser automation CLI for AI agents. Use when tasks require web browsing, scraping, form filling, or web interaction. Built on Playwright with Rust CLI + Node.js daemon architecture.
Installation
npm install -g agent-browser
agent-browser install # Download Chromium
Linux: agent-browser install --with-deps
Architecture
- Rust CLI (fast native binary) parses commands, communicates with daemon
- Node.js daemon manages Playwright browser instance, persists between commands
- Daemon auto-starts on first command, stays alive for fast subsequent ops
- Each session has own daemon process + Unix socket (or TCP on Windows)
Core Workflow
- Open page:
agent-browser open <url> - Get snapshot:
agent-browser snapshot -i(interactive elements only) - Use refs:
agent-browser click @e2/agent-browser fill @e3 "text" - Repeat snapshot after page changes
Refs (Primary Selection Method)
Snapshot generates ARIA accessibility tree with [ref=eN] tags. Refs map to getByRole() locators internally.
agent-browser snapshot
# - heading "Example Domain" [ref=e1] [level=1]
# - button "Submit" [ref=e2]
# - textbox "Email" [ref=e3]
# - link "Learn more" [ref=e4]
agent-browser click @e2 # Click button
agent-browser fill @e3 "test@example.com"
agent-browser get text @e1 # Get heading text
Ref syntax: @e1, e1, or ref=e1 all work.
Why refs?
- Deterministic: points to exact element from snapshot
- Fast: no DOM re-query, uses cached role/name locator
- AI-friendly: snapshot + ref workflow optimal for LLMs
Interactive Roles (get refs automatically)
button, link, textbox, checkbox, radio, combobox, listbox, menuitem, option, searchbox, slider, spinbutton, switch, tab, treeitem
Content Roles (get refs when named)
heading, cell, gridcell, columnheader, rowheader, listitem, article, region, main, navigation
Commands
Navigation
agent-browser open <url> # Auto-prepends https:// if needed
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close # Closes browser + daemon
Interaction
agent-browser click <sel> # Left click
agent-browser dblclick <sel> # Double click
agent-browser fill <sel> <text> # Clear field + set value (atomic)
agent-browser type <sel> <text> # Type character by character (preserves existing)
agent-browser press <key> # Key press: Enter, Tab, Control+a, Shift+Tab
agent-browser keydown <key> # Hold key down
agent-browser keyup <key> # Release key
agent-browser hover <sel>
agent-browser focus <sel>
agent-browser select <sel> <val> # Select dropdown option by value
agent-browser check <sel> # Check checkbox
agent-browser uncheck <sel> # Uncheck checkbox
agent-browser scroll up|down|left|right [px] # Default 300px
agent-browser scrollintoview <sel>
agent-browser drag <src> <tgt> # Drag and drop
agent-browser upload <sel> <file1> [file2...]
Get Info
agent-browser get text <sel> # textContent
agent-browser get html <sel> # innerHTML
agent-browser get value <sel> # input value
agent-browser get attr <sel> <attr>
agent-browser get title
agent-browser get url
agent-browser get count <sel> # Number of matching elements
agent-browser get box <sel> # Bounding box {x,y,width,height}
Check State
agent-browser is visible <sel>
agent-browser is enabled <sel>
agent-browser is checked <sel>
Snapshot Options
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
agent-browser snapshot -c # Compact (remove empty structural elements)
agent-browser snapshot -d 3 # Limit depth to 3 levels
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser snapshot -i -c -d 5 # Combine options
Screenshots & PDF
agent-browser screenshot [path] # PNG to stdout (base64) or file
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot -f page.png
agent-browser pdf <path> # Save as PDF (format: Letter, A4, etc)
Wait
agent-browser wait <selector> # Wait for element visible
agent-browser wait <ms> # Wait for time (numeric = milliseconds)
agent-browser wait --text "Welcome"
agent-browser wait --url "**/dashboard"
agent-browser wait --load networkidle # load | domcontentloaded | networkidle
agent-browser wait --fn "window.ready === true" # Wait for JS condition
Find (Semantic Locators)
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find placeholder "Search" fill "query"
agent-browser find alt "Logo" click # By alt text
agent-browser find title "Close" click # By title attribute
agent-browser find testid "submit-btn" click
agent-browser find first ".item" click # First match
agent-browser find last ".item" click # Last match
agent-browser find nth 2 "a" text # Nth match (0-indexed)
Actions: click, fill, check, hover, text
Sessions (Isolated Browser Instances)
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
AGENT_BROWSER_SESSION=agent1 agent-browser click @e2
agent-browser session list # List active sessions
agent-browser session # Show current session
Each session has: own browser, cookies/storage, navigation history, auth state.
Storage & Cookies
agent-browser cookies # Get all cookies
agent-browser cookies set <name> <val>
agent-browser cookies clear
agent-browser storage local # Get all localStorage
agent-browser storage local <key> # Get specific key
agent-browser storage local set <k> <v>
agent-browser storage local clear
agent-browser storage session # Same for sessionStorage
Tabs & Windows
agent-browser tab # List tabs
agent-browser tab new [url] # New tab
agent-browser tab <n> # Switch to tab n
agent-browser tab close [n] # Close tab
agent-browser window new # New window (new context)
Frames
agent-browser frame <sel> # Switch to iframe by selector
agent-browser frame main # Back to main frame
Dialogs (alert/confirm/prompt)
agent-browser dialog accept [text] # Accept (with optional prompt text)
agent-browser dialog dismiss # Dismiss
Network Interception
agent-browser network route <url> # Intercept requests
agent-browser network route <url> --abort # Block requests
agent-browser network route <url> --body '{"mock":true}' # Mock response
agent-browser network unroute [url] # Remove routes
agent-browser network requests # View tracked requests
agent-browser network requests --filter api # Filter by URL pattern
Browser Settings
agent-browser set viewport <w> <h>
agent-browser set device "iPhone 14" # Emulate device
agent-browser set geo <lat> <lng> # Geolocation
agent-browser set offline [on|off] # Toggle offline mode
agent-browser set media dark|light # Color scheme
agent-browser set credentials <u> <p> # HTTP basic auth
agent-browser set headers '{"X-Custom":"value"}'
Debug & Tracing
agent-browser --headed open <url> # Show browser window (not headless)
agent-browser console # View console messages
agent-browser console --clear # Clear console
agent-browser errors # View page errors
agent-browser errors --clear
agent-browser highlight <sel> # Highlight element visually
agent-browser eval <js> # Run JavaScript, returns result
agent-browser trace start [path] # Start recording trace
agent-browser trace stop [path] # Stop and save trace
Auth State Persistence
agent-browser state save auth.json # Save cookies, localStorage, sessionStorage
agent-browser state load auth.json # Load at next launch (must be at browser start)
Mouse Control
agent-browser mouse move <x> <y> # Move to coordinates
agent-browser mouse down [button] # left|right|middle
agent-browser mouse up [button]
agent-browser mouse wheel <dy> [dx] # Scroll wheel
Options
| Option | Description |
|---|---|
--session <name> | Isolated session (or AGENT_BROWSER_SESSION env) |
--json | JSON output (for parsing) |
--full, -f | Full page screenshot |
--headed | Show browser window |
--name, -n | Locator name filter |
--exact | Exact text match |
--debug | Debug output |
Selector Types
- Refs (preferred):
@e1,@e2from snapshot - CSS:
#id,.class,div > button - Text:
text=Submit - XPath:
xpath=//button
JSON Output Format
agent-browser snapshot --json
# {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
agent-browser get text @e1 --json
# {"success":true,"data":{"text":"Hello World"}}
agent-browser is visible @e2 --json
# {"success":true,"data":{"visible":true}}
Patterns
Login Flow
agent-browser open https://example.com/login
agent-browser snapshot -i
# Identify refs for username, password, submit
agent-browser fill @e2 "username"
agent-browser fill @e3 "password"
agent-browser click @e4
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json # Persist auth for later
Form Submission
agent-browser open https://example.com/form
agent-browser snapshot -i --json
# Parse refs from JSON response
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser select @e3 "option-value"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --text "Success"
Scraping Data
agent-browser open https://example.com/data
agent-browser snapshot
agent-browser get text @e1
agent-browser get attr @e2 "href"
agent-browser eval "document.querySelectorAll('.item').length"
agent-browser eval "JSON.stringify([...document.querySelectorAll('.price')].map(e => e.textContent))"
Multi-page Navigation
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e3 # Navigate to new page
agent-browser wait --load networkidle
agent-browser snapshot -i # Fresh snapshot for new page
Parallel Browser Sessions
# Terminal 1
agent-browser --session scraper1 open https://site-a.com
agent-browser --session scraper1 snapshot -i
# Terminal 2
agent-browser --session scraper2 open https://site-b.com
agent-browser --session scraper2 snapshot -i
Mock API Response
agent-browser network route "**/api/users" --body '[{"id":1,"name":"Mock User"}]'
agent-browser open https://example.com # App sees mocked data
agent-browser network unroute "**/api/users"
Handle File Downloads
agent-browser open https://example.com/downloads
agent-browser snapshot -i
agent-browser click @e2 # Download button
# Download handled automatically, saved to suggested filename
Anti-Patterns
- Don't use CSS selectors when refs available - refs from snapshot are deterministic
- Don't skip snapshot after page changes - refs become stale after navigation
- Don't use
typewhenfillworks -fillatomically clears + sets,typeappends - Don't hardcode wait times - use semantic waits (
--text,--url,--load,--fn) - Don't use
--headedin automation - only for debugging - Don't call
state loadafter browser started - must be at launch time
Timeouts
Default timeout: 10 seconds (Playwright default is 30s). Commands will fail after timeout if element not found or condition not met.
Browser Support
| Platform | Binary | Fallback |
|---|---|---|
| macOS ARM64 | Native Rust | Node.js |
| macOS x64 | Native Rust | Node.js |
| Linux ARM64 | Native Rust | Node.js |
| Linux x64 | Native Rust | Node.js |
| Windows | - | Node.js |
More by dmmulroy
View allComprehensive Cloudflare platform skill covering Workers, Pages, storage (KV, D1, R2), AI (Workers AI, Vectorize, Agents SDK), networking (Tunnel, Spectrum), security (WAF, DDoS), and infrastructure-as-code (Terraform, Pulumi). Use for any Cloudflare development task.
Migrate better-result TaggedError from v1 (class-based) to v2 (factory-based) API
Create Product Requirements Documents (PRDs) that define the end state of a feature. Use when planning new features, migrations, or refactors. Generates structured PRDs with acceptance criteria.
Detect whether the current project uses jj (Jujutsu) or git for version control. Run this BEFORE any VCS command to use the correct tool.
