Comparing pinchtab with browserwing
pinchtab
View full →Author
@pinchtab
Stars
8.8k
Repository
pinchtab/pinchtab
Browser Automation with PinchTab
PinchTab gives agents a browser they can drive through stable accessibility refs, low-token text extraction, and persistent profiles or instances. Treat it as a CLI-first browser skill; use the HTTP API only when the CLI is unavailable or you need profile-management routes that do not exist in the CLI yet.
Preferred tool surface:
- Use
pinchtabCLI commands first. - Use
curlfor profile-management routes or non-shell/API fallback flows. - Use
jqonly when you need structured parsing from JSON responses.
Core Workflow
Every PinchTab automation follows this pattern:
- Ensure the correct server, profile, or instance is available for the task.
- Navigate with
pinchtab nav <url>orpinchtab instance navigate <instance-id> <url>. - Observe with
pinchtab snap -i -c,pinchtab snap --text, orpinchtab text, then collect the current refs such ase5. - Interact with those fresh refs using
click,fill,type,press,select,hover, orscroll. - Re-snapshot or re-read text after any navigation, submit, modal open, accordion expand, or other DOM-changing action.
Rules:
- Never act on stale refs after the page changes.
- Default to
pinchtab textwhen you need content, not layout. - Default to
pinchtab snap -i -cwhen you need actionable elements. - Use screenshots only for visual verification, UI diffs, or debugging.
- Start multi-site or parallel work by choosing the right instance or profile first.
Selectors
PinchTab uses a unified selector system. Any command that targets an element accepts these formats:
| Selector | Example | Resolves via |
|---|---|---|
| Ref | e5 | Snapshot cache (fastest) |
| CSS | #login, .btn, [data-testid="x"] | document.querySelector |
| XPath | xpath://button[@id="submit"] | CDP search |
| Text | text:Sign In | Visible text match |
| Semantic | find:login button | Natural language query via /find |
Auto-detection: bare e5 → ref, #id / .class / [attr] → CSS, //path → XPath. Use explicit prefixes (css:, xpath:, text:, find:) when auto-detection is ambiguous.
pinchtab click e5 # ref
pinchtab click "#submit" # CSS (auto-detected)
pinchtab click "text:Sign In" # text match
pinchtab click "xpath://button[@type]" # XPath
pinchtab fill "#email" "user@test.com" # CSS
pinchtab fill e3 "user@test.com" # ref
Same syntax in HTTP API via selector field. Legacy ref field still accepted.
Command Chaining
Use && when you don't need intermediate output: pinchtab nav <url> && pinchtab snap -i -c. Run separately when you must read refs before acting.
Challenge Solving
If a page shows a challenge instead of content (e.g., "Just a moment..."), call POST /solve with {"maxAttempts": 3} to auto-detect and resolve it. Use POST /tabs/TAB_ID/solve for tab-scoped. Works best with stealthLevel: "full" in config. Safe to call speculatively — returns immediately if no challenge is present. See api.md for full solver options.
Handling Authentication and State
Patterns: (1) One-off: pinchtab instance start → --server http://localhost:<port>. (2) Reuse profile: pinchtab instance start --profile work --mode headed → switch to headless after login. (3) HTTP: POST /profiles, then POST /profiles/<name>/start. (4) Human-assisted: headed login, then agent reuses headless.
Agent sessions: pinchtab session create --agent-id <id> or POST /sessions → set PINCHTAB_SESSION=ses_....
Essential Commands
Server and targeting
pinchtab server # Start server foreground
pinchtab daemon install # Install as system service
pinchtab health # Check server status
pinchtab instances # List running instances
pinchtab profiles # List available profiles
pinchtab --server http://localhost:9868 snap -i -c # Target specific instance
Navigation and tabs
pinchtab nav <url>
pinchtab nav <url> --new-tab
pinchtab nav <url> --tab <tab-id>
pinchtab nav <url> --block-images
pinchtab nav <url> --block-ads
pinchtab nav <url> --print-tab-id # Print only the new tabId on stdout
pinchtab back # Navigate back in history
pinchtab forward # Navigate forward
pinchtab reload # Reload current page
pinchtab tab # List tabs (no subcommand - just `tab`)
pinchtab tab <tab-id> # Focus an existing tab
pinchtab tab new <url> # Open a new tab
pinchtab tab close <tab-id> # Close a tab — use this to clean up stale tabs between runs
pinchtab instance navigate <instance-id> <url>
Tab workflow — most commands target the active tab by default, so single-tab flows need no plumbing:
pinchtab nav http://example.com
pinchtab snap -i -c # active tab
pinchtab click e5 # active tab
pinchtab text # active tab
When you need to pin to a specific tab (parallel tabs, long-running flows, or shell-isolated
runners like agent tool calls where env vars don't persist across invocations), capture the
tab ID with --print-tab-id and pass --tab on every subsequent command:
TAB_ID=$(pinchtab nav http://example.com --print-tab-id)
pinchtab --tab "$TAB_ID" snap -i -c
pinchtab --tab "$TAB_ID" click e5
pinchtab --tab "$TAB_ID" text
Within a single shell session you can also export PINCHTAB_TAB=$(pinchtab nav URL --print-tab-id)
and drop the --tab flag. This does not survive across separate shell invocations (each
Bash tool call in an agent runs a fresh shell), so prefer explicit --tab for agent workflows.
Priority: --tab <id> flag > PINCHTAB_TAB env var > active tab.
Observation
pinchtab snap
pinchtab snap -i # Interactive elements only
pinchtab snap -i -c # Interactive + compact
pinchtab snap -d # Diff from previous snapshot
pinchtab snap --selector <css> # Scope to CSS selector
pinchtab snap --max-tokens <n> # Token budget limit
pinchtab snap --text # Text output format
pinchtab text # Page text content (Readability-filtered; drops nav/repeated headlines)
pinchtab text --full # Full page text (document.body.innerText) — use when Readability is dropping content you need
pinchtab text --raw # Alias of --full
# CLI returns JSON; use `| jq -r .text` for plain text
pinchtab find <query> # Semantic element search
pinchtab find --ref-only <query> # Return refs only
Guidance:
snap -i -cis the default for finding actionable refs.snap -dis the default follow-up snapshot for multi-step flows.textis the default for reading articles, dashboards, reports, or confirmation messages.pinchtab find <query>is the direct route when you already know the semantic target (e.g. "login button", "email input", "accept cookies link") — skips the full snapshot and returns a ranked match with its ref. Pair with--ref-onlyon large/dense pages to get just the ref string for piping straight intoclick/fill/type. Preferfindoversnap -i -c+ visual scan whenever you can describe the target in a phrase.- Refs from
snap -iand fullsnapuse different numbering. Do not mix them — if you snapshot with-i, use those refs. If you re-snapshot without-i, get fresh refs before acting.
Interaction
All interaction commands accept unified selectors (refs, CSS, XPath, text, semantic). See the Selectors section above.
pinchtab click <selector> # Click element
pinchtab click --wait-nav <selector> # Click and wait for navigation
pinchtab click --x 100 --y 200 # Click by coordinates
pinchtab click <selector> --dialog-action accept # Click + auto-accept any alert/confirm the click opens
pinchtab click <selector> --dialog-action dismiss # Click + auto-dismiss
pinchtab click <selector> --dialog-action accept \
--dialog-text "hello" # Click + accept a prompt() with a response
pinchtab dblclick <selector> # Double-click element
pinchtab mouse move <selector> # Move pointer to element center
pinchtab mouse move <x> <y> # Move pointer to coordinates
pinchtab mouse down <selector> --button left # Press a mouse button at an explicit target
pinchtab mouse down --button left # Press a mouse button at current pointer
pinchtab mouse up <selector> --button left # Release a mouse button at an explicit target
pinchtab mouse up --button left # Release a mouse button at current pointer
pinchtab mouse wheel 240 --dx 40 # Dispatch wheel deltas at current pointer
pinchtab drag <from> <to> # Drag between selector/ref or x,y points (synthesized mouse sequence)
pinchtab drag <selector> --drag-x <n> --drag-y <n> # Single-step drag by pixel offset (mirrors HTTP /action dragX/dragY)
pinchtab type <selector> <text> # Type with keystrokes
pinchtab fill <selector> <text> # Set value directly
pinchtab press <key> # Press key (Enter, Tab, Escape...)
pinchtab hover <selector> # Hover element
pinchtab select <selector> <value|text> # Select dropdown option by value attr, or fall back to visible text
pinchtab scroll <pixels|direction|selector> # e.g. `scroll 1500`, `scroll down`, `scroll '#footer'`
Rules:
- Prefer
fillfor deterministic form entry. - Prefer
typeonly when the site depends on keystroke events. - Prefer
click --wait-navwhen a click is expected to navigate. - Prefer low-level
mousecommands only when normalclick/hoverabstractions are insufficient, such as drag handles, canvas widgets, or sites that depend on exact pointer sequences. - Re-snapshot immediately after
click,press Enter,select, orscrollif the UI can change. selectmatches by value attr first, then visible text (case-insensitive). Error lists available options if no match.- For JS dialogs: use
--dialog-action acceptor--dialog-action dismisson click. Add--dialog-textfor prompt responses. - For the
scrollaction via HTTP, use"scrollX"/"scrollY"for pixel deltas, or"selector"to scroll an element into view. Example:{"kind":"scroll","scrollY":1500}or{"kind":"scroll","selector":"#footer"}. Thex/yfields are target viewport coordinates, not scroll deltas. - The download HTTP endpoint (
GET /download?url=...orGET /tabs/TAB_ID/download?url=...) returns JSON{contentType, data (base64), size, url}, not raw bytes. Decodedatawith base64 to get the file. Onlyhttp/httpsURLs are allowed. Private/internal hosts are blocked unless listed insecurity.downloadAllowedDomains.
Waiting
Use wait when the DOM settles asynchronously — spinners, toasts, XHR-driven content.
pinchtab wait <selector> # Element to appear (default visible)
pinchtab wait <selector> --state hidden # Element to disappear
pinchtab wait --text "Order confirmed" # Text to appear
pinchtab wait --not-text "Loading..." # Text to disappear (spinner/toast dismiss)
pinchtab wait --url "**/dashboard" # URL glob match
pinchtab wait --load networkidle # Network idle
pinchtab wait 500 # Fixed delay in ms (last resort)
Default timeout 10s, max 30s via --timeout <ms>. Prefer --not-text / --state hidden over polling.
Export, debug, and verification
pinchtab screenshot
pinchtab screenshot -o /tmp/pinchtab-page.png # Format driven by extension
pinchtab screenshot -q 60 # JPEG quality
pinchtab pdf
pinchtab pdf -o /tmp/pinchtab-report.pdf
pinchtab pdf --landscape
Advanced operations: explicit opt-in only
Use these only when the task explicitly requires them and safer commands are insufficient.
pinchtab eval "document.title"
pinchtab eval --await-promise "fetch('/api/me').then(r => r.json())"
pinchtab download <url> -o /tmp/pinchtab-download.bin
pinchtab upload /absolute/path/provided-by-user.ext -s <css>
Rules:
evalis for narrow, read-only DOM inspection unless the user explicitly asks for a page mutation.downloadshould prefer a safe temporary or workspace path over an arbitrary filesystem location.uploadrequires a file path the user explicitly provided or clearly approved for the task.
HTTP API fallback
Use curl when CLI unavailable. Key endpoints on instance port (e.g. 9867):
POST /navigatewith{"url":"..."}GET /snapshot?filter=interactive&format=compactPOST /actionwith{"kind":"fill","selector":"e3","text":"..."}POST /actionswith a batch of actions — runs them in one round-trip. Body accepts either an array[{"kind":"fill",...},{"kind":"click",...}]or an envelope{"actions":[...],"stopOnError":true,"tabId":"..."}. Use this for tight form flows (fill + fill + click submit) to cut round-trip latency. SetstopOnError:trueto halt on the first failure; the response contains a per-step{index, success, result?, error?}array. Tab-scoped variant:POST /tabs/TAB_ID/actions.GET /textPOST /solvewith{"maxAttempts": 3}
Tab-scoped HTTP API
Use /tabs/TAB_ID/... routes to target specific tabs. Get tab ID from navigate response or GET /tabs.
Pattern: curl -H "Authorization: Bearer <token>" http://localhost:9867/tabs/TAB_ID/<endpoint>
Key endpoints: navigate, snapshot, text, action, screenshot, pdf, back, forward, close, wait, download, upload, handoff, resume.
Action examples:
- Click:
{"kind":"click","selector":"#btn"} - Click with nav:
{"kind":"click","selector":"#link","waitNav":true} - Drag:
{"kind":"drag","selector":"#piece","dragX":12,"dragY":-158} - Scroll:
{"kind":"scroll","scrollY":1500}or{"kind":"scroll","selector":"#footer"}
Common Patterns
- Form flow:
nav→snap -i -c→fillfields →click --wait-navsubmit →textto verify - Multi-step: After each action,
snap -d -i -cfor diff - Direct selectors: Skip snapshot when structure is known:
pinchtab click "text:Accept Cookies"orfill "#search" "query"
Form submission: Always click the submit button — never use press Enter.
Token Economy
Prefer low-token commands: text, snap -i -c, snap -d. Use --block-images for read-heavy tasks. Reserve screenshots/PDFs for visual verification.
Diffing and Verification
- Use
pinchtab snap -dafter each state-changing action in long workflows. - Use
pinchtab textto confirm success messages, table updates, or navigation outcomes. The default mode extracts Readability-filtered content (reader view), which may drop navigation, repeated headlines, short-text nodes, or collapse lists/grids down to a single representative item. Reach forpinchtab text --fullwhenever (a) you're verifying content on a list/grid/tab/accordion page, (b) the expected marker is short, or (c) a default read came back missing content you can see in the snapshot. It returns the rawdocument.body.innerTextand is almost always the safer choice once you know Readability is going to trim. - Use
pinchtab screenshotonly when visual regressions, CAPTCHA, or layout-specific confirmation matters. - If a ref disappears after a change, treat that as expected and fetch fresh refs instead of retrying the stale one.
- Action responses like
{"clicked":true,"submitted":true}mean the event fired on the target element — not that the form was accepted by the server or passed native HTML validation. Always verify the expected success marker or state change viasnap/textbefore treating a submission as complete. - Same-origin iframes are supported natively via
pinchtab frame <target>— a stateful scope that subsequent selector-based/snapshotand/actioncalls inherit. Typical flow:pinchtab frame '#payment-frame'→pinchtab snap -i -c(refs reflect iframe interior) →pinchtab fill '#card'/click '#pay'→pinchtab frame main. Target acceptsmain, an iframe ref, a CSS selector for the iframe element, a frame name, or a frame URL. Nested iframes need multiple hops. Refs emitted by a fullsnap(no-i) for iframe descendants carry frame context — ref-based actions work across the boundary without an explicit scope set. Cross-origin iframes are not exposed as frame scopes; fall back toevalagainstiframe.contentDocument(same-origin-policy permitting).pinchtab text(andtext --full) honors the active frame scope and also accepts an explicit--frame <frameId>flag for one-shot reads — so afterpinchtab frame '#content-frame', a followingpinchtab text --fullextracts from the iframe's document, not the outer page. The--frameargument must be a frame ID (the 32-char hexframeIdfrompinchtab frame <target>output), not a CSS selector. For a one-shot read, the idiom is:FID=$(pinchtab frame '#content-frame' | jq -r .current.frameId); pinchtab frame main; pinchtab text --full --frame "$FID". Passing a selector liketext --frame '#content-frame'returns "no frame for given id found". eval→ always IIFE.evalexpressions share the same realm across calls, so any top-levelconst/let/classfrom one call collides with the next:SyntaxError: Identifier 'x' has already been declared. Use an IIFE on everyevalthat introduces identifiers, not only on multi-statement ones:pinchtab eval "(() => { const r = document.querySelector('#x').getBoundingClientRect(); return {x: r.x, y: r.y, w: r.width, h: r.height}; })()". For a single expression that doesn't introduce identifiers (e.g.document.title,document.getElementById('x').value), the IIFE is optional. The IIFE pattern also fixes DOMRect serialization —getBoundingClientRect()returns a value whose own-enumerable fields don't survive JSON, so the explicit projection is what actually ships the numbers back.pinchtab text(both default and--full) returns content fromdisplay:noneandvisibility:hiddennodes because it readsdocument.body.innerText(and Readability's input) from raw DOM — the visibility cascade is not applied. When you need to confirm that a success banner or error message is actually visible (not just present as a pre-seeded hidden element), verify viapinchtab snap(the accessibility tree respects visibility and hides non-rendered subtrees) or viaevalagainst the element'soffsetHeight/getComputedStyle().display. A common trap: a page ships with a hidden success<div>pre-rendered;textwill report the success string before the form is ever submitted.- The compact snapshot shows
<option>elements by their visible text, not theirvalueattribute. You don't normally need to look up thevalue: theselectaction accepts either — it matches onvaluefirst and falls back to visible text (case-insensitive). Only reach foreval+Array.from(select.options)when debugging an unexpected no-match error. text:<value>selectors are resolved by a JS-level search over visible text and can intermittently fail withDOM Errororcontext deadline exceededon large/dynamic pages. If you have a freshsnap -i -cin hand, prefer the ref (e12) — refs resolve by stable backend node IDs and don't depend on page-side JS.snap -i -c(interactive, compact) skips non-interactive descendants. For iframe interiors, either set aframescope first or use a fullpinchtab snap(no-i) which flattens same-origin iframe descendants into the parent snapshot.- ARIA expansion state (
aria-expanded="true" | "false") is usually placed on the outermost container of an accordion/menu/disclosure section, not on the header/trigger that dispatches the click. When verifying state after a click, querydocument.querySelector('#section-a').getAttribute('aria-expanded')(or the wrapper's equivalent) rather than the clicked element. click --wait-navcan return{"success": true}or, immediately after the navigation fires,Error 409: unexpected page navigation— the latter means the server saw a navigation while mid-response and aborted its reply, not that the click failed. Treat 409 after a navigation-expected click as success and verify the resulting page with a freshsnap/text.
References
- Full API: api.md
- Minimal env vars: env.md
- Agent optimization: agent-optimization.md
- Profiles: profiles.md
- MCP: mcp.md
- Security model: TRUST.md
browserwing
View full →Author
@browserwing
Stars
1.2k
Repository
browserwing/browserwing
BrowserWing
Browser automation platform with 78 built-in scripts, CLI for AI agents, and full browser control API. Turn any website into structured JSON with one command.
Install
npm install -g browserwing
Then start the server:
browserwing --port 8080
Chrome must be installed on the host machine.
Verify Installation
browserwing doctor
Quick Start — Run Built-in Scripts
BrowserWing ships with 78 ready-to-use scripts across 10 categories. No configuration needed.
# Get Hacker News top stories
browserwing run hackernews-top
# Get GitHub trending repos
browserwing run github-trending
# Get Bilibili hot videos
browserwing run bilibili-hot
# Search with parameters
browserwing run jd-search --keyword="mechanical keyboard"
# Output as table or CSV
browserwing run douban-movie-hot --format=table
browserwing run sinafinance-rank --format=csv > stocks.csv
Discover Scripts
# List all scripts
browserwing ls
# List only built-in scripts
browserwing ls --builtin
# Search by keyword
browserwing ls --search=stock
# Filter by category
browserwing ls --category=finance
# JSON output for programmatic use
browserwing ls --format=json
Built-in Script Categories
| Category | Examples |
|---|---|
| Tech | hackernews-top, github-trending, producthunt-hot, devto-top, stackoverflow-hot |
| Finance | sinafinance-rank, eastmoney-hot, xueqiu-hot, binance-gainers, yahoo-finance-quote |
| News | 36kr-hot, toutiao-hot, bbc-news, bloomberg-rss, reuters-search |
| Social | weibo-hot, zhihu-hot, xiaohongshu-hot, reddit-popular, twitter-trending |
| Entertainment | bilibili-hot, douban-movie-hot, douyin-hot, imdb-trending, youtube-trending |
| Shopping | jd-search, taobao-search, amazon-bestsellers, xianyu-hot, smzdm-hot |
| Academic | arxiv-new, google-scholar-search, baidu-scholar-search, cnki-search |
| Jobs | boss-recommend, linkedin-jobs |
| Reading | douban-book-hot, weread-ranking, substack-feed, lesswrong-curated |
| Government | gov-policy, gov-law |
Direct Browser Control
Control Chrome directly from the CLI — navigate, click, type, extract, screenshot, and more.
Workflow: Navigate → Snapshot → Interact → Extract
# 1. Navigate to a page
browserwing exec navigate https://example.com
# 2. Get page structure with element RefIDs
browserwing exec snapshot
# Output shows elements like:
# @e1 - Search (textbox)
# @e2 - Login (button)
# @e3 - Sign Up (link)
# 3. Interact using RefIDs
browserwing exec type @e1 "search query"
browserwing exec press-key Enter
browserwing exec click @e2
# 4. Extract data
browserwing exec extract ".result-item" --fields=text,href --multiple
# 5. Take screenshot
browserwing exec screenshot --output=page.png
# 6. Execute JavaScript
browserwing exec eval 'document.title'
All exec Actions
| Action | Usage | Description |
|---|---|---|
| navigate | exec navigate <url> | Open a URL |
| snapshot | exec snapshot | Get accessibility tree with @e refs |
| click | exec click <@ref> | Click element |
| type | exec type <@ref> <text> | Type into input |
| extract | exec extract <selector> [--fields=text,href] [--multiple] | Extract data |
| wait | exec wait <@ref> [--state=visible] [--timeout=10] | Wait for element |
| press-key | exec press-key <Enter|Tab|Escape> | Press key |
| screenshot | exec screenshot [--output=file.png] | Take screenshot |
| eval | exec eval '<js>' | Run JavaScript |
| fill-form | exec fill-form --field=name:value [--submit] | Fill form fields |
| tabs | exec tabs <list|new|switch|close> | Manage tabs |
| select | exec select <@ref> <value> | Select dropdown |
| hover | exec hover <@ref> | Hover element |
| scroll | exec scroll | Scroll to bottom |
| back/forward | exec back, exec forward | Navigation history |
| page-info | exec page-info | Get URL and title |
Admin Operations
Manage LLM configs, browser instances, cookies, scripts, and AI exploration — all from the CLI.
# LLM Configuration
browserwing config list
browserwing config add --name=gpt --provider=openai --api-key=sk-xxx --model=gpt-4o
browserwing config test gpt
# Browser Instances
browserwing browser list
browserwing browser start
browserwing browser stop
# Cookie Management
browserwing cookie list
browserwing cookie save
browserwing cookie import cookies.json
# Script Management
browserwing script get <id>
browserwing script create my-script.json
browserwing script delete <id>
browserwing script export --output=SKILL.md
browserwing script summary
# AI Exploration (auto-generate scripts)
browserwing explore start --url=https://example.com --task="find top products"
browserwing explore script <session-id>
browserwing explore save <session-id>
# System
browserwing health
browserwing doctor
browserwing mcp status
Output Formats
All commands support --format=json|table|csv:
browserwing run zhihu-hot --format=json # default, for piping
browserwing run zhihu-hot --format=table # human-readable
browserwing run zhihu-hot --format=csv # spreadsheet-friendly
Agent Integration Pattern
Typical AI agent workflow:
# Step 1: Discover available scripts
browserwing ls --format=json
# Step 2: Find relevant script
browserwing ls --search=stock --format=json
# Step 3: Execute and get structured data
browserwing run sinafinance-rank
# Step 4: Pipe to other tools
browserwing run hackernews-top | jq '.[0:5]'
For direct browser control:
# Step 1: Open page
browserwing exec navigate https://example.com
# Step 2: Understand page structure
browserwing exec snapshot
# Step 3: Interact
browserwing exec click @e3
browserwing exec type @e5 "query"
# Step 4: Extract results
browserwing exec extract ".item" --fields=text,href --multiple --format=json
CLI Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General error |
| 2 | Cannot connect to server |
| 3 | Script not found |
| 4 | Script execution failed |
| 64 | Bad arguments |
Environment Variables
| Variable | Description |
|---|---|
BROWSERWING_URL | Server URL (default: auto-detect from config.toml) |
BROWSERWING_JSON_ERRORS | Set to 1 for JSON error output on stderr |