Comparing pinchtab with browserwing

Author

@pinchtab

Stars

8.8k

Repository

pinchtab/pinchtab

skills/pinchtab/SKILL.md

Browser Automation with PinchTab

PinchTab gives agents a browser they can drive through stable accessibility refs, low-token text extraction, and persistent profiles or instances. Treat it as a CLI-first browser skill; use the HTTP API only when the CLI is unavailable or you need profile-management routes that do not exist in the CLI yet.

Preferred tool surface:

  • Use pinchtab CLI commands first.
  • Use curl for profile-management routes or non-shell/API fallback flows.
  • Use jq only when you need structured parsing from JSON responses.

Core Workflow

Every PinchTab automation follows this pattern:

  1. Ensure the correct server, profile, or instance is available for the task.
  2. Navigate with pinchtab nav <url> or pinchtab instance navigate <instance-id> <url>.
  3. Observe with pinchtab snap -i -c, pinchtab snap --text, or pinchtab text, then collect the current refs such as e5.
  4. Interact with those fresh refs using click, fill, type, press, select, hover, or scroll.
  5. Re-snapshot or re-read text after any navigation, submit, modal open, accordion expand, or other DOM-changing action.

Rules:

  • Never act on stale refs after the page changes.
  • Default to pinchtab text when you need content, not layout.
  • Default to pinchtab snap -i -c when you need actionable elements.
  • Use screenshots only for visual verification, UI diffs, or debugging.
  • Start multi-site or parallel work by choosing the right instance or profile first.

Selectors

PinchTab uses a unified selector system. Any command that targets an element accepts these formats:

SelectorExampleResolves via
Refe5Snapshot cache (fastest)
CSS#login, .btn, [data-testid="x"]document.querySelector
XPathxpath://button[@id="submit"]CDP search
Texttext:Sign InVisible text match
Semanticfind:login buttonNatural language query via /find

Auto-detection: bare e5 → ref, #id / .class / [attr] → CSS, //path → XPath. Use explicit prefixes (css:, xpath:, text:, find:) when auto-detection is ambiguous.

pinchtab click e5                        # ref
pinchtab click "#submit"                 # CSS (auto-detected)
pinchtab click "text:Sign In"            # text match
pinchtab click "xpath://button[@type]"   # XPath
pinchtab fill "#email" "user@test.com"   # CSS
pinchtab fill e3 "user@test.com"         # ref

Same syntax in HTTP API via selector field. Legacy ref field still accepted.

Command Chaining

Use && when you don't need intermediate output: pinchtab nav <url> && pinchtab snap -i -c. Run separately when you must read refs before acting.

Challenge Solving

If a page shows a challenge instead of content (e.g., "Just a moment..."), call POST /solve with {"maxAttempts": 3} to auto-detect and resolve it. Use POST /tabs/TAB_ID/solve for tab-scoped. Works best with stealthLevel: "full" in config. Safe to call speculatively — returns immediately if no challenge is present. See api.md for full solver options.

Handling Authentication and State

Patterns: (1) One-off: pinchtab instance start--server http://localhost:<port>. (2) Reuse profile: pinchtab instance start --profile work --mode headed → switch to headless after login. (3) HTTP: POST /profiles, then POST /profiles/<name>/start. (4) Human-assisted: headed login, then agent reuses headless.

Agent sessions: pinchtab session create --agent-id <id> or POST /sessions → set PINCHTAB_SESSION=ses_....

Essential Commands

Server and targeting

pinchtab server                                     # Start server foreground
pinchtab daemon install                             # Install as system service
pinchtab health                                     # Check server status
pinchtab instances                                  # List running instances
pinchtab profiles                                   # List available profiles
pinchtab --server http://localhost:9868 snap -i -c  # Target specific instance

Navigation and tabs

pinchtab nav <url>
pinchtab nav <url> --new-tab
pinchtab nav <url> --tab <tab-id>
pinchtab nav <url> --block-images
pinchtab nav <url> --block-ads
pinchtab nav <url> --print-tab-id                   # Print only the new tabId on stdout
pinchtab back                                       # Navigate back in history
pinchtab forward                                    # Navigate forward
pinchtab reload                                     # Reload current page
pinchtab tab                                        # List tabs (no subcommand - just `tab`)
pinchtab tab <tab-id>                               # Focus an existing tab
pinchtab tab new <url>                              # Open a new tab
pinchtab tab close <tab-id>                         # Close a tab — use this to clean up stale tabs between runs
pinchtab instance navigate <instance-id> <url>

Tab workflow — most commands target the active tab by default, so single-tab flows need no plumbing:

pinchtab nav http://example.com
pinchtab snap -i -c      # active tab
pinchtab click e5        # active tab
pinchtab text            # active tab

When you need to pin to a specific tab (parallel tabs, long-running flows, or shell-isolated runners like agent tool calls where env vars don't persist across invocations), capture the tab ID with --print-tab-id and pass --tab on every subsequent command:

TAB_ID=$(pinchtab nav http://example.com --print-tab-id)
pinchtab --tab "$TAB_ID" snap -i -c
pinchtab --tab "$TAB_ID" click e5
pinchtab --tab "$TAB_ID" text

Within a single shell session you can also export PINCHTAB_TAB=$(pinchtab nav URL --print-tab-id) and drop the --tab flag. This does not survive across separate shell invocations (each Bash tool call in an agent runs a fresh shell), so prefer explicit --tab for agent workflows.

Priority: --tab <id> flag > PINCHTAB_TAB env var > active tab.

Observation

pinchtab snap
pinchtab snap -i                                    # Interactive elements only
pinchtab snap -i -c                                 # Interactive + compact
pinchtab snap -d                                    # Diff from previous snapshot
pinchtab snap --selector <css>                      # Scope to CSS selector
pinchtab snap --max-tokens <n>                      # Token budget limit
pinchtab snap --text                                # Text output format
pinchtab text                                       # Page text content (Readability-filtered; drops nav/repeated headlines)
pinchtab text --full                                # Full page text (document.body.innerText) — use when Readability is dropping content you need
pinchtab text --raw                                 # Alias of --full
# CLI returns JSON; use `| jq -r .text` for plain text
pinchtab find <query>                               # Semantic element search
pinchtab find --ref-only <query>                    # Return refs only

Guidance:

  • snap -i -c is the default for finding actionable refs.
  • snap -d is the default follow-up snapshot for multi-step flows.
  • text is the default for reading articles, dashboards, reports, or confirmation messages.
  • pinchtab find <query> is the direct route when you already know the semantic target (e.g. "login button", "email input", "accept cookies link") — skips the full snapshot and returns a ranked match with its ref. Pair with --ref-only on large/dense pages to get just the ref string for piping straight into click / fill / type. Prefer find over snap -i -c + visual scan whenever you can describe the target in a phrase.
  • Refs from snap -i and full snap use different numbering. Do not mix them — if you snapshot with -i, use those refs. If you re-snapshot without -i, get fresh refs before acting.

Interaction

All interaction commands accept unified selectors (refs, CSS, XPath, text, semantic). See the Selectors section above.

pinchtab click <selector>                           # Click element
pinchtab click --wait-nav <selector>                # Click and wait for navigation
pinchtab click --x 100 --y 200                      # Click by coordinates
pinchtab click <selector> --dialog-action accept    # Click + auto-accept any alert/confirm the click opens
pinchtab click <selector> --dialog-action dismiss   # Click + auto-dismiss
pinchtab click <selector> --dialog-action accept \
    --dialog-text "hello"                           # Click + accept a prompt() with a response
pinchtab dblclick <selector>                        # Double-click element
pinchtab mouse move <selector>                      # Move pointer to element center
pinchtab mouse move <x> <y>                         # Move pointer to coordinates
pinchtab mouse down <selector> --button left        # Press a mouse button at an explicit target
pinchtab mouse down --button left                   # Press a mouse button at current pointer
pinchtab mouse up <selector> --button left          # Release a mouse button at an explicit target
pinchtab mouse up --button left                     # Release a mouse button at current pointer
pinchtab mouse wheel 240 --dx 40                    # Dispatch wheel deltas at current pointer
pinchtab drag <from> <to>                           # Drag between selector/ref or x,y points (synthesized mouse sequence)
pinchtab drag <selector> --drag-x <n> --drag-y <n>  # Single-step drag by pixel offset (mirrors HTTP /action dragX/dragY)
pinchtab type <selector> <text>                     # Type with keystrokes
pinchtab fill <selector> <text>                     # Set value directly
pinchtab press <key>                                # Press key (Enter, Tab, Escape...)
pinchtab hover <selector>                           # Hover element
pinchtab select <selector> <value|text>             # Select dropdown option by value attr, or fall back to visible text
pinchtab scroll <pixels|direction|selector>         # e.g. `scroll 1500`, `scroll down`, `scroll '#footer'`

Rules:

  • Prefer fill for deterministic form entry.
  • Prefer type only when the site depends on keystroke events.
  • Prefer click --wait-nav when a click is expected to navigate.
  • Prefer low-level mouse commands only when normal click / hover abstractions are insufficient, such as drag handles, canvas widgets, or sites that depend on exact pointer sequences.
  • Re-snapshot immediately after click, press Enter, select, or scroll if the UI can change.
  • select matches by value attr first, then visible text (case-insensitive). Error lists available options if no match.
  • For JS dialogs: use --dialog-action accept or --dialog-action dismiss on click. Add --dialog-text for prompt responses.
  • For the scroll action via HTTP, use "scrollX" / "scrollY" for pixel deltas, or "selector" to scroll an element into view. Example: {"kind":"scroll","scrollY":1500} or {"kind":"scroll","selector":"#footer"}. The x/y fields are target viewport coordinates, not scroll deltas.
  • The download HTTP endpoint (GET /download?url=... or GET /tabs/TAB_ID/download?url=...) returns JSON {contentType, data (base64), size, url}, not raw bytes. Decode data with base64 to get the file. Only http/https URLs are allowed. Private/internal hosts are blocked unless listed in security.downloadAllowedDomains.

Waiting

Use wait when the DOM settles asynchronously — spinners, toasts, XHR-driven content.

pinchtab wait <selector>                            # Element to appear (default visible)
pinchtab wait <selector> --state hidden             # Element to disappear
pinchtab wait --text "Order confirmed"              # Text to appear
pinchtab wait --not-text "Loading..."               # Text to disappear (spinner/toast dismiss)
pinchtab wait --url "**/dashboard"                  # URL glob match
pinchtab wait --load networkidle                    # Network idle
pinchtab wait 500                                   # Fixed delay in ms (last resort)

Default timeout 10s, max 30s via --timeout <ms>. Prefer --not-text / --state hidden over polling.

Export, debug, and verification

pinchtab screenshot
pinchtab screenshot -o /tmp/pinchtab-page.png       # Format driven by extension
pinchtab screenshot -q 60                            # JPEG quality
pinchtab pdf
pinchtab pdf -o /tmp/pinchtab-report.pdf
pinchtab pdf --landscape

Advanced operations: explicit opt-in only

Use these only when the task explicitly requires them and safer commands are insufficient.

pinchtab eval "document.title"
pinchtab eval --await-promise "fetch('/api/me').then(r => r.json())"
pinchtab download <url> -o /tmp/pinchtab-download.bin
pinchtab upload /absolute/path/provided-by-user.ext -s <css>

Rules:

  • eval is for narrow, read-only DOM inspection unless the user explicitly asks for a page mutation.
  • download should prefer a safe temporary or workspace path over an arbitrary filesystem location.
  • upload requires a file path the user explicitly provided or clearly approved for the task.

HTTP API fallback

Use curl when CLI unavailable. Key endpoints on instance port (e.g. 9867):

  • POST /navigate with {"url":"..."}
  • GET /snapshot?filter=interactive&format=compact
  • POST /action with {"kind":"fill","selector":"e3","text":"..."}
  • POST /actions with a batch of actions — runs them in one round-trip. Body accepts either an array [{"kind":"fill",...},{"kind":"click",...}] or an envelope {"actions":[...],"stopOnError":true,"tabId":"..."}. Use this for tight form flows (fill + fill + click submit) to cut round-trip latency. Set stopOnError:true to halt on the first failure; the response contains a per-step {index, success, result?, error?} array. Tab-scoped variant: POST /tabs/TAB_ID/actions.
  • GET /text
  • POST /solve with {"maxAttempts": 3}

Tab-scoped HTTP API

Use /tabs/TAB_ID/... routes to target specific tabs. Get tab ID from navigate response or GET /tabs.

Pattern: curl -H "Authorization: Bearer <token>" http://localhost:9867/tabs/TAB_ID/<endpoint>

Key endpoints: navigate, snapshot, text, action, screenshot, pdf, back, forward, close, wait, download, upload, handoff, resume.

Action examples:

  • Click: {"kind":"click","selector":"#btn"}
  • Click with nav: {"kind":"click","selector":"#link","waitNav":true}
  • Drag: {"kind":"drag","selector":"#piece","dragX":12,"dragY":-158}
  • Scroll: {"kind":"scroll","scrollY":1500} or {"kind":"scroll","selector":"#footer"}

Common Patterns

  • Form flow: navsnap -i -cfill fields → click --wait-nav submit → text to verify
  • Multi-step: After each action, snap -d -i -c for diff
  • Direct selectors: Skip snapshot when structure is known: pinchtab click "text:Accept Cookies" or fill "#search" "query"

Form submission: Always click the submit button — never use press Enter.

Token Economy

Prefer low-token commands: text, snap -i -c, snap -d. Use --block-images for read-heavy tasks. Reserve screenshots/PDFs for visual verification.

Diffing and Verification

  • Use pinchtab snap -d after each state-changing action in long workflows.
  • Use pinchtab text to confirm success messages, table updates, or navigation outcomes. The default mode extracts Readability-filtered content (reader view), which may drop navigation, repeated headlines, short-text nodes, or collapse lists/grids down to a single representative item. Reach for pinchtab text --full whenever (a) you're verifying content on a list/grid/tab/accordion page, (b) the expected marker is short, or (c) a default read came back missing content you can see in the snapshot. It returns the raw document.body.innerText and is almost always the safer choice once you know Readability is going to trim.
  • Use pinchtab screenshot only when visual regressions, CAPTCHA, or layout-specific confirmation matters.
  • If a ref disappears after a change, treat that as expected and fetch fresh refs instead of retrying the stale one.
  • Action responses like {"clicked":true,"submitted":true} mean the event fired on the target element — not that the form was accepted by the server or passed native HTML validation. Always verify the expected success marker or state change via snap/text before treating a submission as complete.
  • Same-origin iframes are supported natively via pinchtab frame <target> — a stateful scope that subsequent selector-based /snapshot and /action calls inherit. Typical flow: pinchtab frame '#payment-frame'pinchtab snap -i -c (refs reflect iframe interior) → pinchtab fill '#card' / click '#pay'pinchtab frame main. Target accepts main, an iframe ref, a CSS selector for the iframe element, a frame name, or a frame URL. Nested iframes need multiple hops. Refs emitted by a full snap (no -i) for iframe descendants carry frame context — ref-based actions work across the boundary without an explicit scope set. Cross-origin iframes are not exposed as frame scopes; fall back to eval against iframe.contentDocument (same-origin-policy permitting). pinchtab text (and text --full) honors the active frame scope and also accepts an explicit --frame <frameId> flag for one-shot reads — so after pinchtab frame '#content-frame', a following pinchtab text --full extracts from the iframe's document, not the outer page. The --frame argument must be a frame ID (the 32-char hex frameId from pinchtab frame <target> output), not a CSS selector. For a one-shot read, the idiom is: FID=$(pinchtab frame '#content-frame' | jq -r .current.frameId); pinchtab frame main; pinchtab text --full --frame "$FID". Passing a selector like text --frame '#content-frame' returns "no frame for given id found".
  • eval → always IIFE. eval expressions share the same realm across calls, so any top-level const/let/class from one call collides with the next: SyntaxError: Identifier 'x' has already been declared. Use an IIFE on every eval that introduces identifiers, not only on multi-statement ones: pinchtab eval "(() => { const r = document.querySelector('#x').getBoundingClientRect(); return {x: r.x, y: r.y, w: r.width, h: r.height}; })()". For a single expression that doesn't introduce identifiers (e.g. document.title, document.getElementById('x').value), the IIFE is optional. The IIFE pattern also fixes DOMRect serialization — getBoundingClientRect() returns a value whose own-enumerable fields don't survive JSON, so the explicit projection is what actually ships the numbers back.
  • pinchtab text (both default and --full) returns content from display:none and visibility:hidden nodes because it reads document.body.innerText (and Readability's input) from raw DOM — the visibility cascade is not applied. When you need to confirm that a success banner or error message is actually visible (not just present as a pre-seeded hidden element), verify via pinchtab snap (the accessibility tree respects visibility and hides non-rendered subtrees) or via eval against the element's offsetHeight / getComputedStyle().display. A common trap: a page ships with a hidden success <div> pre-rendered; text will report the success string before the form is ever submitted.
  • The compact snapshot shows <option> elements by their visible text, not their value attribute. You don't normally need to look up the value: the select action accepts either — it matches on value first and falls back to visible text (case-insensitive). Only reach for eval + Array.from(select.options) when debugging an unexpected no-match error.
  • text:<value> selectors are resolved by a JS-level search over visible text and can intermittently fail with DOM Error or context deadline exceeded on large/dynamic pages. If you have a fresh snap -i -c in hand, prefer the ref (e12) — refs resolve by stable backend node IDs and don't depend on page-side JS.
  • snap -i -c (interactive, compact) skips non-interactive descendants. For iframe interiors, either set a frame scope first or use a full pinchtab snap (no -i) which flattens same-origin iframe descendants into the parent snapshot.
  • ARIA expansion state (aria-expanded="true" | "false") is usually placed on the outermost container of an accordion/menu/disclosure section, not on the header/trigger that dispatches the click. When verifying state after a click, query document.querySelector('#section-a').getAttribute('aria-expanded') (or the wrapper's equivalent) rather than the clicked element.
  • click --wait-nav can return {"success": true} or, immediately after the navigation fires, Error 409: unexpected page navigation — the latter means the server saw a navigation while mid-response and aborted its reply, not that the click failed. Treat 409 after a navigation-expected click as success and verify the resulting page with a fresh snap / text.

References

browserwing

View full →

Author

@browserwing

Stars

1.2k

Repository

browserwing/browserwing

skills/browserwing/SKILL.md

BrowserWing

Browser automation platform with 78 built-in scripts, CLI for AI agents, and full browser control API. Turn any website into structured JSON with one command.

Install

npm install -g browserwing

Then start the server:

browserwing --port 8080

Chrome must be installed on the host machine.

Verify Installation

browserwing doctor

Quick Start — Run Built-in Scripts

BrowserWing ships with 78 ready-to-use scripts across 10 categories. No configuration needed.

# Get Hacker News top stories
browserwing run hackernews-top

# Get GitHub trending repos
browserwing run github-trending

# Get Bilibili hot videos
browserwing run bilibili-hot

# Search with parameters
browserwing run jd-search --keyword="mechanical keyboard"

# Output as table or CSV
browserwing run douban-movie-hot --format=table
browserwing run sinafinance-rank --format=csv > stocks.csv

Discover Scripts

# List all scripts
browserwing ls

# List only built-in scripts
browserwing ls --builtin

# Search by keyword
browserwing ls --search=stock

# Filter by category
browserwing ls --category=finance

# JSON output for programmatic use
browserwing ls --format=json

Built-in Script Categories

CategoryExamples
Techhackernews-top, github-trending, producthunt-hot, devto-top, stackoverflow-hot
Financesinafinance-rank, eastmoney-hot, xueqiu-hot, binance-gainers, yahoo-finance-quote
News36kr-hot, toutiao-hot, bbc-news, bloomberg-rss, reuters-search
Socialweibo-hot, zhihu-hot, xiaohongshu-hot, reddit-popular, twitter-trending
Entertainmentbilibili-hot, douban-movie-hot, douyin-hot, imdb-trending, youtube-trending
Shoppingjd-search, taobao-search, amazon-bestsellers, xianyu-hot, smzdm-hot
Academicarxiv-new, google-scholar-search, baidu-scholar-search, cnki-search
Jobsboss-recommend, linkedin-jobs
Readingdouban-book-hot, weread-ranking, substack-feed, lesswrong-curated
Governmentgov-policy, gov-law

Direct Browser Control

Control Chrome directly from the CLI — navigate, click, type, extract, screenshot, and more.

Workflow: Navigate → Snapshot → Interact → Extract

# 1. Navigate to a page
browserwing exec navigate https://example.com

# 2. Get page structure with element RefIDs
browserwing exec snapshot

# Output shows elements like:
#   @e1 - Search (textbox)
#   @e2 - Login (button)
#   @e3 - Sign Up (link)

# 3. Interact using RefIDs
browserwing exec type @e1 "search query"
browserwing exec press-key Enter
browserwing exec click @e2

# 4. Extract data
browserwing exec extract ".result-item" --fields=text,href --multiple

# 5. Take screenshot
browserwing exec screenshot --output=page.png

# 6. Execute JavaScript
browserwing exec eval 'document.title'

All exec Actions

ActionUsageDescription
navigateexec navigate <url>Open a URL
snapshotexec snapshotGet accessibility tree with @e refs
clickexec click <@ref>Click element
typeexec type <@ref> <text>Type into input
extractexec extract <selector> [--fields=text,href] [--multiple]Extract data
waitexec wait <@ref> [--state=visible] [--timeout=10]Wait for element
press-keyexec press-key <Enter|Tab|Escape>Press key
screenshotexec screenshot [--output=file.png]Take screenshot
evalexec eval '<js>'Run JavaScript
fill-formexec fill-form --field=name:value [--submit]Fill form fields
tabsexec tabs <list|new|switch|close>Manage tabs
selectexec select <@ref> <value>Select dropdown
hoverexec hover <@ref>Hover element
scrollexec scrollScroll to bottom
back/forwardexec back, exec forwardNavigation history
page-infoexec page-infoGet URL and title

Admin Operations

Manage LLM configs, browser instances, cookies, scripts, and AI exploration — all from the CLI.

# LLM Configuration
browserwing config list
browserwing config add --name=gpt --provider=openai --api-key=sk-xxx --model=gpt-4o
browserwing config test gpt

# Browser Instances
browserwing browser list
browserwing browser start
browserwing browser stop

# Cookie Management
browserwing cookie list
browserwing cookie save
browserwing cookie import cookies.json

# Script Management
browserwing script get <id>
browserwing script create my-script.json
browserwing script delete <id>
browserwing script export --output=SKILL.md
browserwing script summary

# AI Exploration (auto-generate scripts)
browserwing explore start --url=https://example.com --task="find top products"
browserwing explore script <session-id>
browserwing explore save <session-id>

# System
browserwing health
browserwing doctor
browserwing mcp status

Output Formats

All commands support --format=json|table|csv:

browserwing run zhihu-hot --format=json    # default, for piping
browserwing run zhihu-hot --format=table   # human-readable
browserwing run zhihu-hot --format=csv     # spreadsheet-friendly

Agent Integration Pattern

Typical AI agent workflow:

# Step 1: Discover available scripts
browserwing ls --format=json

# Step 2: Find relevant script
browserwing ls --search=stock --format=json

# Step 3: Execute and get structured data
browserwing run sinafinance-rank

# Step 4: Pipe to other tools
browserwing run hackernews-top | jq '.[0:5]'

For direct browser control:

# Step 1: Open page
browserwing exec navigate https://example.com

# Step 2: Understand page structure
browserwing exec snapshot

# Step 3: Interact
browserwing exec click @e3
browserwing exec type @e5 "query"

# Step 4: Extract results
browserwing exec extract ".item" --fields=text,href --multiple --format=json

CLI Exit Codes

CodeMeaning
0Success
1General error
2Cannot connect to server
3Script not found
4Script execution failed
64Bad arguments

Environment Variables

VariableDescription
BROWSERWING_URLServer URL (default: auto-detect from config.toml)
BROWSERWING_JSON_ERRORSSet to 1 for JSON error output on stderr

Links

AI Skill Finder

Ask me what skills you need

What are you building?

Tell me what you're working on and I'll find the best agent skills for you.