Skip to content

Conversation

@dprevoznik
Copy link
Contributor

@dprevoznik dprevoznik commented Jan 15, 2026

Anthropic Computer Use Template Overhaul

This PR overhauls both the TypeScript and Python Anthropic Computer Use templates to use Kernel's Computer Controls API instead of Playwright for all browser interactions.

Why This Change

The previous implementation used Playwright directly, which required maintaining browser connections and handling lower-level browser automation. By migrating to Kernel's Computer Controls API, users get:

  • Native integration with Kernel's browser infrastructure
  • Built-in replay recording for debugging and auditing
  • Consistent API across all Kernel computer use templates
  • Simplified session management with automatic cleanup

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     Entry Point (index.ts / main.py)        │
│  - Defines the Kernel app and action                        │
│  - Creates browser session with KernelBrowserSession        │
│  - Invokes the sampling loop                                │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     Session Manager (session.ts / .py)      │
│  - Manages browser lifecycle (create/delete)                │
│  - Handles replay recording (start/stop/poll for URL)       │
│  - Configures viewport (1024x768 @ 60Hz)                    │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     Sampling Loop (loop.ts / .py)           │
│  - Implements the Anthropic prompt loop                     │
│  - Manages conversation history                             │
│  - Routes tool calls to ToolCollection                      │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     Tool Collection (tools/)                │
│  - ComputerTool: Mouse, keyboard, screenshots via Kernel    │
│  - Maps Anthropic actions to Kernel Computer Controls API   │
│  - Tracks last known mouse position for drag operations     │
└─────────────────────────────────────────────────────────────┘

File Structure (TypeScript)

anthropic-computer-use/
├── index.ts              # Entry point - defines Kernel app and cua-task action
├── session.ts            # KernelBrowserSession - manages browser lifecycle + replays
├── loop.ts               # Anthropic sampling loop with tool routing
├── tools/
│   ├── collection.ts     # ToolCollection - routes tool calls, manages versions
│   ├── computer.ts       # ComputerTool - implements all mouse/keyboard actions
│   ├── types/
│   │   └── computer.ts   # TypeScript types for actions and results
│   └── utils/
│       ├── keyboard.ts   # Key mapping utilities
│       └── validator.ts  # Coordinate validation
├── types/
│   └── beta.ts           # Anthropic beta API types
├── utils/
│   ├── message-processing.ts  # Prompt caching, image filtering
│   └── tool-results.ts        # Format tool results for API
└── README.md

The Python template follows the same structure with equivalent modules.

Key Components

1. KernelBrowserSession (session.ts / session.py)

Manages the browser lifecycle as a context manager:

const session = new KernelBrowserSession(kernel, {
  stealth: true,
  recordReplay: true,  // Optional: capture video replay
});

await session.start();
// ... use session.sessionId for computer controls
const info = await session.stop();
// info.replayViewUrl contains the video URL if recording was enabled

Features:

  • Automatic browser creation with configurable viewport (1024x768 @ 60Hz)
  • Optional replay recording with grace period before stopping
  • Polls for replay URL after stopping
  • Automatic cleanup on exit

2. ComputerTool (tools/computer.ts / tools/computer.py)

Maps Anthropic's computer use actions to Kernel's Computer Controls API:

Anthropic Action Kernel API
left_click, right_click, double_click computer.clickMouse()
mouse_move computer.moveMouse()
left_click_drag computer.dragMouse()
type computer.typeText()
key computer.pressKey()
scroll computer.scroll()
screenshot computer.captureScreenshot()

Key implementation details:

  • Maintains lastMousePosition to support drag operations from current position
  • Maps Anthropic key names to Kernel/xdotool format
  • Returns base64-encoded screenshots after each action
  • Supports both computer_use_20241022 and computer_use_20250124 API versions

3. Sampling Loop (loop.ts / loop.py)

Implements the Anthropic computer use prompt loop:

  • Sends messages to Claude with computer use tools
  • Processes tool calls and executes them via ToolCollection
  • Supports thinking mode with configurable budget
  • Handles prompt caching for efficiency

New Features

Replay Recording

Users can enable video replay recording by passing record_replay: true in the payload:

kernel invoke ts-anthropic-cua cua-task --payload '{"query": "...", "record_replay": true}'

The response includes a replay_url field with a link to view the recorded session.

Known Limitations

Cursor Position: The cursor_position action is not supported with Kernel's Computer Controls API. If the model attempts to use this action, an error is returned. This is a known limitation that does not significantly impact most workflows, as the model tracks cursor position through screenshots.

Testing

Both templates have been tested with the magnitasks.com Kanban board task, which exercises:

  • Navigation and clicking
  • Drag-and-drop (left_click_drag)
  • Multiple sequential actions

Updated Documentation

  • Template READMEs updated with setup, usage, replay recording, and limitations
  • QA command updated with new test task
  • CLI post-install instructions updated with new example

Note

Overhauls the Anthropic Computer Use templates to use Kernel’s Computer Controls API instead of Playwright, with built-in browser session management and optional replay recording.

  • Replace Playwright automation with Kernel controls in both TS (tools/computer.ts, loop.ts, session.ts, index.ts) and Python (tools/computer.py, loop.py, session.py, main.py) templates
  • Add KernelBrowserSession to manage browser lifecycle, live view, and replays (configurable viewport 1024x768@60Hz; stop/poll for replay_url)
  • Update sampling loops to construct ToolCollection with Kernel client and sessionId; handle thinking blocks and tool_use routing; enable prompt caching
  • Implement comprehensive key/mouse/scroll mappings to Kernel APIs; drop unsupported cursor_position; track last mouse position for drags; standardize typing delay and screenshot flow
  • Bump SDKs (@onkernel/sdk / kernel to 0.24.0); remove Playwright deps and code paths
  • Refresh READMEs with setup/usage and replay instructions; adjust QA/template invoke commands to magnitasks task and optional record_replay flag
  • Update template defaults in pkg/create/templates.go to new invoke payloads

Written by Cursor Bugbot for commit ac3baaa. This will update automatically on new commits. Configure here.

@dprevoznik
Copy link
Contributor Author

Verified the new templates work with recorded replays. Similar resulting video for python and typescript templates.

replays.3.mp4

@dprevoznik dprevoznik requested review from tnsardesai and removed request for tnsardesai January 15, 2026 16:20
Copy link
Contributor

@tembo tembo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice migration overall — the templates read a lot simpler without Playwright/CDP plumbing and the replay recording flow is a good addition.

Main things I called out inline:

  • Make session teardown a bit more robust (avoid leaving stale state around in TS; ensure Python cleanup always deletes the browser even if replay polling fails).
  • Small TS tool polish: remove now-unused import, and default scroll coordinates to the tracked mouse position instead of hardcoding (0,0).
  • Preserve unexpected Anthropic content blocks in _response_to_params to avoid silently dropping future/new block types.

None of these block the PR, just small hardening/maintainability tweaks.

@dprevoznik
Copy link
Contributor Author

working on tembo / cursor comments now

Copy link
Contributor

@rgarcia rgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good refactor from Playwright to Kernel Computer Controls API. The Python implementation looks solid. Main issues are in the TypeScript side:

  • Key mappings need to use X11 keysym names (Python has correct mappings to reference)
  • Minor naming/cleanup items

The existing comments from previous review cover the session cleanup and scroll behavior edge cases well.

@dprevoznik
Copy link
Contributor Author

Screenshot 2026-01-15 at 6 01 59 PM

@rgarcia all fixed and tested the templates. tagging in case you want to do another check

@dprevoznik dprevoznik requested a review from rgarcia January 15, 2026 23:02
Copy link
Contributor

@tnsardesai tnsardesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. should we consider adding an api to get cursor position?

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

tnsardesai and others added 8 commits January 16, 2026 13:19
Fix remaining TS items + update Python template for Anthropic CUA to utilize computer controls instead of Playwright. Still do to: optimize click location issues.
…8 viewport and Claude Sonnet 4.5

Updates both TypeScript and Python Anthropic Computer Use templates:

- Set viewport to 1024x768@60Hz (Anthropic recommended size)
- Update model to claude-sonnet-4-5-20250929
- Fix coordinate alignment between browser viewport and computer tool dimensions

Changes:
- pkg/templates/typescript/anthropic-computer-use/
  - tools/computer.ts: display_width_px=1024, display_height_px=768
  - session.ts: viewport 1024x768@60Hz
  - index.ts: model updated to claude-sonnet-4-5-20250929

- pkg/templates/python/anthropic-computer-use/
  - tools/computer.py: width=1024, height=768
  - session.py: viewport 1024x768@60Hz
  - main.py: model updated to claude-sonnet-4-5-20250929

Test replays (magnitasks.com Kanban drag test - moved 5 items to Done):
- TypeScript: https://proxy.iad-awesome-blackwell.onkernel.com:8443/browser/replays?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE4MDAwMTgyNTYsInNlc3Npb24iOnsiaWQiOiJmZDA3NGRxZjY5bnNlcjk4aDliNGtrb3giLCJjZHBQb3J0Ijo5MjIyLCJjZHBXc1BhdGgiOiIiLCJpbnN0YW5jZU5hbWUiOiJicm93c2VyLXN0ZWFsdGgtcHJvZHVjdGlvbi01LWFsbG93ZWQtaGFtbWVyaGVhZC00MjcxIiwiZnFkbiI6InF1aWV0LXRyZWUtM3kybnd6c2EucHJvZC1pYWQtdWtwLWJyb3dzZXJzLTAub25rZXJuZWwuYXBwIiwibWV0cm8iOiJodHRwczovL2FwaS5wcm9kLWlhZC11a3AtYnJvd3NlcnMtMC5vbmtlcm5lbC5ydW4vdjEiLCJ1c2VySWQiOiJ3ODdoNHd1dTRoazNmeHFyZW5iNzFrMnAiLCJvcmdJZCI6ImlxMnRmMjUzbWlsOWptOWhmZjI3bDhyMiIsInN0ZWFsdGgiOnRydWUsImhlYWRsZXNzIjpmYWxzZSwicmVwbGF5UHJlZml4IjoiczM6Ly9rZXJuZWwtYXBpLXByb2Qvc2Vzc2lvbnJlcGxheXMvaXEydGYyNTNtaWw5am05aGZmMjdsOHIyL2ZkMDc0ZHFmNjluc2VyOThoOWI0a2tveCIsImtlcm5lbEh0dHBTZXJ2ZXJQb3J0Ijo0NDQsInRpbWVvdXRTZWNvbmRzIjozMDAsImNyZWF0ZWRBdCI6IjIwMjYtMDEtMTVUMTM6MDQ6MTYuNzc2OTEwOTc5WiIsImltYWdlIjoib25rZXJuZWwva2VybmVsLWN1LXYyNTo5NmYzOGU0Iiwic3RlYWx0aFByb3h5SWRlbnRpZmllciI6Ijg3NTY1X25YREZGQDIxNi4yNDcuMTAyLjE1MDo2MTIzMiIsImxpdmVTbHVnIjoia3c5b0lBc1VzRkxlIiwicHJpdmF0ZUlQIjoiMTcyLjE2LjIuMjAxIiwidmlld3BvcnRXaWR0aCI6MTAyNCwidmlld3BvcnRIZWlnaHQiOjc2OCwidmlld3BvcnRSZWZyZXNoUmF0ZSI6NjB9fQ.GHE2BXg6qrtNMoqO6NvuJ9fbHTW15igfmXl7W-ls3Qg&replay_id=wipxrn813lmajv7ukdkuykoa
- Python: https://proxy.iad-awesome-blackwell.onkernel.com:8443/browser/replays?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE4MDAwMTc4OTUsInNlc3Npb24iOnsiaWQiOiJseTVxOXQxa3F6YXR3NzE1N3lpYzl2M3IiLCJjZHBQb3J0Ijo5MjIyLCJjZHBXc1BhdGgiOiIiLCJpbnN0YW5jZU5hbWUiOiJicm93c2VyLXN0ZWFsdGgtcHJvZHVjdGlvbi01LXJlYWwtd2F0Y2htZW4tNTUxNCIsImZxZG4iOiJ0d2lsaWdodC1ib25vYm8tZGFvZTd5ZngucHJvZC1pYWQtdWtwLWJyb3dzZXJzLTAub25rZXJuZWwuYXBwIiwibWV0cm8iOiJodHRwczovL2FwaS5wcm9kLWlhZC11a3AtYnJvd3NlcnMtMC5vbmtlcm5lbC5ydW4vdjEiLCJ1c2VySWQiOiJ3ODdoNHd1dTRoazNmeHFyZW5iNzFrMnAiLCJvcmdJZCI6ImlxMnRmMjUzbWlsOWptOWhmZjI3bDhyMiIsInN0ZWFsdGgiOnRydWUsImhlYWRsZXNzIjpmYWxzZSwicmVwbGF5UHJlZml4IjoiczM6Ly9rZXJuZWwtYXBpLXByb2Qvc2Vzc2lvbnJlcGxheXMvaXEydGYyNTNtaWw5am05aGZmMjdsOHIyL2x5NXE5dDFrcXphdHc3MTU3eWljOXYzciIsImtlcm5lbEh0dHBTZXJ2ZXJQb3J0Ijo0NDQsInRpbWVvdXRTZWNvbmRzIjozMDAsImNyZWF0ZWRBdCI6IjIwMjYtMDEtMTVUMTI6NTg6MTUuMzk0MjQyNTc3WiIsImltYWdlIjoib25rZXJuZWwva2VybmVsLWN1LXYyNTo5NmYzOGU0Iiwic3RlYWx0aFByb3h5SWRlbnRpZmllciI6Ijg3NTY1X25YREZGQDE0MC4yMzMuMjQ5LjE3NDo2MTIzNCIsImxpdmVTbHVnIjoiak5DdGdpdHRreGtrIiwicHJpdmF0ZUlQIjoiMTcyLjE2LjcuMTMzIiwidmlld3BvcnRXaWR0aCI6MTAyNCwidmlld3BvcnRIZWlnaHQiOjc2OCwidmlld3BvcnRSZWZyZXNoUmF0ZSI6NjB9fQ._AhzTu1HwawrWwDgo66K3FZkEh4dpiOEVPmBTO4A21A&replay_id=pa0ha28zodehf1e1jyv1qibn

Resolves KERNEL-725
Updated invokecommand example for the anthropic templates
…sistency

TypeScript template:
- Add xdotool-format key mappings for consistency with Python template
- Rename methods from convertToOnKernelKey to convertToKernelKey
- Fix scroll fallback to use lastMousePosition instead of [0, 0]
- Fix scroll amount using ?? operator to handle zero correctly
- Remove unused KeyboardUtils import
- Fix error message: "OnKernel" → "Kernel"
- Reset all state fields (liveViewUrl, replayViewUrl) on session stop
- Handle replay recording failures gracefully with try/catch

Python template:
- Wrap cleanup in try/finally to ensure browser deletion on errors
- Handle replay recording failures gracefully with try/except
- Preserve unexpected Anthropic content block types in loop
…otool behavior

Anthropic's reference implementation uses xdotool where each scroll_amount
unit equals one scroll wheel click (~120 pixels). Previously:
- TypeScript used the value directly
- Python used a 10x multiplier

Both now use 120x to match Anthropic's expected behavior for AI agents.
Wrap replay stopping logic in try/finally to ensure browser session is
always deleted even if stopReplay() fails. This prevents resource leaks
on the Kernel platform when replay recording is enabled and stopping fails.

Matches the existing Python implementation behavior.
@dprevoznik dprevoznik merged commit d8b2d2d into main Jan 16, 2026
2 checks passed
@dprevoznik dprevoznik deleted the kernel-computer-use branch January 16, 2026 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants