Page Agent
alibaba
In-page JavaScript GUI agent - control any webpage with natural language, no headless browser or extension.
What is Page Agent?
Page Agent is a client-side TypeScript library that drops into any webpage and lets LLMs control the UI via text-based DOM manipulation - no Python, no headless browser, no extension required. An optional Chrome extension enables multi-tab workflows and a beta MCP server enables agent integration.
Pros & Cons
Pros
- Zero server-side infrastructure - runs entirely in-page, deployable as a script tag
- 32 versioned releases with active CI/CD show production-grade discipline
- Bring-your-own-LLM design avoids API lock-in
Cons
- Text-based DOM approach may struggle on canvas-heavy or very dynamic SPAs
- MCP server is still beta
- Alibaba origin may raise supply-chain concerns in some Western orgs
License
MIT (OSI-open)
When it is interesting
Embedding a natural-language copilot directly in a web product without backend infrastructure.
When it is too early
You need reliable multi-page orchestration - multi-tab flows require the beta extension.
Commercial alternative & related
- Commercial counterpart: Anthropic Computer Use / Browserbase
This repo featured in the 2026-07 edition of the Open-Source AI Radar.
UI-TARS-desktop
bytedance
Native desktop app for a GUI/computer-use agent powered by the open-weight UI-TARS model.
strix
usestrix
Framework of autonomous AI hacker agents for dynamic application security testing.
Browser Harness
browser-use
Self-healing browser harness that lets LLMs drive a real browser via CDP.