UI-TARS-desktop
bytedance
Native desktop app for a GUI/computer-use agent powered by the open-weight UI-TARS model.
What is UI-TARS-desktop?
A native desktop app (Windows/macOS, plus a browser build) for a GUI / computer-use agent: it takes a screenshot, a vision-language model reads the interface, and the agent drives mouse and keyboard from a natural-language instruction. It is powered by the open-weight UI-TARS model (e.g. UI-TARS-1.5-7B, run locally) or ByteDance's Seed series, and ships alongside Agent TARS, an MCP-based CLI/web sibling that works with any provider.
Pros & Cons
Pros
- Both the app and the base model (UI-TARS-1.5-7B) are real Apache-2.0 - commercially free and fully self-hostable
- Large, active community (36k+ stars) with a peer-reviewed paper behind it, and cross-platform
- Flexible: run local or cloud, plus the Agent TARS stack with MCP and free provider choice
Cons
- The open 7B model is Apache-2.0, but the strongest models (Doubao-1.5-UI-TARS, Seed-1.5-VL) are proprietary and paid via ByteDance's VolcEngine API - top performance means cloud lock-in, and the free remote operator was discontinued in August 2025
- Computer use is inherently risky: an agent with full mouse/keyboard/browser control is exposed to prompt injection and misclicks - run it in a sandbox or VM
- Pre-1.0 (v0.3.0) with 403 open issues; local hardware requirements for the 7B model are not documented
License
Apache-2.0 (OSI-open)
App and open model both Apache-2.0 (OSI-open) - but the highest-performing models are a proprietary, paid cloud backend.
When it is interesting
An open, self-hostable computer-use agent for automation experiments.
When it is too early
Unsandboxed or production use, or if you need the top models without a paid VolcEngine plan.
Commercial alternative & related
- Commercial counterpart: Claude Computer Use / OpenAI Operator
This repo featured in the 2026-06 edition of the Open-Source AI Radar.
strix
usestrix
Framework of autonomous AI hacker agents for dynamic application security testing.
Page Agent
alibaba
In-page JavaScript GUI agent - control any webpage with natural language, no headless browser or extension.
Browser Harness
browser-use
Self-healing browser harness that lets LLMs drive a real browser via CDP.