Agent Zero

A real computer handed to an agent, watched by a forced screenshot loop.

Edited by Michael Ruescher / reviewed 2026-06-03

Operator Read

Agent Zero is the most complete "visible computer" in the watchlist -- and as of v1.17 (2026-05-23), the visible computer extends beyond the container to the operator's actual host machine, with required visual verification on state-changing actions. The operator decision is no longer just "give the agent a desktop?" but "give the agent which desktop -- internal Xpra, host machine, or both?" with each routed through cleanly separated paths. The bet remains governance through visibility, now with a runtime-enforced screenshot loop instead of trusting tool outputs.

Operator Stance / as of 2026-06-23

Use it for: Work where the agent actually needs a desktop -- a real browser, a LibreOffice session, a terminal that remembers what it did. Operators trying to figure out whether giving an agent a full computer is more useful or more dangerous than a tool-only sandbox.
Avoid it for: Pipelines downstream of the agent that expect OOXML by default -- v1.13+ writes ODF unless you configure otherwise. UI automation that depends on coordinate clicks: the agent is now told to prefer named actions and reach for coordinates last.
Watch next: What lifecycle policies emerge for the persistent Xpra desktop (timeouts, storage caps, idle cleanup), and whether 'agent with a real computer' stabilizes as a competitive position or fragments back into tool-by-tool.

When A Real Desktop Earns Its Keep

Use Agent Zero when the work actually needs a full computer. The Playwright-powered browser runs a persistent Chromium with live WebUI viewer, screencast streaming, tab management, and Chrome extension support -- including stale-context recovery that restarts the Playwright instance cleanly when a cached context is detected as closed. The multi-tab fanout auto-registers tabs opened by sites and runs a multi action that reads or mutates across tabs in a single tool call with parallel execution. The LibreOffice virtual desktop opens DOCX, XLSX, and PPTX in full sessions over Xpra/XFCE; the legacy Collabora/WOPI runtime is gone.

The Linux Desktop skill teaches Agent Zero to operate XFCE -- app launch, focus, click, cell edit, stable folder entry points -- and tells the agent to prefer structured, app-native, keyboard actions and treat positional clicks as last resort. If your UI-automation pipeline relies on coordinate clicks, expect a behavior shift: cell_edit(B3, 42) is the path now, not click(x=423, y=187).

The Persistence Trade

The desktop session is persistent across canvas and modal navigation: a single Xpra iframe stays alive, with explicit shutdown distinguished from crashes via a "Shutdown Desktop" launcher that requires confirmation. Unsafe affordances (logout, lock, switch-user) are hidden. The accessibility win: operators can watch agent work in a real environment without losing state on every navigation. The trade: accumulated state -- browser sessions, temporary files, LibreOffice locks, open applications -- is the operator's problem. There's no automatic session reset, no documented idle cleanup, no storage cap. Plan for manual cleanup or build it.

Open-Format Default

Verify your downstream tooling handles ODT before upgrading to v1.13+. Document artifacts now default to ODF formats (ODT/ODS/ODP); OOXML (DOCX/XLSX/PPTX) is available but requires explicit opt-in. Pipelines expecting Word/Excel/PowerPoint output silently flowing through will break. This is the trend across the watchlist made local: safe-and-open by default, proprietary requires the operator to ask.

Host Desktop With Vision Verification

v1.17 (2026-05-23) exposes computer_use_remote as a callable tool that controls the operator's host desktop -- outside the Docker/Xpra container -- using platform-native structural targeting: macOS via Accessibility (AX) with ax_snapshot / ax_action, Windows via UIA, Linux via AT-SPI / Wayland. The category move sits in the runtime check: every state-changing action is treated as unverified until a fresh screenshot visibly confirms the outcome. Agents must stop when no screenshot is available. Screenshots return as multimodal vision messages, not text summaries.

The internal Docker/Xpra desktop continues to be controlled by the linux-desktop skill; the host path and container path are cleanly separated. macOS approval denials route to a re-arm-required stop flow rather than silent retry. Operators evaluating host control must decide whether computer_use_remote is permitted on their host at all -- the trust mode is opt-in, but the runtime checks are enforceable once enabled.

v1.16 made screenshot capture ephemeral and context-scoped by default: captures route through in-process image refs rather than disk, so the agent no longer leaves screenshot trails on the filesystem by default. Explicit user-initiated screenshots remain durable. The tradeoff: host-action audit evidence now lives in the model context, not on disk -- operators wanting durable evidence must enable explicit capture. v1.16 also split speech into independent built-in plugins (_kokoro_tts, _whisper_stt) -- legacy speech APIs were removed (breaking) -- and renamed document_artifact to office_artifact with shims dropped. v1.18 added a configurable max_active_skills cap, skill visibility controls (hide skills from the model-facing catalog), and an MCP multimodal content handling fix.

Container Reality

Agent Zero is a Docker-deep install. Browser, desktop, LibreOffice all run inside a long-lived container. The WebUI makes the agent visible; getting the container set up is the friction. Two operational details to know: OAuth settings expose account disconnect and remaining quota visibility for OpenAI/ChatGPT OAuth (users see Codex usage quota and reset timing), and PTY master descriptors for terminal sessions are now properly closed on exit, preventing /dev/ptmx exhaustion under sustained use.

Posture basis: 2026-05-07-agent-zero-full-computer-workcell, 2026-05-12-agent-zero-browser-multitab-and-document-formats, 2026-05-27-agent-zero-host-desktop-with-vision-verification.

Open Questions

Where does host-action audit evidence land under ephemeral capture? Operators cannot inspect on-disk caches by default to verify what the agent saw on the host. Is the answer in-process model context only, or is there a structured audit trail elsewhere?
v1.17's "agents must stop when a screenshot is unavailable" is described as a runtime check, but the release notes do not fully distinguish whether the rule is enforced at the model-prompt level or at the tool-runtime return-shape level. Worth a v1.17 commit probe.
When both host and container desktops are available, routing-by-rank is documented but not enforcement. How reliably does the agent pick the right path under prompt pressure?
Is the "prefer structured over coordinate clicks" guidance enforced at the runtime level, or is it agent-level instruction that a model can ignore? What happens in practice when a structured action is unavailable?
Is there a session timeout, idle cleanup, or storage limit for persistent Xpra desktop state? Or does the operator manage cleanup entirely manually?
The multi browser action fans out across tabs. Are the parallel executions isolated per tab, or do they share Playwright context state?
ODF is now the default output format. Are Agent Zero's downstream integrations (file browser, Memory, Projects, ZIP download) fully ODF-aware?
The Linux Desktop skill provides stable entry points for Workdir, Projects, Skills, Agents, and Downloads. How do these map to the underlying Docker container filesystem, and what persists across container restarts?

What To Watch Next

Whether host computer-use evolves toward per-app, per-tool, or per-domain gating beyond the current opt-in / vision-verification defaults -- operators with mixed-trust applications on the host need finer authority.
How the ephemeral-capture default coexists with audit requirements in enterprise deployments. The current setup is a privacy win; it may need an "evidence retention" knob in regulated environments.
Whether ODF-first generates integration friction with downstream tools (e.g., GitHub attachments, email clients, or workflows expecting DOCX).
State management for persistent desktops: whether an automated cleanup path (session timeout, disk quota, reset-on-task-completion) ships in a future version.
Whether the "structured over coordinate" guidance extends to the browser surface (form actions, element selectors) as a first-class constraint, or remains only in the Desktop skill.
Custom tool creation and subagent spawning within a long-running desktop session: how tool proliferation is managed and what the cleanup contract is.

Harvest Notes

2026-06-16 .. 2026-06-23 -- silent release channel. No commit reached the default branch in-window; the latest tag remains v1.20 (2026-06-04, pre-window). The current-state claims above are unchanged and re-verified as still standing. Caveat for next cycle: 23 commits landed in-window on a non-default ready staging branch (none merged to default, none tagged) -- Agent Zero is quiet in what operators run, not necessarily quiet as a project. The standing persistent-desktop lifecycle question (timeouts, storage caps, idle cleanup) was not advanced this cycle.

Verification

open source commits / evidence floor: release note / updated 2026-06-23

Verified claims

Agent Zero: The Workcell Is Becoming A Visible Computer / verified 2026-05-07
Agent Zero v1.11--v1.13: Visible Computer, ODF Documents, and Persistent Desktop / verified 2026-05-12
Agent Zero: Host Desktop Control With Required Visual Verification / verified 2026-05-27
Screenshot artifacts reverted to durable chat-scoped storage / verified 2026-06-03

Featured in

Foreground Attention Is No Longer the Control / 2026-07-02
Patched for Whom / 2026-07-01
Governance, Sold Separately / 2026-06-24
Protected on Paper / 2026-06-23
Who's Allowed to Say Yes / 2026-06-16
The Policy You Wrote Wasn't the Policy You Had / 2026-06-03
Auto Stops Asking / 2026-05-27
Governance Becomes Enforcement / 2026-05-12
The Harness Leaves The Chat Box / 2026-05-07

Source policy: what Frontier watches and accepts as evidence

Edited and maintained by Bitter Frontier.

View source on GitHub