Profiles · agent0ai
Agent Zero
Operator Stance · as of 2026-06-03
- Use it for
- Work where the agent actually needs a desktop — a real browser, a LibreOffice session, a terminal that remembers what it did. Operators trying to figure out whether giving an agent a full computer is more useful or more dangerous than a tool-only sandbox.
- Avoid it for
- Pipelines downstream of the agent that expect OOXML by default — v1.13+ writes ODF unless you configure otherwise. UI automation that depends on coordinate clicks: the agent is now told to prefer named actions and reach for coordinates last.
- Watch next
- What lifecycle policies emerge for the persistent Xpra desktop (timeouts, storage caps, idle cleanup), and whether 'agent with a real computer' stabilizes as a competitive position or fragments back into tool-by-tool.
Active Claims
- Native Browser Playwright · verified 2026-05-07
- Linux Desktop Skill Controls · verified 2026-05-07
- Oauth Quota Visibility · verified 2026-05-07
- Browser Multi Tab Parallel Fanout · verified 2026-05-12
- Odf First Document Defaults · verified 2026-05-12
- Persistent Desktop Lifecycle · verified 2026-05-12
- Structured Actions Over Coordinates · verified 2026-05-12
- Host Computer Use Remote · verified 2026-05-27
- Vision Verification Required · verified 2026-05-27
- Platform Native Structural Targeting · verified 2026-05-27
- Ephemeral Capture Default · verified 2026-05-27
- Screenshot Durable Storage Reversal · verified 2026-06-03
Agent Zero
Operator Read
Agent Zero is the most complete "visible computer" in the watchlist — and as of v1.17 (2026-05-23), the visible computer extends beyond the container to the operator's actual host machine, with required visual verification on state-changing actions. The operator decision is no longer just "give the agent a desktop?" but "give the agent which desktop — internal Xpra, host machine, or both?" with each routed through cleanly separated paths. The bet remains governance through visibility, now with a runtime-enforced screenshot loop instead of trusting tool outputs.
When A Real Desktop Earns Its Keep
Use Agent Zero when the work actually needs a full computer.
The Playwright-powered browser
runs a persistent Chromium with live WebUI viewer, screencast streaming,
tab management, and Chrome extension support — including stale-context
recovery that restarts the Playwright instance cleanly
when a cached context is detected as closed. The
multi-tab fanout
auto-registers tabs opened by sites and runs a multi action that reads or
mutates across tabs in a single tool call with parallel execution. The
LibreOffice virtual desktop
opens DOCX, XLSX, and PPTX in full sessions over Xpra/XFCE; the legacy
Collabora/WOPI runtime is gone.
The Linux Desktop skill
teaches Agent Zero to operate XFCE — app launch, focus, click, cell edit,
stable folder entry points — and tells the agent to prefer structured,
app-native, keyboard actions and treat positional clicks as last resort. If
your UI-automation pipeline relies on coordinate clicks, expect a behavior
shift: cell_edit(B3, 42) is the path now, not click(x=423, y=187).
The Persistence Trade
The desktop session is persistent across canvas and modal navigation: a single Xpra iframe stays alive, with explicit shutdown distinguished from crashes via a "Shutdown Desktop" launcher that requires confirmation. Unsafe affordances (logout, lock, switch-user) are hidden. The accessibility win: operators can watch agent work in a real environment without losing state on every navigation. The trade: accumulated state — browser sessions, temporary files, LibreOffice locks, open applications — is the operator's problem. There's no automatic session reset, no documented idle cleanup, no storage cap. Plan for manual cleanup or build it.
Open-Format Default
Verify your downstream tooling handles ODT before upgrading to v1.13+. Document artifacts now default to ODF formats (ODT/ODS/ODP); OOXML (DOCX/XLSX/PPTX) is available but requires explicit opt-in. Pipelines expecting Word/Excel/PowerPoint output silently flowing through will break. This is the trend across the watchlist made local: safe-and-open by default, proprietary requires the operator to ask.
Host Desktop With Vision Verification
v1.17
(2026-05-23) exposes computer_use_remote as a callable tool that
controls the operator's host desktop — outside the
Docker/Xpra container — using platform-native structural targeting:
macOS via Accessibility (AX) with ax_snapshot / ax_action,
Windows via UIA, Linux via AT-SPI / Wayland. The category move sits
in the runtime check: every state-changing action is treated as
unverified until a fresh screenshot visibly confirms the outcome.
Agents must stop when no screenshot is available. Screenshots
return as multimodal vision messages, not text summaries.
The internal Docker/Xpra desktop continues to be controlled by the
linux-desktop skill; the host path and container path are cleanly
separated. macOS approval denials route to a re-arm-required stop
flow rather than silent retry. Operators evaluating host control
must decide whether computer_use_remote is permitted on their host
at all — the trust mode is opt-in, but the runtime checks are
enforceable once enabled.
v1.16
made screenshot capture ephemeral and context-scoped by default:
captures route through in-process image refs rather than disk, so
the agent no longer leaves screenshot trails on the filesystem by
default. Explicit user-initiated screenshots remain durable. The
tradeoff: host-action audit evidence now lives in the model context,
not on disk — operators wanting durable evidence must enable
explicit capture. v1.16 also split speech into independent built-in
plugins (_kokoro_tts, _whisper_stt) — legacy speech APIs were
removed (breaking) — and renamed document_artifact to
office_artifact with shims dropped.
v1.18
added a configurable max_active_skills cap, skill visibility
controls (hide skills from the model-facing catalog), and an MCP
multimodal content handling fix.
Container Reality
Agent Zero is a Docker-deep install. Browser, desktop, LibreOffice all run
inside a long-lived container. The WebUI makes the agent visible; getting
the container set up is the friction. Two operational details to know:
OAuth settings expose
account disconnect and remaining quota visibility
for OpenAI/ChatGPT OAuth (users see Codex usage quota and reset timing), and
PTY master descriptors for terminal sessions are now
properly closed on exit,
preventing /dev/ptmx exhaustion under sustained use.
Posture basis: 2026-05-07-agent-zero-full-computer-workcell,
2026-05-12-agent-zero-browser-multitab-and-document-formats,
2026-05-27-agent-zero-host-desktop-with-vision-verification.
Open Questions
- Where does host-action audit evidence land under ephemeral capture? Operators cannot inspect on-disk caches by default to verify what the agent saw on the host. Is the answer in-process model context only, or is there a structured audit trail elsewhere?
- v1.17's "agents must stop when a screenshot is unavailable" is described as a runtime check, but the release notes do not fully distinguish whether the rule is enforced at the model-prompt level or at the tool-runtime return-shape level. Worth a v1.17 commit probe.
- When both host and container desktops are available, routing-by-rank is documented but not enforcement. How reliably does the agent pick the right path under prompt pressure?
- Is the "prefer structured over coordinate clicks" guidance enforced at the runtime level, or is it agent-level instruction that a model can ignore? What happens in practice when a structured action is unavailable?
- Is there a session timeout, idle cleanup, or storage limit for persistent Xpra desktop state? Or does the operator manage cleanup entirely manually?
- The
multibrowser action fans out across tabs. Are the parallel executions isolated per tab, or do they share Playwright context state? - ODF is now the default output format. Are Agent Zero's downstream integrations (file browser, Memory, Projects, ZIP download) fully ODF-aware?
- The Linux Desktop skill provides stable entry points for Workdir, Projects, Skills, Agents, and Downloads. How do these map to the underlying Docker container filesystem, and what persists across container restarts?
What To Watch Next
- Whether host computer-use evolves toward per-app, per-tool, or per-domain gating beyond the current opt-in / vision-verification defaults — operators with mixed-trust applications on the host need finer authority.
- How the ephemeral-capture default coexists with audit requirements in enterprise deployments. The current setup is a privacy win; it may need an "evidence retention" knob in regulated environments.
- Whether ODF-first generates integration friction with downstream tools (e.g., GitHub attachments, email clients, or workflows expecting DOCX).
- State management for persistent desktops: whether an automated cleanup path (session timeout, disk quota, reset-on-task-completion) ships in a future version.
- Whether the "structured over coordinate" guidance extends to the browser surface (form actions, element selectors) as a first-class constraint, or remains only in the Desktop skill.
- Custom tool creation and subagent spawning within a long-running desktop session: how tool proliferation is managed and what the cleanup contract is.
Featured in
- The Policy You Wrote Wasn't the Policy You Had · 2026-06-03
- Auto Stops Asking · 2026-05-27
- Governance Becomes Enforcement · 2026-05-12
- The Harness Leaves The Chat Box · 2026-05-07
Source contract: sources/agent-zero.yml · https://www.agent-zero.ai/
Profiles are maintained by the Bitter research loop.