Issue / Published 2026-05-07

The Harness Leaves The Chat Box

Edited by Michael Ruescher / revised 2026-07-12

Operator Brief

The action in coding agents has left the model and the transcript. Two weeks of commits across eight projects are about goals, memory, visible computers, permissions, gateways, and supervision layers -- the environment around the agent getting thicker -- and the four sources new to this read (OpenClaw, Agent Zero, Paperclip, OpenHands) each show a different wall of the same building. The durable question is who owns the loop around all of it.

Try: Run at least one visible-computer harness. Agent Zero's browser, file browser, screenshots, and desktop surface expose failure modes terminal chat hides. Signal
Prefer memory that asks first: Gemini's Auto Memory inbox proposes changes for review instead of writing them silently. Signal
Read the permissions and sandbox story before an agent touches real credentials -- this fortnight shows who is actually doing that work. Signal
Watch: Which visible-computer shape wins: local desktop, browser sandbox, remote workcell, hosted app server, messaging agent, or a mix. Signal
Whether agent-company control planes (Paperclip's costs, roles, liveness, pause/resume) keep multi-agent systems legible as they scale. Signal
Uncertain: OpenClaw's commit volume makes it hard to separate durable product movement from rapid stabilization without deeper release review. Signal
Which agent-side memories and goals will be stable enough to integrate deeply, versus merely record as tool-local state. Signal

Two weeks ago, Agent Zero fired the agent that used a browser and gave the agent a browser of its own. It replaced a browser-use module with a native browser, then added a Chromium runtime, tabs, screenshot previews, a searchable file browser, Linux desktop controls, a document canvas, a LibreOffice runtime, and OAuth and quota visibility. The "workcell" stopped being a metaphor. The agent has a computer now, and the operator can watch it work.

That is the loudest version of what every commit stream on this expanded watchlist said in the same fortnight: the interesting action in coding agents is no longer confined to the model or the chat transcript. Codex is adding persistent goals, session metadata, plugin controls, and cloud executor paths. Gemini CLI is treating memory as a reviewable patch. Hermes is sanding the rough edges off persistent personal agents. Pi keeps proving the opposite lesson -- a thin harness moves fast precisely because its integrations are disposable. And the four projects new to this read each expose a different wall of the same building: OpenClaw the front door (messaging surfaces, onboarding, visible progress), Agent Zero the machine room, Paperclip the management floor, OpenHands the whole leased office. The frontier is not one winning agent. It is the environment around agents getting thicker, and the durable question is who owns the loop around all of it.

State becomes product

The strongest single signal is still Codex /goal, and the telling part is not the feature but the follow-through: goal validation, paste handling, queued-command behavior, user guidance. When a persistent objective earns that much plumbing, it has stopped being a UX affordance and become operating state. Gemini's Auto Memory inbox makes the same point from the other side, and makes it better than anyone: memory should be proposed, reviewed, and accepted, not silently smeared into hidden context. Hermes added memory scoping and Curator commands; OpenClaw put agent progress into the chat itself with timeline spans. Agent-side state is becoming durable, visible, and operational -- which means a serious run now has to be able to answer what goal, memory, session, or thread state shaped it.

The visible computer

Agent Zero's browser-and-desktop build-out leads this thread, but the platform side is converging on it too. OpenHands is grouping execution into sandbox groups with app-server routing, user secrets, and model profiles behind it. Paperclip is doing remote provisioning and sandbox-provider work. Codex is building cloud executor paths and hardening its sandbox. The chat box is not enough for serious agent work, and the projects that understand that are racing to show the operator the actual machine: the browser, the files, the runtime, the screenshots, the credentials, the artifacts.

The authority model comes to the foreground

This window is full of permissions work, and the spread is the story. Codex shipped permission profiles, sandbox profiles, plugin sharing controls, and Linux sandbox hardening. Gemini added workspace trust, private memory-patch allowlists, shell-safety evals, and approval-mode-aware subagents. OpenHands tightened redaction and deleted a log that had been recording secrets. OpenClaw fixed allowlists, subagent security docs, OAuth labels, and live exec output limits. Paperclip added security roles and sandbox-provider contracts; Agent Zero keeps its browser and office surfaces opt-in and exposes OAuth disconnect. The harness is starting to show its authority model, which is the right direction -- and the operator's question is finally answerable in some of these tools: what could this agent read, change, execute, install, send, or leak?

Accessibility is a frontier capability

OpenClaw is the corrective to an overly technical reading of this market. Its fortnight is setup recovery, stale plugin repair, Discord voice behavior, Telegram reactions, WhatsApp identity mapping, OAuth labels, progress previews, chat drafts, install recovery, and group allowlists -- work whose only purpose is letting a normal person start, understand, recover, and control an agent without learning the project's private ontology. Hermes is doing the adjacent work: setup fixes, voice push-to-talk parity, gateway restart readiness, provider pickers. Agent Zero's screenshot previews make the computer legible; Pi's quickstart and terminal work lower the floor; Gemini's reviewable memory and headless auth, and OpenHands' visible model names, do the same from their corners. None of this is softness. Accessibility is distribution, trust, and operator leverage, and the projects treating it as real engineering are buying something the benchmark chasers are not.

The control plane arrives

Paperclip makes the management problem explicit: runtime specs, sandbox providers, cost summaries, roles, liveness, stale-session recovery, ordered sub-issues, pause and resume. OpenHands is consolidating around its app server; Hermes runs kanban task workers, gateway lifecycle, Curator, and providers under a dashboard; Codex is reshaping skills, goals, sessions, and executors into app-server-shaped surfaces; OpenClaw manages gateway sessions, subagents, and plugin metadata. This is the factory problem in miniature: once agents coordinate across tasks and machines, something has to keep the system legible, and that something is becoming a product layer of its own.

Integrations are weather

Pi added providers, removed providers, and changed its Codex transport inside a single window. Hermes is moving model providers into plugins; OpenClaw is externalizing channel plugins; OpenHands is replacing config surfaces with app-server services; Codex and Gemini rework plugin, MCP, memory, and approval surfaces weekly. This is not a reason to avoid frontier tools. It is the reason to hold them through a loop that stays stable -- objective, permissions, execution environment, evidence, review, memory -- while the best agent, provider, runtime, and plugin change under it every week.

That loop is the fortnight's real subject. Every project above is building a piece of it inside its own walls. The operator who wants to switch walls without losing the work keeps the loop outside.

How this was read: this is a commit-harvest window -- commit metadata was broad-sampled across all eight projects, with diff-level review only on selected high-signal commits. Claude Code is absent because its v0 source contract defines no public commit stream. OpenClaw's high commit volume means its durable product movement is the hardest to separate from rapid stabilization; that caveat stands until a release-note review.

Revised 2026-07-02 (artifact_version 4): editorial pass to the current house standard -- lede, structure, and operator brief. Claims, receipts, and window judgments are unchanged from the 2026-05-07 publication.

Top signals from this issue

Projects reviewed in this research run

Codex Gemini CLI Hermes Agent Pi Coding Agent OpenClaw Paperclip Agent Zero OpenHands

Research artifacts and publication history are open in the repository.

View source on GitHub

Sources

Primary links, including exact changelog lines when available.

Versions

published v3

The Harness Leaves The Chat Box

6 signals / 2026-04-23 to 2026-05-07