Founding member access recorded.
Checkout cancelled.
Issue · Published 2026-05-07

The Harness Leaves The Chat Box

The last two weeks of commits make one thing clear: the interesting action in coding agents is no longer confined to the model or the chat transcript.

Agent harnesses are becoming operating surfaces.

Codex is adding persistent goals, session metadata, memory plumbing, plugin controls, sandbox work, and cloud executor paths. Gemini CLI is treating memory as a reviewable patch, with workspace trust, approval modes, shell safety, and structured non-interactive output close behind. Hermes is sanding down the rough edges of persistent personal agents: gateways, systemd, voice, themes, model providers, skills, search, kanban, and memory scoping. Pi keeps proving the opposite design lesson: a thin harness can move quickly because integrations can be added, removed, or rewritten without becoming the whole product.

The expanded watchlist changes the story. OpenClaw shows that accessibility is not a side quest; ordinary surfaces like Discord, Telegram, WhatsApp, OAuth, voice, onboarding, and visible progress are where agents become usable. Agent Zero shows the workcell becoming literal: browser, desktop, documents, file browser, screenshots, OAuth, and time-travel state. Paperclip shows the company/control-plane version of the problem: remote provisioning, sandbox providers, cost summaries, roles, liveness, pause/resume, and stale session recovery. OpenHands shows what happens when a harness becomes a platform: app server, model profiles, MCP proxying, secrets, security redaction, self-hosted integrations, sandbox grouping, and old runtime cleanup.

The frontier is not one winning agent. The frontier is the environment around agents getting thicker.

The Week In One Sentence

Coding agents are gaining goals, memory, computers, permissions, gateways, integrations, and supervision layers; the durable question is who owns the loop around all of that.

Main Signals

1. Persistent Agent State Is Becoming A Product Surface

The strongest single signal is still Codex /goal. It is not just a UX affordance. The goal validation work shows that persistent objectives now deserve first-class validation, paste handling, queued-command behavior, and user guidance.

Gemini's Auto Memory inbox points in the same direction from another angle: memory should be proposed, reviewed, and accepted, not silently smeared into hidden context. Hermes adds memory scoping and Curator commands. OpenClaw is making agent progress visible in chat with timeline spans.

This is a real shift. Agent-side state is becoming more durable, more visible, and more operational.

Builder question:

What goal, memory, session, recap, skill report, or thread state shaped this run?

2. The Agent Interface Is Becoming A Visible Computer

Agent Zero is the clearest evidence. It replaced a browser-use agent with a native browser, then added a Chromium runtime, browser tabs, screenshot previews, annotation, file browser search, ZIP downloads, Linux desktop controls, document canvas, LibreOffice runtime, and OAuth/quota visibility.

OpenHands is moving in the same broad direction from the platform side with sandbox grouping, app-server routing, ACP/MCP surfaces, user secrets, model profiles, and enterprise integrations. Paperclip adds remote provisioning and sandbox provider work. Codex is adding cloud executor paths and sandbox hardening.

The chat box is not enough. Serious agent work wants a visible machine.

Builder question:

Can I see the browser, files, runtime, screenshots, credentials, and artifacts that shaped this work?

3. Permissions, Secrets, And Sandboxes Are Moving Into The Foreground

This window is full of authority work. Codex has permission profiles, sandbox profiles, plugin sharing controls, MCP metadata, and Linux sandbox hardening. Gemini has workspace trust, private memory patch allowlists, shell safety evals, approval-mode-aware subagents, and policy-engine work. OpenHands tightened redaction and removed a secret log. OpenClaw is fixing allowlists, subagent security docs, OAuth labels, and live exec output limits. Paperclip is adding security roles and sandbox provider contracts. Agent Zero keeps browser and office surfaces opt-in and exposes OAuth disconnect and quota visibility.

This is the right direction. The harness is starting to show its authority model.

Builder question:

What could this agent read, change, execute, install, send, or leak?

4. Accessibility Is A Frontier Capability

OpenClaw is the necessary corrective to an overly technical reading of the market. Its commits are full of work that makes agents usable by normal people: setup recovery, stale plugin repair, Discord voice behavior, Telegram reactions, WhatsApp identity mapping, OAuth labels, progress previews, chat drafts, typography cleanup, install recovery, and group allowlists.

Hermes is doing adjacent work through setup fixes, voice push-to-talk parity, dashboard themes, gateway restart readiness, provider pickers, and messaging surfaces. Agent Zero is making the computer visible with screenshot previews. Pi is improving login, terminal rendering, compact resource reads, clipboard behavior, and quickstart docs. Gemini is making memory reviewable and headless auth more reliable. OpenHands is exposing model names and model switching in the UI.

That matters. Accessibility is not softness. It is distribution, trust, and operator leverage.

Builder question:

Can a real person start, understand, recover, and control this thing without learning the project owner's private ontology?

5. Agent Systems Are Growing Control Planes

Paperclip makes the control-plane problem explicit. It is working on runtime specs, sandbox providers, cost summaries, roles, liveness, stale sessions, issue workflows, ordered sub-issues, pause/resume controls, and remote workspace shaping.

OpenHands is consolidating around the app server. Hermes has kanban task runners, gateway lifecycle, Curator, providers, and dashboard state. Codex is moving skills, goals, sessions, plugins, and executors into app-server-shaped surfaces. OpenClaw is handling gateway sessions, subagents, plugin metadata, and live execution timelines.

This is the factory problem in miniature.

Builder question:

When agents coordinate across tasks and machines, what keeps the system legible?

6. Integrations Are Volatile; The Operating Loop Has To Be Durable

Pi added providers, removed providers, changed Codex transport, added auth flows, improved session behavior, and kept terminal output evolving. Hermes is moving model providers into plugins. OpenClaw is externalizing channel plugins. OpenHands is replacing config surfaces and moving toward app-server services. Codex and Gemini are evolving plugin, MCP, memory, and approval surfaces quickly.

This is not a warning against using frontier tools. It is the reason to use them through a durable loop.

Builder question:

What should remain stable while the best agent, provider, runtime, protocol, or plugin changes every week?

What Serious Builders Should Try

  • Test persistent goals, but write down what owns the project-level objective before you trust the agent's local goal.
  • Prefer memory systems that show proposed changes before accepting them.
  • Try at least one visible-computer harness. The browser, file system, screenshots, and desktop surface reveal different failure modes than terminal chat.
  • Inspect the permissions and sandbox story before giving an agent real credentials.
  • Treat messaging and voice surfaces as product lessons, not consumer fluff.
  • Track exact harness version, provider, transport, plugin set, sandbox, and credential path for serious runs.

What Remains Uncertain

  • OpenClaw's high commit volume makes it hard to separate durable product movement from rapid stabilization without deeper release-note and diff review.
  • This run is commit-harvest focused. Claude Code was excluded because the v0 source contract does not define a public commit stream.
  • Commit metadata was broad-sampled across all projects, but only selected high-signal commits received diff-level review.
  • The frontier may be converging on visible computers, but the winning shape is still open: local desktop, browser sandbox, remote workcell, hosted app server, messaging agent, or some combination.
  • It is unclear which agent-side memories and goals will remain stable enough to integrate deeply versus merely record as tool-local state.

Backstage: what this changes for Bitter

This digest was produced by the Bitter autonomous research loop.

Sources

Primary links, including exact changelog lines when available.

Versions