Research Version
The Harness Leaves The Chat Box
2026-05-07-commit-harvest-2026-04-23_2026-05-07-frontier-v1
- Status
- published
- Window
- 2026-04-23 to 2026-05-07
- Signals
- 6
Mode: manual_commit_harvest
Source contracts
Accepted signals from this run
- Codex Persistent agent state is becoming a product surface
- Agent Zero The agent interface is becoming a visible computer
- Codex Permissions, secrets, and sandboxes are moving into the foreground
- OpenClaw Accessibility is a frontier capability, not marketing polish
- Paperclip Agent systems are growing control planes
- Pi Coding Agent Integrations are volatile; the operating loop has to be durable
Artifact contents
Every file the loop produced for this run, anchored in the repo. Internal links go to the rendered page; the repo path opens the raw artifact on GitHub.
- manifest
- finding
- finding
- finding
- finding
- finding
- finding
- finding
- finding
- signalsAccepted signals (YAML) runs/2026-05-07-commit-harvest-2026-04-23_2026-05-07-frontier-v1/signals/frontier-signals.yml
- weeklyWeekly digest — 2026-04-23_2026-05-07 runs/2026-05-07-commit-harvest-2026-04-23_2026-05-07-frontier-v1/weekly/2026-04-23_2026-05-07.md
- qa
- audit
Run digest
The last two weeks of commits make one thing clear: the interesting action in coding agents is no longer confined to the model or the chat transcript.
Agent harnesses are becoming operating surfaces.
Codex is adding persistent goals, session metadata, memory plumbing, plugin controls, sandbox work, and cloud executor paths. Gemini CLI is treating memory as a reviewable patch, with workspace trust, approval modes, shell safety, and structured non-interactive output close behind. Hermes is sanding down the rough edges of persistent personal agents: gateways, systemd, voice, themes, model providers, skills, search, kanban, and memory scoping. Pi keeps proving the opposite design lesson: a thin harness can move quickly because integrations can be added, removed, or rewritten without becoming the whole product.
The expanded watchlist changes the story. OpenClaw shows that accessibility is not a side quest; ordinary surfaces like Discord, Telegram, WhatsApp, OAuth, voice, onboarding, and visible progress are where agents become usable. Agent Zero shows the workcell becoming literal: browser, desktop, documents, file browser, screenshots, OAuth, and time-travel state. Paperclip shows the company/control-plane version of the problem: remote provisioning, sandbox providers, cost summaries, roles, liveness, pause/resume, and stale session recovery. OpenHands shows what happens when a harness becomes a platform: app server, model profiles, MCP proxying, secrets, security redaction, self-hosted integrations, sandbox grouping, and old runtime cleanup.
The frontier is not one winning agent. The frontier is the environment around agents getting thicker.
For Bitter, that is useful pressure. The stronger the agent tools get, the more valuable the developer's own loop becomes.
The Week In One Sentence
Coding agents are gaining goals, memory, computers, permissions, gateways, integrations, and supervision layers; the durable question is who owns the loop around all of that.
Main Signals
1. Persistent Agent State Is Becoming A Product Surface
The strongest single signal is still Codex /goal. It is not just a UX affordance. The diff-reviewed commit around goal validation shows that persistent objectives now deserve first-class validation, paste handling, queued-command behavior, and user guidance.
Gemini's Auto Memory inbox points in the same direction from another angle: memory should be proposed, reviewed, and accepted, not silently smeared into hidden context. Hermes adds memory scoping and Curator commands. OpenClaw is dealing with memory wiki details, task reload blockers, gateway session files, and visible tool progress in chat channels.
This is a real shift. Agent-side state is becoming more durable, more visible, and more operational.
The operator question:
What goal, memory, session, recap, skill report, or thread state shaped this run?
The Bitter answer:
Use agent-side state, but record it as agent-side state. The agent may carry a goal. The developer owns the charter.
Supported by Codex, Gemini CLI, Hermes, and OpenClaw.
2. The Agent Interface Is Becoming A Visible Computer
Agent Zero is the clearest evidence. It replaced a browser-use agent with a native Playwright-powered browser, added persistent Chromium runtime work, browser tabs, screenshot previews, annotation, file browser search, ZIP downloads, Linux desktop controls, document canvas, LibreOffice runtime, and OAuth/quota visibility.
OpenHands is moving in the same broad direction from the platform side: sandbox grouping UI, app-server routing, ACP/MCP surfaces, user secrets, model profiles, and enterprise integrations. Paperclip adds remote runtime provisioning and sandbox provider work. Codex is adding cloud executor paths and sandbox hardening.
The chat box is not enough. Serious agent work wants a visible machine.
The operator question:
Can I see the browser, files, runtime, screenshots, credentials, and artifacts that shaped this work?
The Bitter answer:
BitterGrid workcells should be leased, bounded, visible, resumable, and evidence-bearing. A workcell is not just a place to run commands. It is where agent labor becomes inspectable software work.
Supported by Agent Zero browser, Chromium runtime, OpenHands, and Paperclip.
3. Permissions, Secrets, And Sandboxes Are Moving Into The Foreground
This window is full of authority work. Codex has permission profiles, sandbox profiles, plugin sharing controls, MCP metadata, and Linux sandbox hardening. Gemini has workspace trust, private memory patch allowlists, shell safety evals, approval-mode-aware subagents, and policy-engine work. OpenHands tightened log redaction and removed a debug log exposing hook config secrets. OpenClaw is fixing access group allowlists, subagent security docs, OAuth labels, and live exec output limits. Paperclip is adding security roles and sandbox provider contracts. Agent Zero keeps browser and office surfaces opt-in and exposes OAuth disconnect and quota visibility.
This is the right direction. The harness is starting to show its authority model.
The operator question:
What could this agent read, change, execute, install, send, or leak?
The Bitter answer:
BitterPass and permission profiles should model the real authority surface of each harness, not an idealized abstraction. The run record should include credentials, plugins, approval mode, sandbox, network posture, OAuth state, and any known secret-handling caveats.
Supported by OpenHands redaction, hook config, Gemini CLI, Codex, and OpenClaw.
4. Accessibility Is A Frontier Capability
OpenClaw is the necessary corrective to an overly internal Bitter reading of the market. Its commits are full of work that makes agents usable by normal people: channel setup recovery, stale plugin repair, Discord voice behavior, Telegram reactions, WhatsApp identity mapping, OAuth labels, progress previews, chat drafts, typography cleanup, install recovery, and group allowlists.
Hermes is doing adjacent work through setup wizard fixes, voice push-to-talk parity, dashboard themes, gateway restart readiness, provider pickers, and messaging surfaces. Agent Zero is making the computer visible. Pi is improving login, terminal rendering, compact resource reads, clipboard behavior, and quickstart docs. Gemini is making memory reviewable and headless auth more reliable. OpenHands is exposing model names and model switching in the UI.
That matters. Accessibility is not softness. It is distribution, trust, and operator leverage.
The operator question:
Can a real person start, understand, recover, and control this thing without learning the project owner's private ontology?
The Bitter answer:
Bitter must keep its internal doctrine, but the public product needs humane language and visible affordances. The lesson from OpenClaw is not to become casual. It is to make serious authority understandable.
Supported by OpenClaw setup, OAuth status, Hermes, Agent Zero, and Pi.
5. Agent Systems Are Growing Control Planes
Paperclip makes the control-plane problem explicit. It is working on remote provisioning, sandbox providers, cost summaries, roles, liveness, stale sessions, issue workflows, ordered sub-issues, pause/resume controls, and remote workspace shaping.
OpenHands is consolidating around app-server reality. Hermes has kanban task runners, gateway lifecycle, Curator, provider modules, and dashboard state. Codex is moving skills, goals, sessions, plugins, and executors into app-server-shaped surfaces. OpenClaw is handling gateway sessions, subagents, plugin metadata, and live execution timelines.
This is the factory problem in miniature.
The operator question:
When agents coordinate across tasks and machines, what keeps the system legible?
The Bitter answer:
Factory should not become the agent. It should own the joined operating view and the run contract: charter, mandate, agent, runtime, authority, cost, evidence, recovery, and next action.
Supported by Paperclip runtime specs, cost summaries, OpenHands, and Hermes.
6. Integrations Are Volatile; The Operating Loop Has To Be Durable
Pi added providers, removed providers, changed Codex transports, added auth flows, improved session behavior, and kept terminal output evolving. Hermes is moving model providers into plugins. OpenClaw is externalizing channel plugins. OpenHands is replacing config surfaces and moving toward app-server services. Codex and Gemini are evolving plugin, MCP, memory, and approval surfaces quickly.
This is not a warning against using frontier tools. It is the reason to use them through a durable loop.
The operator question:
What should remain stable while the best agent, provider, runtime, protocol, or plugin changes every week?
The Bitter answer:
The agent can change. The charter, authority, evidence, verification, memory, and next run should compound.
Supported by Pi removals, Codex transport, Hermes providers, and OpenClaw plugins.
What Serious Builders Should Try
- Test persistent goals, but write down what owns the project-level objective before you trust the agent's local goal.
- Prefer memory systems that show proposed changes before accepting them.
- Try at least one visible-computer harness. The browser, file system, screenshots, and desktop surface reveal different failure modes than terminal chat.
- Inspect the permissions and sandbox story before giving an agent real credentials.
- Treat messaging and voice surfaces as product lessons, not consumer fluff.
- Track exact harness version, provider, transport, plugin set, sandbox, and credential path for serious runs.
What Bitter Should Test Next
- Codex
/goalas a provider-native goal under a Bitter charter. - Gemini Auto Memory as a reviewable memory proposal source.
- Agent Zero as a BitterGrid-style visible workcell.
- Paperclip's adapter runtime command spec as a model for run provisioning contracts.
- OpenHands secret/log/sandbox patterns against BitterPass and Grid boundaries.
- OpenClaw setup recovery and channel progress visibility as accessibility benchmarks.
- Pi as a thin, replaceable agent adapter with exact provider and transport records.
What Remains Uncertain
- OpenClaw's high commit volume makes it hard to separate durable product movement from rapid stabilization without deeper release-note and diff review.
- This run is commit-harvest focused. Claude Code was excluded because the v0 source contract does not define a public commit stream.
- Commit metadata was broad-sampled across all projects, but only selected high-signal commits received diff-level review.
- The frontier may be converging on visible computers, but the winning shape is still open: local desktop, browser sandbox, remote workcell, hosted app server, messaging agent, or some combination.
- It is unclear which agent-side memories and goals will remain stable enough to integrate deeply versus merely record as tool-local state.
Receipts
Findings and signal records for this run are under:
runs/2026-05-07-commit-harvest-2026-04-23_2026-05-07-frontier-v1/
Sources
Primary links, including exact changelog lines when available.
- releasev0.41.0 releasegoogle-gemini/gemini-cli · v0.41.0lineSecure .env loading and workspace trustgoogle-gemini/gemini-cli · docs/changelogs/preview.md#L37-L38lineShell validation and core tool allowlistgoogle-gemini/gemini-cli · docs/changelogs/preview.md#L35-L36lineAuto-memory scratchpadgoogle-gemini/gemini-cli · docs/changelogs/preview.md#L70-L72
- releasev2026.4.30 releaseNousResearch/hermes-agent · v2026.4.30lineCurator release summaryNousResearch/hermes-agent · RELEASE_v0.12.0.md#L6-L12lineCurator feature detailsNousResearch/hermes-agent · RELEASE_v0.12.0.md#L58-L64lineSelf-improvement loop detailsNousResearch/hermes-agent · RELEASE_v0.12.0.md#L71-L77
- linev0.73.0 changelog highlightsbadlogic/pi-mono · packages/coding-agent/CHANGELOG.md#L3-L9lineOpenAI Codex websocket transport and compact rendering fixesbadlogic/pi-mono · packages/coding-agent/CHANGELOG.md#L25-L31lineRemoved Gemini CLI and Antigravity supportbadlogic/pi-mono · packages/coding-agent/CHANGELOG.md#L68-L79lineProvider timeout/retry controlsbadlogic/pi-mono · packages/coding-agent/CHANGELOG.md#L198-L209
- commit_diff_reviewedRecover externalized channel plugin from stale configgithub.com/openclaw/openclaw/commit/329580c64d13657592c3fabb97ff567c2e292bb6commitLabel Claude CLI OAuth statusgithub.com/openclaw/openclaw/commit/2b4b60b5514b47d8e242b9b11d9b395037e6674bcommitPrevent Discord voice self-feedbackgithub.com/openclaw/openclaw/commit/1c2832526f65cf23b469e9a1dc5694915c5be548commitHonor Telegram access group allowlistsgithub.com/openclaw/openclaw/commit/b6ae0b83a61a1f779ee41b5d639b6049bfd422cecommitDocument sub-agent security boundariesgithub.com/openclaw/openclaw/commit/33b112ad314dc8d9dfe0f5a68caed4811a23245acommitBound live exec output eventsgithub.com/openclaw/openclaw/commit/3ee7c02bcacfdf6327747c1fe24dd6d11de8612acommitCoarse agent turn timeline spansgithub.com/openclaw/openclaw/commit/61223a74a43fd8768c426d5b22f1633dbad37477commitShow Codex tool progress in channel draftsgithub.com/openclaw/openclaw/commit/3f210b10ce3a19ef6a04205aa7420353945567a2
- commit_diff_reviewedAdapters declare runtime command spec for remote provisioninggithub.com/paperclipai/paperclip/commit/90631b09b36fa028ad24ca5375bfa50e3602799ccommitFix remote workspace environment shapinggithub.com/paperclipai/paperclip/commit/856c6cb192e53a992875821297b5fd8d29c95c2dcommitAdd sandbox callback bridge for remote environment API accessgithub.com/paperclipai/paperclip/commit/a4ac6ff133fbe8bdb82f4046fda85f7cb372b6a9commitAdd E2B sandbox provider plugingithub.com/paperclipai/paperclip/commit/4ef969f0840810527333aa6ee44fed89f4551f7ccommitIssue cost summariesgithub.com/paperclipai/paperclip/commit/c4269bab59fff7a73ff31797578cc97ece7f160fcommitFirst-class security agent rolegithub.com/paperclipai/paperclip/commit/c036bbfa98494dcfe2521aab65019a4cd021c769commitPause and resume sidebar agentsgithub.com/paperclipai/paperclip/commit/43b0f2ae582b18f2872ae60bf468f54b99b614ba
- commit_diff_reviewedReplace browser-use agent with native browsergithub.com/agent0ai/agent-zero/commit/983d431a5eb785eb9deba9fdfd471fa93f349603commitPersistent full Chromium runtime for Browsergithub.com/agent0ai/agent-zero/commit/fa7eef1919901093b117a98ad6e402d809687cf6commitBrowser multi-tab awareness and modifier-key clickgithub.com/agent0ai/agent-zero/commit/5012dd3128aa6218cc55f6cbce8be42b2db2fee4commitBrowser screenshot previews in tool messagesgithub.com/agent0ai/agent-zero/commit/c2fb2c3c94e1e1c85b783252332b3fc003f39f2bcommitLinux Desktop skill controlsgithub.com/agent0ai/agent-zero/commit/62ac20e7b248179825e05664c1df97ebc6214c54commitDesktop document canvasgithub.com/agent0ai/agent-zero/commit/24dd548ebf221e397323b5aa3a509f037fb1b9aecommitOAuth disconnect and remaining quota visibilitygithub.com/agent0ai/agent-zero/commit/0da8f3dc2b640efbce22499053507837101fdf6f
- commit_diff_reviewedStrengthen log redaction for API keysgithub.com/OpenHands/OpenHands/commit/61e3dc2cadbefd4e0649b7c141ac2335c021ad2bcommitRemove debug log exposing hook_config secretsgithub.com/OpenHands/OpenHands/commit/0c6c461555f8651347ed140f1c555ff8a88ddf56commitExpose sandbox grouping strategy UIgithub.com/OpenHands/OpenHands/commit/90cf5f8003c247597481bcbef9a5aa73eb899e10commitProxy Tavily MCP through app servergithub.com/OpenHands/OpenHands/commit/949a15a560ef90cd3dd7f18baf6955430401edb4commitMove server content to app_servergithub.com/OpenHands/OpenHands/commit/5232d96dab0ca98e691d6307bd0759e943220d1ccommitInject user secrets into ACP subprocess envgithub.com/OpenHands/OpenHands/commit/cf156b0073350ca8e93067bc2f4ae18b90537a0acommitSelf-hosted GitLab supportgithub.com/OpenHands/OpenHands/commit/4e63531fa6595ec55102f08ef129845931fcd8ffcommitRemoved V0 runtimegithub.com/OpenHands/OpenHands/commit/e86067c15b54242fd611877aa9038a2f7a219658