<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Bitter Frontier</title><description>Source-backed field notes on what changed in agentic harnesses, what got easier, and what serious developers should do next.</description><link>https://frontier.bitter.sh/</link><item><title>The Policy You Wrote Wasn&apos;t the Policy You Had</title><link>https://frontier.bitter.sh/digests/2026-05-28_2026-06-03-weekly/</link><guid isPermaLink="true">https://frontier.bitter.sh/digests/2026-05-28_2026-06-03-weekly/</guid><description>Seven days, ten providers, one uncomfortable theme: the headline this
week is not new capability. It is the gap between the policy an operator
configured and the policy the runtime actually enforced -- and how
many providers spent the window quietly closing it.</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;The Policy You Wrote Wasn&amp;#39;t the Policy You Had&lt;/h1&gt;
&lt;p&gt;Seven days, ten providers, one uncomfortable theme: the headline this
week is not new capability. It is the gap between the policy an operator
&lt;em&gt;configured&lt;/em&gt; and the policy the runtime &lt;em&gt;actually enforced&lt;/em&gt; -- and how
many providers spent the window quietly closing it.&lt;/p&gt;
&lt;p&gt;A Claude Code operator who wrote a &lt;code&gt;Read&lt;/code&gt;-deny rule to hide a secret
file was still leaking it through &lt;code&gt;Glob&lt;/code&gt; and &lt;code&gt;Grep&lt;/code&gt;. A Pi user
authenticating against an OAuth server could be handed a verification
URI that ran shell commands. A Hermes Docker dashboard could drop its
auth because a heuristic misread the bind host. A Gemini CLI MCP
blacklist could be bypassed. None of these were the operator&amp;#39;s
misconfiguration. The rules were written; the enforcement silently
wasn&amp;#39;t there. This week, across
&lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;Claude Code&lt;/a&gt;,
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/releases/tag/v0.45.0&quot;&gt;Gemini CLI&lt;/a&gt;,
&lt;a href=&quot;https://github.com/earendil-works/pi/commit/ba6e529&quot;&gt;Pi&lt;/a&gt;,
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/ce271ad&quot;&gt;OpenHands&lt;/a&gt;,
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.29&quot;&gt;Hermes&lt;/a&gt;,
and &lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;Flue&lt;/a&gt;, the same class
of fix landed: restore the enforcement the operator already believed
was in place.&lt;/p&gt;
&lt;p&gt;The quieter, more forward-looking thread is the inverse of a gap-close:
skills and plugins became &lt;strong&gt;governed, auditable, sometimes
agent-activated resources&lt;/strong&gt; across four providers in parallel --
&lt;a href=&quot;https://github.com/paperclipai/paperclip/releases/tag/v2026.529.0&quot;&gt;Paperclip&lt;/a&gt;,
&lt;a href=&quot;https://github.com/openclaw/openclaw/releases&quot;&gt;OpenClaw&lt;/a&gt;,
&lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;Flue&lt;/a&gt;, and
&lt;a href=&quot;https://github.com/agent0ai/agent-zero/releases&quot;&gt;Agent Zero&lt;/a&gt;. Capability
that used to be an ambient default is becoming reviewable operating
state.&lt;/p&gt;
&lt;h2&gt;Security Advisories: Check These Before Upgrading&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Claude Code 2.1.160--2.1.162: three permission-bypass gaps closed at
once.&lt;/strong&gt; Custom &lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;&lt;code&gt;WebFetch&lt;/code&gt;&lt;/a&gt;
permission rules now override the built-in preapproved-domain whitelist;
Windows permission rules with backslashes or case-variant paths now
match; and &lt;code&gt;Read&lt;/code&gt;-deny rules now hide files from &lt;code&gt;Glob&lt;/code&gt; and &lt;code&gt;Grep&lt;/code&gt;
results. The sharpest of the three is the last one: a file an operator
&lt;em&gt;denied&lt;/em&gt; for &lt;code&gt;Read&lt;/code&gt; was still discoverable -- path and contents
surfaceable -- through search tools, defeating the access-control intent.
The attacker model is prompt-injection or compromised task content
steering the agent toward a denied domain or walled-off path; the fix is
gated purely on upgrading, so the operator action is &lt;strong&gt;upgrade, then
re-audit whether any policy was silently bypassed in the prior window&lt;/strong&gt;,
especially on Windows and any setup relying on &lt;code&gt;Read&lt;/code&gt;-deny to hide
secrets from search. The changelog ships this as an ordinary entry; treat
it as the advisory it is.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Claude Code 2.1.160: execution-granting config writes now prompt even
in acceptEdits mode.&lt;/strong&gt; Two guardrails land together. &lt;code&gt;acceptEdits&lt;/code&gt; mode
now &lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;prompts before writing build-tool config that grants code
execution&lt;/a&gt; (&lt;code&gt;.npmrc&lt;/code&gt;,
&lt;code&gt;.yarnrc*&lt;/code&gt;, &lt;code&gt;bunfig.toml&lt;/code&gt;, &lt;code&gt;.bazelrc&lt;/code&gt;, &lt;code&gt;.pre-commit-config.yaml&lt;/code&gt;,
&lt;code&gt;.devcontainer/&lt;/code&gt;), and the agent now prompts before writing &lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;shell
startup files&lt;/a&gt; (&lt;code&gt;.zshenv&lt;/code&gt;, &lt;code&gt;.zlogin&lt;/code&gt;, &lt;code&gt;.bash_login&lt;/code&gt;) and
&lt;code&gt;~/.config/git/&lt;/code&gt;. Operators running &lt;code&gt;acceptEdits&lt;/code&gt; or auto-leaning modes
previously had a &lt;em&gt;silent&lt;/em&gt; write path into files that execute on the next
shell login, install, or commit -- the classic agent-persistence and
supply-chain escalation vector. The prompt &lt;strong&gt;is&lt;/strong&gt; the guardrail here; blanket-
allowing it puts you back where you started.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pi: OAuth command injection and git-package path traversal closed.&lt;/strong&gt;
Commit &lt;a href=&quot;https://github.com/earendil-works/pi/commit/ba6e529&quot;&gt;&lt;code&gt;ba6e529&lt;/code&gt;&lt;/a&gt;
validates OAuth verification URIs (rejecting non-HTTP(S) schemes) and
launches the browser via &lt;code&gt;spawn()&lt;/code&gt; instead of shell &lt;code&gt;exec()&lt;/code&gt;, closing a
path where a malicious OAuth server could inject &lt;code&gt;$(id&amp;gt;/tmp/pwned)&lt;/code&gt;-style
commands. Commit
&lt;a href=&quot;https://github.com/earendil-works/pi/commit/a98e087&quot;&gt;&lt;code&gt;a98e087&lt;/code&gt;&lt;/a&gt; rejects
git URLs with &lt;code&gt;..&lt;/code&gt;, null bytes, backslashes, or leading slashes at both
parse and resolution time, blocking writes outside the package install
root. The attacker is whoever controls the OAuth server or authors the
git package; both fixes need no config change, only the upgrade.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenHands main: three named CVEs.&lt;/strong&gt; No tagged release fell in the
window, but main closed
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/73d1d9a&quot;&gt;CVE-2026-44492 (axios 1.16.0)&lt;/a&gt;,
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/b025cd2&quot;&gt;CVE-2026-41238 (dompurify 3.4.0)&lt;/a&gt;,
and
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/3eb16a9&quot;&gt;CVE-2026-42305 (dulwich 1.2.5)&lt;/a&gt;.
The first two are browser-facing (HTTP client and HTML/DOM sanitizer)
and need a frontend rebuild and redeploy; the third is a backend git
library and needs a &lt;code&gt;poetry.lock&lt;/code&gt; re-resolve and image rebuild.
Self-hosters pinning older lockfiles must bump manually.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Gemini CLI v0.45.0: MCP blacklist bypass fixed.&lt;/strong&gt; The
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/releases/tag/v0.45.0&quot;&gt;stable release&lt;/a&gt;
bundles Termux relaunch/resize fixes, session-context filtering on
history resume, and -- the security-bearing item -- a fix for a path
where a &lt;em&gt;blacklisted&lt;/em&gt; MCP tool or server could still be reached.
Operators relying on MCP deny-lists for containment should upgrade
before trusting the blacklist, and test that blacklisted tools are
actually unreachable rather than assume full coverage.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hermes v0.15.1: Docker insecure binding is now an explicit opt-in.&lt;/strong&gt;
The dashboard no longer infers insecure mode from the bind host; it
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.29&quot;&gt;requires &lt;code&gt;HERMES_DASHBOARD_INSECURE=1&lt;/code&gt;&lt;/a&gt;
explicitly. This removes a silent path where a misread bind host dropped
auth and exposed the dashboard to a network-adjacent attacker. Existing
Docker and hosted setups must update env config before upgrading. The
same patch fixes a v0.15.0 loopback-mode dashboard reload loop and
restores MCP bare-command resolution (&lt;code&gt;npx&lt;/code&gt;, &lt;code&gt;npm&lt;/code&gt;, &lt;code&gt;node&lt;/code&gt;) in Docker.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Paperclip v2026.529.0: first-admin claim is now the bootstrap gate.&lt;/strong&gt;
Unclaimed self-hosted deployments get a
&lt;a href=&quot;https://github.com/paperclipai/paperclip/releases/tag/v2026.529.0&quot;&gt;one-time browser claim&lt;/a&gt;
to create the first admin. The flip side is a race: &lt;strong&gt;whoever completes
the claim first becomes admin&lt;/strong&gt;, so an attacker with network reach to a
freshly stood-up instance could seize control before the legitimate
operator. Claim promptly and restrict network exposure during the
unclaimed window.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hermes v0.15.0: Promptware defense, and a migration.&lt;/strong&gt; The Velocity
Release adds a built-in
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.28&quot;&gt;defense against Brainworm-class prompt-injection&lt;/a&gt;
and closes 19 security-tagged issues. Operators running against untrusted
content (web, repos, MCP output) should validate the defense against
their own injection corpus rather than assume blanket coverage; novel
vectors outside the known class may still pass.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Flue v0.9.0: a hard breaking migration.&lt;/strong&gt; Routing imports move from
&lt;code&gt;@flue/runtime/app&lt;/code&gt; to &lt;code&gt;@flue/runtime/routing&lt;/code&gt;, provider model values now
require &lt;code&gt;provider-id/model-id&lt;/code&gt; format, SDK mount paths derive from
&lt;code&gt;baseUrl&lt;/code&gt;, and
&lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;persisted beta session state is rejected&lt;/a&gt;
-- clear or migrate the store before upgrading or sessions fail to
restore. Cloudflare Durable Object migrations are no longer auto-appended;
the operator now owns them in the Wrangler config, and interrupted
workflows no longer auto-retry.&lt;/p&gt;
&lt;h2&gt;The Enforcement Gap, Six Ways&lt;/h2&gt;
&lt;p&gt;The thread that cuts across the watchlist is consistent enough to name
plainly. In each case, a control the operator had reason to believe was
active was not -- and the fix is the same shape: make the enforcement
match the configuration.&lt;/p&gt;
&lt;p&gt;The Claude Code cluster is the clearest statement of it. A
&lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;&lt;code&gt;Read&lt;/code&gt;-deny rule that didn&amp;#39;t hide files from &lt;code&gt;Glob&lt;/code&gt;/&lt;code&gt;Grep&lt;/code&gt;&lt;/a&gt;,
a &lt;code&gt;WebFetch&lt;/code&gt; rule that didn&amp;#39;t override the preapproved-domain list, and
Windows path rules that silently didn&amp;#39;t match on case or separator
variance are three independent ways the same promise -- &amp;quot;the policy I
wrote is enforced&amp;quot; -- was broken. The same release line also converts a
&lt;em&gt;silent&lt;/em&gt; config-write into a
&lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;confirmation checkpoint&lt;/a&gt;
for files that grant code execution, and corrects an over-broad
managed-settings policy that was
&lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;wrongly blocking legitimate third-party provider sessions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Gemini CLI&amp;#39;s
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/releases/tag/v0.45.0&quot;&gt;MCP blacklist bypass&lt;/a&gt;
is the same bug class at the tool layer: a deny-list that didn&amp;#39;t deny.
Its companion
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/dceb2ea30650c3f6742e67ec71110857904e78b3&quot;&gt;policy-file resilience fix&lt;/a&gt;
closes a fail-open gap where a policy file that failed to persist (on
cross-device container mounts) or failed to parse (corrupt TOML) could
leave the agent running without the operator&amp;#39;s intended policy in
effect. Recovery now writes a &lt;code&gt;.bak&lt;/code&gt; and rebuilds to defaults, so a
corrupted policy is silently reset -- re-verify intended policy after a
&lt;code&gt;.bak&lt;/code&gt; appears.&lt;/p&gt;
&lt;p&gt;Pi&amp;#39;s quartet of hardening commits -- OAuth injection, git path traversal,
&lt;a href=&quot;https://github.com/earendil-works/pi/commit/135fb54&quot;&gt;auth files created at &lt;code&gt;0o600&lt;/code&gt;&lt;/a&gt;
instead of briefly world-readable, and
&lt;a href=&quot;https://github.com/earendil-works/pi/commit/ea3465a&quot;&gt;extension cache moved out of world-accessible &lt;code&gt;/tmp&lt;/code&gt;&lt;/a&gt;
-- is the multi-user-host version of the same theme: close the windows
where a control was assumed but a co-tenant could slip through. Flue&amp;#39;s
v0.9.1
&lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;WebSocket credential hardening&lt;/a&gt;
strips query strings and fragments before persisting Cloudflare
attachments so URL-carried handshake credentials are not retained, and
OpenHands moved
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/ce271ad&quot;&gt;ACP provider credentials off the plaintext &lt;code&gt;acp_env&lt;/code&gt; channel onto an
encrypted secrets channel&lt;/a&gt;.
And Hermes closed the same shape at the deployment edge: a Docker
dashboard that
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.29&quot;&gt;silently dropped auth&lt;/a&gt;
when a heuristic misread the bind host now demands an explicit
&lt;code&gt;HERMES_DASHBOARD_INSECURE=1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The operator takeaway is uncomfortable but actionable: an upgrade is not
just a feature bump this week. For every provider above, &lt;strong&gt;the safe
assumption is that some control you configured on the prior build was not
holding&lt;/strong&gt;, and the post-upgrade action is a re-audit, not a victory lap.&lt;/p&gt;
&lt;h2&gt;Skills and Plugins Become Governed State&lt;/h2&gt;
&lt;p&gt;The second thread runs the other way. Four
providers, four surfaces, shipped the same move: agent capability stops
being an ambient default and becomes reviewable, sometimes approvable,
operating state.&lt;/p&gt;
&lt;p&gt;Paperclip made company skills first-class resources with an
&lt;a href=&quot;https://github.com/paperclipai/paperclip/releases/tag/v2026.529.0&quot;&gt;&lt;code&gt;install / reset / audit / export / assign&lt;/code&gt; CLI&lt;/a&gt;. The load-bearing verbs
are &lt;code&gt;audit&lt;/code&gt; and &lt;code&gt;export&lt;/code&gt; -- which skills an agent holds becomes a
queryable, exportable fact rather than implicit config -- and &lt;code&gt;assign&lt;/code&gt;,
which makes a capability grant a distinct, reviewable authority action.&lt;/p&gt;
&lt;p&gt;OpenClaw&amp;#39;s &lt;a href=&quot;https://github.com/openclaw/openclaw/releases&quot;&gt;Skill
Workshop&lt;/a&gt; inserts a human-in-the-loop gate: new skills enter a
pending-proposal queue reviewed via CLI or Gateway before taking effect.
A new &lt;a href=&quot;https://github.com/openclaw/openclaw/releases&quot;&gt;&lt;code&gt;skill_workshop&lt;/code&gt;&lt;/a&gt; agent tool lets agents &lt;em&gt;file&lt;/em&gt; proposals
themselves, which widens the surface proposals originate from -- so the
operator decision is who may review and who may self-approve. Lax review
re-opens the unreviewed-skill path.&lt;/p&gt;
&lt;p&gt;Flue v0.9.2 went the other
direction on activation authority: an &lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;&lt;code&gt;activate_skill&lt;/code&gt;&lt;/a&gt; tool lets agents
load full skill instructions &lt;em&gt;on demand&lt;/em&gt; before matching work. The
operator&amp;#39;s visible control narrows to &lt;em&gt;which&lt;/em&gt; skills are configured; the
choice to activate moves to the agent. Workspace skills are reread on
activation, so mid-session edits take effect.&lt;/p&gt;
&lt;p&gt;Agent Zero v1.19 made
Office, Desktop, and Editor plugins toggleable behind a &lt;a href=&quot;https://github.com/agent0ai/agent-zero/releases&quot;&gt;protected
plugin-state API&lt;/a&gt; -- a real authority lever that lets an operator disable
powerful capabilities (Desktop computer-use especially) on deployments
that should not have them. The release note describes a &amp;quot;protected&amp;quot;
toggle endpoint but no auth model or role-based capability management, so
treat it as a disable lever, not yet an audited capability register.&lt;/p&gt;
&lt;p&gt;The shapes differ -- catalog audit, proposal approval, agent
self-activation, capability toggle -- but the direction is one: the
question &amp;quot;what can this agent do?&amp;quot; is becoming answerable by inspecting
state rather than reading code or trusting defaults.&lt;/p&gt;
&lt;p&gt;The accessibility read is mixed. Claude Code&amp;#39;s &lt;code&gt;waitingFor&lt;/code&gt; field and
fan-out progress counter make agent state legible to operators who
previously had to open each session; Flue v0.9.0, by contrast, forces a
hard migration with no automated path, raising rather than lowering the
cost of staying current. The week made harnesses more governable, not
more reachable.&lt;/p&gt;
&lt;h2&gt;Control Plane&lt;/h2&gt;
&lt;p&gt;Control plane saw the most movement, in two directions. The
governance-of-capability cluster above (Paperclip skills, OpenClaw Skill
Workshop, Agent Zero plugin toggles, Flue agent-activated skills) sits
here, as does a steady relocation of authority onto standing credentials
and cloud paths: Codex
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;remote-exec API-key host registration&lt;/a&gt;,
Hermes
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.28&quot;&gt;Bitwarden Secrets Manager replacing per-provider keys&lt;/a&gt;,
Claude Code
&lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;Auto Mode reaching Bedrock/Vertex/Foundry&lt;/a&gt;,
and Codex models
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;running under AWS IAM via Bedrock&lt;/a&gt;.
Claude Code also made agent supervision more legible: &lt;code&gt;claude agents --json&lt;/code&gt; now exposes a
&lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;&lt;code&gt;waitingFor&lt;/code&gt; field&lt;/a&gt; naming
what a blocked session waits on (e.g. a permission prompt), plus a
&lt;code&gt;done/total&lt;/code&gt; fan-out progress counter.&lt;/p&gt;
&lt;h2&gt;Runtime&lt;/h2&gt;
&lt;p&gt;Runtime carried most of the enforcement-gap closures -- the Claude
Code config-write prompts, Pi&amp;#39;s OAuth and path-traversal fixes, Flue&amp;#39;s
WebSocket credential stripping, Hermes&amp;#39;s Promptware defense, OpenHands&amp;#39;s
dulwich CVE -- plus one notable posture reversal: Agent Zero
&lt;a href=&quot;https://github.com/agent0ai/agent-zero/releases&quot;&gt;reverted computer-use screenshots to durable chat-scoped storage&lt;/a&gt;,
undoing its prior ephemeral-by-default stance. That improves audit trails
but persists potentially sensitive on-screen content (credentials, PII,
internal UIs) with no automatic redaction -- a data-at-rest exposure
operators must scope and prune.&lt;/p&gt;
&lt;h2&gt;Platform&lt;/h2&gt;
&lt;p&gt;Platform was mostly steady-state plumbing: the OpenHands frontend CVE
cluster, Gemini CLI&amp;#39;s
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/releases/tag/v0.45.0&quot;&gt;v0.45.0 stable bundle&lt;/a&gt;
and an
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/211e7d1aec61f64aaace702ed2a4b97ff9de1ace&quot;&gt;editor-spam-loop fix&lt;/a&gt;,
OpenClaw&amp;#39;s
&lt;a href=&quot;https://github.com/openclaw/openclaw/releases&quot;&gt;MiniMax M3 model support&lt;/a&gt;,
and Flue&amp;#39;s
&lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;OpenTelemetry tracing package&lt;/a&gt;.
Codex&amp;#39;s
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;Sites plugin&lt;/a&gt; -- in-app
website/web-app creation and deployment, included by default in Business
workspaces -- is the one platform item with a governance edge: a deploy
capability may already be active without an explicit enablement step.&lt;/p&gt;
&lt;h2&gt;Provider Notes&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Codex (CLI 0.135.0--0.136.0, iOS 1.2026.146)&lt;/strong&gt; shipped
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;named permission profiles with custom-config display&lt;/a&gt;
and &lt;code&gt;codex doctor&lt;/code&gt; diagnostics (0.135.0), a non-interactive installer
for CI, plus
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;remote-exec API-key host registration&lt;/a&gt;
and thread archiving (0.136.0). The iOS app added an
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;optional Face ID / passcode lock for Codex&lt;/a&gt;
and SSH-to-Windows. Two integrations landed: the
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;Sites plugin&lt;/a&gt; and
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;Amazon Bedrock&lt;/a&gt; under
AWS-managed auth and billing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Claude Code (2.1.158--2.1.162)&lt;/strong&gt; is the enforcement-gap headliner:
&lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;the permission/deny-rule cluster&lt;/a&gt;,
&lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;execution-granting config-write prompts&lt;/a&gt;,
the managed-settings third-party-session unblock, agent-status
observability, and Auto Mode reaching Bedrock/Vertex/Foundry for Opus
4.7/4.8.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Gemini CLI (v0.44.1--v0.46.0-preview)&lt;/strong&gt; shipped the
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/releases/tag/v0.45.0&quot;&gt;v0.45.0 stable bundle&lt;/a&gt;
with the MCP blacklist fix and Termux hardening,
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/dceb2ea30650c3f6742e67ec71110857904e78b3&quot;&gt;policy-file resilience&lt;/a&gt;,
and a server-flag-gated
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/665228e983c611007c4e2d36550e67b34f75055e&quot;&gt;Gemini 3.5 Flash GA rollout&lt;/a&gt;
that decouples model-in-use from client version. A CI change to
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/cfcecebe8069f3714641a68e2898593698f739ba&quot;&gt;&lt;code&gt;pull_request_target&lt;/code&gt;&lt;/a&gt;
on the PR-size labeler is low-risk as written (it only reads line counts)
but removes the structural safety of &lt;code&gt;pull_request&lt;/code&gt; -- any future edit
adding fork-code checkout becomes immediately dangerous.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hermes Agent (v0.15.0--v0.15.1 + post-release commits)&lt;/strong&gt; is the
Velocity Release: a 76% &lt;code&gt;run_agent.py&lt;/code&gt; refactor, Kanban evolving into a
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.28&quot;&gt;multi-agent orchestration platform&lt;/a&gt;
with auto-decomposition, swarm topology, and worktree-per-task,
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.28&quot;&gt;Promptware defense&lt;/a&gt;,
and
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.28&quot;&gt;Bitwarden Secrets Manager&lt;/a&gt;.
The v0.15.1 patch fixes the
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.29&quot;&gt;Docker insecure-binding opt-in and a dashboard reload loop&lt;/a&gt;;
June 3 commit waves hardened installer self-update, Windows/WSL2 PTY and
schtasks handling, and desktop session management.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pi coding agent (commits to main)&lt;/strong&gt; shipped a security-hardening
cluster:
&lt;a href=&quot;https://github.com/earendil-works/pi/commit/ba6e529&quot;&gt;OAuth launch hardening&lt;/a&gt;,
&lt;a href=&quot;https://github.com/earendil-works/pi/commit/a98e087&quot;&gt;git path-traversal rejection&lt;/a&gt;,
&lt;a href=&quot;https://github.com/earendil-works/pi/commit/135fb54&quot;&gt;auth-file mode-on-create&lt;/a&gt;
(&lt;code&gt;0o600&lt;/code&gt; instead of briefly world-readable),
&lt;a href=&quot;https://github.com/earendil-works/pi/commit/ea3465a&quot;&gt;extension-cache isolation&lt;/a&gt;
out of world-accessible &lt;code&gt;/tmp&lt;/code&gt;, and
&lt;a href=&quot;https://github.com/earendil-works/pi/commit/6cb23f9&quot;&gt;HTML-export XSS sanitization&lt;/a&gt;.
Alongside, model-catalog maintenance removed stale Codex entries, added
&lt;a href=&quot;https://github.com/earendil-works/pi/commit/83afcdc&quot;&gt;Mistral Devstral 2 and Open Mistral Nemo&lt;/a&gt;,
and refreshed Claude model pricing and token caps to 128k output.
No reliably in-window tagged release landed; the security work shipped as
commits to main.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenClaw (2026.5.31-beta.3 through 2026.6.1 stable)&lt;/strong&gt; shipped the
&lt;a href=&quot;https://github.com/openclaw/openclaw/releases&quot;&gt;Skill Workshop proposal workflow&lt;/a&gt;,
interrupted-tool-call recovery, bounded request timers (re-evaluate SLOs),
enhanced plugin isolation, MiniMax M3, and
&lt;a href=&quot;https://github.com/openclaw/openclaw/releases&quot;&gt;Tailscale Serve service-name binding&lt;/a&gt;
with SQLite-backed state migration for iMessage and plugin-install
tracking.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Paperclip (v2026.529.0)&lt;/strong&gt; shipped the
&lt;a href=&quot;https://github.com/paperclipai/paperclip/releases/tag/v2026.529.0&quot;&gt;skills CLI/catalog&lt;/a&gt;,
the
&lt;a href=&quot;https://github.com/paperclipai/paperclip/releases/tag/v2026.529.0&quot;&gt;first-admin claim flow&lt;/a&gt;,
inline document annotations, per-user sidebar controls, and
&lt;a href=&quot;https://github.com/paperclipai/paperclip/releases/tag/v2026.529.0&quot;&gt;live Claude model discovery&lt;/a&gt;
from the UI.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent Zero (v1.19)&lt;/strong&gt; renamed Remote Link to
&lt;a href=&quot;https://github.com/agent0ai/agent-zero/releases&quot;&gt;Remote Control with selectable tunnel providers&lt;/a&gt;,
made
&lt;a href=&quot;https://github.com/agent0ai/agent-zero/releases&quot;&gt;Office/Desktop/Editor plugins toggleable behind a protected API&lt;/a&gt;,
reverted
&lt;a href=&quot;https://github.com/agent0ai/agent-zero/releases&quot;&gt;screenshots to durable chat-scoped storage&lt;/a&gt;,
unified OAuth account management, and hardened Xpra desktop control.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenHands (main, no tagged release)&lt;/strong&gt; shipped the three-CVE remediation
cluster (&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/73d1d9a&quot;&gt;axios&lt;/a&gt;,
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/b025cd2&quot;&gt;dompurify&lt;/a&gt;,
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/3eb16a9&quot;&gt;dulwich&lt;/a&gt;),
the
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/ce271ad&quot;&gt;ACP-credentials-to-secrets-channel move&lt;/a&gt;,
a
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/82744a0&quot;&gt;cascade-delete-sole-org-requester change&lt;/a&gt;
on &lt;code&gt;DELETE /api/organizations&lt;/code&gt; (org deletion now also deletes the
requesting user if it is their only org), a git-proxy capability, and a
LiteLLM 1.84.1 upgrade.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Flue (Tier 2; v0.8.1--v0.9.2)&lt;/strong&gt; shipped
&lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;OpenTelemetry tracing&lt;/a&gt;, the
&lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;v0.9.0 breaking migration&lt;/a&gt;,
&lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;WebSocket credential hardening&lt;/a&gt;,
operator-owned workflow-run retention (the implicit 50-run prune is
gone), and
&lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;autonomous &lt;code&gt;activate_skill&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;What To Try&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Claude Code operators&lt;/strong&gt;: upgrade to 2.1.162 and re-audit any
allow/deny or &lt;code&gt;Read&lt;/code&gt;-deny policy that ran on older builds. Then wire
&lt;a href=&quot;https://code.claude.com/docs/en/changelog&quot;&gt;&lt;code&gt;waitingFor&lt;/code&gt; and the fan-out progress counter&lt;/a&gt;
into supervision tooling so stuck-agent triage stops requiring a human
to open each session.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Paperclip operators&lt;/strong&gt;: use the
&lt;a href=&quot;https://github.com/paperclipai/paperclip/releases/tag/v2026.529.0&quot;&gt;skills CLI&lt;/a&gt;
to &lt;code&gt;audit&lt;/code&gt; and &lt;code&gt;export&lt;/code&gt; which agents hold which skills, and claim any
freshly stood-up self-hosted instance immediately.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex operators on iOS&lt;/strong&gt;: enable the
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;Face ID / passcode lock&lt;/a&gt;
before treating mobile as a trusted access surface.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hermes operators&lt;/strong&gt;: queue a decomposable task on the
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.28&quot;&gt;new multi-agent Kanban&lt;/a&gt;
and confirm the orchestrator spawns the expected sub-agents in isolated
worktrees before trusting it with real work. Set
&lt;code&gt;HERMES_DASHBOARD_INSECURE=1&lt;/code&gt; only where insecure binding is genuinely
intended.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent Zero operators&lt;/strong&gt;: disable the
&lt;a href=&quot;https://github.com/agent0ai/agent-zero/releases&quot;&gt;Office/Desktop/Editor plugins&lt;/a&gt;
you do not need, and review retention/access controls for the now-durable
computer-use screenshots before capturing sensitive screens.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gemini CLI maintainers&lt;/strong&gt;: review the
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/cfcecebe8069f3714641a68e2898593698f739ba&quot;&gt;&lt;code&gt;pull_request_target&lt;/code&gt; labeler workflow&lt;/a&gt;
to confirm it only reads PR metadata and never checks out fork code
under the elevated token.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What Remains Uncertain&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Codex remote-exec key lifecycle&lt;/strong&gt;: scope, rotation, and revocation
for the
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;approved-host API-key registration&lt;/a&gt;
are undocumented. Whether a leaked key grants persistent remote exec is
unverified.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex iOS SSH trust handling&lt;/strong&gt;: host-key verification, key storage,
and scoping of the
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;iOS SSH-to-Windows client&lt;/a&gt;
are not described.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gemini CLI model routing&lt;/strong&gt;: with
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/665228e983c611007c4e2d36550e67b34f75055e&quot;&gt;Flash GA gated server-side&lt;/a&gt;,
the model in use is no longer determined by client version alone --
backend flag state is now part of the audit surface.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenClaw plugin-isolation depth&lt;/strong&gt;: the release note asserts
&lt;a href=&quot;https://github.com/openclaw/openclaw/releases&quot;&gt;tighter isolation&lt;/a&gt; but
does not describe the boundary&amp;#39;s depth, so operators cannot verify it
from the receipt.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hermes Promptware coverage&lt;/strong&gt;: the
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.28&quot;&gt;defense targets a known attack class&lt;/a&gt;;
novel injection vectors outside Brainworm patterns may still pass.
Validate against your own corpus.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flue persisted-session migration&lt;/strong&gt;: v0.9.0
&lt;a href=&quot;https://github.com/withastro/flue/blob/main/CHANGELOG.md&quot;&gt;rejects pre-upgrade session state&lt;/a&gt;
with no automated migration path; a self-scripted migration could
reintroduce stale, unredacted state.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenHands org-deletion blast radius&lt;/strong&gt;: operators on
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/82744a0&quot;&gt;the cascade-delete change&lt;/a&gt;
should enforce backups before any &lt;code&gt;DELETE /api/organizations&lt;/code&gt;, since a
sole-org delete now removes the user identity too.&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>Auto Stops Asking</title><link>https://frontier.bitter.sh/digests/2026-05-13_2026-05-27-weekly/</link><guid isPermaLink="true">https://frontier.bitter.sh/digests/2026-05-13_2026-05-27-weekly/</guid><description>Fifteen days, ten providers, one direction. The change that cuts across
the watchlist this fortnight is uncomfortable to ignore: autonomy stopped
asking for permission.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;Auto Stops Asking&lt;/h1&gt;
&lt;p&gt;Fifteen days, ten providers, one direction. The change that cuts across
the watchlist this fortnight is uncomfortable to ignore: autonomy stopped
asking for permission.&lt;/p&gt;
&lt;p&gt;Claude Code &lt;a href=&quot;https://code.claude.com/docs/en/changelog#2-1-152&quot;&gt;2.1.152&lt;/a&gt;
flipped Auto mode from opt-in to default. Codex
&lt;a href=&quot;https://developers.openai.com/codex/changelog&quot;&gt;26.519&lt;/a&gt; graduated goal
mode out of experimental and turned it on by default across app, IDE,
and CLI. Gemini CLI
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/releases/tag/v0.44.0&quot;&gt;v0.44.0&lt;/a&gt;
collapsed multiple Auto variants into a single mode and added
shell-redirect auto-approval in &lt;code&gt;AUTO_EDIT&lt;/code&gt;. Three providers, three
surfaces, one shape: the permission ceremony that used to gate
productive autonomy is no longer the default surface. Operators don&amp;#39;t
choose to enable autonomy; they decide how to constrain it.&lt;/p&gt;
&lt;p&gt;The other half of the fortnight is the policy substrate that move
requires. Codex CLI 0.133.0 shipped permission profile &lt;strong&gt;inheritance&lt;/strong&gt;
and a managed &lt;code&gt;requirements.toml&lt;/code&gt; enforcement file consulted by the
runtime. Gemini CLI integrated PolicyEngine &lt;strong&gt;into ACP sessions&lt;/strong&gt;,
reaching enforcement into the protocol layer. OpenHands shipped
&lt;strong&gt;org-level LLM profiles&lt;/strong&gt; with two-tier permissions and concurrency-safe
activation. Three different products, three different surfaces, one
direction: policy lives in versioned, org-managed files now — not in
per-session flags.&lt;/p&gt;
&lt;p&gt;These themes are not independent. Autonomy moving from opt-in to baseline
makes per-session permission grants intractable. The policy file is the
correct primitive when the operator&amp;#39;s decision is &amp;quot;constrain the
baseline&amp;quot; rather than &amp;quot;consent to each escalation.&amp;quot;&lt;/p&gt;
&lt;h2&gt;Breaking Changes: Check These Before Upgrading&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Claude Code v2.1.149: a PowerShell permission bypass and a worktree
sandbox scope bug.&lt;/strong&gt; Windows operators with PowerShell allowlists are
affected by PowerShell built-in &lt;code&gt;cd&lt;/code&gt; functions (&lt;code&gt;cd..&lt;/code&gt;, &lt;code&gt;cd\&lt;/code&gt;, &lt;code&gt;cd~&lt;/code&gt;,
&lt;code&gt;X:&lt;/code&gt;) defeating the workspace boundary undetected. Git worktree
workflows are affected by the sandbox write allowlist over-scoping the
main repository root instead of the shared &lt;code&gt;.git&lt;/code&gt; directory. Anthropic
ships these as ordinary changelog entries; the changelog is the de-facto
advisory surface, but no separate page exists. Upgrade past 2.1.149
before deploying. v2.1.147 closes adjacent &lt;code&gt;forceLoginOrgUUID&lt;/code&gt; and
&lt;code&gt;forceLoginMethod&lt;/code&gt; enforcement gaps against third-party-provider and
API-key sessions; v2.1.148 closes a Vertex AI provider bypass.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Claude Code v2.1.152: Auto mode no longer requires opt-in consent.&lt;/strong&gt;
Auto mode — the permission classifier that runs safe actions without
prompting and blocks risky ones — is now the default permission
posture across the install base. Admins relying on the consent dialog
as a visible posture check have lost that surface. Re-audit managed
settings and decide where the equivalent check now lives.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Codex CLI 0.134.0: legacy profile configs rejected with migration
guidance.&lt;/strong&gt; &lt;code&gt;--profile&lt;/code&gt; is the canonical permission selector across
CLI, TUI, and sandbox flows. Scripts using older permission flag-soup
must migrate before upgrade.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenHands main (pre-2026-05-22 SaaS deployments): MCP server and
&lt;code&gt;acp_env&lt;/code&gt; cross-org credential leak.&lt;/strong&gt; Before
&lt;a href=&quot;https://github.com/OpenHands/OpenHands/pull/14528&quot;&gt;PR #14528&lt;/a&gt;, MCP
server configurations added by an org member were broadcast to every
other member&amp;#39;s row. The fix splits agent settings into shared and
private halves and strips legacy leaked values on read. Multi-tenant
SaaS operators on pre-fix deployments should rotate MCP credentials
added before that date and confirm they are on a post-fix main
build (no in-window tagged release yet).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hermes Agent v0.14.0: PyPI distribution, lazy adapter install, and
the proxy.&lt;/strong&gt; Installation moves to
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/pull/26593&quot;&gt;&lt;code&gt;pip install hermes-agent&lt;/code&gt;&lt;/a&gt;;
the &lt;code&gt;[all]&lt;/code&gt; extras are removed in favor of lazy install of heavy
adapters on first use. Cold-start drops ~19s. A native Windows beta
ships. The &lt;code&gt;hermes proxy&lt;/code&gt; command exposes a local OpenAI-compatible
endpoint backed by whichever OAuth provider the operator is signed
into. The PR body does not specify the proxy&amp;#39;s bind address or auth
model; default-loopback-only is the safe assumption to verify, not
assume.&lt;/p&gt;
&lt;h2&gt;Autonomy Stops Asking&lt;/h2&gt;
&lt;p&gt;Three providers shipped default-on autonomy in the same fortnight, and
the framing is consistent enough to deserve its own paragraph.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Claude Code&amp;#39;s Auto mode&lt;/strong&gt; was the explicit feature. Until 2.1.152 it
required consent — operators clicked through a dialog to enable it.
Now it is the default. Auto mode selectively runs safe actions
without prompting and blocks risky ones via a classifier; the
classification is runtime-defined, not enumerated in docs. The same
release adds &lt;code&gt;disallowed-tools&lt;/code&gt; in skill and slash-command frontmatter
(a skill can subtract from the agent&amp;#39;s tool surface) and a
&lt;code&gt;MessageDisplay&lt;/code&gt; hook event that can transform or hide assistant
message text on the output path. Skill authors get a way to scope down;
hook authors get a new vector to filter what operators see.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Codex&amp;#39;s goal mode&lt;/strong&gt; is the long-horizon variant. The 26.519 product
launch graduates it out of experimental across the app, IDE extension,
and CLI; CLI 0.133.0 turns goals on by default with dedicated storage
and progress tracking across active turns. Operators can point Codex
at an objective spanning &amp;quot;hours or even days.&amp;quot; Same launch ships
remote computer use after Mac lock with documented safeguards:
short-lived authorization, covered displays, automatic relock on local
input, manual unlock fallback. The locked-host computer-use surface is
gated, but the gates are policy choices, not absent capability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Gemini CLI&amp;#39;s Auto modes merged.&lt;/strong&gt; The prior fan of Auto variants
collapses to one. The release frames this as UX simplification; in
practice it collapses whatever differentiation the variants carried.
v0.44.0 stable adds shell-redirect auto-approval in &lt;code&gt;AUTO_EDIT&lt;/code&gt; —
described as quality-of-life and also an attack-surface expansion if
the agent is steered toward sensitive write paths.&lt;/p&gt;
&lt;p&gt;Operators who never enabled Auto mode now get its productivity benefit
without ceremony. Operators who used the consent dialog as a manual
sanity check before risky actions must build that check elsewhere —
managed settings, hook policy, or out-of-band review. The accessibility
win and the authority-visibility cost arrive together; the
RESEARCH_CONTRACT calls this the cross-axis tension, and it is the
shape of every default-on change this fortnight.&lt;/p&gt;
&lt;h2&gt;Policy Moves Into Versioned Files&lt;/h2&gt;
&lt;p&gt;The other half of the move is structural. If autonomy is the baseline
and the operator decision is constraint, then per-session flags are the
wrong surface. Three providers shipped, in the same fortnight, the same
answer: policy lives in versioned, org-managed files consulted by the
runtime.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Codex CLI 0.133.0&lt;/strong&gt; added permission profile &lt;strong&gt;inheritance&lt;/strong&gt; — a
profile can derive from another, layering changes on top of a base
instead of redeclaring every grant. Managed &lt;code&gt;requirements.toml&lt;/code&gt;
integration is the org-level enforcement surface; the release describes
it as enforcement, not advice. Runtime refresh lets profiles update
without restart. CLI 0.134.0 then made &lt;code&gt;--profile&lt;/code&gt; the canonical
selector across the CLI, TUI permission flows, and sandbox flows,
rejecting legacy configs with migration guidance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Gemini CLI v0.44.0&lt;/strong&gt; integrated PolicyEngine into ACP (Agent
Communication Protocol) sessions
(&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/issues/27252&quot;&gt;PR #27252&lt;/a&gt;) —
framed as a deadlock fix, but the effect is policy enforcement at the
protocol-session layer, not just at the shell-tool layer. The
&amp;quot;deadlock fix&amp;quot; framing understates the structural shift: enforcement
now reaches into the ACP layer the docs name explicitly as the
delegation primitive.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenHands&lt;/strong&gt; added organization-level LLM profile storage in SaaS
mode (&lt;a href=&quot;https://github.com/OpenHands/OpenHands/pull/14406&quot;&gt;PR #14406&lt;/a&gt;).
Migration 116 adds an encrypted &lt;code&gt;llm_profiles&lt;/code&gt; JSON column on the org
table; six CRUD endpoints sit under
&lt;code&gt;/api/organizations/{org_id}/profiles&lt;/code&gt;. Permissions are two-tier:
&lt;code&gt;VIEW_ORG_SETTINGS&lt;/code&gt; for read; &lt;code&gt;EDIT_ORG_SETTINGS&lt;/code&gt; for create / update
/ delete / rename / &lt;strong&gt;activate&lt;/strong&gt;. Activate is the bigger surface;
the same transaction updates the org&amp;#39;s &lt;code&gt;profiles.active&lt;/code&gt; and the
acting member&amp;#39;s &lt;code&gt;agent_settings_diff&lt;/code&gt;, with &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt;
serializing concurrent writes.&lt;/p&gt;
&lt;p&gt;For enterprise operators, the practical implication is the same across
all three: stop maintaining flat policy in per-session flags. Build a
base policy (Codex profile, Gemini policy file, OpenHands org LLM
profile) and derive per-team variations. The runtime now treats the
file as the source of truth.&lt;/p&gt;
&lt;p&gt;The distribution and signing model for these files is not yet fully
documented in any of the three. That is the next thing to watch.&lt;/p&gt;
&lt;h2&gt;Authority Over Inputs, Three Surfaces&lt;/h2&gt;
&lt;p&gt;The third theme is quieter but the strongest single thread of the
fortnight. Three providers shipped, through three very different
surfaces, the same primitive: structural authority over what the
agent or its inputs can do.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenClaw&lt;/strong&gt;
(&lt;a href=&quot;https://github.com/openclaw/openclaw/releases/tag/v2026.5.26&quot;&gt;v2026.5.26&lt;/a&gt;)
hardened the &lt;strong&gt;inbound-sender&lt;/strong&gt; layer. ClickClack &lt;code&gt;allowFrom&lt;/code&gt; sender
allowlists run before agent dispatch, not as post-dispatch blocking.
Browser snapshot reads honor SSRF policy before reading tab URLs.
Queued system-event text is sanitized so untrusted plugin or channel
labels cannot spoof nested prompt markers. Memory store gets a separate
prompt-like-text reject filter. Tool-call serializations are scrubbed
from replies. The pattern: deny unauthorized senders the chance to
influence agent behavior at all, rather than blocking specific actions
after the agent has been biased.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent Zero&lt;/strong&gt;
(&lt;a href=&quot;https://github.com/agent0ai/agent-zero/releases/tag/v1.17&quot;&gt;v1.17&lt;/a&gt;)
hardened the &lt;strong&gt;host-runtime&lt;/strong&gt; layer. The new &lt;code&gt;computer_use_remote&lt;/code&gt;
tool controls the operator&amp;#39;s actual desktop — outside the
Docker/Xpra container — with platform-specific structural targeting
(macOS Accessibility / Windows UIA / Linux AT-SPI). Every
state-changing action is treated as unverified until a fresh
screenshot visibly confirms the outcome. Agents must stop when no
screenshot is available. macOS approval denials route to a
re-arm-required stop flow rather than silent retry. v1.16 made
screenshot capture ephemeral and context-scoped by default —
captures route through in-process image refs rather than disk, so
the agent no longer leaves screenshot trails by default.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenHands&lt;/strong&gt; (&lt;a href=&quot;https://github.com/OpenHands/OpenHands/pull/14528&quot;&gt;PR #14528&lt;/a&gt;)
hardened the &lt;strong&gt;org-member&lt;/strong&gt; layer. Before the fix, MCP server and
&lt;code&gt;acp_env&lt;/code&gt; configurations added by one org member were broadcast to
every other member&amp;#39;s row. The fix splits agent settings into a shared
half and a private half; private keys go only to the acting member&amp;#39;s
row. The fix also strips legacy leaked values on read so pre-fix data
stops contaminating after upgrade.&lt;/p&gt;
&lt;p&gt;Three providers, three surfaces, one primitive: authority over inputs
applied at the layer the input enters. The shapes are different —
allowlist, vision verification, per-member private settings — but the
principle is the same. Inputs cross trust boundaries with explicit
structural gates, not by prompt discipline.&lt;/p&gt;
&lt;h2&gt;Provider Notes&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Codex (26.519, CLI 0.131--0.134)&lt;/strong&gt; shipped goal-mode graduation,
remote computer use after Mac lock, Appshots, plugin marketplace
sharing, profile inheritance, managed &lt;code&gt;requirements.toml&lt;/code&gt;, &lt;code&gt;codex doctor&lt;/code&gt; diagnostics, Python SDK first-class authentication, &lt;code&gt;codex exec resume --output-schema&lt;/code&gt;, conversation history search, and
read-only MCP concurrency via &lt;code&gt;readOnlyHint&lt;/code&gt;. The product launch and
the CLI minor releases are tightly coordinated; goal-mode graduation
and CLI default-on landed the same day.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Gemini CLI (v0.44.0)&lt;/strong&gt; shipped stable LocalSessionInvocation /
RemoteSessionInvocation protocols (closing the &amp;quot;tests but no observed
remote target&amp;quot; gap on the prior &lt;code&gt;AgentProtocol&lt;/code&gt;), first-wins
prioritize-project agent registration, OAuth refresh preservation
during rotation, keychain auth for &lt;code&gt;--list-sessions&lt;/code&gt; and
non-interactive mode, and MCP OAuth token refresh on
re-authentication. Two weeks of What&amp;#39;s-New digests (Weeks 21--22) are
not yet published; the changelog and release notes are the trailing
surface.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenHands (main branch, no tagged release in window)&lt;/strong&gt; shipped the
ACP agent settings UI, organization-level LLM profiles, scoped
MCP/ACP env to acting org members, Azure DevOps via Microsoft Entra ID
OAuth/OIDC, Bitbucket DC and Jira DC integrations with KOTS-managed
service accounts, and a batched CVE remediation cluster (9+ deps).
The shape is consolidation as the enterprise-self-hosted shell around
third-party agents and Data Center source control.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent Zero (v1.15--v1.18)&lt;/strong&gt; shipped host-machine desktop control with
vision verification, ephemeral context-scoped capture by default,
speech as independent built-in plugins (breaking removal of legacy
APIs), &lt;code&gt;document_artifact&lt;/code&gt; → &lt;code&gt;office_artifact&lt;/code&gt; rename, dedicated
Markdown editor plugin, file-browser routing formalization,
configurable &lt;code&gt;max_active_skills&lt;/code&gt;, MCP multimodal content handling
fix, and skill visibility controls (operators can hide skills from
the model-facing catalog).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenClaw (v2026.5.18--v2026.5.26)&lt;/strong&gt; shipped the content-boundary
hardening suite, transcripts promoted to a core source-provider path
with Meeting Notes plugin, reaction-based approvals across Signal /
iMessage / WhatsApp, named model login profiles with credential
migrations for Hermes / OpenCode / Codex, realtime Talk inspectable /
steerable / cancellable across Web UI and Discord voice, on-by-default
gateway auth rate-limiter for unset &lt;code&gt;gateway.auth.rateLimit&lt;/code&gt;, and
release verification stanzas with full CI run URLs and evidence
manifests.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hermes Agent (v0.14.0)&lt;/strong&gt; is the Foundation Release: PyPI
distribution, lazy adapter install with supply-chain advisory
checker, native Windows beta, Zed ACP Registry listing, the
OpenAI-compatible local &lt;code&gt;hermes proxy&lt;/code&gt;, Honcho identity-mapping with
peer-id in cache signatures, isolated credential pool on provider
fallback, and a sustained &lt;code&gt;fix(kanban)&lt;/code&gt; corruption-hardening wave
post-release.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Paperclip (v2026.513, v2026.517, v2026.525)&lt;/strong&gt; shipped scoped agent
permissions and protected assignments via a real authorization
service, routine env secrets with &lt;code&gt;agent &amp;lt; project &amp;lt; routine&lt;/code&gt;
precedence, board-managed document locks, Modal as a first-party
sandbox plugin, and an ACPX-Claude adapter that resolves bare Claude
model IDs, surfaces real diagnostic detail, and respects user
&lt;code&gt;~/.claude/settings.json&lt;/code&gt; permissions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pi coding agent (v0.74.1--v0.76.0)&lt;/strong&gt; shipped supply-chain
hardening (npm shrinkwrap, lifecycle-script controls, isolated
install smoke tests), &lt;code&gt;--session-id&lt;/code&gt; explicit session naming and
&lt;code&gt;excludeFromContext&lt;/code&gt; flag for the bash RPC, plus provider retry and
timeout bounds. Supply-chain posture lands the same fortnight as
Hermes&amp;#39;s lazy-install advisory work — two different providers
converging on the same hygiene.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Flue (Tier 2; v0.6.0--v0.8.0)&lt;/strong&gt; shipped the agents-vs-workflows
category split (persistent &lt;code&gt;agents/&lt;/code&gt; via &lt;code&gt;createAgent&lt;/code&gt; vs finite
&lt;code&gt;workflows/&lt;/code&gt; via &lt;code&gt;run&lt;/code&gt;), &lt;code&gt;local()&lt;/code&gt; sandbox factory with env
allowlist, Cloudflare Shell sandbox replacing the previously
misleading R2 model, run observability with bare runId routes, an
OpenAPI sub-app, and a read-only admin sub-app. The runs-as-workflow-
only choice is the cleanest &amp;quot;what is the receipt?&amp;quot; answer this cycle.&lt;/p&gt;
&lt;h2&gt;What To Try&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Codex operators&lt;/strong&gt;: point goal mode at an objective spanning hours
or days on 26.519 + CLI 0.133.0; observe the dedicated storage and
progress-tracking surface. If you have multiple teams, draft a base
permission profile and derive per-team variations using the new
inheritance.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Claude Code operators&lt;/strong&gt;: audit managed settings before upgrade to
2.1.152 if you relied on the Auto mode consent dialog as a manual
posture check. Skill authors should evaluate &lt;code&gt;disallowed-tools&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenHands evaluators&lt;/strong&gt;: enable &lt;code&gt;ENABLE_ACP&lt;/code&gt; and point it at Claude
Code, Codex, or Gemini CLI as the back-end. Observe how
LLM/Condenser/MCP settings grey out — authority shifts to the
back-end agent and the UI reflects the transfer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent Zero operators (host adopters)&lt;/strong&gt;: enable
&lt;code&gt;computer_use_remote&lt;/code&gt; on a non-critical host. Test the
vision-verification stop flow: trigger a state change, withhold a
screenshot, observe whether the agent halts as the release notes
describe.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hermes adopters&lt;/strong&gt;: try &lt;code&gt;pip install hermes-agent&lt;/code&gt; and route Codex
CLI, Aider, Cline, or Continue through &lt;code&gt;hermes proxy&lt;/code&gt; against a
single OAuth subscription. Confirm the proxy&amp;#39;s bind address before
exposing it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenClaw operators&lt;/strong&gt;: verify your &lt;code&gt;gateway.auth.rateLimit&lt;/code&gt; setting;
the unset case is now ratelimited by default. Test the pre-dispatch
&lt;code&gt;allowFrom&lt;/code&gt; allowlist with a sender outside your trust set.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What Remains Uncertain&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Codex managed &lt;code&gt;requirements.toml&lt;/code&gt; distribution and signing&lt;/strong&gt;: the
release notes describe org-level enforcement but not how the file
reaches the runtime, whether it is signed, or whether tampering is
detectable. Enterprise adopters cannot rely on enforcement without
this answer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gemini PolicyEngine-in-ACP default posture&lt;/strong&gt;: per-session
enforcement by default, or only when an operator has configured a
policy? Release notes frame it as a deadlock fix. The structural
shift implied by the change is larger than that framing suggests.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent Zero ephemeral-capture audit evidence&lt;/strong&gt;: where does
host-action evidence land for audit when screenshots are ephemeral?
Operators cannot browse on-disk caches to confirm what the agent
saw.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hermes &lt;code&gt;hermes proxy&lt;/code&gt; bind and auth model&lt;/strong&gt;: PR body does not
detail loopback-only binding or shared-token requirement.
Default-loopback is the safe assumption to verify, not assume.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gemini remote session invocation target&lt;/strong&gt;: the protocol is stable
but where remote invocations actually run (Google-hosted,
operator-hosted, both) is undocumented.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenHands no-tagged-release operators&lt;/strong&gt;: the strategic
positioning, the org-LLM-profile feature, and the cross-org
credential leak fix are all main-branch-only. Operators tracking
the 1.x release channel see none of this until the next release
consolidates.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The composition pattern&lt;/strong&gt;: OpenHands ACP UI fronting Claude Code,
Codex, or Gemini CLI is a &lt;em&gt;multi-product composition&lt;/em&gt; claim that
does not fit the current finding schema&amp;#39;s single-subject
assumption. Paperclip&amp;#39;s ACPX-Claude adapter respecting
&lt;code&gt;~/.claude/settings.json&lt;/code&gt; is the same shape. This is a schema
doctrine question recorded in the audit note for this digest.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Two weeks of Claude Code What&amp;#39;s-New digests not yet published&lt;/strong&gt;
(Weeks 21--22). The official_digest priority-1 surface in
&lt;code&gt;sources/claude-code.yml&lt;/code&gt; is missing this fortnight. Harvesters
running this window must fall through to the changelog only.&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>Governance Becomes Enforcement</title><link>https://frontier.bitter.sh/digests/2026-05-12-weekly/</link><guid isPermaLink="true">https://frontier.bitter.sh/digests/2026-05-12-weekly/</guid><description>Five days, nine providers. The change that cuts across all of them is
deceptively simple: governance is moving from convention to enforcement.</description><pubDate>Tue, 12 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;Governance Becomes Enforcement&lt;/h1&gt;
&lt;p&gt;Five days, nine providers. The change that cuts across all of them is
deceptively simple: governance is moving from convention to enforcement.&lt;/p&gt;
&lt;p&gt;The older model was: the agent could do X, and operators relied on prompting,
documentation, and trust to prevent the wrong X. The new model -- visible in
at least four independent places this week -- is: the wrong X is structurally
blocked, logged, or defaults to off.&lt;/p&gt;
&lt;p&gt;Hermes made secret redaction the default, not an opt-in. Paperclip blocked
agents from self-transitioning to review without a real review path.
OpenHands defaulted sub-agent delegation to off and surfaced evaluation
scores in the UI. Agent Zero defaulted document output to open formats and
told agents to prefer named actions over coordinate clicks. These are
different tools, different teams, and different architectures. The pattern is
the same: risky behavior requires explicit enablement; safe behavior is what
happens by default.&lt;/p&gt;
&lt;p&gt;The other half of the week was about durability: agents that can stay on task
across turns, sessions, crashes, and context compression. Claude Code shipped a
full &lt;a href=&quot;https://code.claude.com/docs/en/changelog#2-1-139&quot;&gt;&lt;code&gt;claude agents&lt;/code&gt;&lt;/a&gt;
supervisor surface and a &lt;a href=&quot;https://code.claude.com/docs/en/changelog#2-1-139&quot;&gt;&lt;code&gt;/goal&lt;/code&gt;&lt;/a&gt;
command. Hermes shipped the same &lt;code&gt;/goal&lt;/code&gt; primitive and backed it with a
Kanban board that enforces completion evidence before marking work done.
Gemini made sessions portable across machines. Agent Zero made desktop
sessions persistent across navigation.&lt;/p&gt;
&lt;p&gt;These two themes -- governance as enforcement, long-horizon durability -- are
not coincidental. You need both. Durability without governance means persistent
agents doing the wrong thing persistently. Governance without durability means
agents that are safe but cannot hold a goal long enough to finish anything.&lt;/p&gt;
&lt;h2&gt;Breaking Changes: Check These Before Upgrading&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Hermes v0.13.0: secret redaction is now ON by default.&lt;/strong&gt; If you have Hermes
log pipelines that read raw agent output, they will receive sanitized logs
after upgrade. This is the right call as a default; it is a breaking change for
tooling that depends on unredacted output.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Paperclip v2026.512.0: SSH host environment was leaking.&lt;/strong&gt; Before PR #5142,
SSH remote execution forwarded the Paperclip host&amp;#39;s environment variables --
including API keys and tokens -- to remote execution targets. Operators running
SSH-managed agents should treat this as a security advisory and upgrade.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hermes v0.13.0: Discord role allowlists are now guild-scoped.&lt;/strong&gt; The prior
behavior allowed a role match from any guild to authorize a cross-guild DM --
a CVSS 8.1 bypass. Discord operators using role-based access control should
reverify their configuration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Claude Code v2.1.x: &lt;code&gt;worktree.baseRef&lt;/code&gt; now defaults to &lt;code&gt;&amp;quot;fresh&amp;quot;&lt;/code&gt;.&lt;/strong&gt; New
worktrees now branch from &lt;code&gt;origin/&amp;lt;default&amp;gt;&lt;/code&gt; rather than the local &lt;code&gt;HEAD&lt;/code&gt;.
Operators who depended on new worktrees carrying unpushed local commits should
set &lt;code&gt;worktree.baseRef: &amp;quot;head&amp;quot;&lt;/code&gt; explicitly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pi v0.74.0: package scope migration underway.&lt;/strong&gt; The npm package is moving
from &lt;code&gt;@mariozechner/pi-coding-agent&lt;/code&gt; to &lt;code&gt;@earendil-works/pi-coding-agent&lt;/code&gt;.
Global installs: run &lt;code&gt;pi update --self&lt;/code&gt; once the new package publishes.
CI, Dockerfiles, and &lt;code&gt;package.json&lt;/code&gt; pins: update the reference manually.&lt;/p&gt;
&lt;h2&gt;Evidence Before Completion&lt;/h2&gt;
&lt;p&gt;Two providers shipped independent enforcement of the same principle this week:
agents cannot self-attest that work is complete.&lt;/p&gt;
&lt;p&gt;Hermes&amp;#39;s Kanban board now requires workers to have
&lt;a href=&quot;https://github.com/NousResearch/hermes-agent/releases/tag/v2026.5.7&quot;&gt;valid card references&lt;/a&gt;
before a task card moves to done. The hallucination gate verifies that cards a
worker claims to have created actually exist and belong to that worker --
blocking phantom references and cross-worker card claims. Workers that exit
without completing are auto-blocked. Heartbeats detect stale workers; zombie
processes are detected on both platforms. Per-task retry budgets prevent silent
cascades.&lt;/p&gt;
&lt;p&gt;Paperclip&amp;#39;s control-plane fix (PR #5292) blocks agents from
self-transitioning an issue to &lt;code&gt;in_review&lt;/code&gt; state. The &lt;code&gt;in_review&lt;/code&gt; transition
now requires a real review precondition, not just a model deciding it is ready
for review.&lt;/p&gt;
&lt;p&gt;These are different mechanisms -- Hermes&amp;#39;s is multi-agent coordination,
Paperclip&amp;#39;s is a state-machine gate -- but the observation is the same:
agent claims about their own completion are not sufficient evidence of
completion. The system needs to verify independently.&lt;/p&gt;
&lt;p&gt;For operators building multi-agent workflows: the completion contract is now
part of the orchestration contract, not just the prompt.&lt;/p&gt;
&lt;h2&gt;Long-Horizon Durability&lt;/h2&gt;
&lt;p&gt;The week&amp;#39;s most operator-visible features are all about agents staying on task.&lt;/p&gt;
&lt;p&gt;Claude Code&amp;#39;s
&lt;a href=&quot;https://code.claude.com/docs/en/changelog#2-1-139&quot;&gt;&lt;code&gt;claude agents&lt;/code&gt;&lt;/a&gt;
supervisor view shows every session by state -- working, waiting on you,
done, failed -- with background sessions running under a persistent supervisor
process that survives terminal close. Sessions isolate to separate git
worktrees automatically. You can dispatch from the prompt, background an
active session with one keystroke, and reply to blocked sessions from a peek
panel without attaching. Alongside it, the
&lt;a href=&quot;https://code.claude.com/docs/en/changelog#2-1-139&quot;&gt;&lt;code&gt;/goal&lt;/code&gt;&lt;/a&gt; command sets a
completion condition that Claude tracks across turns until met.&lt;/p&gt;
&lt;p&gt;Hermes&amp;#39;s &lt;code&gt;/goal&lt;/code&gt; Ralph loop does the same thing at the session level, backed
by the Kanban reliability primitives above. Lock the agent onto a target and
it persists across context compression, turn budgets, and branching. The
Kanban layer handles the multi-agent case: workers pick up tasks, execute
them, and cannot mark themselves done without evidence.&lt;/p&gt;
&lt;p&gt;Gemini CLI&amp;#39;s
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/3805640530a9&quot;&gt;session export/import&lt;/a&gt;
makes sessions portable: export a session, move it to another machine,
import and continue. State crosses as a serializable object rather than
ambient context.&lt;/p&gt;
&lt;p&gt;Agent Zero&amp;#39;s persistent desktop lifecycle (v1.13) changes the semantics of
the desktop environment: a single Xpra XFCE session stays alive across
canvas navigation, modal switches, and keepalive hosts. Explicit shutdown is
distinguished from crashes; unsafe affordances are hidden. The desktop is now
a persistent surface, not one that resets on navigation.&lt;/p&gt;
&lt;h2&gt;Authority Made Visible&lt;/h2&gt;
&lt;p&gt;Three separate tools shipped changes this week that make permission state
observable at a glance.&lt;/p&gt;
&lt;p&gt;Codex&amp;#39;s TUI now shows
&lt;a href=&quot;https://github.com/openai/codex/commit/e6312d44f073&quot;&gt;&lt;code&gt;permissions&lt;/code&gt; and &lt;code&gt;approval-mode&lt;/code&gt;&lt;/a&gt;
as separately configurable status-line items. The most common operator surprise
before this -- forgetting which permission posture is active before an
irreversible command -- is now a visual check.&lt;/p&gt;
&lt;p&gt;Claude Code&amp;#39;s &lt;code&gt;claude agents&lt;/code&gt; supervisor makes session state visible: working,
waiting, done, or failed. A single panel replaces five terminal windows. The
live overlay on &lt;code&gt;/goal&lt;/code&gt; tracks elapsed time, turns, and tokens consumed.&lt;/p&gt;
&lt;p&gt;OpenHands&amp;#39;s new critic evaluation display shows a score (0--1), star rating
(0--5), and color-coded bands in the GUI for every completed session:
&lt;code&gt;agent_behavioral_issues&lt;/code&gt;, &lt;code&gt;user_followup_patterns&lt;/code&gt;, and
&lt;code&gt;infrastructure_issues&lt;/code&gt;. The display is deployment-controlled via
&lt;code&gt;OH_ENABLE_CRITIC_BY_DEFAULT&lt;/code&gt; (disabled by default). When enabled, it creates
a feedback loop that doesn&amp;#39;t exist when evaluation lives in logs: users see
when sessions are degrading in real time.&lt;/p&gt;
&lt;p&gt;Agent Zero&amp;#39;s
&lt;a href=&quot;https://github.com/agent0ai/agent-zero/releases/tag/v1.13&quot;&gt;Linux Desktop skill&lt;/a&gt;
takes this in a different direction: it tells the agent to use named
structured actions (&lt;code&gt;cell_edit&lt;/code&gt;, &lt;code&gt;app_launch&lt;/code&gt;, &lt;code&gt;form_submit&lt;/code&gt;) and treat
coordinate clicks (&lt;code&gt;click(x=423, y=187)&lt;/code&gt;) as a last resort. The principle is
audit clarity. &lt;code&gt;cell_edit(B3, 42)&lt;/code&gt; is meaningful; a coordinate click is not.
An action that can be named and described is easier to verify, replay, and
record than one that can only be described by its position.&lt;/p&gt;
&lt;h2&gt;Default-Closed Governance&lt;/h2&gt;
&lt;p&gt;The week also continued a trend across the watchlist: sensitive capabilities
default to closed, and operators must explicitly enable them.&lt;/p&gt;
&lt;p&gt;OpenHands&amp;#39;s sub-agent delegation (&lt;code&gt;enable_sub_agents&lt;/code&gt;) defaults to off. Behind
the gate, the orchestrator routes tasks to specialized sub-agents -- a bash
runner, a code explorer, a web researcher -- each with tool surfaces defined by TaskToolSet rather than full access. The default-off choice is right: routing
work to specialized agents changes session scope, cost, and authority in ways
that require deliberate operator decision.&lt;/p&gt;
&lt;p&gt;OpenClaw&amp;#39;s skill archive upload gate (&lt;code&gt;skills.install.allowUploadedArchives&lt;/code&gt;)
defaults to closed. Trusted Gateway clients can stage and install zip-backed
skills only when the operator explicitly enables the flag. OpenClaw keeps
repeating this pattern: code-execution surfaces are opt-in, explicit, and
documented as requiring trust.&lt;/p&gt;
&lt;p&gt;Agent Zero&amp;#39;s ODF-first document default (v1.13) inverts the prior assumption:
document artifacts now default to ODT/ODS/ODP (open formats) rather than
DOCX/XLSX/PPTX. OOXML compatibility requires explicit opt-in. For operators
with downstream workflows expecting Office XML output, this is a change to
verify before upgrading.&lt;/p&gt;
&lt;h2&gt;Provider Notes&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Claude Code (v2.1.139)&lt;/strong&gt; adds &lt;code&gt;settings.autoMode.hard_deny&lt;/code&gt;: hard blocks that
no allow rule can override. The
&lt;code&gt;continueOnBlock&lt;/code&gt; option for PostToolUse hooks feeds the rejection reason back
so Claude can adapt rather than just stop. API key auth now disables Remote
Control, &lt;code&gt;/schedule&lt;/code&gt;, and claude.ai MCP connectors -- operators using API key
auth should audit reliance on those surfaces.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenClaw (v2026.5.10 beta)&lt;/strong&gt; adds per-agent
&lt;a href=&quot;https://docs.openclaw.ai/message&quot;&gt;message send restrictions&lt;/a&gt;
(&lt;code&gt;tools.message.crossContext&lt;/code&gt;, &lt;code&gt;tools.message.actions.allow&lt;/code&gt;) that let you
deploy a sandboxed agent that can only reply in the thread it was addressed
in. Memory auto-promotion is now bounded: the dreaming process compacts the
oldest sections when the budget is reached, while preserving user-authored
notes. Transcript reads are now streaming; peak memory for a long session
dropped roughly 90%.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Paperclip (v2026.512.0)&lt;/strong&gt; adds secrets provider vault configuration with
AWS Secrets Manager as the first remote-import backend. The database gains
&lt;code&gt;secret_access_events&lt;/code&gt; and &lt;code&gt;company_secret_provider_configs&lt;/code&gt; tables. The new
&lt;a href=&quot;https://github.com/paperclipai/paperclip&quot;&gt;&lt;code&gt;cursor_cloud&lt;/code&gt; adapter&lt;/a&gt; routes
work to Cursor&amp;#39;s hosted-agent platform.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent Zero (v1.11--v1.13)&lt;/strong&gt; completes what it calls the &amp;quot;visible computer&amp;quot;:
browser with
&lt;a href=&quot;https://github.com/agent0ai/agent-zero/releases/tag/v1.11&quot;&gt;multi-tab parallel fanout&lt;/a&gt;,
LibreOffice desktop via Xpra/XFCE, and a persistent desktop session. The
&lt;code&gt;multi&lt;/code&gt; browser action fans out reads or mutations across tabs in a single
tool call with parallel execution.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Gemini CLI (v0.41.0)&lt;/strong&gt; adds a pluggable
&lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/54f1e8c6d7e2&quot;&gt;&lt;code&gt;AgentProtocol&lt;/code&gt;&lt;/a&gt;
with local and remote backends, forcing the &amp;quot;where does delegated work
actually run&amp;quot; question into a surface that can be inspected and configured.
Workspace trust now enforces in headless mode; shell command validation gains
a core-tools allowlist.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pi coding agent (v0.74.0)&lt;/strong&gt; migrates from &lt;code&gt;badlogic/pi-mono&lt;/code&gt; to the Earendil
Works organization. JSONC parsing for &lt;code&gt;models.json&lt;/code&gt; is new (comments and
trailing commas now valid).&lt;/p&gt;
&lt;h2&gt;What To Try&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Hermes operators&lt;/strong&gt;: verify your log pipeline handles sanitized output before
upgrading to v0.13.0. Redaction is now on.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Paperclip operators running SSH&lt;/strong&gt;: upgrade before deploying new remote
agents. The host env isolation fix is silent in prior versions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Claude Code&lt;/strong&gt;: dispatch a background session with &lt;code&gt;claude --bg &amp;quot;&amp;lt;prompt&amp;gt;&amp;quot;&lt;/code&gt;,
use &lt;code&gt;claude agents&lt;/code&gt; to monitor, and test peek/reply from the list. Set a
&lt;code&gt;/goal&lt;/code&gt; on a multi-step task and inspect the turn/token overlay.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenHands&lt;/strong&gt;: enable &lt;code&gt;enable_sub_agents&lt;/code&gt; in a multi-task session. Observe
whether sub-agent scoping reduces total session cost or context accumulation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent Zero&lt;/strong&gt;: create a Writer document and confirm the output is ODT (not
DOCX) in v1.13+. Verify your downstream tooling handles ODT, or explicitly
configure OOXML output.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex&lt;/strong&gt;: add both &lt;code&gt;permissions&lt;/code&gt; and &lt;code&gt;approval-mode&lt;/code&gt; to your status line
if you run multiple permission profiles.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What Remains Uncertain&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Hermes Kanban hallucination gate&lt;/strong&gt;: what does verification involve? Is it
model-based, schema-based, or rule-based? The gate&amp;#39;s false-positive rate
under real multi-agent workloads is not yet documented.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Paperclip &lt;code&gt;in_review&lt;/code&gt; gate&lt;/strong&gt;: what constitutes a &amp;quot;real review path&amp;quot;? The
PR notes do not define whether a human reviewer, an automated review step,
or a configured participant list is required.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenHands critic calibration&lt;/strong&gt;: what does a score of 0.4 mean operationally?
When does &lt;code&gt;agent_behavioral_issues&lt;/code&gt; fire versus &lt;code&gt;user_followup_patterns&lt;/code&gt;?
The calibration methodology is not yet documented.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gemini &lt;code&gt;RemoteSubagentProtocol&lt;/code&gt;&lt;/strong&gt;: ships with tests but no observed remote
target. Whether the remote execution surface runs on a Google-hosted
infrastructure or a user-controlled one is not yet established.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Claude Code &lt;code&gt;/ultrareview&lt;/code&gt;&lt;/strong&gt;: the research preview returns verdicts to
CLI/Desktop but the output schema is not documented. How should a CI
pipeline ingest or route the findings?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent Zero desktop state&lt;/strong&gt;: is there a session timeout, an idle cleanup,
or a storage limit for persistent Xpra sessions? Or does the operator manage
cleanup entirely manually?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenClaw skill archive trust model&lt;/strong&gt;: &lt;code&gt;skills.install.allowUploadedArchives&lt;/code&gt;
is opt-in, but signature checking and sandbox isolation for uploaded archives
are not yet documented.&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>The Harness Leaves The Chat Box</title><link>https://frontier.bitter.sh/digests/2026-04-23_2026-05-07-frontier-rollup-expanded/</link><guid isPermaLink="true">https://frontier.bitter.sh/digests/2026-04-23_2026-05-07-frontier-rollup-expanded/</guid><description>The last two weeks of commits make one thing clear: the interesting action in coding agents is no longer confined to the model or the chat transcript.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;The Harness Leaves The Chat Box&lt;/h1&gt;
&lt;p&gt;The last two weeks of commits make one thing clear: the interesting action in coding agents is no longer confined to the model or the chat transcript.&lt;/p&gt;
&lt;p&gt;Agent harnesses are becoming operating surfaces.&lt;/p&gt;
&lt;p&gt;Codex is adding persistent goals, session metadata, memory plumbing, plugin controls, sandbox work, and cloud executor paths. Gemini CLI is treating memory as a reviewable patch, with workspace trust, approval modes, shell safety, and structured non-interactive output close behind. Hermes is sanding down the rough edges of persistent personal agents: gateways, systemd, voice, themes, model providers, skills, search, kanban, and memory scoping. Pi keeps proving the opposite design lesson: a thin harness can move quickly because integrations can be added, removed, or rewritten without becoming the whole product.&lt;/p&gt;
&lt;p&gt;The expanded watchlist changes the story. OpenClaw shows that accessibility is not a side quest; ordinary surfaces like Discord, Telegram, WhatsApp, OAuth, voice, onboarding, and visible progress are where agents become usable. Agent Zero shows the workcell becoming literal: browser, desktop, documents, file browser, screenshots, OAuth, and time-travel state. Paperclip shows the company/control-plane version of the problem: remote provisioning, sandbox providers, cost summaries, roles, liveness, pause/resume, and stale session recovery. OpenHands shows what happens when a harness becomes a platform: app server, model profiles, MCP proxying, secrets, security redaction, self-hosted integrations, sandbox grouping, and old runtime cleanup.&lt;/p&gt;
&lt;p&gt;The frontier is not one winning agent. The frontier is the environment around agents getting thicker.&lt;/p&gt;
&lt;h2&gt;The Week In One Sentence&lt;/h2&gt;
&lt;p&gt;Coding agents are gaining goals, memory, computers, permissions, gateways, integrations, and supervision layers; the durable question is who owns the loop around all of that.&lt;/p&gt;
&lt;h2&gt;Main Signals&lt;/h2&gt;
&lt;h3&gt;1. Persistent Agent State Is Becoming A Product Surface&lt;/h3&gt;
&lt;p&gt;The strongest single signal is still Codex &lt;a href=&quot;https://github.com/openai/codex/commit/f09e1936e0fd464dcea78fe55b84bd20f721cad6&quot;&gt;&lt;code&gt;/goal&lt;/code&gt;&lt;/a&gt;. It is not just a UX affordance. The goal validation work shows that persistent objectives now deserve first-class validation, paste handling, queued-command behavior, and user guidance.&lt;/p&gt;
&lt;p&gt;Gemini&amp;#39;s &lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/a7beb890d093e2cf66ed1ac8debff690b75e1f6d&quot;&gt;Auto Memory&lt;/a&gt; inbox points in the same direction from another angle: memory should be proposed, reviewed, and accepted, not silently smeared into hidden context. Hermes adds memory scoping and &lt;a href=&quot;https://github.com/NousResearch/hermes-agent/commit/fe8560fc1249b4a7e448b5c3b80a7d213df9d78f&quot;&gt;Curator&lt;/a&gt; commands. OpenClaw is making agent progress visible in chat with &lt;a href=&quot;https://github.com/openclaw/openclaw/commit/61223a74a43fd8768c426d5b22f1633dbad37477&quot;&gt;timeline spans&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is a real shift. Agent-side state is becoming more durable, more visible, and more operational.&lt;/p&gt;
&lt;p&gt;Builder question:&lt;/p&gt;
&lt;p&gt;What goal, memory, session, recap, skill report, or thread state shaped this run?&lt;/p&gt;
&lt;h3&gt;2. The Agent Interface Is Becoming A Visible Computer&lt;/h3&gt;
&lt;p&gt;Agent Zero is the clearest evidence. It replaced a browser-use agent with a &lt;a href=&quot;https://github.com/agent0ai/agent-zero/commit/983d431a5eb785eb9deba9fdfd471fa93f349603&quot;&gt;native browser&lt;/a&gt;, then added a &lt;a href=&quot;https://github.com/agent0ai/agent-zero/commit/fa7eef1919901093b117a98ad6e402d809687cf6&quot;&gt;Chromium runtime&lt;/a&gt;, browser tabs, screenshot previews, annotation, file browser search, ZIP downloads, Linux desktop controls, document canvas, LibreOffice runtime, and OAuth/quota visibility.&lt;/p&gt;
&lt;p&gt;OpenHands is moving in the same broad direction from the platform side with &lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/90cf5f8003c247597481bcbef9a5aa73eb899e10&quot;&gt;sandbox grouping&lt;/a&gt;, app-server routing, ACP/MCP surfaces, user secrets, model profiles, and enterprise integrations. Paperclip adds &lt;a href=&quot;https://github.com/paperclipai/paperclip/commit/90631b09b36fa028ad24ca5375bfa50e3602799c&quot;&gt;remote provisioning&lt;/a&gt; and sandbox provider work. Codex is adding cloud executor paths and sandbox hardening.&lt;/p&gt;
&lt;p&gt;The chat box is not enough. Serious agent work wants a visible machine.&lt;/p&gt;
&lt;p&gt;Builder question:&lt;/p&gt;
&lt;p&gt;Can I see the browser, files, runtime, screenshots, credentials, and artifacts that shaped this work?&lt;/p&gt;
&lt;h3&gt;3. Permissions, Secrets, And Sandboxes Are Moving Into The Foreground&lt;/h3&gt;
&lt;p&gt;This window is full of authority work. Codex has &lt;a href=&quot;https://github.com/openai/codex/commit/5119680f85ed01fe039ee8fba0245de24f3a5e37&quot;&gt;permission profiles&lt;/a&gt;, sandbox profiles, plugin sharing controls, MCP metadata, and Linux sandbox hardening. Gemini has &lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/a38f393af77c0ccf50da10d73c84cfb594dd8175&quot;&gt;workspace trust&lt;/a&gt;, private memory patch allowlists, shell safety evals, &lt;a href=&quot;https://github.com/google-gemini/gemini-cli/commit/40b384de2c1d251c9d13a6359216a9e6cff5a254&quot;&gt;approval-mode-aware subagents&lt;/a&gt;, and policy-engine work. OpenHands tightened &lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/61e3dc2cadbefd4e0649b7c141ac2335c021ad2b&quot;&gt;redaction&lt;/a&gt; and removed a &lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/0c6c461555f8651347ed140f1c555ff8a88ddf56&quot;&gt;secret log&lt;/a&gt;. OpenClaw is fixing &lt;a href=&quot;https://github.com/openclaw/openclaw/commit/b6ae0b83a61a1f779ee41b5d639b6049bfd422ce&quot;&gt;allowlists&lt;/a&gt;, subagent security docs, OAuth labels, and live exec output limits. Paperclip is adding security roles and sandbox provider contracts. Agent Zero keeps browser and office surfaces opt-in and exposes OAuth disconnect and quota visibility.&lt;/p&gt;
&lt;p&gt;This is the right direction. The harness is starting to show its authority model.&lt;/p&gt;
&lt;p&gt;Builder question:&lt;/p&gt;
&lt;p&gt;What could this agent read, change, execute, install, send, or leak?&lt;/p&gt;
&lt;h3&gt;4. Accessibility Is A Frontier Capability&lt;/h3&gt;
&lt;p&gt;OpenClaw is the necessary corrective to an overly technical reading of the market. Its commits are full of work that makes agents usable by normal people: &lt;a href=&quot;https://github.com/openclaw/openclaw/commit/329580c64d13657592c3fabb97ff567c2e292bb6&quot;&gt;setup recovery&lt;/a&gt;, stale plugin repair, Discord voice behavior, Telegram reactions, WhatsApp identity mapping, &lt;a href=&quot;https://github.com/openclaw/openclaw/commit/2b4b60b5514b47d8e242b9b11d9b395037e6674b&quot;&gt;OAuth labels&lt;/a&gt;, progress previews, chat drafts, typography cleanup, install recovery, and group allowlists.&lt;/p&gt;
&lt;p&gt;Hermes is doing adjacent work through &lt;a href=&quot;https://github.com/NousResearch/hermes-agent/commit/6388aafbd6cbfd22c26036291d884d4055b5f6bc&quot;&gt;setup fixes&lt;/a&gt;, voice push-to-talk parity, dashboard themes, gateway restart readiness, provider pickers, and messaging surfaces. Agent Zero is making the computer visible with &lt;a href=&quot;https://github.com/agent0ai/agent-zero/commit/c2fb2c3c94e1e1c85b783252332b3fc003f39f2b&quot;&gt;screenshot previews&lt;/a&gt;. Pi is improving login, terminal rendering, compact resource reads, clipboard behavior, and &lt;a href=&quot;https://github.com/badlogic/pi-mono/commit/010e9acfe959f437613bcba7139b264012ca43a4&quot;&gt;quickstart&lt;/a&gt; docs. Gemini is making memory reviewable and headless auth more reliable. OpenHands is exposing model names and model switching in the UI.&lt;/p&gt;
&lt;p&gt;That matters. Accessibility is not softness. It is distribution, trust, and operator leverage.&lt;/p&gt;
&lt;p&gt;Builder question:&lt;/p&gt;
&lt;p&gt;Can a real person start, understand, recover, and control this thing without learning the project owner&amp;#39;s private ontology?&lt;/p&gt;
&lt;h3&gt;5. Agent Systems Are Growing Control Planes&lt;/h3&gt;
&lt;p&gt;Paperclip makes the control-plane problem explicit. It is working on &lt;a href=&quot;https://github.com/paperclipai/paperclip/commit/90631b09b36fa028ad24ca5375bfa50e3602799c&quot;&gt;runtime specs&lt;/a&gt;, sandbox providers, &lt;a href=&quot;https://github.com/paperclipai/paperclip/commit/c4269bab59fff7a73ff31797578cc97ece7f160f&quot;&gt;cost summaries&lt;/a&gt;, roles, liveness, stale sessions, issue workflows, ordered sub-issues, pause/resume controls, and remote workspace shaping.&lt;/p&gt;
&lt;p&gt;OpenHands is consolidating around the &lt;a href=&quot;https://github.com/OpenHands/OpenHands/commit/5232d96dab0ca98e691d6307bd0759e943220d1c&quot;&gt;app server&lt;/a&gt;. Hermes has kanban task runners, gateway lifecycle, Curator, &lt;a href=&quot;https://github.com/NousResearch/hermes-agent/commit/f0d278412f8c14e94a11678be424f6a6ddb79fa2&quot;&gt;providers&lt;/a&gt;, and dashboard state. Codex is moving skills, goals, sessions, plugins, and executors into app-server-shaped surfaces. OpenClaw is handling gateway sessions, subagents, plugin metadata, and live execution timelines.&lt;/p&gt;
&lt;p&gt;This is the factory problem in miniature.&lt;/p&gt;
&lt;p&gt;Builder question:&lt;/p&gt;
&lt;p&gt;When agents coordinate across tasks and machines, what keeps the system legible?&lt;/p&gt;
&lt;h3&gt;6. Integrations Are Volatile; The Operating Loop Has To Be Durable&lt;/h3&gt;
&lt;p&gt;Pi added providers, &lt;a href=&quot;https://github.com/badlogic/pi-mono/commit/fe66edd943691f8eac295fef68ce36930c35fa05&quot;&gt;removed providers&lt;/a&gt;, changed &lt;a href=&quot;https://github.com/badlogic/pi-mono/commit/4745a9589883fb8200981ddfecb94a593d6e95a2&quot;&gt;Codex transport&lt;/a&gt;, added auth flows, improved session behavior, and kept terminal output evolving. Hermes is moving model providers into &lt;a href=&quot;https://github.com/NousResearch/hermes-agent/commit/9022804d78e88253d138d448e9107a3884b2b96c&quot;&gt;plugins&lt;/a&gt;. OpenClaw is externalizing &lt;a href=&quot;https://github.com/openclaw/openclaw/commit/42a32298f9681b6af7e8ed001401f24caefa895e&quot;&gt;channel plugins&lt;/a&gt;. OpenHands is replacing config surfaces and moving toward app-server services. Codex and Gemini are evolving plugin, MCP, memory, and approval surfaces quickly.&lt;/p&gt;
&lt;p&gt;This is not a warning against using frontier tools. It is the reason to use them through a durable loop.&lt;/p&gt;
&lt;p&gt;Builder question:&lt;/p&gt;
&lt;p&gt;What should remain stable while the best agent, provider, runtime, protocol, or plugin changes every week?&lt;/p&gt;
&lt;h2&gt;What Serious Builders Should Try&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Test persistent goals, but write down what owns the project-level objective before you trust the agent&amp;#39;s local goal.&lt;/li&gt;
&lt;li&gt;Prefer memory systems that show proposed changes before accepting them.&lt;/li&gt;
&lt;li&gt;Try at least one visible-computer harness. The browser, file system, screenshots, and desktop surface reveal different failure modes than terminal chat.&lt;/li&gt;
&lt;li&gt;Inspect the permissions and sandbox story before giving an agent real credentials.&lt;/li&gt;
&lt;li&gt;Treat messaging and voice surfaces as product lessons, not consumer fluff.&lt;/li&gt;
&lt;li&gt;Track exact harness version, provider, transport, plugin set, sandbox, and credential path for serious runs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What Remains Uncertain&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;OpenClaw&amp;#39;s high commit volume makes it hard to separate durable product movement from rapid stabilization without deeper release-note and diff review.&lt;/li&gt;
&lt;li&gt;This run is commit-harvest focused. Claude Code was excluded because the v0 source contract does not define a public commit stream.&lt;/li&gt;
&lt;li&gt;Commit metadata was broad-sampled across all projects, but only selected high-signal commits received diff-level review.&lt;/li&gt;
&lt;li&gt;The frontier may be converging on visible computers, but the winning shape is still open: local desktop, browser sandbox, remote workcell, hosted app server, messaging agent, or some combination.&lt;/li&gt;
&lt;li&gt;It is unclear which agent-side memories and goals will remain stable enough to integrate deeply versus merely record as tool-local state.&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>Coding Agents Are Becoming Working Environments</title><link>https://frontier.bitter.sh/digests/2026-04-22_2026-05-06-frontier-rollup/</link><guid isPermaLink="true">https://frontier.bitter.sh/digests/2026-04-22_2026-05-06-frontier-rollup/</guid><description>The last two weeks were not about one coding agent pulling ahead. They were
about the layer around coding agents getting more serious.</description><pubDate>Wed, 06 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;Coding Agents Are Becoming Working Environments&lt;/h1&gt;
&lt;p&gt;The last two weeks were not about one coding agent pulling ahead. They were
about the layer around coding agents getting more serious.&lt;/p&gt;
&lt;p&gt;By harness, I mean the practical wrapper around the model: the CLI, permission
system, memory surface, sandbox, plugin layer, review flow, and runtime
assumptions that determine what the agent can actually do.&lt;/p&gt;
&lt;p&gt;Codex added persistent goals. Claude Code pushed deeper into cloud review,
session recaps, plugins, hooks, MCP, and telemetry. Gemini CLI tightened
workspace trust and environment loading while experimenting with reviewable
memory patches. Hermes added a background Curator for skill maintenance. Pi
kept proving the other side of the market: a small terminal harness can move
quickly by keeping the core thin.&lt;/p&gt;
&lt;p&gt;The through line is simple: coding agents are becoming less like chat boxes
and more like working environments.&lt;/p&gt;
&lt;p&gt;That is useful, but it also raises the stakes. If the agent can remember,
review, load plugins, carry goals, and run under different permission modes,
then serious developers need to know which of those surfaces shaped the work.&lt;/p&gt;
&lt;h2&gt;The Signals&lt;/h2&gt;
&lt;h3&gt;Persistent goals move coding agents beyond single sessions&lt;/h3&gt;
&lt;p&gt;Codex &lt;code&gt;/goal&lt;/code&gt; is the strongest signal in this window. It gives the agent
something more durable than a prompt: a persistent objective it can carry
across a longer arc of work.&lt;/p&gt;
&lt;p&gt;That matters because long-horizon development is not just a code-generation
problem. It is an orientation problem. The agent has to stay pointed in the
right direction across sessions, reviews, interruptions, and course
corrections.&lt;/p&gt;
&lt;p&gt;The new question is not &amp;quot;can the agent remember?&amp;quot; It is &amp;quot;what goal is it
pursuing, and who decided that goal is still the right one?&amp;quot;&lt;/p&gt;
&lt;p&gt;For Bitter, the answer should be conservative: use agent goals, but record
which goal was active and reconcile it against the project charter and current
task before treating it as durable project memory.&lt;/p&gt;
&lt;p&gt;Supported by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/findings/codex/&quot;&gt;Codex&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Agent memory is becoming a product surface&lt;/h3&gt;
&lt;p&gt;Claude session recaps, Gemini Auto Memory, and Hermes Curator all point in the
same direction: agent tools are learning how to carry context forward.&lt;/p&gt;
&lt;p&gt;That is good. It also means memory is no longer one thing. A serious run may
now be shaped by chat history, session recaps, generated memory patches, skill
reports, resume state, and local project notes.&lt;/p&gt;
&lt;p&gt;Bitter should not fight those surfaces. It should record which agent-side
memory affected a run, then decide what deserves to become part of the
project&amp;#39;s durable record.&lt;/p&gt;
&lt;p&gt;Supported by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/findings/claude-code/&quot;&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/gemini-cli/&quot;&gt;Gemini CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/hermes-agent/&quot;&gt;Hermes Agent&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Permissions are getting clearer, but every agent does them differently&lt;/h3&gt;
&lt;p&gt;Codex expanded permission profiles and sandbox metadata. Gemini added secure
&lt;code&gt;.env&lt;/code&gt; loading, workspace trust, and shell allowlists. Claude Code kept moving
around plugins, hooks, MCP, telemetry, and permission prompts. Pi&amp;#39;s provider
and extension layers changed quickly.&lt;/p&gt;
&lt;p&gt;The direction is good: agents are exposing more of the authority they run
with. The problem is fragmentation. Every tool names and scopes that authority
in its own way.&lt;/p&gt;
&lt;p&gt;For serious work, &amp;quot;the agent had access&amp;quot; is not enough. The useful question is
more specific: what could it read, what could it change, which plugins were
enabled, which credentials were exposed, which sandbox was active, and which
release channel was running?&lt;/p&gt;
&lt;p&gt;Supported by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/findings/codex/&quot;&gt;Codex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/claude-code/&quot;&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/gemini-cli/&quot;&gt;Gemini CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/pi-coding-agent/&quot;&gt;Pi Coding Agent&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Review is moving inside the agent tools&lt;/h3&gt;
&lt;p&gt;Claude &lt;code&gt;/ultrareview&lt;/code&gt; is the cleanest example: provider-native cloud fleets
can review branches and PRs. Codex multi-agent controls, Gemini subagent and
eval work, and Hermes Curator reports rhyme with it.&lt;/p&gt;
&lt;p&gt;This is a useful direction. Agent tools should be able to criticize their own
work. But native review is still evidence, not truth.&lt;/p&gt;
&lt;p&gt;A review surface can produce a useful claim: &amp;quot;this looks risky,&amp;quot; &amp;quot;this path
failed,&amp;quot; &amp;quot;this patch needs another pass.&amp;quot; The project still needs an external
standard for what counts as done.&lt;/p&gt;
&lt;p&gt;Supported by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/findings/claude-code/&quot;&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/codex/&quot;&gt;Codex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/gemini-cli/&quot;&gt;Gemini CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/hermes-agent/&quot;&gt;Hermes Agent&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Plugins and skills are becoming the new agent interface&lt;/h3&gt;
&lt;p&gt;Codex plugins, Claude plugins, Gemini extensions and MCP, Hermes skills, and
Pi extension APIs are all part of the same shift. The practical power of an
agent is moving into the things around it.&lt;/p&gt;
&lt;p&gt;That makes the harness more useful, but also harder to reason about. If a run
depends on a plugin, extension, hook, skill, or transport layer, that surface
is part of the work environment and should be visible in the record.&lt;/p&gt;
&lt;p&gt;Supported by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/findings/codex/&quot;&gt;Codex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/claude-code/&quot;&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/gemini-cli/&quot;&gt;Gemini CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/hermes-agent/&quot;&gt;Hermes Agent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/pi-coding-agent/&quot;&gt;Pi Coding Agent&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Do not build your workflow around one agent&amp;#39;s current integration list&lt;/h3&gt;
&lt;p&gt;Pi removed built-in Gemini CLI and Antigravity support while adding many new
providers. Gemini&amp;#39;s stable, preview, and nightly channels differ materially.
Codex alpha and app-server surfaces move quickly.&lt;/p&gt;
&lt;p&gt;This is normal frontier motion. The mistake is treating any current integration
list as durable architecture.&lt;/p&gt;
&lt;p&gt;The stable layer should be the project workflow around the agent: objective,
permissions, execution environment, evidence, review, memory, and what the next
run should know.&lt;/p&gt;
&lt;p&gt;Supported by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/findings/pi-coding-agent/&quot;&gt;Pi Coding Agent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/gemini-cli/&quot;&gt;Gemini CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/findings/codex/&quot;&gt;Codex&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What Serious Developers Should Do&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Treat persistent goals as useful, but make sure they still match the
project-level direction.&lt;/li&gt;
&lt;li&gt;Treat agent-side memory as context, not automatically as the project record.&lt;/li&gt;
&lt;li&gt;Record which goals, recaps, memories, plugins, skills, permission modes,
release channels, and transports were active during serious runs.&lt;/li&gt;
&lt;li&gt;Prefer tools that make trust, sandboxing, plugins, sessions, and review
state easy to inspect.&lt;/li&gt;
&lt;li&gt;Treat native agent review as evidence, not final judgment.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What Bitter Is Testing&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;How Codex &lt;code&gt;/goal&lt;/code&gt; changes long-running work in a real repo.&lt;/li&gt;
&lt;li&gt;How to record agent memory, permissions, plugins, review output, and release
channels without tying Bitter to one tool&amp;#39;s vocabulary.&lt;/li&gt;
&lt;li&gt;Whether Claude &lt;code&gt;/ultrareview&lt;/code&gt;, Gemini memory patches, Hermes Curator, and Pi
extension metadata produce evidence worth carrying forward.&lt;/li&gt;
&lt;li&gt;Which agent harnesses expose enough state to be trustworthy over long runs.&lt;/li&gt;
&lt;li&gt;How to keep the public research loop conservative: no signal unless it can
change what someone does next.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;What Remains Uncertain&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Whether persistent goals become stable enough for long-horizon development or
remain convenience features tied to one tool.&lt;/li&gt;
&lt;li&gt;Whether agent memory surfaces converge, or each product keeps inventing its
own private memory layer.&lt;/li&gt;
&lt;li&gt;Whether cloud/native review produces evidence that is inspectable enough for
serious work.&lt;/li&gt;
&lt;li&gt;Whether plugin and skill ecosystems converge around useful metadata.&lt;/li&gt;
&lt;li&gt;Which agent tools expose enough permission, session, plugin, transport, and
release-channel state to support trustworthy wrapping.&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item></channel></rss>