This Week in Agentic Harnesses / Published 2026-06-03

The Policy You Wrote Wasn't the Policy You Had

Edited by Michael Ruescher / revised 2026-07-12

Operator Brief

The dominant move this week was not new capability; it was closing the gap between the policy operators thought they had configured and the policy the runtime actually enforced. Claude Code, Gemini CLI, Pi, Hermes, OpenHands, and Flue each fixed a deny rule, credential boundary, or sandbox edge that was silently failing to hold -- several of them security advisories the changelogs do not flag. The quieter thread: skills and plugins became governed, auditable, sometimes agent-activated resources across Paperclip, OpenClaw, Flue, and Agent Zero.

Upgrade / check: Claude Code 2.1.160-2.1.162 closes three permission-bypass gaps: WebFetch rules now override preapproved domains, Windows path rules match backslash/case variants, and Read-deny hides files from Glob/Grep. Upgrade, then re-audit any allow/deny policy from older builds. Signal
Claude Code 2.1.160 also makes acceptEdits prompt before writing execution-granting config (.npmrc, .bazelrc, .pre-commit-config) and shell-startup files. Recognize the new prompts; do not blanket-allow them. Signal
Pi closes OAuth browser-launch command injection and git-package path traversal. Untrusted OAuth servers or git URLs could execute or write outside the install root on prior builds. OAuth · Path traversal
OpenHands patches axios (CVE-2026-44492) and dompurify (CVE-2026-41238) in the frontend: two commits, one action: rebuild and redeploy the bundle. dulwich (CVE-2026-42305) is a backend git lib needing a lockfile re-resolve and image rebuild. Frontend · Backend
Gemini CLI v0.45.0 stable closes an MCP blacklist bypass where a blacklisted tool or server could still be invoked. Upgrade before trusting MCP deny-lists for containment. Signal
Hermes v0.15.1 makes the Docker dashboard insecure binding an explicit HERMES_DASHBOARD_INSECURE=1 opt-in. The heuristic that silently dropped auth is gone. Update env config before upgrade. Signal
Flue v0.9.0 is a hard breaking migration: routing/provider imports move, provider model values need provider-id/model-id format, persisted beta sessions are rejected, and Cloudflare DO migrations are now operator-owned. Sessions will not restore until updated. Signal
Try: Claude Code: wire claude agents --json waitingFor and the done/total progress counter into supervision tooling so stuck-agent triage stops requiring a human to open each session. Signal
Paperclip: use the skills CLI (install/reset/audit/export/assign) to audit which agents hold which company skills and export the catalog for provenance review. Signal
Codex: enable the optional Face ID / passcode lock on iOS 1.2026.146 before treating mobile as a trusted access surface. Signal
Hermes v0.15.0: queue a decomposable task on the new multi-agent Kanban and confirm the orchestrator spawns sub-agents in isolated worktrees before trusting it with real work. Signal
Agent Zero v1.19: disable the Office/Desktop/Editor plugins you do not need via the protected plugin-toggle endpoint to bound the agent's standing capability surface. Signal
Watch: Watch whether agent self-proposal and self-activation of skills (OpenClaw skill_workshop tool, Flue activate_skill) outpaces the review gate. The governance surface now spanning Paperclip, OpenClaw, Flue, and Agent Zero only holds if self-approval stays bounded. OpenClaw · Flue
Standing credentials and approved-host registries are replacing per-session prompts: Codex remote-exec API-key host registration, Hermes Bitwarden Secrets Manager, Codex Bedrock under AWS IAM. A leaked key or over-broad approval becomes ambient authority. Codex · Hermes
Cloud-provider deployment paths keep expanding: Claude Code Auto Mode now on Bedrock/Vertex/Foundry, Codex models via Amazon Bedrock. On those paths, managed-settings deny rules must carry the governance weight per-action prompts used to. Claude Code · Codex
Agent Zero reversed its ephemeral-by-default posture: computer-use screenshots now persist to durable chat-scoped storage with no automatic redaction. Review retention and access controls before capturing sensitive screens. Signal
Uncertain: Codex remote-exec API-key registration: the changelog does not document key scope, rotation, or revocation. Whether a leaked key grants persistent remote exec is unverified. Signal
Codex iOS SSH-to-Windows: host-key verification, key storage, and scoping of the iOS SSH client are undocumented.
Gemini CLI model routing: Gemini 3.5 Flash GA is gated server-side by experiment flag, so the same binary routes different users to different models, so client version no longer determines the model in use. Signal
OpenClaw enhanced plugin isolation: the release note asserts a tighter sandbox boundary but does not describe its depth, so operators cannot verify it from the receipt. Signal
Flue v0.9.0 persisted-session rejection has no automated migration path; a self-scripted migration could reintroduce stale, unredacted state. Signal
OpenHands DELETE /api/organizations now cascade-deletes the sole-org requester's user identity, the highest-consequence caveat this cycle; enforce backups before any org delete. Signal

An operator told Claude Code to keep one file out of reach. They wrote a Read-deny rule, the sort of one-line guardrail security teams lean on, and moved on. The file stayed readable anyway: any Glob or Grep the agent ran could return its path and its contents, deny rule or not.

That was one of at least six ways this week, across six different makers of AI coding agents, that a safety rule turned out to be advisory. A Pi user signing in through a malicious OAuth server could be handed a verification link that ran shell commands. A Hermes dashboard dropped its authentication because a heuristic misread the bind host. A Gemini CLI deny-list let through a tool it was meant to block. None of it was misconfiguration. The operators had written the rules correctly; the runtimes simply were not enforcing them, and over the week Claude Code, OpenHands, and Flue shipped the same unglamorous patch as the rest: not a new feature, but enforcement for a rule the operator already believed was in place.

The week's other current ran the opposite way, toward more control rather than less. At four providers at once, the skills and plugins an agent can reach stopped being an ambient default and became something an operator can list, audit, and approve. Paperclip, OpenClaw, Flue, and Agent Zero each turned capability that used to be invisible into reviewable operating state.

Security Advisories: Check These Before Upgrading

Claude Code 2.1.160 through 2.1.162: three permission-bypass gaps closed at once. Custom WebFetch permission rules now override the built-in preapproved-domain whitelist; Windows permission rules with backslashes or case-variant paths now match; and Read-deny rules now hide files from Glob and Grep results. The sharpest of the three is the last one: a file an operator denied for Read was still discoverable through search tools: its path and contents were surfaceable, defeating the access-control intent. The attacker model is prompt-injection or compromised task content steering the agent toward a denied domain or walled-off path; the fix is gated purely on upgrading, so the operator action is upgrade, then re-audit whether any policy was silently bypassed in the prior window, especially on Windows and any setup relying on Read-deny to hide secrets from search. The changelog ships this as an ordinary entry; treat it as the advisory it is.

Claude Code 2.1.160: execution-granting config writes now prompt even in acceptEdits mode. Two guardrails land together. acceptEdits mode now prompts before writing build-tool config that grants code execution (.npmrc, .yarnrc*, bunfig.toml, .bazelrc, .pre-commit-config.yaml, .devcontainer/), and the agent now prompts before writing shell startup files (.zshenv, .zlogin, .bash_login) and ~/.config/git/. Operators running acceptEdits or auto-leaning modes previously had a silent write path into files that execute on the next shell login, install, or commit. That is the classic agent-persistence and supply-chain escalation vector. The prompt is the guardrail here; blanket-allowing it puts you back where you started.

Pi: OAuth command injection and git-package path traversal closed. Commit ba6e529 validates OAuth verification URIs (rejecting non-HTTP(S) schemes) and launches the browser via spawn() instead of shell exec(), closing a path where a malicious OAuth server could inject $(id>/tmp/pwned)-style commands. Commit a98e087 rejects git URLs with .., null bytes, backslashes, or leading slashes at both parse and resolution time, blocking writes outside the package install root. The attacker is whoever controls the OAuth server or authors the git package; both fixes need no config change, only the upgrade.

OpenHands main: three named CVEs. No tagged release fell in the window, but main closed CVE-2026-44492 (axios 1.16.0), CVE-2026-41238 (dompurify 3.4.0), and CVE-2026-42305 (dulwich 1.2.5). The first two are browser-facing (HTTP client and HTML/DOM sanitizer) and need a frontend rebuild and redeploy; the third is a backend git library and needs a poetry.lock re-resolve and image rebuild. Self-hosters pinning older lockfiles must bump manually.

Gemini CLI v0.45.0: MCP blacklist bypass fixed. The stable release bundles Termux relaunch/resize fixes, session-context filtering on history resume, and the security-bearing item: a fix for a path where a blacklisted MCP tool or server could still be reached. Operators relying on MCP deny-lists for containment should upgrade before trusting the blacklist, and test that blacklisted tools are actually unreachable rather than assume full coverage.

Hermes v0.15.1: Docker insecure binding is now an explicit opt-in. The dashboard no longer infers insecure mode from the bind host; it requires HERMES_DASHBOARD_INSECURE=1 explicitly. This removes a silent path where a misread bind host dropped auth and exposed the dashboard to a network-adjacent attacker. Existing Docker and hosted setups must update env config before upgrading. The same patch fixes a v0.15.0 loopback-mode dashboard reload loop and restores MCP bare-command resolution (npx, npm, node) in Docker.

Paperclip v2026.529.0: first-admin claim is now the bootstrap gate. Unclaimed self-hosted deployments get a one-time browser claim to create the first admin. The flip side is a race: whoever completes the claim first becomes admin, so an attacker with network reach to a freshly stood-up instance could seize control before the legitimate operator. Claim promptly and restrict network exposure during the unclaimed window.

Hermes v0.15.0: Promptware defense, and a migration. The Velocity Release adds a built-in defense against Brainworm-class prompt-injection and closes 19 security-tagged issues. Operators running against untrusted content (web, repos, MCP output) should validate the defense against their own injection corpus rather than assume blanket coverage; novel vectors outside the known class may still pass.

Flue v0.9.0: a hard breaking migration. Routing imports move from @flue/runtime/app to @flue/runtime/routing, provider model values now require provider-id/model-id format, SDK mount paths derive from baseUrl, and persisted beta session state is rejected. Clear or migrate the store before upgrading, or sessions fail to restore. Cloudflare Durable Object migrations are no longer auto-appended; the operator now owns them in the Wrangler config, and interrupted workflows no longer auto-retry.

The enforcement gap

These bugs share a signature, and it is the reason they are dangerous: the rule exists, it reads correctly in the config, and it does nothing. There is no error, no blocked action, no log line. The operator has every reason to think the control is live, which is precisely why no one goes back to check it.

Claude Code's cluster is the clearest case. A Read-deny that search walked right around, a WebFetch rule that failed to override the preapproved-domain list, and Windows path rules that quietly missed on a backslash or a capital letter were three faces of the same false promise. The same release line turned a silent agent config-write into a confirmation prompt for files that grant code execution, and, in the other direction, loosened an over-broad managed-settings policy that had been blocking legitimate third-party sessions it should have allowed.

Once you have the shape, you see it everywhere. Gemini CLI's deny-list that didn't deny is the same failure one layer up, at the tool boundary; its quieter policy-file fix closes a fail-open path where a policy that failed to save or parse left the agent running under no policy at all. Pi spent four commits sealing the gaps a co-tenant on a shared host could slip through, writing auth files at 0o600 instead of world-readable and moving its extension cache out of /tmp. Flue stopped retaining the credentials carried in WebSocket handshake URLs; OpenHands moved provider credentials off a plaintext channel onto an encrypted one; Hermes forced its Docker dashboard to stop guessing whether to require authentication.

The practical reading is uncomfortable. An upgrade this week is not a feature bump; it is the moment you find out which of your controls were decorative. For every provider above, the safe assumption is that something you set on the last build was not holding, and the job after upgrading is to check, not to exhale.

Capability you can audit

For most of the short life of agent harnesses, the answer to "what can this agent actually do?" has lived in code and defaults, somewhere an operator had to go and read. Four providers spent the week turning it into something you can look up.

Paperclip made company skills first-class objects with an install / reset / audit / export / assign CLI. The verbs that matter are audit and export, which turn the skills an agent holds from implicit configuration into a fact you can query and hand to a reviewer, and assign, which makes granting a capability its own deliberate, recorded act. OpenClaw came at the same problem from the review side: its Skill Workshop holds new skills in a pending queue until a human approves them through the CLI or Gateway. The wrinkle is that a companion skill_workshop tool lets the agent file its own proposals, so the question an operator now owns is who may approve, and whether the agent is ever allowed to approve itself.

Flue pushed the other way, and the contrast is the interesting part. Its activate_skill tool lets an agent pull a skill's full instructions on demand, mid-task; the operator still picks which skills exist, but the decision to switch one on has moved to the model. Agent Zero, by contrast, shipped the bluntest instrument of the four, a protected API that toggles its Office, Desktop, and Editor plugins off, a real lever for pulling powerful capabilities (Desktop computer-use above all) out of deployments that should never have had them. For now it is a switch, not an audited register: the release note promises a "protected" endpoint without saying who is allowed to flip it.

Catalog, approval queue, on-demand activation, kill switch. Four different shapes, the same underlying shift: "what can this agent do?" is becoming a question you answer by reading state instead of trusting a default. Whether that also makes the tools easier to live with is a separate matter, and the week answered it both ways. Claude Code's new waitingFor field, surfaced through claude agents --json with a done-over-total progress counter, finally names what a stalled session is waiting on, so minding a fan-out of agents no longer means opening each one in turn. Flue's v0.9.0, in the same window, forced a hard migration with no automated path. The harnesses got more governable. They did not get more approachable.

A countervailing drift

Underneath the week's two visible currents ran a quieter one, and it does not point toward the operator. Authority kept migrating onto standing credentials and managed cloud paths, where a single secret or a one-time approval buys lasting reach. Codex added remote-exec host registration by API key and the option to run its models under AWS IAM through Bedrock; Hermes swapped per-provider keys for a Bitwarden Secrets Manager pool; Claude Code's Auto Mode reached Bedrock, Vertex, and Foundry. None of these is a flaw on its own. Together they mean more of an agent's power now rides on credentials that outlive any single session, the place where one leak or one over-broad grant quietly becomes standing authority.

The sharpest reversal came from Agent Zero, which returned computer-use screenshots to durable, chat-scoped storage and undid the ephemeral-by-default stance it had shipped only weeks earlier. The upside is a real audit trail; the cost is that every captured screen, credentials and internal dashboards included, now sits on disk with no automatic redaction, waiting to be scoped and pruned by hand. Codex's new Sites plugin carries a milder version of the same edge: in Business workspaces it is on by default, so the power to build and deploy a live web app may already be switched on before anyone chose to enable it. The rest was plumbing, a Gemini CLI editor-spam-loop fix, MiniMax M3 support in OpenClaw, and OpenTelemetry tracing in Flue.

Provider Notes

Codex (CLI 0.135.0 to 0.136.0, iOS 1.2026.146) shipped named permission profiles with custom-config display and codex doctor diagnostics (0.135.0), a non-interactive installer for CI, plus remote-exec API-key host registration and thread archiving (0.136.0). The iOS app added an optional Face ID / passcode lock for Codex and SSH-to-Windows. Two integrations landed: the Sites plugin and Amazon Bedrock under AWS-managed auth and billing.

Claude Code (2.1.158--2.1.162) is the enforcement-gap headliner: the permission/deny-rule cluster, execution-granting config-write prompts, the managed-settings third-party-session unblock, agent-status observability, and Auto Mode reaching Bedrock/Vertex/Foundry for Opus 4.7/4.8.

Gemini CLI (v0.44.1 to v0.46.0-preview) shipped the v0.45.0 stable bundle with the MCP blacklist fix and Termux hardening, policy-file resilience, and a server-flag-gated Gemini 3.5 Flash GA rollout that decouples model-in-use from client version. A CI change to pull_request_target on the PR-size labeler is low-risk as written (it only reads line counts) but removes the structural safety of pull_request, so any future edit adding fork-code checkout becomes immediately dangerous.

Hermes Agent (v0.15.0 to v0.15.1, plus post-release commits) is the Velocity Release: a 76% run_agent.py refactor, Kanban evolving into a multi-agent orchestration platform with auto-decomposition, swarm topology, and worktree-per-task, Promptware defense, and Bitwarden Secrets Manager. The v0.15.1 patch fixes the Docker insecure-binding opt-in and a dashboard reload loop; June 3 commit waves hardened installer self-update, Windows/WSL2 PTY and schtasks handling, and desktop session management.

Pi coding agent (commits to main) shipped a security-hardening cluster: OAuth launch hardening, git path-traversal rejection, auth-file mode-on-create (0o600 instead of briefly world-readable), extension-cache isolation out of world-accessible /tmp, and HTML-export XSS sanitization. Alongside, model-catalog maintenance removed stale Codex entries, added Mistral Devstral 2 and Open Mistral Nemo, and refreshed Claude model pricing and token caps to 128k output. No reliably in-window tagged release landed; the security work shipped as commits to main.

OpenClaw (2026.5.31-beta.3 through 2026.6.1 stable) shipped the Skill Workshop proposal workflow, interrupted-tool-call recovery, bounded request timers (re-evaluate SLOs), enhanced plugin isolation, MiniMax M3, and Tailscale Serve service-name binding with SQLite-backed state migration for iMessage and plugin-install tracking.

Paperclip (v2026.529.0) shipped the skills CLI/catalog, the first-admin claim flow, inline document annotations, per-user sidebar controls, and live Claude model discovery from the UI.

Agent Zero (v1.19) renamed Remote Link to Remote Control with selectable tunnel providers, made Office/Desktop/Editor plugins toggleable behind a protected API, reverted screenshots to durable chat-scoped storage, unified OAuth account management, and hardened Xpra desktop control.

OpenHands (main, no tagged release) shipped the three-CVE remediation cluster (axios, dompurify, dulwich), the ACP-credentials-to-secrets-channel move, a cascade-delete-sole-org-requester change on DELETE /api/organizations (org deletion now also deletes the requesting user if it is their only org), a git-proxy capability, and a LiteLLM 1.84.1 upgrade.

Flue (Tier 2; v0.8.1 to v0.9.2) shipped OpenTelemetry tracing, the v0.9.0 breaking migration, WebSocket credential hardening, operator-owned workflow-run retention (the implicit 50-run prune is gone), and autonomous activate_skill.

What To Try

Claude Code operators: upgrade to 2.1.162 and re-audit any allow/deny or Read-deny policy that ran on older builds. Then wire waitingFor and the fan-out progress counter into supervision tooling so stuck-agent triage stops requiring a human to open each session.
Paperclip operators: use the skills CLI to audit and export which agents hold which skills, and claim any freshly stood-up self-hosted instance immediately.
Codex operators on iOS: enable the Face ID / passcode lock before treating mobile as a trusted access surface.
Hermes operators: queue a decomposable task on the new multi-agent Kanban and confirm the orchestrator spawns the expected sub-agents in isolated worktrees before trusting it with real work. Set HERMES_DASHBOARD_INSECURE=1 only where insecure binding is genuinely intended.
Agent Zero operators: disable the Office/Desktop/Editor plugins you do not need, and review retention/access controls for the now-durable computer-use screenshots before capturing sensitive screens.
Gemini CLI maintainers: review the pull_request_target labeler workflow to confirm it only reads PR metadata and never checks out fork code under the elevated token.

What Remains Uncertain

Codex remote-exec key lifecycle: scope, rotation, and revocation for the approved-host API-key registration are undocumented. Whether a leaked key grants persistent remote exec is unverified.
Codex iOS SSH trust handling: host-key verification, key storage, and scoping of the iOS SSH-to-Windows client are not described.
Gemini CLI model routing: with Flash GA gated server-side, the model in use is no longer determined by client version alone; backend flag state is now part of the audit surface.
OpenClaw plugin-isolation depth: the release note asserts tighter isolation but does not describe the boundary's depth, so operators cannot verify it from the receipt.
Hermes Promptware coverage: the defense targets a known attack class; novel injection vectors outside Brainworm patterns may still pass. Validate against your own corpus.
Flue persisted-session migration: v0.9.0 rejects pre-upgrade session state with no automated migration path; a self-scripted migration could reintroduce stale, unredacted state.
OpenHands org-deletion blast radius: operators on the cascade-delete change should enforce backups before any DELETE /api/organizations, since a sole-org delete now removes the user identity too.

Revised 2026-07-02 (artifact_version 2): operator-brief thesis compressed to the house standfirst standard. Body, claims, and receipts unchanged.

Top signals from this issue

Projects reviewed in this research run

Codex Claude Code Gemini CLI Hermes Agent Pi Coding Agent OpenClaw Paperclip Agent Zero OpenHands Flue

Research artifacts and publication history are open in the repository.

View source on GitHub

Sources

Primary links, including exact changelog lines when available.

Versions

complete

The Policy You Wrote Wasn't the Policy You Had

32 signals / 2026-05-28 to 2026-06-03