Research Version

The Policy You Wrote Wasn't the Policy You Had

2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0

Status: complete
Window: 2026-05-28 to 2026-06-03
Signals: 32

Mode: weekly_digest / Model: claude-opus-4-8

Second full ten-provider weekly under the Amendment 004 schema, and the first run executed end-to-end as a background multi-agent workflow rather than a hand-orchestrated harvest. The adversarial verify stage did real work: the QA gate failed on first assembly (15 unsupported and 8 out-of-window receipts out of 63 promoted signals), and the digest was remediated to a receipt-clean state before publication (see qa.md). The harvest over-decomposed (100 findings / 63 raw signals for a 7-day window); the published signal set was curated down to 32 genuinely decision-bearing signals to honor the findings-rarer-than-signals discipline. Doctrine and source-contract follow-ups are in audit.md.

Sources harvested

Codex Claude Code Gemini CLI Hermes Agent Pi Coding Agent OpenClaw Paperclip Agent Zero OpenHands Flue

Accepted signals from this run

Artifact contents

Every file the loop produced for this run, anchored in the repo. Internal links go to the rendered page; the repo path opens the raw artifact on GitHub.

Run digest

Seven days, ten providers, one uncomfortable theme: the headline this week is not new capability. It is the gap between the policy an operator configured and the policy the runtime actually enforced -- and how many providers spent the window quietly closing it.

A Claude Code operator who wrote a Read-deny rule to hide a secret file was still leaking it through Glob and Grep. A Pi user authenticating against an OAuth server could be handed a verification URI that ran shell commands. A Hermes Docker dashboard could drop its auth because a heuristic misread the bind host. A Gemini CLI MCP blacklist could be bypassed. None of these were the operator's misconfiguration. The rules were written; the enforcement silently wasn't there. This week, across Claude Code, Gemini CLI, Pi, OpenHands, Hermes, and Flue, the same class of fix landed: restore the enforcement the operator already believed was in place.

The quieter, more forward-looking thread is the inverse of a gap-close: skills and plugins became governed, auditable, sometimes agent-activated resources across four providers in parallel -- Paperclip, OpenClaw, Flue, and Agent Zero. Capability that used to be an ambient default is becoming reviewable operating state.

Breaking Changes: Check These Before Upgrading

Claude Code 2.1.160--2.1.162: three permission-bypass gaps closed at once. Custom WebFetch permission rules now override the built-in preapproved-domain whitelist; Windows permission rules with backslashes or case-variant paths now match; and Read-deny rules now hide files from Glob and Grep results. The sharpest of the three is the last one: a file an operator denied for Read was still discoverable -- path and contents surfaceable -- through search tools, defeating the access-control intent. The attacker model is prompt-injection or compromised task content steering the agent toward a denied domain or walled-off path; the fix is gated purely on upgrading, so the operator action is upgrade, then re-audit whether any policy was silently bypassed in the prior window, especially on Windows and any setup relying on Read-deny to hide secrets from search. The changelog ships this as an ordinary entry; treat it as the advisory it is.

Claude Code 2.1.160: execution-granting config writes now prompt even in acceptEdits mode. Two guardrails land together. acceptEdits mode now prompts before writing build-tool config that grants code execution (.npmrc, .yarnrc*, bunfig.toml, .bazelrc, .pre-commit-config.yaml, .devcontainer/), and the agent now prompts before writing shell startup files (.zshenv, .zlogin, .bash_login) and ~/.config/git/. Operators running acceptEdits or auto-leaning modes previously had a silent write path into files that execute on the next shell login, install, or commit -- the classic agent-persistence and supply-chain escalation vector. The prompt is the defense; blanket- allowing it re-opens the vector.

Pi: OAuth command injection and git-package path traversal closed. Commit ba6e529 validates OAuth verification URIs (rejecting non-HTTP(S) schemes) and launches the browser via spawn() instead of shell exec(), closing a path where a malicious OAuth server could inject $(id>/tmp/pwned)-style commands. Commit a98e087 rejects git URLs with .., null bytes, backslashes, or leading slashes at both parse and resolution time, blocking writes outside the package install root. The attacker is whoever controls the OAuth server or authors the git package; both fixes need no config change, only the upgrade.

OpenHands main: three named CVEs. No tagged release fell in the window, but main closed CVE-2026-44492 (axios 1.16.0), CVE-2026-41238 (dompurify 3.4.0), and CVE-2026-42305 (dulwich 1.2.5). The first two are browser-facing (HTTP client and HTML/DOM sanitizer) and need a frontend rebuild and redeploy; the third is a backend git library and needs a poetry.lock re-resolve and image rebuild. Self-hosters pinning older lockfiles must bump manually.

Gemini CLI v0.45.0: MCP blacklist bypass fixed. The stable release bundles Termux relaunch/resize fixes, session-context filtering on history resume, and -- the security-bearing item -- a fix for a path where a blacklisted MCP tool or server could still be reached. Operators relying on MCP deny-lists for containment should upgrade before trusting the blacklist, and test that blacklisted tools are actually unreachable rather than assume full coverage.

Hermes v0.15.1: Docker insecure binding is now an explicit opt-in. The dashboard no longer infers insecure mode from the bind host; it requires HERMES_DASHBOARD_INSECURE=1 explicitly. This removes a silent path where a misread bind host dropped auth and exposed the dashboard to a network-adjacent attacker. Existing Docker and hosted setups must update env config before upgrading. The same patch fixes a v0.15.0 loopback-mode dashboard reload loop and restores MCP bare-command resolution (npx, npm, node) in Docker.

Paperclip v2026.529.0: first-admin claim is now the bootstrap gate. Unclaimed self-hosted deployments get a one-time browser claim to create the first admin. The flip side is a race: whoever completes the claim first becomes admin, so an attacker with network reach to a freshly stood-up instance could seize control before the legitimate operator. Claim promptly and restrict network exposure during the unclaimed window.

Hermes v0.15.0: Promptware defense, and a migration. The Velocity Release adds a built-in defense against Brainworm-class prompt-injection and closes 19 security-tagged issues. Operators running against untrusted content (web, repos, MCP output) should validate the defense against their own injection corpus rather than assume blanket coverage; novel vectors outside the known class may still pass.

Flue v0.9.0: a hard breaking migration. Routing imports move from @flue/runtime/app to @flue/runtime/routing, provider model values now require provider-id/model-id format, SDK mount paths derive from baseUrl, and persisted beta session state is rejected -- clear or migrate the store before upgrading or sessions fail to restore. Cloudflare Durable Object migrations are no longer auto-appended; the operator now owns them in the Wrangler config, and interrupted workflows no longer auto-retry.

The Enforcement Gap, Six Ways

The thread that cuts across the watchlist is consistent enough to name plainly. In each case, a control the operator had reason to believe was active was not -- and the fix is the same shape: make the enforcement match the configuration.

The Claude Code cluster is the clearest statement of it. A Read-deny rule that didn't hide files from Glob/Grep, a WebFetch rule that didn't override the preapproved-domain list, and Windows path rules that silently didn't match on case or separator variance are three independent ways the same promise -- "the policy I wrote is enforced" -- was broken. The same release line also converts a silent config-write into a confirmation checkpoint for files that grant code execution, and corrects an over-broad managed-settings policy that was wrongly blocking legitimate third-party provider sessions.

Gemini CLI's MCP blacklist bypass is the same bug class at the tool layer: a deny-list that didn't deny. Its companion policy-file resilience fix closes a fail-open-ish gap where a policy file that failed to persist (on cross-device container mounts) or failed to parse (corrupt TOML) could leave the agent running without the operator's intended policy in effect; recovery now writes a .bak and rebuilds -- which means a corrupted policy is silently reset to defaults, so re-verify intended policy after a .bak appears.

Pi's quartet of hardening commits -- OAuth injection, git path traversal, auth files created at 0o600 instead of briefly world-readable, and extension cache moved out of world-accessible /tmp -- is the multi-user-host version of the same theme: close the windows where a control was assumed but a co-tenant could slip through. Flue's v0.9.1 WebSocket credential hardening strips query strings and fragments before persisting Cloudflare attachments so URL-carried handshake credentials are not retained, and OpenHands moved ACP provider credentials off the deprecated plaintext-style acp_env channel onto a cipher-protected secrets channel.

The operator takeaway is uncomfortable but actionable: an upgrade is not just a feature bump this week. For every provider above, the safe assumption is that some control you configured on the prior build was not holding, and the post-upgrade action is a re-audit, not a victory lap.

Skills and Plugins Become Governed State

Running against the gap-close current is a constructive one. Four providers, four surfaces, shipped the same move: agent capability stops being an ambient default and becomes reviewable, sometimes approvable, operating state.

Paperclip made company skills first-class resources with an install / reset / audit / export / assign CLI. The load-bearing verbs are audit and export -- which skills an agent holds becomes a queryable, exportable fact rather than implicit config -- and assign, which makes a capability grant a distinct, reviewable authority action.

OpenClaw's Skill Workshop inserts a human-in-the-loop gate: new skills enter a pending-proposal queue reviewed via CLI or Gateway before taking effect. A new skill_workshop agent tool lets agents file proposals themselves, which widens the surface proposals originate from -- so the operator decision is who may review and who may self-approve. Lax review re-opens the unreviewed-skill path.

Flue v0.9.2 went the other direction on activation authority: an activate_skill tool lets agents load full skill instructions on demand before matching work. The operator's visible control narrows to which skills are configured; the choice to activate moves to the agent. Workspace skills are reread on activation, so mid-session edits take effect.

Agent Zero v1.19 made Office, Desktop, and Editor plugins toggleable behind a protected plugin-state API -- a real authority lever that lets an operator disable powerful capabilities (Desktop computer-use especially) on deployments that should not have them. The release note describes a "protected" toggle endpoint but no auth model or role-based capability management, so treat it as a disable lever, not yet an audited capability register.

The shapes differ -- catalog audit, proposal approval, agent self-activation, capability toggle -- but the direction is one: the question "what can this agent do?" is becoming answerable by inspecting state rather than reading code or trusting defaults.

Control Plane, Runtime, Platform

Control plane saw the most movement, in two directions. The governance-of-capability cluster above (Paperclip skills, OpenClaw Skill Workshop, Agent Zero plugin toggles, Flue agent-activated skills) sits here, as does a steady relocation of authority onto standing credentials and cloud paths: Codex remote-exec API-key host registration, Hermes Bitwarden Secrets Manager replacing per-provider keys, Claude Code Auto Mode reaching Bedrock/Vertex/Foundry, and Codex models running under AWS IAM via Bedrock. Claude Code also made agent supervision more legible: claude agents --json now exposes a waitingFor field naming what a blocked session waits on (e.g. a permission prompt), plus a done/total fan-out progress counter.

Runtime carried most of the enforcement-gap closures -- the Claude Code config-write prompts, Pi's OAuth and path-traversal fixes, Flue's WebSocket credential stripping, Hermes's Promptware defense, OpenHands's dulwich CVE -- plus one notable posture reversal: Agent Zero reverted computer-use screenshots to durable chat-scoped storage, undoing its prior ephemeral-by-default stance. That improves audit trails but persists potentially sensitive on-screen content (credentials, PII, internal UIs) with no automatic redaction -- a data-at-rest exposure operators must scope and prune.

Platform was mostly steady-state plumbing: the OpenHands frontend CVE cluster, Gemini CLI's v0.45.0 stable bundle and an editor-spam-loop fix, OpenClaw's MiniMax M3 model support, and Flue's OpenTelemetry tracing package. Codex's Sites plugin -- in-app website/web-app creation and deployment, included by default in Business workspaces -- is the one platform item with a governance edge: a deploy capability may already be active without an explicit enablement step.

Provider Notes

Codex (CLI 0.135.0--0.136.0, iOS 1.2026.146) shipped named permission profiles with custom-config display and codex doctor diagnostics (0.135.0), a non-interactive installer for CI, plus remote-exec API-key host registration and thread archiving (0.136.0). The iOS app added an optional Face ID / passcode lock for Codex and SSH-to-Windows. Two integrations landed: the Sites plugin and Amazon Bedrock under AWS-managed auth and billing.

Claude Code (2.1.158--2.1.162) is the enforcement-gap headliner: the permission/deny-rule cluster, execution-granting config-write prompts, the managed-settings third-party-session unblock, agent-status observability, and Auto Mode reaching Bedrock/Vertex/Foundry for Opus 4.7/4.8.

Gemini CLI (v0.44.1--v0.46.0-preview) shipped the v0.45.0 stable bundle with the MCP blacklist fix and Termux hardening, policy-file resilience, and a server-flag-gated Gemini 3.5 Flash GA rollout that decouples model-in-use from client version. A CI change to pull_request_target on the PR-size labeler is low-risk as written (it only reads line counts) but removes the structural safety of pull_request -- any future edit adding fork-code checkout becomes immediately dangerous.

Hermes Agent (v0.15.0--v0.15.2 + post-release commits) is the Velocity Release: a 76% run_agent.py refactor, Kanban evolving into a multi-agent orchestration platform with auto-decomposition, swarm topology, and worktree-per-task, Promptware defense, and Bitwarden Secrets Manager. The v0.15.1 patch fixes the Docker insecure-binding opt-in and a dashboard reload loop; June 3 commit waves hardened installer self-update, Windows/WSL2 PTY and schtasks handling, and desktop session management.

Pi coding agent (commits to main) shipped a security-hardening cluster: OAuth launch hardening, git path-traversal rejection, auth-file mode-on-create (0o600 instead of briefly world-readable), extension-cache isolation out of world-accessible /tmp, and HTML-export XSS sanitization. Alongside, model-catalog maintenance removed stale Codex entries and added Mistral Devstral 2, Open Mistral Nemo, and Claude Opus 4.8. No reliably in-window tagged release landed; the security work shipped as commits to main.

OpenClaw (2026.5.31-beta.3 through 2026.6.1 stable) shipped the Skill Workshop proposal workflow, interrupted-tool-call recovery, bounded request timers (re-evaluate SLOs), enhanced plugin isolation, MiniMax M3, and Tailscale Serve service-name binding with SQLite-backed state migration for iMessage and plugin-install tracking.

Paperclip (v2026.529.0) shipped the skills CLI/catalog, the first-admin claim flow, inline document annotations, per-user sidebar controls, and live Claude model discovery from the UI.

Agent Zero (v1.19) renamed Remote Link to Remote Control with selectable tunnel providers, made Office/Desktop/Editor plugins toggleable behind a protected API, reverted screenshots to durable chat-scoped storage, unified OAuth account management, and hardened Xpra desktop control.

OpenHands (main, no tagged release) shipped the three-CVE remediation cluster, the ACP-credentials-to-secrets-channel move, a cascade-delete-sole-org-requester change on DELETE /api/organizations (org deletion now also deletes the requesting user if it is their only org), a git-proxy capability, and a LiteLLM 1.84.1 upgrade.

Flue (Tier 2; v0.8.1--v0.9.2) shipped OpenTelemetry tracing, the v0.9.0 breaking migration, WebSocket credential hardening, operator-owned workflow-run retention (the implicit 50-run prune is gone), and autonomous activate_skill.

What To Try

Claude Code operators: upgrade to 2.1.162 and re-audit any allow/deny or Read-deny policy that ran on older builds. Then wire waitingFor and the fan-out progress counter into supervision tooling so stuck-agent triage stops requiring a human to open each session.
Paperclip operators: use the skills CLI to audit and export which agents hold which skills, and claim any freshly stood-up self-hosted instance immediately.
Codex operators on iOS: enable the Face ID / passcode lock before treating mobile as a trusted access surface.
Hermes operators: queue a decomposable task on the new multi-agent Kanban and confirm the orchestrator spawns the expected sub-agents in isolated worktrees before trusting it with real work. Set HERMES_DASHBOARD_INSECURE=1 only where insecure binding is genuinely intended.
Agent Zero operators: disable the Office/Desktop/Editor plugins you do not need, and review retention/access controls for the now-durable computer-use screenshots before capturing sensitive screens.
Gemini CLI maintainers: review the pull_request_target labeler workflow to confirm it only reads PR metadata and never checks out fork code under the elevated token.

What Remains Uncertain

Codex remote-exec key lifecycle: scope, rotation, and revocation for the approved-host API-key registration are undocumented. Whether a leaked key grants persistent remote exec is unverified.
Codex iOS SSH trust handling: host-key verification, key storage, and scoping of the iOS SSH-to-Windows client are not described.
Gemini CLI model routing: with Flash GA gated server-side, the model in use is no longer determined by client version alone -- backend flag state is now part of the audit surface.
OpenClaw plugin-isolation depth: the release note asserts tighter isolation but does not describe the boundary's depth, so operators cannot verify it from the receipt.
Hermes Promptware coverage: the defense targets a known attack class; novel injection vectors outside Brainworm patterns may still pass. Validate against your own corpus.
Flue persisted-session migration: v0.9.0 rejects pre-upgrade session state with no automated migration path; a self-scripted migration could reintroduce stale, unredacted state.
OpenHands org-deletion blast radius: operators on the cascade-delete change should enforce backups before any DELETE /api/organizations, since a sole-org delete now removes the user identity too.

Sources

Primary links, including exact changelog lines when available.