Founding member access recorded.
Checkout cancelled.
This Week in Agentic Harnesses · Published 2026-05-27

Operator Brief

Autonomy stopped asking. Three providers shipped default-on autonomy in the same fortnight, and three providers moved permission policy out of session flags into versioned, org-managed files.

Upgrade / check
  • Claude Code 2.1.149+ closes a PowerShell `cd..` workspace-boundary bypass and a git-worktree sandbox over-scope bug. Treat as a security advisory the changelog does not flag. Signal
  • Claude Code 2.1.152 makes Auto mode default-on. Managed deployments must re-audit what Auto now classifies as safe. Signal
  • OpenHands main branch (no in-window release) fixes a cross-org credential leak in MCP server and `acp_env` configurations. Multi-tenant SaaS operators on pre-2026-05-22 deployments may have already cross-contaminated. Signal
  • Codex CLI 0.134.0 rejects legacy profile configs with migration guidance. Move scripts to `--profile` as the canonical handle before upgrade. Signal
Try
  • Codex: point goal mode at an hours-or-days objective on 26.519 + CLI 0.133.0 and watch the dedicated storage / progress-tracking surface. Signal
  • OpenHands: enable ENABLE_ACP against Claude Code, Codex, or Gemini CLI as the back-end agent. Observe the greyed-out LLM/Condenser/MCP settings and the unified /api/conversations endpoint. Signal
  • Agent Zero v1.17: enable computer_use_remote on a non-critical host and observe the vision-verification stop flow — every state-changing action requires a fresh screenshot. Signal
  • Hermes v0.14.0: install via pip install hermes-agent and route Codex CLI / Aider / Cline / Continue through hermes proxy against a single OAuth provider. Distribution · Proxy
Watch
  • Three providers, one direction: Claude Code Auto mode default-on, Codex goal mode default-on across surfaces, Gemini CLI Auto modes collapsed + shell-redirect auto-approval in AUTO_EDIT. The pattern is autonomy moving from opt-in to baseline. Bet for the quarter, not a passing release-note theme. Claude Code · Codex · Gemini CLI
  • Policy is moving into versioned files. Codex managed requirements.toml, Gemini PolicyEngine-in-ACP, OpenHands org-level LLM profiles — three different surfaces, one shape: policy lives in a versioned, org-managed file consulted by the runtime, not in per-session flags. Codex · Gemini · OpenHands
  • Authority over inputs is generalising through three surfaces in parallel: OpenClaw at the inbound-sender layer (pre-dispatch allowlists, prompt-marker spoofing), Agent Zero at the host-runtime layer (vision-verified host actions), OpenHands at the org-member layer (per-member private MCP and ACP env). Same primitive, three surfaces. OpenClaw · Agent Zero · OpenHands
Uncertain
  • Codex `requirements.toml` distribution and signing model: not documented in the release notes. Enterprise adopters must confirm the trust path before depending on enforcement. Signal
  • Gemini PolicyEngine-in-ACP default posture: per-session enforcement by default, or only when configured? Release notes frame it as a deadlock fix. Signal
  • Agent Zero ephemeral-capture default: where does host-action audit evidence land? Operators cannot inspect on-disk caches by default. Signal
  • Hermes `hermes proxy` bind and auth model: PR body does not detail loopback-only binding or shared-token requirement. Default-loopback is the safe assumption to verify, not assume. Signal
  • Gemini remote session invocation target: stable protocol exists but where remote invocations actually run (Google-hosted, operator-hosted, both) is undocumented. Signal

Auto Stops Asking

Fifteen days, ten providers, one direction. The change that cuts across the watchlist this fortnight is uncomfortable to ignore: autonomy stopped asking for permission.

Claude Code 2.1.152 flipped Auto mode from opt-in to default. Codex 26.519 graduated goal mode out of experimental and turned it on by default across app, IDE, and CLI. Gemini CLI v0.44.0 collapsed multiple Auto variants into a single mode and added shell-redirect auto-approval in AUTO_EDIT. Three providers, three surfaces, one shape: the permission ceremony that used to gate productive autonomy is no longer the default surface. Operators don't choose to enable autonomy; they decide how to constrain it.

The other half of the fortnight is the policy substrate that move requires. Codex CLI 0.133.0 shipped permission profile inheritance and a managed requirements.toml enforcement file consulted by the runtime. Gemini CLI integrated PolicyEngine into ACP sessions, reaching enforcement into the protocol layer. OpenHands shipped org-level LLM profiles with two-tier permissions and concurrency-safe activation. Three different products, three different surfaces, one direction: policy lives in versioned, org-managed files now — not in per-session flags.

These themes are not independent. Autonomy moving from opt-in to baseline makes per-session permission grants intractable. The policy file is the correct primitive when the operator's decision is "constrain the baseline" rather than "consent to each escalation."

Breaking Changes: Check These Before Upgrading

Claude Code v2.1.149: a PowerShell permission bypass and a worktree sandbox scope bug. Windows operators with PowerShell allowlists are affected by PowerShell built-in cd functions (cd.., cd\, cd~, X:) defeating the workspace boundary undetected. Git worktree workflows are affected by the sandbox write allowlist over-scoping the main repository root instead of the shared .git directory. Anthropic ships these as ordinary changelog entries; the changelog is the de-facto advisory surface, but no separate page exists. Upgrade past 2.1.149 before deploying. v2.1.147 closes adjacent forceLoginOrgUUID and forceLoginMethod enforcement gaps against third-party-provider and API-key sessions; v2.1.148 closes a Vertex AI provider bypass.

Claude Code v2.1.152: Auto mode no longer requires opt-in consent. Auto mode — the permission classifier that runs safe actions without prompting and blocks risky ones — is now the default permission posture across the install base. Admins relying on the consent dialog as a visible posture check have lost that surface. Re-audit managed settings and decide where the equivalent check now lives.

Codex CLI 0.134.0: legacy profile configs rejected with migration guidance. --profile is the canonical permission selector across CLI, TUI, and sandbox flows. Scripts using older permission flag-soup must migrate before upgrade.

OpenHands main (pre-2026-05-22 SaaS deployments): MCP server and acp_env cross-org credential leak. Before PR #14528, MCP server configurations added by an org member were broadcast to every other member's row. The fix splits agent settings into shared and private halves and strips legacy leaked values on read. Multi-tenant SaaS operators on pre-fix deployments should rotate MCP credentials added before that date and confirm they are on a post-fix main build (no in-window tagged release yet).

Hermes Agent v0.14.0: PyPI distribution, lazy adapter install, and the proxy. Installation moves to pip install hermes-agent; the [all] extras are removed in favor of lazy install of heavy adapters on first use. Cold-start drops ~19s. A native Windows beta ships. The hermes proxy command exposes a local OpenAI-compatible endpoint backed by whichever OAuth provider the operator is signed into. The PR body does not specify the proxy's bind address or auth model; default-loopback-only is the safe assumption to verify, not assume.

Autonomy Stops Asking

Three providers shipped default-on autonomy in the same fortnight, and the framing is consistent enough to deserve its own paragraph.

Claude Code's Auto mode was the explicit feature. Until 2.1.152 it required consent — operators clicked through a dialog to enable it. Now it is the default. Auto mode selectively runs safe actions without prompting and blocks risky ones via a classifier; the classification is runtime-defined, not enumerated in docs. The same release adds disallowed-tools in skill and slash-command frontmatter (a skill can subtract from the agent's tool surface) and a MessageDisplay hook event that can transform or hide assistant message text on the output path. Skill authors get a way to scope down; hook authors get a new vector to filter what operators see.

Codex's goal mode is the long-horizon variant. The 26.519 product launch graduates it out of experimental across the app, IDE extension, and CLI; CLI 0.133.0 turns goals on by default with dedicated storage and progress tracking across active turns. Operators can point Codex at an objective spanning "hours or even days." Same launch ships remote computer use after Mac lock with documented safeguards: short-lived authorization, covered displays, automatic relock on local input, manual unlock fallback. The locked-host computer-use surface is gated, but the gates are policy choices, not absent capability.

Gemini CLI's Auto modes merged. The prior fan of Auto variants collapses to one. The release frames this as UX simplification; in practice it collapses whatever differentiation the variants carried. v0.44.0 stable adds shell-redirect auto-approval in AUTO_EDIT — described as quality-of-life and also an attack-surface expansion if the agent is steered toward sensitive write paths.

Operators who never enabled Auto mode now get its productivity benefit without ceremony. Operators who used the consent dialog as a manual sanity check before risky actions must build that check elsewhere — managed settings, hook policy, or out-of-band review. The accessibility win and the authority-visibility cost arrive together; the RESEARCH_CONTRACT calls this the cross-axis tension, and it is the shape of every default-on change this fortnight.

Policy Moves Into Versioned Files

The other half of the move is structural. If autonomy is the baseline and the operator decision is constraint, then per-session flags are the wrong surface. Three providers shipped, in the same fortnight, the same answer: policy lives in versioned, org-managed files consulted by the runtime.

Codex CLI 0.133.0 added permission profile inheritance — a profile can derive from another, layering changes on top of a base instead of redeclaring every grant. Managed requirements.toml integration is the org-level enforcement surface; the release describes it as enforcement, not advice. Runtime refresh lets profiles update without restart. CLI 0.134.0 then made --profile the canonical selector across the CLI, TUI permission flows, and sandbox flows, rejecting legacy configs with migration guidance.

Gemini CLI v0.44.0 integrated PolicyEngine into ACP (Agent Communication Protocol) sessions (PR #27252) — framed as a deadlock fix, but the effect is policy enforcement at the protocol-session layer, not just at the shell-tool layer. The "deadlock fix" framing understates the structural shift: enforcement now reaches into the ACP layer the docs name explicitly as the delegation primitive.

OpenHands added organization-level LLM profile storage in SaaS mode (PR #14406). Migration 116 adds an encrypted llm_profiles JSON column on the org table; six CRUD endpoints sit under /api/organizations/{org_id}/profiles. Permissions are two-tier: VIEW_ORG_SETTINGS for read; EDIT_ORG_SETTINGS for create / update / delete / rename / activate. Activate is the bigger surface; the same transaction updates the org's profiles.active and the acting member's agent_settings_diff, with SELECT ... FOR UPDATE serializing concurrent writes.

For enterprise operators, the practical implication is the same across all three: stop maintaining flat policy in per-session flags. Build a base policy (Codex profile, Gemini policy file, OpenHands org LLM profile) and derive per-team variations. The runtime now treats the file as the source of truth.

The distribution and signing model for these files is not yet fully documented in any of the three. That is the next thing to watch.

Authority Over Inputs, Three Surfaces

The third theme is quieter but the strongest single thread of the fortnight. Three providers shipped, through three very different surfaces, the same primitive: structural authority over what the agent or its inputs can do.

OpenClaw (v2026.5.26) hardened the inbound-sender layer. ClickClack allowFrom sender allowlists run before agent dispatch, not as post-dispatch blocking. Browser snapshot reads honor SSRF policy before reading tab URLs. Queued system-event text is sanitized so untrusted plugin or channel labels cannot spoof nested prompt markers. Memory store gets a separate prompt-like-text reject filter. Tool-call serializations are scrubbed from replies. The pattern: deny unauthorized senders the chance to influence agent behavior at all, rather than blocking specific actions after the agent has been biased.

Agent Zero (v1.17) hardened the host-runtime layer. The new computer_use_remote tool controls the operator's actual desktop — outside the Docker/Xpra container — with platform-specific structural targeting (macOS Accessibility / Windows UIA / Linux AT-SPI). Every state-changing action is treated as unverified until a fresh screenshot visibly confirms the outcome. Agents must stop when no screenshot is available. macOS approval denials route to a re-arm-required stop flow rather than silent retry. v1.16 made screenshot capture ephemeral and context-scoped by default — captures route through in-process image refs rather than disk, so the agent no longer leaves screenshot trails by default.

OpenHands (PR #14528) hardened the org-member layer. Before the fix, MCP server and acp_env configurations added by one org member were broadcast to every other member's row. The fix splits agent settings into a shared half and a private half; private keys go only to the acting member's row. The fix also strips legacy leaked values on read so pre-fix data stops contaminating after upgrade.

Three providers, three surfaces, one primitive: authority over inputs applied at the layer the input enters. The shapes are different — allowlist, vision verification, per-member private settings — but the principle is the same. Inputs cross trust boundaries with explicit structural gates, not by prompt discipline.

Provider Notes

Codex (26.519, CLI 0.131--0.134) shipped goal-mode graduation, remote computer use after Mac lock, Appshots, plugin marketplace sharing, profile inheritance, managed requirements.toml, codex doctor diagnostics, Python SDK first-class authentication, codex exec resume --output-schema, conversation history search, and read-only MCP concurrency via readOnlyHint. The product launch and the CLI minor releases are tightly coordinated; goal-mode graduation and CLI default-on landed the same day.

Gemini CLI (v0.44.0) shipped stable LocalSessionInvocation / RemoteSessionInvocation protocols (closing the "tests but no observed remote target" gap on the prior AgentProtocol), first-wins prioritize-project agent registration, OAuth refresh preservation during rotation, keychain auth for --list-sessions and non-interactive mode, and MCP OAuth token refresh on re-authentication. Two weeks of What's-New digests (Weeks 21--22) are not yet published; the changelog and release notes are the trailing surface.

OpenHands (main branch, no tagged release in window) shipped the ACP agent settings UI, organization-level LLM profiles, scoped MCP/ACP env to acting org members, Azure DevOps via Microsoft Entra ID OAuth/OIDC, Bitbucket DC and Jira DC integrations with KOTS-managed service accounts, and a batched CVE remediation cluster (9+ deps). The shape is consolidation as the enterprise-self-hosted shell around third-party agents and Data Center source control.

Agent Zero (v1.15--v1.18) shipped host-machine desktop control with vision verification, ephemeral context-scoped capture by default, speech as independent built-in plugins (breaking removal of legacy APIs), document_artifactoffice_artifact rename, dedicated Markdown editor plugin, file-browser routing formalization, configurable max_active_skills, MCP multimodal content handling fix, and skill visibility controls (operators can hide skills from the model-facing catalog).

OpenClaw (v2026.5.18--v2026.5.26) shipped the content-boundary hardening suite, transcripts promoted to a core source-provider path with Meeting Notes plugin, reaction-based approvals across Signal / iMessage / WhatsApp, named model login profiles with credential migrations for Hermes / OpenCode / Codex, realtime Talk inspectable / steerable / cancellable across Web UI and Discord voice, on-by-default gateway auth rate-limiter for unset gateway.auth.rateLimit, and release verification stanzas with full CI run URLs and evidence manifests.

Hermes Agent (v0.14.0) is the Foundation Release: PyPI distribution, lazy adapter install with supply-chain advisory checker, native Windows beta, Zed ACP Registry listing, the OpenAI-compatible local hermes proxy, Honcho identity-mapping with peer-id in cache signatures, isolated credential pool on provider fallback, and a sustained fix(kanban) corruption-hardening wave post-release.

Paperclip (v2026.513, v2026.517, v2026.525) shipped scoped agent permissions and protected assignments via a real authorization service, routine env secrets with agent < project < routine precedence, board-managed document locks, Modal as a first-party sandbox plugin, and an ACPX-Claude adapter that resolves bare Claude model IDs, surfaces real diagnostic detail, and respects user ~/.claude/settings.json permissions.

Pi coding agent (v0.74.1--v0.76.0) shipped supply-chain hardening (npm shrinkwrap, lifecycle-script controls, isolated install smoke tests), --session-id explicit session naming and excludeFromContext flag for the bash RPC, plus provider retry and timeout bounds. Supply-chain posture lands the same fortnight as Hermes's lazy-install advisory work — two different providers converging on the same hygiene.

Flue (Tier 2; v0.6.0--v0.8.0) shipped the agents-vs-workflows category split (persistent agents/ via createAgent vs finite workflows/ via run), local() sandbox factory with env allowlist, Cloudflare Shell sandbox replacing the previously misleading R2 model, run observability with bare runId routes, an OpenAPI sub-app, and a read-only admin sub-app. The runs-as-workflow- only choice is the cleanest "what is the receipt?" answer this cycle.

What To Try

  • Codex operators: point goal mode at an objective spanning hours or days on 26.519 + CLI 0.133.0; observe the dedicated storage and progress-tracking surface. If you have multiple teams, draft a base permission profile and derive per-team variations using the new inheritance.
  • Claude Code operators: audit managed settings before upgrade to 2.1.152 if you relied on the Auto mode consent dialog as a manual posture check. Skill authors should evaluate disallowed-tools.
  • OpenHands evaluators: enable ENABLE_ACP and point it at Claude Code, Codex, or Gemini CLI as the back-end. Observe how LLM/Condenser/MCP settings grey out — authority shifts to the back-end agent and the UI reflects the transfer.
  • Agent Zero operators (host adopters): enable computer_use_remote on a non-critical host. Test the vision-verification stop flow: trigger a state change, withhold a screenshot, observe whether the agent halts as the release notes describe.
  • Hermes adopters: try pip install hermes-agent and route Codex CLI, Aider, Cline, or Continue through hermes proxy against a single OAuth subscription. Confirm the proxy's bind address before exposing it.
  • OpenClaw operators: verify your gateway.auth.rateLimit setting; the unset case is now ratelimited by default. Test the pre-dispatch allowFrom allowlist with a sender outside your trust set.

What Remains Uncertain

  • Codex managed requirements.toml distribution and signing: the release notes describe org-level enforcement but not how the file reaches the runtime, whether it is signed, or whether tampering is detectable. Enterprise adopters cannot rely on enforcement without this answer.
  • Gemini PolicyEngine-in-ACP default posture: per-session enforcement by default, or only when an operator has configured a policy? Release notes frame it as a deadlock fix. The structural shift implied by the change is larger than that framing suggests.
  • Agent Zero ephemeral-capture audit evidence: where does host-action evidence land for audit when screenshots are ephemeral? Operators cannot browse on-disk caches to confirm what the agent saw.
  • Hermes hermes proxy bind and auth model: PR body does not detail loopback-only binding or shared-token requirement. Default-loopback is the safe assumption to verify, not assume.
  • Gemini remote session invocation target: the protocol is stable but where remote invocations actually run (Google-hosted, operator-hosted, both) is undocumented.
  • OpenHands no-tagged-release operators: the strategic positioning, the org-LLM-profile feature, and the cross-org credential leak fix are all main-branch-only. Operators tracking the 1.x release channel see none of this until the next release consolidates.
  • The composition pattern: OpenHands ACP UI fronting Claude Code, Codex, or Gemini CLI is a multi-product composition claim that does not fit the current finding schema's single-subject assumption. Paperclip's ACPX-Claude adapter respecting ~/.claude/settings.json is the same shape. This is a schema doctrine question recorded in the audit note for this digest.
  • Two weeks of Claude Code What's-New digests not yet published (Weeks 21--22). The official_digest priority-1 surface in sources/claude-code.yml is missing this fortnight. Harvesters running this window must fall through to the changelog only.

This digest was produced by the Bitter autonomous research loop.

Sources

Primary links, including exact changelog lines when available.

Versions