This Week in Agentic Harnesses / Published 2026-05-27

Auto Stops Asking

Edited by Michael Ruescher / revised 2026-07-12

Operator Brief

Autonomy stopped asking. Three providers shipped default-on autonomy in the same fortnight, and three providers moved permission policy out of session flags into versioned, org-managed files.

Upgrade / check: Claude Code 2.1.149+ closes a PowerShell cd.. workspace-boundary bypass and a git-worktree sandbox over-scope bug. Treat as a security advisory the changelog does not flag. Signal
Claude Code 2.1.152 makes Auto mode default-on. Managed deployments must re-audit what Auto now classifies as safe. Signal
OpenHands main branch (no in-window release) fixes a cross-org credential leak in MCP server and acp_env configurations. Multi-tenant SaaS operators on pre-2026-05-22 deployments may have already cross-contaminated. Signal
Codex CLI 0.134.0 rejects legacy profile configs with migration guidance. Move scripts to --profile as the canonical handle before upgrade. Signal
Try: Codex: point goal mode at an hours-or-days objective on 26.519 + CLI 0.133.0 and watch the dedicated storage / progress-tracking surface. Signal
OpenHands: enable ENABLE_ACP against Claude Code, Codex, or Gemini CLI as the back-end agent. Observe the greyed-out LLM/Condenser/MCP settings and the unified /api/conversations endpoint. Signal
Agent Zero v1.17: enable computer_use_remote on a non-critical host and observe the vision-verification stop flow: every state-changing action requires a fresh screenshot. Signal
Hermes v0.14.0: install via pip install hermes-agent and route Codex CLI / Aider / Cline / Continue through hermes proxy against a single OAuth provider. Distribution · Proxy
Watch: Three providers moved the same way this fortnight: Claude Code Auto mode default-on, Codex goal mode default-on across surfaces, Gemini CLI Auto modes collapsed + shell-redirect auto-approval in AUTO_EDIT. The pattern is autonomy moving from opt-in to baseline. Bet for the quarter, not a passing release-note theme. Claude Code · Codex · Gemini CLI
Policy is moving into versioned files. Codex managed requirements.toml, Gemini PolicyEngine-in-ACP, OpenHands org-level LLM profiles. The common shape: policy lives in a versioned, org-managed file the runtime consults instead of per-session flags. Codex · Gemini · OpenHands
Authority over inputs is generalising through three surfaces in parallel: OpenClaw at the inbound-sender layer (pre-dispatch allowlists, prompt-marker spoofing), Agent Zero at the host-runtime layer (vision-verified host actions), OpenHands at the org-member layer (per-member private MCP and ACP env). Same primitive, three surfaces. OpenClaw · Agent Zero · OpenHands
Uncertain: Codex requirements.toml distribution and signing model: not documented in the release notes. Enterprise adopters must confirm the trust path before depending on enforcement. Signal
Gemini PolicyEngine-in-ACP default posture: per-session enforcement by default, or only when configured? Release notes frame it as a deadlock fix. Signal
Agent Zero ephemeral-capture default: where does host-action audit evidence land? Operators cannot inspect on-disk caches by default. Signal
Hermes hermes proxy bind and auth model: PR body does not detail loopback-only binding or shared-token requirement. Default-loopback is the safe assumption to verify, not assume. Signal
Gemini remote session invocation target: stable protocol exists but where remote invocations actually run (Google-hosted, operator-hosted, both) is undocumented. Signal

Until late May, turning an AI coding agent loose still took a deliberate click. You switched on the mode that let it run commands without stopping to ask, and you knew you had done it. Then Claude Code shipped version 2.1.152, and the click disappeared: Auto mode, the setting that lets the agent act first and report after, became the default for everyone.

Three makers flipped the same switch inside two weeks. Claude Code made Auto mode default-on; Codex graduated its long-horizon "goal mode" out of beta and turned it on across app, IDE, and command line; Gemini CLI folded a menu of Auto variants into one and began auto-approving shell redirects. The permission ceremony that used to stand between an operator and an autonomous agent is no longer the thing you opt into. It is the thing you opt out of, and the only decision left is how to fence it in.

Fencing it in was the fortnight's other half. Once an agent runs by default, signing off on its actions one prompt at a time stops scaling, and three providers reached for the same replacement: a versioned, org-managed policy file the runtime reads before it acts. Codex shipped profile inheritance and a managed requirements.toml; Gemini pushed its policy engine down into the session protocol; OpenHands moved permissions onto org-level profiles. The unit of control is migrating from the session flag you set and forget to the file you check in and review.

Breaking Changes: Check These Before Upgrading

Claude Code v2.1.149: a PowerShell permission bypass and a worktree sandbox scope bug. Windows operators with PowerShell allowlists are affected by PowerShell built-in cd functions (cd.., cd\, cd~, X:) defeating the workspace boundary undetected. Git worktree workflows are affected by the sandbox write allowlist over-scoping the main repository root instead of the shared .git directory. Anthropic ships these as ordinary changelog entries; the changelog is the de-facto advisory surface, but no separate page exists. Upgrade past 2.1.149 before deploying. v2.1.147 closes adjacent forceLoginOrgUUID and forceLoginMethod enforcement gaps against third-party-provider and API-key sessions; v2.1.148 closes a Vertex AI provider bypass.

Claude Code v2.1.152: Auto mode no longer requires opt-in consent. Auto mode (the permission classifier that runs safe actions without prompting and blocks risky ones) is now the default permission posture across the install base. Admins relying on the consent dialog as a visible posture check have lost that surface. Re-audit managed settings and decide where the equivalent check now lives.

Codex CLI 0.134.0: legacy profile configs rejected with migration guidance. --profile is the canonical permission selector across CLI, TUI, and sandbox flows. Scripts using older permission flag-soup must migrate before upgrade.

OpenHands main (pre-2026-05-22 SaaS deployments): MCP server and acp_env cross-org credential leak. Before PR #14528, MCP server configurations added by an org member were broadcast to every other member's row. The fix splits agent settings into shared and private halves and strips legacy leaked values on read. Multi-tenant SaaS operators on pre-fix deployments should rotate MCP credentials added before that date and confirm they are on a post-fix main build (no in-window tagged release yet).

Hermes Agent v0.14.0: PyPI distribution, lazy adapter install, and the proxy. Installation moves to pip install hermes-agent; the [all] extras are removed in favor of lazy install of heavy adapters on first use. Cold-start drops ~19s. A native Windows beta ships. The hermes proxy command exposes a local OpenAI-compatible endpoint backed by whichever OAuth provider the operator is signed into. The PR body does not specify the proxy's bind address or auth model; default-loopback-only is the safe assumption to verify, not assume.

What "default-on" actually means

The three flips are not the same flip, and the differences are where the operator decisions hide. Claude Code's Auto mode is the broad case: a classifier runs the actions it judges safe and stops on the ones it judges risky, with the line between them defined at runtime rather than written down. The same release hands authors two new levers that point in opposite directions, a disallowed-tools field that lets a skill subtract from the agent's tool surface, and a MessageDisplay hook that can rewrite or hide assistant text before the operator sees it. One narrows what the agent can do; the other narrows what the human is shown.

Codex's goal mode is the long-horizon case. The 26.519 launch took it out of experimental across the app, IDE, and CLI, and CLI 0.133.0 turned goals on by default with their own storage and progress tracking, so an operator can hand Codex an objective measured in "hours or even days" and walk away. The same launch let Codex drive a Mac after the screen locks, hedged with short-lived authorization, covered displays, and an automatic relock on any local input. The locked-host capability is real; the safeguards are policy choices layered on top, not a wall.

Gemini CLI's change was the quietest and, in its way, the most telling. Its stable v0.44.0 collapsed a fan of Auto variants into a single mode, billed as simplification, erasing whatever distinctions those variants used to carry. It also began auto-approving shell redirects in AUTO_EDIT, a convenience that doubles as a wider attack surface the moment an agent is steered toward a path it should not write.

The benefit and the bill arrive together. An operator who never bothered to enable Auto mode now gets its speed for free; an operator who treated the consent dialog as a last manual look before something irreversible has to rebuild that checkpoint somewhere else, in managed settings, a hook, or out-of-band review. Every default-on change this fortnight made the tools easier to use and a standing authority check harder to see, in the same motion.

The policy file becomes the unit of control

Once an agent runs by default, the interesting question is no longer whether it may act but within what bounds, and bounds do not belong in a flag that lives for one session. Three providers said so in the same fortnight, each moving policy into a file that outlives the session and that the runtime treats as authoritative.

Codex went furthest. CLI 0.133.0 added profile inheritance, so a permission profile can derive from a base and layer changes on top instead of redeclaring every grant, and paired it with a managed requirements.toml the release calls enforcement, not advice. CLI 0.134.0 then made --profile the single selector across the CLI, its terminal UI, and the sandbox, and began rejecting the old flag-soup configs outright. Gemini took a narrower but deeper cut, pushing its PolicyEngine into ACP sessions; the changelog files it as a deadlock fix, which undersells it, because enforcement now reaches into the very protocol layer the docs hold up as the delegation primitive.

OpenHands showed what this looks like with the schema visible. Its org-level LLM profiles land as an encrypted column on the organization table, six endpoints under /api/organizations/{org_id}/profiles, and a two-tier permission split: one role reads settings, another creates, edits, and, the load-bearing verb, activates them. Activation flips the org's live profile and the acting member's settings in a single locked transaction, the database admitting that "which policy is in force" is now contested, shared state worth serializing.

For anyone running a team the takeaway is the same in all three: stop keeping policy in per-session flags. Write a base, derive the per-team variations, and treat the checked-in file as the source of truth. The one thing none of the three has documented is how that file is distributed and whether it is signed, which is the gap to close before leaning on it for real enforcement.

One idea, three doors

The fortnight's quietest thread was also its sharpest. Three providers, working on parts of the stack that have nothing to do with one another, landed on the same instinct: put a structural gate at the exact spot where untrusted input crosses into the agent, rather than trying to talk the agent out of misbehaving once the input is already inside.

OpenClaw put its gate at the front door, where messages arrive. Sender allowlists now run before the agent is dispatched rather than blocking it afterward; browser snapshots check SSRF policy before they read a tab's URL; queued system text is scrubbed so a hostile plugin or channel label cannot forge the markers the model reads as instructions. The through-line is to deny an unauthorized sender any chance to shape the agent's behavior at all, instead of catching specific actions once it has already been nudged.

Agent Zero put its gate at the host, where actions land. Its new computer_use_remote tool drives the operator's real desktop, outside the container, through the platform's own accessibility APIs, and it refuses to trust its own work: every state-changing action stays unverified until a fresh screenshot confirms it, and the agent must stop when no screenshot is available. A denied approval on macOS routes to a flow that has to be re-armed by hand, not a silent retry.

OpenHands put its gate between tenants, where credentials had been leaking. Until the fix, an MCP or acp_env setting added by one member of an organization was written to every member's row; now agent settings split into a shared half and a private one, secrets follow only the member who set them, and the old leaked values are stripped on read so the contamination ends with the upgrade.

A front door, a host, a tenancy boundary: three different doors, one idea. Each provider decided that the way to keep bad input out is a gate at the boundary it crosses, enforced by the system, not a paragraph of prompt asking the model to behave.

Provider Notes

Codex (26.519, CLI 0.131 to 0.134) ran the window's most coordinated release train: goal-mode graduation and the CLI's default-on landed the same day, with profile inheritance and the managed requirements.toml arriving alongside -- capability and its governance shipped as one move, which no other provider managed this fortnight. The rest is quality-of-life at the edges: remote computer use after Mac lock, Appshots, plugin marketplace sharing, codex doctor diagnostics, Python SDK authentication, codex exec resume --output-schema, history search, and read-only MCP concurrency via readOnlyHint.

Gemini CLI (v0.44.0) shipped stable LocalSessionInvocation / RemoteSessionInvocation protocols (closing the "tests but no observed remote target" gap on the prior AgentProtocol), first-wins prioritize-project agent registration, OAuth refresh preservation during rotation, keychain auth for --list-sessions and non-interactive mode, and MCP OAuth token refresh on re-authentication. Two weeks of What's-New digests (Weeks 21 and 22) are not yet published, so the changelog and release notes are the trailing surface.

OpenHands (main branch, no tagged release in window) shipped the ACP agent settings UI, organization-level LLM profiles, scoped MCP/ACP env to acting org members, Azure DevOps via Microsoft Entra ID OAuth/OIDC, Bitbucket DC and Jira DC integrations with KOTS-managed service accounts, and a batched CVE remediation cluster (9+ deps). The shape is consolidation as the enterprise-self-hosted shell around third-party agents and Data Center source control.

Agent Zero (v1.15 to v1.18) shipped host-machine desktop control with vision verification, ephemeral context-scoped capture by default, speech as independent built-in plugins (breaking removal of legacy APIs), document_artifact renamed to office_artifact, dedicated Markdown editor plugin, file-browser routing formalization, configurable max_active_skills, MCP multimodal content handling fix, and skill visibility controls (operators can hide skills from the model-facing catalog).

OpenClaw (v2026.5.18 to v2026.5.26) put its weight where its risk is: the content-boundary hardening suite, reaction-based approvals across Signal, iMessage, and WhatsApp, and an on-by-default gateway auth rate-limiter for configs that never set gateway.auth.rateLimit. Around that core, transcripts became a first-class source path with a Meeting Notes plugin, model login profiles gained credential migrations, realtime Talk became inspectable and cancellable across Web UI and Discord voice, and releases now carry verification stanzas with full CI run URLs -- a vendor showing its receipts.

Hermes Agent (v0.14.0) is the Foundation Release: PyPI distribution, lazy adapter install with supply-chain advisory checker, native Windows beta, Zed ACP Registry listing, the OpenAI-compatible local hermes proxy, Honcho identity-mapping with peer-id in cache signatures, isolated credential pool on provider fallback, and a sustained fix(kanban) corruption-hardening wave post-release.

Paperclip (v2026.513, v2026.517, v2026.525) shipped scoped agent permissions and protected assignments via a real authorization service, routine env secrets with agent < project < routine precedence, board-managed document locks, Modal as a first-party sandbox plugin, and an ACPX-Claude adapter that resolves bare Claude model IDs, surfaces real diagnostic detail, and respects user ~/.claude/settings.json permissions.

Pi coding agent (v0.74.1 to v0.76.0) shipped supply-chain hardening (npm shrinkwrap, lifecycle-script controls, isolated install smoke tests), --session-id explicit session naming and excludeFromContext flag for the bash RPC, plus provider retry and timeout bounds. Supply-chain posture lands the same fortnight as Hermes's lazy-install advisory work, two different providers converging on the same hygiene.

Flue (Tier 2; v0.6.0 to v0.8.0) shipped the agents-vs-workflows category split (persistent agents/ via createAgent vs finite workflows/ via run), local() sandbox factory with env allowlist, Cloudflare Shell sandbox replacing the previously misleading R2 model, run observability with bare runId routes, an OpenAPI sub-app, and a read-only admin sub-app. The runs-as-workflow- only choice is the cleanest "what is the receipt?" answer this cycle.

What To Try

Codex operators: point goal mode at an objective spanning hours or days on 26.519 + CLI 0.133.0; observe the dedicated storage and progress-tracking surface. If you have multiple teams, draft a base permission profile and derive per-team variations using the new inheritance.
Claude Code operators: audit managed settings before upgrade to 2.1.152 if you relied on the Auto mode consent dialog as a manual posture check. Skill authors should evaluate disallowed-tools.
OpenHands evaluators: enable ENABLE_ACP and point it at Claude Code, Codex, or Gemini CLI as the back-end. Observe how LLM/Condenser/MCP settings grey out, as authority shifts to the back-end agent and the UI reflects the transfer.
Agent Zero operators (host adopters): enable computer_use_remote on a non-critical host. Test the vision-verification stop flow: trigger a state change, withhold a screenshot, observe whether the agent halts as the release notes describe.
Hermes adopters: try pip install hermes-agent and route Codex CLI, Aider, Cline, or Continue through hermes proxy against a single OAuth subscription. Confirm the proxy's bind address before exposing it.
OpenClaw operators: verify your gateway.auth.rateLimit setting; the unset case is now ratelimited by default. Test the pre-dispatch allowFrom allowlist with a sender outside your trust set.

What Remains Uncertain

Codex managed requirements.toml distribution and signing: the release notes describe org-level enforcement but not how the file reaches the runtime, whether it is signed, or whether tampering is detectable. Enterprise adopters cannot rely on enforcement without this answer.
Gemini PolicyEngine-in-ACP default posture: per-session enforcement by default, or only when an operator has configured a policy? Release notes frame it as a deadlock fix. The structural shift implied by the change is larger than that framing suggests.
Agent Zero ephemeral-capture audit evidence: where does host-action evidence land for audit when screenshots are ephemeral? Operators cannot browse on-disk caches to confirm what the agent saw.
Hermes hermes proxy bind and auth model: PR body does not detail loopback-only binding or shared-token requirement. Default-loopback is the safe assumption to verify, not assume.
Gemini remote session invocation target: the protocol is stable but where remote invocations actually run (Google-hosted, operator-hosted, both) is undocumented.
OpenHands no-tagged-release operators: the strategic positioning, the org-LLM-profile feature, and the cross-org credential leak fix are all main-branch-only. Operators tracking the 1.x release channel see none of this until the next release consolidates.
Products that front other products: an OpenHands ACP UI driving Claude Code, Codex, or Gemini CLI as the back-end agent, and Paperclip's ACPX-Claude adapter that respects a user's ~/.claude/settings.json, are the same shape: one tool wrapping another's agent. Whether authority and settings flow cleanly across that boundary is the thing to watch as more products compose this way.
Claude Code's own What's-New digests are two weeks behind (Weeks 21 and 22 are unpublished). For this fortnight Claude Code's primary write-up surface is missing, so this read leans on the changelog alone.

Revised 2026-07-02 (artifact_version 2): editorial pass on two provider notes (judgment-first framing). Claims and receipts unchanged.

Top signals from this issue

Projects reviewed in this research run

Claude Code Codex Gemini CLI OpenHands Agent Zero OpenClaw Hermes Agent Paperclip

Research artifacts and publication history are open in the repository.

View source on GitHub

Sources

Primary links, including exact changelog lines when available.

Versions

complete

2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0

14 signals / 2026-05-13 to 2026-05-27