Signals

What changed enough to matter.

A signal is a finding that should change what a serious operator does next — how they configure an agent, what they test, or what they stop relying on. Each entry links to the run that produced it.

Filter by source

Filter by composition

June 2026

2026-06-16 · Hermes Agent

Skills were poisoning every memory store and a skill delete could wipe the working tree (unreleased)

Runtime
- June 16 commits (main) stop a /skill invocation poisoning every connected memory provider with its raw body, and add tree-escape validation so an agent-triggered skill delete cannot rmtree outside the skills root (a fix ported from an incident that wiped another tool user's working directory). The self-improving-agent risk class made concrete.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-15 · Claude Code

Subagents can spawn subagents five deep, and auto mode now classifies spawns before launch

Control Plane
- 2.1.172 lets a subagent spawn its own subagents up to 5 levels deep (new capability and a new governance surface); 2.1.178 then made the auto-mode classifier evaluate a spawn before launch, closing a gap where a deeply nested agent could request an action the operator's policy would block at the top.
- Operators running under auto mode should upgrade past 2.1.178 before trusting a delegation tree, and use argument-aware permission rules to cap what spawned agents can do.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-15 · Claude Code

Permission rules can finally match a tool's arguments (Agent(model:opus))

Control Plane
- 2.1.178 added Tool(param:value) syntax so a rule can match input parameters, e.g. Agent(model:opus) blocks Opus subagents; permissions move from all-or-nothing per tool to per-argument.
- Operators governing delegated trees should reach for this to cap model tiers and arguments inside subagents.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-15 · OpenHands

Concurrency becomes a governed, billable resource (Personal 3, commercial 10; unreleased)

Control Plane
- PR #14168 (main, unreleased) caps concurrent conversations/sandboxes (Personal=3, commercial=10) with per-org and per-user override columns and HTTP 429 enforcement. A real resource-control and economics surface; tightens the free tier.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-15 · Hermes Agent

Fire-and-forget background subagents that re-inject results as a new turn (unreleased)

Runtime
- delegate_task(background=true) (main) dispatches an async subagent and re-injects its result as a new turn, with /stop and /agents as the control surface and a max_async_children cap. The same week removed the default 600s subagent timeout, so runaway detection now rests on heartbeat staleness alone. Changes the unit of work and the receipt boundary.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-15 · Gemini CLI

Three path-traversal holes in agent skill install/link/uninstall (fixed on main only)

Runtime
- Commit bca5667fc / PR #27767 (main, ahead of every stable, preview, and nightly tag as of 2026-06-16) fixes three path-traversal vulnerabilities so a malicious skill package cannot write outside .gemini/skills or delete sibling directories. The clearest confirmation that agent skill packages are an untrusted-input boundary; treat third-party skill installs as untrusted until the carrying release ships.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-15 · Codex

Codex CLI adds usage views, permanent session deletion, and managed encrypted Bedrock auth

Runtime
- CLI 0.140.0 adds /usage cost visibility, permanent codex delete (a data-retention/right-to-delete lever), /import, and managed Amazon Bedrock API-key auth with encrypted local storage; 0.139.0 made sandbox proxy-only networking enforcement more consistent. Use codex delete to purge sensitive sessions; re-validate proxy-only egress.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-14 · OpenHands

Admins can lock an org to a curated model set and hide custom-key fields (unreleased)

Control Plane
- PR #14773 (main, unreleased) adds allow_user_llm_configuration: off hides custom model/base-URL/API-key inputs and locks the org to a curated, proxy-served model set. The platform owns the model-access policy, not the user.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-13 · Hermes Agent

Hermes closes its own guardrail theater: cp into ~/.ssh, a status leak, fail-open adapters (unreleased)

Runtime
- June 13 commits (main, post-v0.16.0) gate cp/mv/install into ~/.ssh and credential/shell-rc files (an unpaired write deny the commit calls 'theater'), stop /api/status leaking host paths and the gateway PID on exposed binds, and make own-policy chat adapters fail closed without an allowlist as their own SECURITY.md required. The v0.16.0 release binary does NOT have these; run main or wait for the next tag.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-13 · OpenHands

Plaintext git tokens in the DB, a plaintext MCP key, and two frontend CVEs

Runtime
- OpenHands stopped persisting PluginSpec.source git tokens in plaintext in the DB (#14795, main) and stopped round-tripping remote MCP API keys in plaintext (#14613, main); react-router CVE-2026-42342 shipped in release 1.8.0 (uncredited), postcss CVE-2026-41305 is on main. Rotate any token embedded in a repo source URL or MCP config before the fix; rebuild the frontend.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-13 · OpenClaw

A WCAG 2.1 AA pass (beta) and a deliberate consent-over-convenience choice on search

Platform
- OpenClaw shipped a measured WCAG 2.1 AA pass on its browser dashboard (contrast above 4.5:1, a focus ring, a 12px font floor across 136 elements) in a BETA tag (v2026.6.7-beta.1), plain-language mobile provider states, and pinned-commit ClawHub skill installs. It also made key-free web search an explicit opt-in (stable v2026.6.8), trading zero-config convenience for explicit consent on where queries egress.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-12 · Claude Code

Org model allowlists are finally binding, even against the default model

Control Plane
- enforceAvailableModels (2.1.175) makes the availableModels allowlist constrain the Default model and blocks user/project widening; a cluster of fixes closed env-var, /fast, subagent, advisor, and dispatch escape hatches. This is the lever an enterprise needs to decide whether Fable 5 is reachable per-org.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-12 · Paperclip

Shared-pool tenants were instance admins of the whole instance (fixed, unreleased)

Control Plane
- PR #7525 (merged to master 2026-06-12, NOT in a tagged release) removes a grant that made every cloud tenant on a shared pool an instance admin with reach into every other tenant's data, and purges stale admin rows. Shared-pool operators must track the next tag and provision a non-cloud-tenant admin identity first (the purge is destructive).
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-12 · Paperclip

Per-company JWT signing keys and a 1-hour TTL replace a single master key (unreleased)

Control Plane
- PR #5864 (master, unreleased) derives a per-company signing key and cuts the agent-token TTL from 48h to 1h, so one tenant's leaked key can no longer forge tokens for other tenants. Multi-tenant blast-radius control; track the next tag.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-12 · Paperclip

A 'NOT APPROVED' comment could auto-complete an issue (fixed, unreleased)

Control Plane
- PR #5839 (master, unreleased) tightens an approval regex that matched negated phrasings (so 'NOT APPROVED' auto-completed an issue) and wraps comment + status + decision in one transaction. Makes 'a rejection can never auto-complete' and 'observable state cannot diverge from intended state' true invariants of the approval gate.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-12 · OpenClaw

Exec approvals fail closed on timeout, and HTTP override surfaces are admin-gated

Control Plane
- v2026.6.6 made exec approvals fail closed on timeout (a pending dangerous command now denies rather than proceeds) across a dozen-surface boundary sweep that also closed a deleted-agent ACP bypass; v2026.6.8 gated HTTP session/model override surfaces behind admin privileges. The correct reversibility default for a surface aimed at non-experts.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-11 · Codex

Computer use expands to Europe and Enterprise, with the first per-app controls and a CDP browser surface

Control Plane
- App 26.609 added Developer mode giving the agent controlled Chrome DevTools Protocol access (network interception, arbitrary in-page JS, the debugger), the first per-app access controls for computer use on Windows, and Enterprise computer use; on 2026-06-16 computer use reached the EEA/UK/Switzerland and Chronicle previewed building memory from screen context.
- Keep Developer-mode CDP off by default; use the Windows per-app controls to allowlist apps; default Chronicle off on confidential machines.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-10 · OpenHands

OpenHands Enterprise: the first user to log in owns the organization (unreleased)

Control Plane
- PR #14752 (main, intended for an untagged 1.39.0) makes the first user to sign in after enabling the default org its owner, keyed to an is_default DB flag (migration 119). The multi-tenant foundation the window's enterprise work stacks on. Operators must control who signs in first.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-10 · OpenHands

hide_personal_workspaces is explicitly UI-only, not an access boundary

Control Plane
- PR #14741 (main, unreleased) hides personal workspaces in org-only installs but the docs state it is UI-only: the orgs API still returns personal orgs and there is no server-side enforcement. Operators must NOT treat it as an access-control boundary; the real boundary is the membership model.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-10 · Claude Code

Untrusted-repo OTEL cert injection and pre-warmed-worker trust bleed closed

Runtime
- 2.1.169 fixes untrusted project settings setting OTEL client-certificate paths without a trust prompt (credential-path injection from a hostile repo); 2.1.172/2.1.174 fix pre-warmed background workers reading another directory's .mcp.json approvals/trust and inheriting another session's ANTHROPIC_* provider env. Upgrade past 2.1.174 and re-audit background-agent and untrusted-repo workflows.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-10 · Gemini CLI

Gemini routes flash workloads to gemini-3.5-flash on stable, behind an experiment flag

Runtime
- Stable v0.46.0 began moving flash workloads to gemini-3.5-flash, gated by an experiment flag and auth-type access logic (so the same binary can route different users to different models). Anyone with cost or eval assumptions pinned to the old flash should re-baseline.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-09 · Gemini CLI

Google steers Gemini CLI users toward a separate Antigravity CLI

Platform
- A transition banner exempted from the 5-show cap shipped to STABLE (v0.45.2) so 'Antigravity is coming to town' shows every session; a PREVIEW build (v0.47.0-preview.0) added in-product migration commands and a skill pointing to Antigravity CLI, a separate Google product. Reads as the start of a managed succession for Gemini CLI; track whether feature investment shifts to Antigravity and whether trust/policy semantics carry over.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-09 · Codex

Codex ships one-click import of Claude Code and Cowork setup

Platform
- App 26.608 added Migrate-to-Codex flows importing supported setup from Claude Code and Claude Cowork, including during onboarding: a defection on-ramp off Anthropic's coding agents and a concrete cross-tool config-portability surface.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-09 · Claude Code

Anthropic's Fable 5 launches and is adopted across rival harnesses within days

Platform
- Claude Code 2.1.170 shipped access to Claude Fable 5, a 'Mythos-class' model; OpenClaw and Pi added Fable 5 support within days (Pi with xhigh effort). A frontier model now reaches the long tail of agent harnesses in a week; the governance lever is the model-allowlist work (separate signal).
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-09 · Codex

Goal mode, worktrees, and inline review come to the iPhone

Platform
- ChatGPT iOS 1.2026.153 added /goal, branch selection, worktree creation, and inline review comments. Persistent long-horizon objectives, env-isolated work, and code review now run from the smallest surface, widening who can drive serious agent work and from where.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-08 · Pi Coding Agent

Pi gates local settings, instructions, and packages behind a saved trust decision

Control Plane
- v0.79.0 added project trust for local settings, resources, instructions, and packages with saved decisions and --approve/--no-approve CLI controls. Pi now treats local project files as untrusted-by-default; open an untrusted repo and confirm it refuses to load local resources until approved.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-08 · Flue

Flue reaches a 1.0-line beta and makes durable, recoverable agent execution the default

Runtime
- Flue shipped durable, recoverable agent execution with pluggable SQLite or Postgres persistence (0.10.0) and reached its first 1.0-line beta (1.0.0-beta.1), a migration-heavy stabilization (valibot tool schemas, opaque run_<ulid> IDs, run-introspection exports). It also swapped standard WebSocket and SSE for a proprietary Durable Streams transport (0.10.2), narrowing external observability. Category evidence that the model+harness split is maturing into stateful infrastructure; the experimental-API caution starts to lift.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-06 · Claude Code

Relayed SendMessage from peer sessions no longer carries user authority

Control Plane
- 2.1.166: messages relayed via SendMessage from other Claude sessions no longer carry user authority; receivers refuse relayed permission requests and auto mode blocks them. Closes a confused-deputy path in multi-session orchestration.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-06 · Hermes Agent

Hermes adds a desktop app, a browser admin panel, and remote-gateway connect

Platform
- v0.16.0 'The Surface Release' adds a native Electron desktop app, a browser web-admin dashboard, and remote-gateway connect over OAuth or username/password, collapsing install-to-first-message to seconds and adding a new authority boundary (the dashboard auth gate) that operators exposing it must govern.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-05 · Paperclip

Deny-by-default authority preset for agents reviewing untrusted content

Control Plane
- PR #7530 (in v2026.609.0) adds a low_trust_review authority preset, source-trust tagging, route containment, and quarantine so an agent reviewing a hostile PR/comment/attachment gets narrower authority and its output cannot flow into higher-trust context. Enforced authority for the untrusted-input boundary, not a dashboard label.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-05 · Paperclip

Paperclip drops 'zero-human companies' for 'manage AI agents for work'

Platform
- PR #7580 retires the 'zero-human companies' tagline for 'the app people use to manage AI agents for work', a repositioning its in-window engineering backs up (human board visibility, audited recovery, approval gates). Calibration signal: the autonomous-company metaphor is being repriced toward human-in-the-loop operating software.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-04 · Codex

Remote controllers are now listable and revocable, and approvals carry environment identity

Control Plane
- CLI 0.137.0 lets remote-control clients pair and have controller grants listed/revoked via app-server v2 RPCs, and binds permission requests/approvals to an environment identity. A concrete authority-inventory and revocation surface for who can drive a session remotely.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-04 · Agent Zero

The public Tailscale tunnel now trusts only the active Remote Control origin

Runtime
- v1.20 (commit ca4efe6e6) normalizes active Remote Control URLs before CSRF allowlisting and restricts WebSocket origin validation to only the currently active Remote Control origin (the public tunnel exposing the whole visible computer), rejecting unrelated external origins.
Run: 2026-06-16-weekly-digest-2026-06-04_2026-06-16-frontier-v0
2026-06-03 · Claude Code

Permission and deny rules now enforced as written across WebFetch, Windows paths, and Glob/Grep

Control Plane
- Three distinct gaps where a configured permission/deny rule silently failed to apply are closed in the 2.1.160-2.1.162 line: custom WebFetch rules now override built-in preapproved domains, Windows rules with backslashes or case-variant paths now match, and Read deny rules now hide files from Glob and Grep results.
- Operators who wrote allow/deny policy and assumed it was enforced were running with a false sense of coverage; the fix is gated purely on upgrading past these versions, so the operator action is 'upgrade, then re-audit whether any policy was silently bypassed in the prior window.'
- The Read-deny-vs-Glob/Grep gap is the sharpest: a file an operator denied for Read was still discoverable (and its path/contents surfaceable) via search tools, defeating the access-control intent.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Claude Code

Agent view exposes why a session is blocked and fan-out progress for scripted supervision

Control Plane
- claude agents --json now includes a waitingFor field naming what a blocked session is waiting on (e.g. a permission prompt), and claude agents rows now show done/total progress before detail when work is fanned out.
- Operators scripting or monitoring agent fleets can now programmatically distinguish 'stuck on a permission prompt' from other waits and read parallel-task completion, which is the difference between a watchdog that can unblock a session and one that can only detect silence.
- The operator action is to wire waitingFor and the progress counter into supervision tooling so stuck-agent triage stops requiring a human to open each session.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Codex

CLI 0.136.0 adds API-key registration for approved remote exec-server hosts

Runtime
- An operator running remote execution can register approved hosts via API key instead of entering credentials per session, changing the remote-exec authentication model.
- This shifts trust to a pre-registered host allowlist keyed by API key — operators must decide which hosts are 'approved' and how those keys are scoped and rotated before enabling remote exec.
- Verification path: upgrade to 0.136.0, register a test host, confirm only approved hosts authenticate and that key scope/rotation behaves as expected before exposing remote execution.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Codex

Amazon Bedrock integration runs Codex models under AWS-managed authentication and billing

Platform
- An operator with AWS infrastructure can now run OpenAI models through Amazon Bedrock, moving authentication and billing under AWS IAM and cost allocation instead of an external OpenAI API path.
- This reframes where the trust and identity boundary sits — Codex model calls become AWS-native, which changes compliance and credential-management decisions for AWS-policy organizations.
- Verification path: provision Codex models via Bedrock, confirm IAM scoping and that no model traffic leaves the AWS-managed path before treating it as compliance-satisfying.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Codex

ChatGPT iOS 1.2026.146 adds optional Face ID / passcode lock for Codex

Control Plane
- An operator running Codex on iOS can now require Face ID or a passcode to open Codex, adding a device-level authority gate that did not exist before.
- It is optional, so the operator decision is whether to enable it as policy for mobile-deployed Codex access.
- Verification path: update to 1.2026.146, enable the lock, confirm Codex requires biometric/passcode on foreground before trusting mobile as an access surface.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Codex

Sites plugin (preview) adds in-app website and web-app creation and deployment

Platform
- An operator can now create, deploy, and manage websites, dashboards, and web apps directly within Codex, removing the external-tool step for web deployment.
- ChatGPT Business workspaces include Sites by default, so the operator decision is whether to allow/govern an in-product deploy surface that may already be enabled.
- Verification path: confirm whether Sites is enabled in your Business workspace and whether agent-initiated deployments fit your hosting/governance policy before relying on it.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Gemini CLI

v0.45.0 stable bundles terminal hardening, session-context cleanup, and an MCP blacklist-bypass fix

Platform
- Operators on preview or older stable builds get a single upgrade decision: move to v0.45.0 to pick up Termux relaunch/resize fixes, session-context filtering on history resume, sequential tool execution for update_topic, Vim keybinding fixes, and an MCP blacklist-bypass prevention fix.
- The MCP blacklist-bypass prevention is the security-bearing item: it closes a path where a blacklisted MCP tool/server could still be reached, so operators relying on MCP allow/deny controls should upgrade before trusting the blacklist.
- Verification path: release tag v0.45.0 notes (published 2026-06-03T01:05:14Z) enumerate the bundled fixes.
- Single composite upgrade decision - bundled small fixes all gated on 'upgrade to v0.45.0' stay one signal.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Gemini CLI

Policy file survives cross-device mounts and corruption via EBUSY fallback and TOML recovery

Control Plane
- Operators running in containers with cross-device mounts no longer hit silent policy-update failures - atomic rename now falls back to copy-then-unlink on EBUSY/EXDEV.
- A corrupted policy TOML is auto-backed-up to .bak and rebuilt from scratch rather than blocking on a syntax error, removing a manual-intervention failure mode.
- Verification path: packages/core/src/policy/config.ts adds the fallback and recovery; persistence.test.ts covers both paths.
- Single operator class (operator persisting policy/permission config), single consequence (policy persistence no longer fails silently).
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Gemini CLI

CI labeler switched to pull_request_target, granting write context to fork PR runs

Platform
- Contributors and maintainers should note the PR-size labeler now runs under pull_request_target, which executes in the base-repo context with write-capable token access on fork PRs.
- This is the classic pwn-request surface: pull_request_target with any checkout or execution of fork-controlled content can leak the elevated token; operators forking or auditing the repo's CI should confirm the workflow does not check out and run untrusted PR code.
- Verification path: .github/workflows/pr-size-labeler.yml line 4 trigger change from pull_request to pull_request_target.
- Single decision for the repo-security auditor: review this workflow's token scope and whether it touches fork-controlled inputs.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Gemini CLI

Gemini 3.5 Flash GA routes to flagged users via backend experiment flag, no client update

Control Plane
- Operators auditing which model their CLI calls cannot rely on client version alone - model selection is now gated server-side by experiment flag GEMINI_3_5_FLASH_GA_LAUNCHED (ID 45780819) via hasGemini35FlashGAAccess().
- Auto-routing logic silently switches to Flash GA when the flag is enabled for a user cohort, so the same binary can route to different models across users.
- Verification path: Config.hasGemini35FlashGAAccess() and the registered experiment flag determine routing; the model in use is no longer fully determined by local config.
- Single decision: operators must treat backend flag state as part of the model-routing audit surface.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Hermes Agent

Docker dashboard insecure binding now requires explicit HERMES_DASHBOARD_INSECURE=1 opt-in

Control Plane
- The dashboard no longer infers insecure mode from bind host, so operators whose Docker setups relied on that inference must add HERMES_DASHBOARD_INSECURE=1 explicitly or the dashboard will not bind insecurely.
- Existing Docker and hosted deployments must update env configuration before upgrading to v0.15.1 to avoid a broken or unexpectedly-secured dashboard.
- Verification path: upgrade to v0.15.1, set HERMES_DASHBOARD_INSECURE=1 only where intended, and confirm the dashboard binds as expected without falling back to host-derived inference.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Hermes Agent

Promptware defense added against Brainworm-class prompt-injection attacks

Runtime
- Operators running the agent against untrusted content (web, repos, MCP tool output) gain a built-in defense layer they should validate against their own injection test cases rather than assume blanket coverage.
- 19 security-tagged issues were closed in the same release, so the upgrade is the gate for these protections; staying on prior versions leaves the injection surface unmitigated.
- Verification path: upgrade to v0.15.0 and run known Brainworm-class injection patterns to confirm the defense triggers before exposing the agent to untrusted input.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Hermes Agent

Bitwarden Secrets Manager integration replaces per-provider API keys

Control Plane
- Operators managing credentials must decide whether to migrate from per-provider API keys to centralized Bitwarden Secrets Manager, changing where secrets live and how they rotate.
- Centralized secret management enables rotation and revocation that scattered per-provider keys did not; an operator wiring CI/automation must re-point credential sourcing.
- Verification path: configure Bitwarden Secrets Manager on v0.15.0, confirm the agent resolves credentials from it, and test a rotation to verify the agent picks up the new secret.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Hermes Agent

Kanban becomes a multi-agent orchestration platform with auto-decomposition, swarm topology, and worktree-per-task

Control Plane
- Operators who ran Kanban as a task board must now decide whether to adopt orchestrator auto-decomposition and swarm topology, which turn a queue into a self-spawning multi-agent fleet with new operating state to supervise.
- Per-task model overrides and worktree-per-task change the cost and isolation profile of every queued task; an operator must re-plan budget and concurrency.
- Verification path: deploy v0.15.0, queue a decomposable task, and confirm the orchestrator spawns the expected sub-agents in isolated worktrees before trusting it with real work.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Pi Coding Agent

OAuth browser-launch URI validation closes command-injection path

Runtime
- An operator authenticating against a third-party or attacker-influenced OAuth server was exposed to shell command injection via the verification URI; upgrading past ba6e529 removes that exposure.
- Verification path: confirm the build includes ba6e529 (non-HTTP(S) URIs rejected, browser launched via spawn() not shell exec()).
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Pi Coding Agent

Git package install path-traversal rejection

Runtime
- An operator installing a git-sourced package from an untrusted URL was exposed to files being written outside the package install root via traversal sequences; upgrading past a98e087 blocks this at parse and resolution time.
- Verification path: confirm a98e087 is present; a crafted git URL with '../' is rejected with 'Refusing to use path outside package install root'.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · OpenClaw

Skill Workshop adds a pending-proposal approval workflow with CLI/Gateway review and a skill_workshop agent tool

Control Plane
- Skill Workshop introduces a new pending-proposal lifecycle that an operator must approve or reject via CLI or Gateway before a skill takes effect, inserting a human-in-the-loop gate into skill provisioning.
- The skill_workshop agent tool lets agents themselves file proposals, expanding the automation surface; operators must decide who may review and who may self-approve.
- Decision is for the control-plane admin/skill-author: configure the review path and authority for skill proposals.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · OpenClaw

Enhanced plugin isolation tightens the plugin sandbox boundary in the 2026.6.1 line

Runtime
- Enhanced plugin isolation changes the sandbox boundary around plugins, including the externalized Tokenjuice and GitHub Copilot plugins now run as separate plugins.
- Operators running third-party or externalized plugins should re-test plugin behavior against the tightened isolation, since capabilities previously available in-process may now be constrained.
- Single runtime-admin decision: verify plugins still function under the new isolation after upgrade.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Paperclip

Unclaimed self-hosted deployments get a one-time browser claim to bootstrap the first admin

Control Plane
- Operators standing up a private self-hosted deployment now have a defined bootstrap path to create the first admin before any invite exists, replacing ad-hoc seeding.
- Whoever completes the one-time browser claim becomes the first admin, so an operator must claim a freshly deployed instance promptly to avoid a race for control.
- This changes the deployment runbook: the claim step is now the gate that establishes ownership of the control plane.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Paperclip

Company skills become first-class resources with an install/reset/audit/export/assign CLI

Control Plane
- Skills move from implicit configuration to governed resources: an operator can now audit which skills are installed and assigned, and export the catalog for review or provenance tracking.
- The CLI verbs (install, reset, audit, export, assign) give platform operators a programmatic path to manage agent capabilities across a company instead of clicking through a board.
- Assignment is a distinct authority action — an operator decides which agents get which skills — so capability grants become reviewable operating state rather than ambient defaults.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Agent Zero

Computer-use screenshots now persist to durable chat-scoped storage by default

Runtime
- Reverses the prior ephemeral-by-default posture for computer-use screenshots, so operators who relied on screenshots being transient must now account for retained artifacts
- Changes deployment storage characteristics: long-running computer-use sessions accumulate screenshots in chat context, requiring storage planning and retention/cleanup review
- Directly hits the Grid/workcell calibration concern of persistence and cleanup for real computer access
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Agent Zero

Office, Desktop, and Editor plugins become toggleable behind a protected plugin-state API

Control Plane
- Operators can disable Office, Desktop, or Editor plugins (Desktop computer-use especially) on deployments that should not hold those capabilities, via the v1.19 plugin-toggle endpoint.
- The endpoint is described as 'protected' but the release note documents no auth model or role-based capability management, so treat it as a disable lever, not yet an audited capability register.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Agent Zero

Remote Link renamed to Remote Control with selectable tunnel providers and handshake version advertisement

Platform
- Operators managing distributed deployments must update remote-connectivity terminology (Remote Link -> Remote Control) and can now choose among Cloudflare, Microsoft Dev Tunnels, Serveo, and Tailscale
- Version advertisement in connector handshakes lets CLI clients detect server compatibility, changing how operators coordinate client/server upgrades across a fleet
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · OpenHands

Upgrade frontend deps (axios 1.16.0, dompurify 3.4.0) to close CVE-2026-44492 and CVE-2026-41238

Platform
- Two browser-facing frontend dependencies were patched in the window: axios to 1.16.0 (CVE-2026-44492, commit 73d1d9a) and dompurify to 3.4.0 (CVE-2026-41238, commit b025cd2). Two commits, one operator action: rebuild and redeploy the frontend bundle.
- Self-hosters pinning older lockfiles must bump both manually; a stale frontend build leaves both CVEs live.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · OpenHands

Upgrade dulwich to 1.2.5 to close CVE-2026-42305 in git operations

Runtime
- Operator must re-resolve poetry.lock (enterprise and root) and rebuild backend images to ship patched dulwich; git operations run inside the agent runtime path.
- Distinct from the frontend CVEs: this is a backend Python git library, different surface and different verification (lockfile pin, not frontend bundle).
- Verification path: confirm dulwich>=1.2.5 in deployed poetry.lock / installed environment.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · OpenHands

ACP provider credentials now route through cipher-protected agent_context.secrets, not acp_env

Control Plane
- Operators running ACP agents must understand provider API keys/base URLs now flow through the cipher-protected secrets channel; the deprecated acp_env channel no longer carries credentials.
- Changes the persistence and exposure surface for agent provider credentials, with SDK gap-fill logic specifically preventing re-folding into the insecure acp_env channel.
- Verification path: confirm ACP provider creds appear via agent_context.secrets and are absent from acp_env in agent context.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · OpenHands

DELETE /api/organizations now cascade-deletes the sole-org requester (personal org)

Control Plane
- Operators must understand that deleting a personal org now also deletes the requesting user account, enabling re-onboarding on next login — a destructive identity-state change behind one endpoint.
- Changes operating-state semantics of an existing destructive API: requires backup discipline before org deletion; multi-org members are protected by preflight orphan detection.
- Verification path: test DELETE /api/organizations against a sole-org account vs a multi-org member and confirm orphan-rejection behavior.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Flue

v0.9.0 breaking app-config migration: routing/provider imports, provider-ID format, SDK mount paths, and beta session-state reset

Platform
- Upgrading to v0.9.0 forces a developer to rewrite application imports: routing moves from `@flue/runtime/app` to `@flue/runtime/routing`, provider APIs and `observe` come from `@flue/runtime`, and Workers AI types from `@flue/runtime/cloudflare` — code will not compile until updated.
- Provider model values now require `provider-id/model-id` format and `registerProvider()`/`configureProvider()` must share one ID; SDK mount paths now derive from `baseUrl` pathname — both are silent runtime-behavior changes that mis-route calls if not updated.
- Persisted beta session state is now rejected; the operator must clear or migrate the session store before upgrading or sessions fail to restore — a distinct destructive pre-upgrade step gated on the same v0.9.0 cutover.
- All of these share one verb (update-before-upgrade) for one persona (the Flue app developer) and one verification path (build + smoke-test against v0.9.0), so they route as a single platform migration signal.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Flue

v0.9.1 strips WebSocket URL credentials and rejects blank requestIds

Runtime
- Operators deploying Flue on Cloudflare WebSockets get two upstream hardening fixes by upgrading to v0.9.1: query strings and fragments are stripped before attachment persistence, so URL-carried handshake credentials are no longer retained, and agent/workflow frames reject blank or whitespace-only `requestId` values.
- Both are the same consequence for one persona (the Cloudflare WebSocket operator) gated on the same upgrade, so they stay one signal; the operator action is to upgrade and confirm credentials are no longer in persisted attachments.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-03 · Flue

v0.9.2 adds an activate_skill tool letting agents load skills autonomously

Control Plane
- Operators configuring skills now get a new agent-facing `activate_skill` tool: agents load full skill instructions on demand before matching work, shifting skill loading from operator-orchestrated to agent-initiated — a proactivity/authority change the operator should be aware of when scoping which skills are available.
- Workspace skills are reread on activation, so edits during an active session take effect (lazy loading preserved); verification is concrete (configure a skill, confirm the agent self-activates it and picks up an edit mid-session).
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-06-02 · Claude Code

Writes to execution-granting config and shell startup files now prompt even in acceptEdits mode

Runtime
- Two new guardrails land together: acceptEdits mode now prompts before writing build-tool config that grants code execution (.npmrc, .yarnrc*, bunfig.toml, .bazelrc, .pre-commit-config.yaml, .devcontainer/, etc.), and the agent now prompts before writing shell startup files (.zshenv, .zlogin, .bash_login) and ~/.config/git/.
- Operators who ran acceptEdits or auto-leaning modes previously had a silent write path into files that execute code on the next shell, install, or commit; the new prompt converts that into a confirmation checkpoint.
- The operator action is to recognize that these prompts will now fire and not blanket-allow them — the prompt is the supply-chain/persistence defense, so auto-approving it re-opens the vector.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0

May 2026

2026-05-30 · Claude Code

Auto Mode now available on Bedrock, Vertex, and Foundry for Opus 4.7 / 4.8

Control Plane
- Auto Mode's permission-handling posture, previously tied to first-party Anthropic auth, now extends to the cloud provider APIs (AWS Bedrock, Google Vertex, Foundry) for Opus 4.7 and 4.8, opt-in via CLAUDE_CODE_ENABLE_AUTO_MODE=1.
- The operator decision is governance-shaped: teams running Claude Code through a cloud-provider procurement path can now deploy the reduced-prompt autonomy posture they could not before, which changes what consent ceremony exists on those deployments.
- Because Auto Mode shifts permission decisioning away from per-action prompts, an operator enabling it on a Bedrock/Vertex deployment must confirm their managed-settings deny rules carry the governance weight the prompts used to.
Run: 2026-06-03-weekly-digest-2026-05-28_2026-06-03-frontier-v0
2026-05-27 · Claude Code

Auto mode becomes the default permission posture

Control Plane
- Operators with managed Claude Code deployments must re-audit what Auto mode classifies as safe by default — the consent gate is gone.
- Admins relying on the opt-in consent dialog as a visible posture check have lost that surface; equivalent visibility now comes from managed-settings policy, not from a runtime prompt.
- Skill authors should evaluate `disallowed-tools` for skills that should run with a reduced tool surface.
- Hook authors should consider whether `MessageDisplay` is a governance gain or a censorship hazard for their deployment.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Claude Code

Three de-facto security advisories without a separate advisory surface

Runtime
- Windows operators on 2.1.148 or earlier with PowerShell allowlists, git worktree workflows, or enterprise login pinning should upgrade to 2.1.149+ before deploying new agents.
- Operators monitoring for security-advisory-shape events (RSS, CVE feeds) need to recognize that Anthropic ships these as ordinary changelog entries; the changelog is the de-facto advisory surface.
- Source-contract owners should decide whether to amend `sources/claude-code.yml` to add an explicit security advisory surface or to document the changelog as carrying that role.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Codex

Goal mode graduates default-on; remote computer use after lock ships

Control Plane
- Operators using Codex must decide whether goal mode is permitted as a baseline or constrained via permission profiles — the inheritance + managed-requirements features are the right tool for this.
- Evaluators of remote computer use after Mac lock should treat the locked-host surface as a new authority decision, not a default; short-lived authorization and relock-on-input are sensible defaults, but the policy for which tasks may operate against a locked host is still an operator choice.
- Plugin-marketplace evaluators (ChatGPT Business; Enterprise coming soon) should treat plugin distribution-by-marketplace as a new supply-chain surface to govern.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Codex

Permission profiles get inheritance and an org-managed enforcement file

Control Plane
- Enterprise operators should restructure permission policy: stop maintaining flat profile lists; build a base profile plus per-team derivations using inheritance.
- Decide where `requirements.toml` lives (repo-rooted, org-rooted, signed) before depending on enforcement — the distribution and trust model are not yet documented.
- Migrate off legacy profile configs; 0.134.0 rejects them with migration guidance.
- Normalize permission selection on `--profile` as the canonical handle; flag-soup approaches are now legacy.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Gemini CLI

Local and remote session invocation protocols land stable

Runtime
- Operators building delegated workflows on Gemini CLI should re-test against v0.44.0 stable; the remote invocation protocol is no longer preview.
- Multi-scope deployments must audit agent name overlaps before upgrading — the new `first-wins prioritize project` resolution changes which definition wins.
- Until Google documents where remote invocations actually run, treat the remote path as infrastructure-to-be-defined; do not depend on it for production.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Gemini CLI

Auto modes collapse and PolicyEngine reaches into ACP sessions

Control Plane
- Operators on previous Auto variants must re-audit which behaviors the consolidated Auto mode treats as safe — the merger may have loosened or tightened constraints; release notes do not enumerate.
- `AUTO_EDIT` operators should explicitly decide whether shell-redirect auto-approval is acceptable for their environment.
- Operators evaluating Gemini ACP integration should treat PolicyEngine-in-ACP as the new enforcement boundary; the 'deadlock fix' framing understates the structural shift.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · OpenHands

OpenHands becomes the GUI shell for other harnesses, with org-level LLM profiles

Platform

composes with Claude Code , Codex , Gemini CLI
- Evaluators of OpenHands as a multi-agent shell: enable `ENABLE_ACP` against your preferred ACP back-end (Claude Code, Codex, Gemini CLI) and test the policy surface — the greyed-out settings while ACP is active are intentional.
- Multi-tenant SaaS operators must confirm they are on 2026-05-22+ to get the MCP/ACP env scoping fix. Audit MCP credentials that may have been shared across org members pre-fix.
- Enterprise admins should treat the org-level LLM profile model as the canonical place to set 'this org uses these models' policy.
- Operators on the release channel need to know none of this is in a tagged 1.x release yet — main-branch only.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Agent Zero

Host desktop control with required visual verification

Runtime
- Operators evaluating Agent Zero for host control must decide whether `computer_use_remote` is allowed at all on the host — the default trust mode is opt-in but the runtime checks are enforceable.
- Workcell operators should know that screenshot capture is now ephemeral and context-scoped by default; auditing what the agent saw requires explicit durable capture.
- Operators using the existing `linux-desktop` skill: verify your skill routes to the path you expect; host and container desktops are now cleanly separated.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · OpenClaw

Content-boundary hardening suite across inbound surfaces

Runtime
- Operators evaluating OpenClaw against 'is it safe to put agents on real channels' can use this suite as evidence of a threat model, not just a feature list.
- Gateway operators should verify whether `gateway.auth.rateLimit` was unset in their config — the on-by-default ratelimit changes observable behavior for non-browser/HTTP auth flows.
- Plugin authors should treat `allowFrom` sender allowlists as the canonical inbound boundary; post-dispatch filtering is the older model.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Hermes Agent

Hermes ships PyPI, lazy adapter install, native Windows beta

Platform

composes with Aider , Cline , Codex , Continue
- Builders who bounced off the prior clone-and-shell installer should re-evaluate Hermes — `pip install hermes-agent` plus lazy adapter install plus Windows beta plus Zed ACP Registry listing materially lower the floor.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Hermes Agent

`hermes proxy`: local OpenAI-compatible endpoint backed by operator OAuth

Control Plane

composes with Aider , Cline , Codex , Continue
- Operators running `hermes proxy` on the documented loopback default (`--host 127.0.0.1`) inherit a low-risk posture; the proxy accepts client `Authorization` headers and strips them before attaching the Hermes OAuth upstream. Operators changing the bind to a non-loopback address must place their own auth in front of the port — the proxy itself does not authenticate local callers.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Hermes Agent

Honcho identity mapping and credential-pool isolation

Control Plane

composes with Aider , Cline , Codex , Continue
- Multi-user gateway operators should upgrade past the Honcho commits (week of 2026-05-21) and the credential-pool isolation commit (2026-05-27) before running shared-thread deployments — these are quiet correctness fixes for cross-user contamination.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Hermes Agent

Kanban corruption-hardening wave (post-v0.14.0)

Runtime

composes with Aider , Cline , Codex , Continue
- Kanban-dependent multi-agent operators should treat the post-v0.14.0 line as the integrity-floor baseline; the corruption-hardening wave volume is the signal.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-27 · Paperclip

Scoped agent permissions, layered routine secrets, document locks

Control Plane
- Multi-agent operators: re-evaluate Paperclip's authz model. The principal-access backfill means pre-existing data is being normalized to the new model — confirm any operator action needed for older versions.
- Secret-handling operators: read PR #6212 before configuring routine env in a deployment where secrets matter — the `agent < project < routine` precedence is a structural operator concept.
- Approval-discipline operators: migrate to lock-backed approval; document locks give approval a persistent surface.
- ACPX-Claude operators: confirm `~/.claude/settings.json` is configured as the source of truth for Claude permissions — the Paperclip control plane defers to it.
Run: 2026-05-27-weekly-digest-2026-05-13_2026-05-27-frontier-v0
2026-05-13 · OpenClaw

Per-sender tool policies via channel-scoped sender keys

Control Plane
- Operators running OpenClaw with public-facing channels can now restrict dangerous tools by requester identity rather than only by agent. Review your tool surfaces and decide whether the broader trust model (per-channel × per-sender) belongs in your deployment.
- Authority restriction now extends across global, agent, group, core, bundled, and plugin tool surfaces — operators should re-audit which surfaces hold authority decisions in their deployment and whether the requester-level layer makes some prior per-agent restrictions redundant.
- Three claim-level updates land in the same release: memory-wiki ingest now requires admin scope, Obsidian search requires write scope, and `openclaw models auth login --provider openai` defaults to ChatGPT/Codex login (API-key setup is now behind `--method api-key`). Setup scripts assuming read-only or API-key-first paths need to be updated.
Run: 2026-05-13-partial-cycle-openclaw-refresh-2026-05-13-frontier-v0
2026-05-12 · Pi Coding Agent

Package scope migration to earendil-works; harness SDK stream config

Platform
- Operators with global Pi installs should run `pi update --self` once @earendil-works/pi-coding-agent is published to migrate from the old @mariozechner scope.
- Operators with Pi pinned in CI, Dockerfiles, or package.json by the old @mariozechner/pi-coding-agent name should update their references to @earendil-works/pi-coding-agent.
Run: 2026-05-12-partial-cycle-pi-coding-agent-2026-05-07_2026-05-12-frontier-v0
2026-05-12 · Paperclip

Secrets provider vaults (AWS Secrets Manager), host env isolation fix, cursor_cloud adapter

Control Plane
- Operators running SSH-managed execution environments should upgrade immediately: the host env isolation fix (PR #5142) closes a path where host environment variables (API keys, tokens, paths) were being forwarded to remote execution targets.
- Operators managing credentials at scale should evaluate the AWS Secrets Manager import path in Secrets settings UI — this enables rotation-aware credential management with an access-event audit trail.
- Operators using Cursor as an adapter can now configure the new `cursor_cloud` adapter for cloud-hosted Cursor routing with session reuse, streaming, and cancellation.
Run: 2026-05-12-partial-cycle-paperclip-2026-05-07_2026-05-12-frontier-v0
2026-05-12 · OpenHands

Sub-agent delegation (opt-in) and critic evaluation GUI

Control Plane
- Operators running multi-task sessions can now enable sub-agent delegation via `enable_sub_agents`. Built-in sub-agents (bash-runner, code-explorer, general-purpose, web-researcher) handle scoped tasks with restricted tool surfaces. Default is off -- enable deliberately.
- Operators should configure `CRITIC_API_KEY` to route critic evaluation spend separately from the primary model key if centralized cost control matters.
- The critic display is deployment-controlled via `OH_ENABLE_CRITIC_BY_DEFAULT` (disabled by default). Deployments that want it enabled should set that flag; per-deployment toggle is `verification.critic_enabled`.
Run: 2026-05-12-partial-cycle-openhands-2026-05-07_2026-05-12-frontier-v0
2026-05-12 · OpenClaw

Per-agent message restrictions, gated code install, and onboarding wayfinding

Control Plane Platform
- Operators deploying public-facing or sandboxed agents should evaluate `tools.message.crossContext` and `tools.message.actions.allow` overrides to restrict agent message sends to the current conversation without changing the global bot policy.
- Operators running long-horizon OpenClaw sessions should know that session memory is now bounded: the memory dreaming promotion cap compacts oldest auto-promoted sections while preserving user-authored notes. Unbounded auto-memory growth is no longer the default behavior.
- Operators deploying OpenClaw for new users should test the improved CLI onboarding wayfinding: setup, onboarding, configure, and channel commands now explain the next useful command at each step.
Run: 2026-05-12-partial-cycle-openclaw-2026-05-07_2026-05-12-frontier-v0
2026-05-12 · Hermes Agent

Hermes drops mistralai from [all] extras after PyPI quarantine of 2.4.6

Platform
- Operators who installed hermes-agent[all] on or around 2026-05-12 should verify whether mistralai==2.4.6 is present in their environment and remove it if so.
- Operators needing Mistral Voxtral TTS must switch to explicit hermes-agent[mistral] install; it no longer ships in [all] while quarantine is active.
Run: 2026-05-12-partial-cycle-hermes-refresh-2026-05-12-frontier-v0
2026-05-12 · Hermes Agent

Durable Kanban with hallucination gate, redaction-on-by-default, channel allowlists

Control Plane
- Operators upgrading existing Hermes deployments must verify that secret redaction is now ON by default. Log pipelines that relied on unredacted output will see sanitized logs after upgrade.
- Discord operators with role-gated access (`DISCORD_ALLOWED_ROLES`) should re-verify their role-scoping configuration: the guild-scoped fix (CVSS 8.1) may change behavior in cross-guild bot deployments.
- Operators building multi-agent workflows on Hermes should evaluate the Kanban board's reliability primitives (heartbeat reclaim, zombie detection, hallucination gate, per-task retries) before building a custom coordination layer.
- Operators using cron should evaluate `no_agent` mode for script-only automation that does not require LLM invocation.
Run: 2026-05-12-partial-cycle-hermes-agent-2026-05-07_2026-05-12-frontier-v0
2026-05-12 · Gemini CLI

Session resume now surfaces errors and finds legacy sessions

Control Plane
- Operators using --resume with legacy session formats should re-test: prior to this fix, resume failures silently started new sessions. Verify the behavior after upgrade.
Run: 2026-05-12-partial-cycle-gemini-refresh-2026-05-12-frontier-v0
2026-05-12 · Flue

Flue: programmable harness with run observability, virtual sandbox, and shell env security fix

Platform
- Operators using shell env for credentials in pre-v0.4.1 Flue sessions should verify their session store does not contain unredacted values — the v0.4.1 shell env redaction fix is a security patch.
- Operators using `sandbox: 'local'` should re-test: it is now genuinely local (direct host access, no just-bash), changing the isolation boundary for agents running in CI.
- Operators building on Flue should evaluate `flue logs` and run history (v0.5.0) as the primary evidence trail for autonomous agent invocations.
Run: 2026-05-12-partial-cycle-flue-2026-05-07_2026-05-12-frontier-v0
2026-05-12 · Codex

PreToolUse hooks can now rewrite tool inputs before execution

Control Plane
- Hook authors who returned updatedInput in PreToolUse hooks expecting rewrites to apply should re-test: prior to this fix, the original input was used; after this fix, the rewritten input is used. Verify existing hooks behave as intended after upgrade.
- Operators can now build input-sanitizing PreToolUse hooks that modify tool arguments before dispatch -- path normalization, argument masking, destination redirection.
Run: 2026-05-12-partial-cycle-codex-refresh-2026-05-12-frontier-v0
2026-05-12 · Claude Code

Agent view, goal completion, and governance hardening

Control Plane
- `claude agents` is the new canonical surface for multi-session supervision; operators running parallel Claude Code sessions should evaluate it now as their primary management interface.
- /goal changes how long-running autonomous work is structured; operators should test goal-based termination against their most common multi-turn workflows.
- `continueOnBlock` enables advisory governance hooks; existing PostToolUse blocks should be redesigned to pass rejection reasons so Claude can adapt rather than just stop.
- `x-claude-code-agent-id` / `x-claude-code-parent-agent-id` headers and OTel span attributes enable call-tree attribution; logging pipelines receiving Anthropic API calls should start capturing these to distinguish parent sessions from subagents.
- API key auth now disables Remote Control, /schedule, and claude.ai MCP connectors; operators using API key should audit reliance on these surfaces before upgrading.
Run: 2026-05-12-partial-cycle-claude-code-2026-05-07_2026-05-12-frontier-v0
2026-05-12 · Agent Zero

ODF-first document defaults, persistent desktop lifecycle, multi-tab browser fanout

Runtime
- Operators running Agent Zero should verify that downstream workflows handle ODT/ODS/ODP output from v1.13+. OOXML output now requires explicit configuration.
- Operators running long-horizon desktop sessions should plan for persistent desktop state: the Xpra Desktop no longer resets on canvas navigation. Accumulated desktop state (open apps, browser sessions) persists until explicitly shut down.
Run: 2026-05-12-partial-cycle-agent-zero-2026-05-07_2026-05-12-frontier-v0
2026-05-11 · Codex

Permissions glance surface and role-aware plugin sharing

Control Plane
- Bitter receipts should record permission posture + approval mode as standard fields.
- Plugin share role-awareness affects whether Bitter can share configs across roles.
- Authority visibility in the TUI is a worked example of governance ergonomics worth borrowing.
Run: 2026-05-11-partial-cycle-codex-2026-05-08_2026-05-11-frontier-v0
2026-05-11 · Gemini CLI

Subagents become pluggable; sessions become portable

Control Plane
- Capability-profile assumption "subagents inherit approval mode" is now under-specified.
- Run-contract design should record which subagent protocol variant a run used.
- Adapter work should distinguish local from remote subagent execution.
- Session export/import gives operators and Bitter a stable serialization point.
Run: 2026-05-11-partial-cycle-2026-05-08_2026-05-11-frontier-v0
2026-05-07 · Paperclip

Agent labor needs operating state, not just parallelism.

Control Plane

Run: 2026-05-07-expanded-watchlist-dry-run
2026-05-07 · Agent Zero · OpenHands

Real computers are becoming the agent work surface.

Runtime

Run: 2026-05-07-expanded-watchlist-dry-run
2026-05-07 · OpenHands

Agent harnesses are becoming full development platforms.

Platform

Run: 2026-05-07-expanded-watchlist-dry-run
2026-05-07 · OpenClaw · OpenHands · Agent Zero

Accessibility is becoming a frontier capability.

Platform

Run: 2026-05-07-expanded-watchlist-dry-run
2026-05-07 · Paperclip · Agent Zero · OpenHands · OpenClaw

Bitter needs a wrap, adapt, refuse decision for every frontier surface.

Run: 2026-05-07-expanded-watchlist-dry-run
2026-05-07 · Codex · Gemini CLI · Hermes Agent · OpenClaw

Persistent agent state is becoming a product surface

Control Plane
- Developers need to know which goals, memory patches, recaps, sessions, and skill maintenance loops shaped a serious run.
Run: 2026-05-07-commit-harvest-2026-04-23_2026-05-07-frontier-v1
2026-05-07 · Agent Zero · OpenHands · Paperclip · Codex

The agent interface is becoming a visible computer

Runtime
- A serious agent harness increasingly needs browser, desktop, file, runtime, sandbox, and artifact surfaces that can be inspected.
Run: 2026-05-07-commit-harvest-2026-04-23_2026-05-07-frontier-v1
2026-05-07 · Codex · Gemini CLI · OpenHands · OpenClaw · Paperclip · Agent Zero

Permissions, secrets, and sandboxes are moving into the foreground

Control Plane
- The harness must make trust state visible: what can be read, what can be changed, which credentials are exposed, and where execution happens.
Run: 2026-05-07-commit-harvest-2026-04-23_2026-05-07-frontier-v1
2026-05-07 · OpenClaw · Agent Zero · Hermes Agent · Pi Coding Agent · Gemini CLI · OpenHands

Accessibility is a frontier capability, not marketing polish

Platform
- Everyday adoption depends on setup recovery, visible progress, voice/chat surfaces, readable UI, OAuth clarity, and fewer dead ends.
Run: 2026-05-07-commit-harvest-2026-04-23_2026-05-07-frontier-v1
2026-05-07 · Paperclip · OpenHands · Hermes Agent · Codex · OpenClaw

Agent systems are growing control planes

Control Plane
- Once agents coordinate across tasks, runtimes, gateways, and integrations, operators need liveness, cost, role, session, and recovery controls.
Run: 2026-05-07-commit-harvest-2026-04-23_2026-05-07-frontier-v1
2026-05-07 · Pi Coding Agent · Hermes Agent · OpenClaw · Codex · Gemini CLI · OpenHands

Integrations are volatile; the operating loop has to be durable

Platform
- Provider lists, plugin systems, transports, and model profiles will keep changing.
Run: 2026-05-07-commit-harvest-2026-04-23_2026-05-07-frontier-v1
2026-05-06 · Codex

Worker-native goals unlock longer horizons.

Control Plane
- Operators now need to ask which durable objective the worker is pursuing, whether it is still aligned with the operator's charter, and how it maps to the current run mandate.
- Treat worker goals as first-class receipt fields: goal id, goal text, creation source, last update, status, scope, originating run, mapped charter, mapped mandate, and settlement status.
Run: 2026-05-06-manual-2026-04-22_2026-05-06-frontier-v0
2026-05-06 · Claude Code · Gemini CLI · Hermes Agent

Worker-native state is becoming a memory layer.

Control Plane
- Recaps, memory patches, skill curators, and task state are moving into worker tools. Operators should use them, but should preserve an operator-owned record of what state governed each run.
- Add worker-native state fields to adapter receipts: recap handles, memory patch ids, curator reports, skill reports, and resume state.
Run: 2026-05-06-manual-2026-04-22_2026-05-06-frontier-v0
2026-05-06 · Codex · Claude Code · Gemini CLI · Pi Coding Agent

Authority semantics are explicit but fragmented.

Control Plane
- Permission profiles, workspace trust, env loading, hooks, MCP behavior, extension schemas, and provider transports differ by worker and release.
- Bitter capability profiles should record worker-native permission and trust semantics instead of assuming a uniform authorization model.
Run: 2026-05-06-manual-2026-04-22_2026-05-06-frontier-v0
2026-05-06 · Claude Code · Codex · Gemini CLI · Hermes Agent

Verification is becoming a worker capability.

Control Plane
- Provider-native review, multi-agent execution, subagent evals, curator reports, and QA-like cloud fleets can catch useful issues, but their verdicts are not automatically the operator's truth.
- Treat worker verification as evidence inputs. BitterQA or the run contract should still own the final evidence standard and settlement.
Run: 2026-05-06-manual-2026-04-22_2026-05-06-frontier-v0
2026-05-06 · Codex · Claude Code · Gemini CLI · Hermes Agent · Pi Coding Agent

Plugin, extension, and skill ecosystems are becoming the integration surface.

Platform
- The practical power of worker CLIs increasingly depends on plugins, hooks, extensions, skills, and transport modules, not just the base model.
- Adapter receipts should include enabled plugin/extension/skill surfaces and should distinguish worker-local skills from Bitter-owned memory.
Run: 2026-05-06-manual-2026-04-22_2026-05-06-frontier-v0
2026-05-06 · Pi Coding Agent · Gemini CLI · Codex

Worker integrations are not durable doctrine.

Platform
- Pi removed built-in Gemini CLI and Antigravity support while adding many providers; Gemini preview/nightly channels differ materially; Codex alpha releases and app-server surfaces move quickly.
- Keep worker adapters thin, versioned, source-contracted, and replaceable. The stable Bitter asset is the run contract and receipt chain.
Run: 2026-05-06-manual-2026-04-22_2026-05-06-frontier-v0
2026-05-06 · Codex · Claude Code · Gemini CLI · Hermes Agent

Provider-native long-horizon state is now table stakes.

Control Plane

Run: 2026-05-06-candidate-2026-04-22_2026-05-06-frontier-v1