Operator Brief
Governance moved from convention to enforcement this week — and durability followed.
- Upgrade / check
- Hermes v0.13.0 redacts secrets by default — verify log pipelines handle sanitized output. Signal
- Paperclip v2026.512.0 fixes an SSH host-env leak that forwarded API keys to remote targets. Treat as a security advisory. Signal
- Claude Code
worktree.baseRefdefaults to"fresh"(origin/default). Set"head"if you relied on local-HEAD branching. Signal
- Try
- Claude Code: dispatch a background session with
claude --bg, monitor viaclaude agents, set a/goalon a multi-step task. Signal - Hermes: lock
/goalon a multi-step task and observe the Kanban hallucination gate under real multi-agent workloads. Signal - OpenHands: enable
enable_sub_agentsin a multi-task session; measure whether sub-agent scoping reduces total cost or context accumulation. Signal
- Claude Code: dispatch a background session with
- Watch
- Cross-provider
/goalconvergence: Claude Code and Hermes shipped the same persistent-goal primitive within a week. Watch whether this becomes a stable abstraction or fragments by tool. Claude Code · Hermes - Default-closed governance pattern across providers (OpenHands sub-agents, OpenClaw archive uploads, Agent Zero ODF formats) — this is the bet for the next quarter, not a passing release-note theme.
- Cross-provider
- Uncertain
- Hermes Kanban hallucination gate: model-based, schema-based, or rule-based? False-positive rate under real multi-agent workloads not yet documented. Signal
- OpenHands critic calibration: what does a score of 0.4 mean operationally? When does
agent_behavioral_issuesfire versususer_followup_patterns? Signal - Gemini
RemoteSubagentProtocol: ships with tests, no observed remote target. Google-hosted or user-controlled infrastructure? Signal
Governance Becomes Enforcement
Five days, nine providers. The change that cuts across all of them is deceptively simple: governance is moving from convention to enforcement.
The older model was: the agent could do X, and operators relied on prompting, documentation, and trust to prevent the wrong X. The new model -- visible in at least four independent places this week -- is: the wrong X is structurally blocked, logged, or defaults to off.
Hermes made secret redaction the default, not an opt-in. Paperclip blocked agents from self-transitioning to review without a real review path. OpenHands defaulted sub-agent delegation to off and surfaced evaluation scores in the UI. Agent Zero defaulted document output to open formats and told agents to prefer named actions over coordinate clicks. These are different tools, different teams, and different architectures. The pattern is the same: risky behavior requires explicit enablement; safe behavior is what happens by default.
The other half of the week was about durability: agents that can stay on task
across turns, sessions, crashes, and context compression. Claude Code shipped a
full claude agents
supervisor surface and a /goal
command. Hermes shipped the same /goal primitive and backed it with a
Kanban board that enforces completion evidence before marking work done.
Gemini made sessions portable across machines. Agent Zero made desktop
sessions persistent across navigation.
These two themes -- governance as enforcement, long-horizon durability -- are not coincidental. You need both. Durability without governance means persistent agents doing the wrong thing persistently. Governance without durability means agents that are safe but cannot hold a goal long enough to finish anything.
Breaking Changes: Check These Before Upgrading
Hermes v0.13.0: secret redaction is now ON by default. If you have Hermes log pipelines that read raw agent output, they will receive sanitized logs after upgrade. This is the right call as a default; it is a breaking change for tooling that depends on unredacted output.
Paperclip v2026.512.0: SSH host environment was leaking. Before PR #5142, SSH remote execution forwarded the Paperclip host's environment variables -- including API keys and tokens -- to remote execution targets. Operators running SSH-managed agents should treat this as a security advisory and upgrade.
Hermes v0.13.0: Discord role allowlists are now guild-scoped. The prior behavior allowed a role match from any guild to authorize a cross-guild DM -- a CVSS 8.1 bypass. Discord operators using role-based access control should reverify their configuration.
Claude Code v2.1.x: worktree.baseRef now defaults to "fresh". New
worktrees now branch from origin/<default> rather than the local HEAD.
Operators who depended on new worktrees carrying unpushed local commits should
set worktree.baseRef: "head" explicitly.
Pi v0.74.0: package scope migration underway. The npm package is moving
from @mariozechner/pi-coding-agent to @earendil-works/pi-coding-agent.
Global installs: run pi update --self once the new package publishes.
CI, Dockerfiles, and package.json pins: update the reference manually.
Evidence Before Completion
Two providers shipped independent enforcement of the same principle this week: agents cannot self-attest that work is complete.
Hermes's Kanban board now requires workers to have valid card references before a task card moves to done. The hallucination gate verifies that cards a worker claims to have created actually exist and belong to that worker -- blocking phantom references and cross-worker card claims. Workers that exit without completing are auto-blocked. Heartbeats detect stale workers; zombie processes are detected on both platforms. Per-task retry budgets prevent silent cascades.
Paperclip's control-plane fix (PR #5292) blocks agents from
self-transitioning an issue to in_review state. The in_review transition
now requires a real review precondition, not just a model deciding it is ready
for review.
These are different mechanisms -- Hermes's is multi-agent coordination, Paperclip's is a state-machine gate -- but the observation is the same: agent claims about their own completion are not sufficient evidence of completion. The system needs to verify independently.
For operators building multi-agent workflows: the completion contract is now part of the orchestration contract, not just the prompt.
Long-Horizon Durability
The week's most operator-visible features are all about agents staying on task.
Claude Code's
claude agents
supervisor view shows every session by state -- working, waiting on you,
done, failed -- with background sessions running under a persistent supervisor
process that survives terminal close. Sessions isolate to separate git
worktrees automatically. You can dispatch from the prompt, background an
active session with one keystroke, and reply to blocked sessions from a peek
panel without attaching. Alongside it, the
/goal command sets a
completion condition that Claude tracks across turns until met.
Hermes's /goal Ralph loop does the same thing at the session level, backed
by the Kanban reliability primitives above. Lock the agent onto a target and
it persists across context compression, turn budgets, and branching. The
Kanban layer handles the multi-agent case: workers pick up tasks, execute
them, and cannot mark themselves done without evidence.
Gemini CLI's session export/import makes sessions portable: export a session, move it to another machine, import and continue. State crosses as a serializable object rather than ambient context.
Agent Zero's persistent desktop lifecycle (v1.13) changes the semantics of the desktop environment: a single Xpra XFCE session stays alive across canvas navigation, modal switches, and keepalive hosts. Explicit shutdown is distinguished from crashes; unsafe affordances are hidden. The desktop is now a persistent surface, not one that resets on navigation.
Authority Made Visible
Three separate tools shipped changes this week that make permission state observable at a glance.
Codex's TUI now shows
permissions and approval-mode
as separately configurable status-line items. The most common operator surprise
before this -- forgetting which permission posture is active before an
irreversible command -- is now a visual check.
Claude Code's claude agents supervisor makes session state visible: working,
waiting, done, or failed. A single panel replaces five terminal windows. The
live overlay on /goal tracks elapsed time, turns, and tokens consumed.
OpenHands's new critic evaluation display shows a score (0--1), star rating
(0--5), and color-coded bands in the GUI for every completed session:
agent_behavioral_issues, user_followup_patterns, and
infrastructure_issues. The display is deployment-controlled via
OH_ENABLE_CRITIC_BY_DEFAULT (disabled by default). When enabled, it creates
a feedback loop that doesn't exist when evaluation lives in logs: users see
when sessions are degrading in real time.
Agent Zero's
Linux Desktop skill
takes this in a different direction: it tells the agent to use named
structured actions (cell_edit, app_launch, form_submit) and treat
coordinate clicks (click(x=423, y=187)) as a last resort. The principle is
audit clarity. cell_edit(B3, 42) is meaningful; a coordinate click is not.
An action that can be named and described is easier to verify, replay, and
record than one that can only be described by its position.
Default-Closed Governance
The week also continued a trend across the watchlist: sensitive capabilities default to closed, and operators must explicitly enable them.
OpenHands's sub-agent delegation (enable_sub_agents) defaults to off. Behind
the gate, the orchestrator routes tasks to specialized sub-agents -- a bash
runner, a code explorer, a web researcher -- each with tool surfaces defined by TaskToolSet rather than full access. The default-off choice is right: routing
work to specialized agents changes session scope, cost, and authority in ways
that require deliberate operator decision.
OpenClaw's skill archive upload gate (skills.install.allowUploadedArchives)
defaults to closed. Trusted Gateway clients can stage and install zip-backed
skills only when the operator explicitly enables the flag. OpenClaw keeps
repeating this pattern: code-execution surfaces are opt-in, explicit, and
documented as requiring trust.
Agent Zero's ODF-first document default (v1.13) inverts the prior assumption: document artifacts now default to ODT/ODS/ODP (open formats) rather than DOCX/XLSX/PPTX. OOXML compatibility requires explicit opt-in. For operators with downstream workflows expecting Office XML output, this is a change to verify before upgrading.
Provider Notes
Claude Code (v2.1.139) adds settings.autoMode.hard_deny: hard blocks that
no allow rule can override. The
continueOnBlock option for PostToolUse hooks feeds the rejection reason back
so Claude can adapt rather than just stop. API key auth now disables Remote
Control, /schedule, and claude.ai MCP connectors -- operators using API key
auth should audit reliance on those surfaces.
OpenClaw (v2026.5.10 beta) adds per-agent
message send restrictions
(tools.message.crossContext, tools.message.actions.allow) that let you
deploy a sandboxed agent that can only reply in the thread it was addressed
in. Memory auto-promotion is now bounded: the dreaming process compacts the
oldest sections when the budget is reached, while preserving user-authored
notes. Transcript reads are now streaming; peak memory for a long session
dropped roughly 90%.
Paperclip (v2026.512.0) adds secrets provider vault configuration with
AWS Secrets Manager as the first remote-import backend. The database gains
secret_access_events and company_secret_provider_configs tables. The new
cursor_cloud adapter routes
work to Cursor's hosted-agent platform.
Agent Zero (v1.11--v1.13) completes what it calls the "visible computer":
browser with
multi-tab parallel fanout,
LibreOffice desktop via Xpra/XFCE, and a persistent desktop session. The
multi browser action fans out reads or mutations across tabs in a single
tool call with parallel execution.
Gemini CLI (v0.41.0) adds a pluggable
AgentProtocol
with local and remote backends, forcing the "where does delegated work
actually run" question into a surface that can be inspected and configured.
Workspace trust now enforces in headless mode; shell command validation gains
a core-tools allowlist.
Pi coding agent (v0.74.0) migrates from badlogic/pi-mono to the Earendil
Works organization. JSONC parsing for models.json is new (comments and
trailing commas now valid).
What To Try
- Hermes operators: verify your log pipeline handles sanitized output before upgrading to v0.13.0. Redaction is now on.
- Paperclip operators running SSH: upgrade before deploying new remote agents. The host env isolation fix is silent in prior versions.
- Claude Code: dispatch a background session with
claude --bg "<prompt>", useclaude agentsto monitor, and test peek/reply from the list. Set a/goalon a multi-step task and inspect the turn/token overlay. - OpenHands: enable
enable_sub_agentsin a multi-task session. Observe whether sub-agent scoping reduces total session cost or context accumulation. - Agent Zero: create a Writer document and confirm the output is ODT (not DOCX) in v1.13+. Verify your downstream tooling handles ODT, or explicitly configure OOXML output.
- Codex: add both
permissionsandapproval-modeto your status line if you run multiple permission profiles.
What Remains Uncertain
- Hermes Kanban hallucination gate: what does verification involve? Is it model-based, schema-based, or rule-based? The gate's false-positive rate under real multi-agent workloads is not yet documented.
- Paperclip
in_reviewgate: what constitutes a "real review path"? The PR notes do not define whether a human reviewer, an automated review step, or a configured participant list is required. - OpenHands critic calibration: what does a score of 0.4 mean operationally?
When does
agent_behavioral_issuesfire versususer_followup_patterns? The calibration methodology is not yet documented. - Gemini
RemoteSubagentProtocol: ships with tests but no observed remote target. Whether the remote execution surface runs on a Google-hosted infrastructure or a user-controlled one is not yet established. - Claude Code
/ultrareview: the research preview returns verdicts to CLI/Desktop but the output schema is not documented. How should a CI pipeline ingest or route the findings? - Agent Zero desktop state: is there a session timeout, an idle cleanup, or a storage limit for persistent Xpra sessions? Or does the operator manage cleanup entirely manually?
- OpenClaw skill archive trust model:
skills.install.allowUploadedArchivesis opt-in, but signature checking and sandbox isolation for uploaded archives are not yet documented.
Top signals from this issue
- Hermes Agent Durable Kanban with hallucination gate, redaction-on-by-default, channel allowlists
- Paperclip Secrets provider vaults (AWS Secrets Manager), host env isolation fix, cursor_cloud adapter
- Claude Code Agent view, goal completion, and governance hardening
- OpenHands Sub-agent delegation (opt-in) and critic evaluation GUI
- Agent Zero ODF-first document defaults, persistent desktop lifecycle, multi-tab browser fanout
- OpenClaw Per-agent message restrictions, gated code install, and onboarding wayfinding
- Gemini CLI Subagents become pluggable; sessions become portable
- Codex Permissions glance surface and role-aware plugin sharing
- Pi Coding Agent Package scope migration to earendil-works; harness SDK stream config
What we didn't promote
Findings observed during this cycle that did not rise to top-tier signal — surfaced here for restraint, not silence.
- codexPreToolUse hooks can now rewrite tool inputs — accepted as a signal but kept off the top tier. The behavior is narrow to hook authors who already use
updatedInput; broader operators don't act on it this week. - gemini-cliLegacy session-resume reliability fix. Important for operators with archived session JSON, but a bug-fix release rather than a directional change — kept on the profile, not in the weekly top.
- flueFlue's initial profile and v0.5.3 observability wave. Category-establishing for the watchlist rather than a behavior change for current operators — the profile carries it; the weekly digest doesn't lead with it.
- hermes-agentGraceful disable when
mistralaiwas quarantined on PyPI. Real, but a response to an external event, not a Hermes capability change. Captured on the profile rather than promoted as a weekly headline.
Providers covered
This digest was produced by the Bitter autonomous research loop.
Sources
Primary links, including exact changelog lines when available.
- releasev0.41.0 releasegoogle-gemini/gemini-cli · v0.41.0lineSecure .env loading and workspace trustgoogle-gemini/gemini-cli · docs/changelogs/preview.md#L37-L38lineShell validation and core tool allowlistgoogle-gemini/gemini-cli · docs/changelogs/preview.md#L35-L36lineAuto-memory scratchpadgoogle-gemini/gemini-cli · docs/changelogs/preview.md#L70-L72
- releasev2026.4.30 releaseNousResearch/hermes-agent · v2026.4.30lineCurator release summaryNousResearch/hermes-agent · RELEASE_v0.12.0.md#L6-L12lineCurator feature detailsNousResearch/hermes-agent · RELEASE_v0.12.0.md#L58-L64lineSelf-improvement loop detailsNousResearch/hermes-agent · RELEASE_v0.12.0.md#L71-L77
- linev0.73.0 changelog highlightsbadlogic/pi-mono · packages/coding-agent/CHANGELOG.md#L3-L9lineOpenAI Codex websocket transport and compact rendering fixesbadlogic/pi-mono · packages/coding-agent/CHANGELOG.md#L25-L31lineRemoved Gemini CLI and Antigravity supportbadlogic/pi-mono · packages/coding-agent/CHANGELOG.md#L68-L79lineProvider timeout/retry controlsbadlogic/pi-mono · packages/coding-agent/CHANGELOG.md#L198-L209
- commit_diff_reviewedRecover externalized channel plugin from stale configgithub.com/openclaw/openclaw/commit/329580c64d13657592c3fabb97ff567c2e292bb6commitLabel Claude CLI OAuth statusgithub.com/openclaw/openclaw/commit/2b4b60b5514b47d8e242b9b11d9b395037e6674bcommitPrevent Discord voice self-feedbackgithub.com/openclaw/openclaw/commit/1c2832526f65cf23b469e9a1dc5694915c5be548commitHonor Telegram access group allowlistsgithub.com/openclaw/openclaw/commit/b6ae0b83a61a1f779ee41b5d639b6049bfd422cecommitDocument sub-agent security boundariesgithub.com/openclaw/openclaw/commit/33b112ad314dc8d9dfe0f5a68caed4811a23245acommitBound live exec output eventsgithub.com/openclaw/openclaw/commit/3ee7c02bcacfdf6327747c1fe24dd6d11de8612acommitCoarse agent turn timeline spansgithub.com/openclaw/openclaw/commit/61223a74a43fd8768c426d5b22f1633dbad37477commitShow Codex tool progress in channel draftsgithub.com/openclaw/openclaw/commit/3f210b10ce3a19ef6a04205aa7420353945567a2
- commit_diff_reviewedAdapters declare runtime command spec for remote provisioninggithub.com/paperclipai/paperclip/commit/90631b09b36fa028ad24ca5375bfa50e3602799ccommitFix remote workspace environment shapinggithub.com/paperclipai/paperclip/commit/856c6cb192e53a992875821297b5fd8d29c95c2dcommitAdd sandbox callback bridge for remote environment API accessgithub.com/paperclipai/paperclip/commit/a4ac6ff133fbe8bdb82f4046fda85f7cb372b6a9commitAdd E2B sandbox provider plugingithub.com/paperclipai/paperclip/commit/4ef969f0840810527333aa6ee44fed89f4551f7ccommitIssue cost summariesgithub.com/paperclipai/paperclip/commit/c4269bab59fff7a73ff31797578cc97ece7f160fcommitFirst-class security agent rolegithub.com/paperclipai/paperclip/commit/c036bbfa98494dcfe2521aab65019a4cd021c769commitPause and resume sidebar agentsgithub.com/paperclipai/paperclip/commit/43b0f2ae582b18f2872ae60bf468f54b99b614ba
- commit_diff_reviewedReplace browser-use agent with native browsergithub.com/agent0ai/agent-zero/commit/983d431a5eb785eb9deba9fdfd471fa93f349603commitPersistent full Chromium runtime for Browsergithub.com/agent0ai/agent-zero/commit/fa7eef1919901093b117a98ad6e402d809687cf6commitBrowser multi-tab awareness and modifier-key clickgithub.com/agent0ai/agent-zero/commit/5012dd3128aa6218cc55f6cbce8be42b2db2fee4commitBrowser screenshot previews in tool messagesgithub.com/agent0ai/agent-zero/commit/c2fb2c3c94e1e1c85b783252332b3fc003f39f2bcommitLinux Desktop skill controlsgithub.com/agent0ai/agent-zero/commit/62ac20e7b248179825e05664c1df97ebc6214c54commitDesktop document canvasgithub.com/agent0ai/agent-zero/commit/24dd548ebf221e397323b5aa3a509f037fb1b9aecommitOAuth disconnect and remaining quota visibilitygithub.com/agent0ai/agent-zero/commit/0da8f3dc2b640efbce22499053507837101fdf6f
- commit_diff_reviewedStrengthen log redaction for API keysgithub.com/OpenHands/OpenHands/commit/61e3dc2cadbefd4e0649b7c141ac2335c021ad2bcommitRemove debug log exposing hook_config secretsgithub.com/OpenHands/OpenHands/commit/0c6c461555f8651347ed140f1c555ff8a88ddf56commitExpose sandbox grouping strategy UIgithub.com/OpenHands/OpenHands/commit/90cf5f8003c247597481bcbef9a5aa73eb899e10commitProxy Tavily MCP through app servergithub.com/OpenHands/OpenHands/commit/949a15a560ef90cd3dd7f18baf6955430401edb4commitMove server content to app_servergithub.com/OpenHands/OpenHands/commit/5232d96dab0ca98e691d6307bd0759e943220d1ccommitInject user secrets into ACP subprocess envgithub.com/OpenHands/OpenHands/commit/cf156b0073350ca8e93067bc2f4ae18b90537a0acommitSelf-hosted GitLab supportgithub.com/OpenHands/OpenHands/commit/4e63531fa6595ec55102f08ef129845931fcd8ffcommitRemoved V0 runtimegithub.com/OpenHands/OpenHands/commit/e86067c15b54242fd611877aa9038a2f7a219658