Run Receipt

2026-05-06-manual-2026-04-22_2026-05-06-frontier-v0

Status: gold
Window: 2026-04-22 to 2026-05-06
Signals: 6

Revision Reason

Manual gold run used as the source-backed reference for later candidate and goal-weighted versions.

Frontier Roll-Up: April 22-May 6, 2026

The last two weeks were not about one winning coding agent. They were about worker tools becoming fuller environments.

Codex added persisted goals, which is the most important signal in this window: worker-native objectives are beginning to survive beyond a single prompt or session. Around that, Codex also expanded permission profiles, plugin workflows, external session import, and multi-agent controls. Claude Code pushed cloud multi-agent review, session recaps, plugin distribution, hook behavior, MCP governance, and telemetry attribution. Gemini CLI hardened workspace trust and environment loading while experimenting with reviewable memory patches. Hermes introduced a background Curator for skill-library maintenance. Pi kept its minimal-harness posture while rapidly changing providers, transports, extensions, and terminal rendering.

The pattern is clear:

The frontier changes. Your loop should compound.

Bitter should use the strongest worker surface available at the moment. It should not confuse that worker surface with the operator's durable loop.

Main Signals

1. Worker-Native Goals Are Emerging

Codex /goal is the strongest signal in this window. It is not just memory. It is a worker-native objective register: a way for a coding worker to carry a durable direction of travel across longer arcs of work.

That unlocks more serious long-horizon work, but it also creates a new authority question. A provider-native goal can guide a worker, but it should not silently become the operator's charter, mandate, or memory.

Bitter should receipt worker goals explicitly and reconcile them against CHARTER.md, the run mandate, and the wake packet.

Signal: 2026-05-06-worker-native-goals

Supported by:

findings/codex.md

2. Worker-Native State Is Becoming a Memory Layer

Claude session recaps, Gemini Auto Memory, and Hermes Curator all point in the same direction: workers are learning how to carry context forward.

That is good. Bitter should leverage it.

But worker-native state should be receipted as worker-native state. It should not silently become the operator's only memory. A Bitter run should know which recap, memory patch, skill report, or resume state governed the work, and what crossed back into operator-owned receipts and wake packets.

Signal: 2026-05-06-worker-native-memory

Supported by:

findings/claude-code.md
findings/gemini-cli.md
findings/hermes-agent.md

3. Authority Semantics Are Getting Explicit, But Not Uniform

Codex expanded permission profiles and sandbox metadata. Gemini added secure .env loading, workspace trust, and shell allowlists. Claude's changelog moved around plugin archives, hooks, MCP retries, permission prompts, and subprocess attribution. Pi's provider and extension layers changed quickly.

This is exactly where operators get hurt if the system hand-waves.

Bitter capability profiles should record the worker's actual trust and permission state: version, channel, env policy, sandbox/profile, plugin set, MCP surface, transport, and credential posture.

Signal: 2026-05-06-fragmented-authority-semantics

Supported by:

findings/codex.md
findings/claude-code.md
findings/gemini-cli.md
findings/pi-coding-agent.md

4. Verification Is Moving Into the Workers

Claude /ultrareview is the clearest signal: provider-native cloud fleets can review branches and PRs. Codex multi-agent controls, Gemini subagent/eval work, and Hermes Curator reports all rhyme with it.

Bitter should treat these as valuable evidence producers, not as final truth. The run contract still needs to say what evidence proves progress, which verification surfaces were used, and what gets settled into memory.

Signal: 2026-05-06-worker-verification

Supported by:

findings/claude-code.md
findings/codex.md
findings/gemini-cli.md
findings/hermes-agent.md

5. Plugins, Extensions, and Skills Are the New Surface Area

Codex plugins, Claude plugins, Gemini extensions/MCP, Hermes skills, and Pi extension APIs are becoming the practical integration membrane.

That means Bitter adapters need to record enabled plugin/extension/skill surfaces. It also means BitterLearn should not ingest worker skills or memories as durable Bitter memory without settlement.

Signal: 2026-05-06-plugin-extension-skill-surface

Supported by:

findings/codex.md
findings/claude-code.md
findings/gemini-cli.md
findings/hermes-agent.md
findings/pi-coding-agent.md

6. Worker Integrations Are Not Doctrine

Pi removed built-in Gemini CLI and Antigravity support while adding many new providers. Gemini's stable, preview, and nightly channels differ materially. Codex alpha and app-server surfaces move quickly.

The durable layer is not a provider list. The durable layer is the run contract: charter, mandate, authority, execution, evidence, judgment, memory, and next run.

Signal: 2026-05-06-worker-integrations-not-doctrine

Supported by:

findings/pi-coding-agent.md
findings/gemini-cli.md
findings/codex.md

Source Notes

Codex

High-signal change: 0.128.0 made persisted goals first-class and expanded permission profiles, plugin workflows, imported sessions, and multi-agent controls.

Bitter action: define receipt fields for Codex goal/session ids, permission profiles, sandbox profiles, plugin state, imported sessions, and multi-agent config.

Claude Code

High-signal change: /ultrareview moved provider-native review into a cloud fleet, while session recaps, plugins, MCP, hooks, OTel, and permission tooling continued to mature.

Bitter action: treat Claude review as an evidence input that must be cited, compared, and settled rather than accepted as final truth.

Gemini CLI

High-signal change: workspace trust, secure env loading, shell allowlists, release-channel handling, MCP lifecycle fixes, and Auto Memory patch flow.

Bitter action: test and receipt trust state, env policy, release channel, MCP behavior, and reviewable memory semantics.

Hermes Agent

High-signal change: Curator now maintains the skill library on a schedule and emits run artifacts, while Hermes continues expanding messaging, integrations, runtime, and worker orchestration.

Bitter action: benchmark Curator as a worker-local self-improvement loop while keeping Bitter memory settlement operator-owned.

Pi Coding Agent

High-signal change: Pi continued rapid provider churn and extension/API evolution, including cached Codex websocket transport, TypeBox extension contracts, provider additions, provider removals, and terminal UX improvements.

Bitter action: keep Pi as a thin worker adapter with exact version, provider, transport, session, and extension metadata in receipts.

What Operators Should Do

Treat worker-native state as useful but not authoritative.
Treat persistent worker goals as mission registers that must be reconciled against the operator's charter and run mandate.
Record which worker goals, recaps, memories, plugins, skills, permission profiles, release channels, and transports were active during serious runs.
Prefer worker tools that expose their trust, sandbox, plugin, session, and verification state clearly.
Treat provider-native review as evidence, not final judgment.

What Bitter Should Do Next

Draft the adapter receipt vocabulary for worker-native state, permissions, plugin surfaces, verification outputs, and release-channel metadata.
Define worker_goal receipt fields and settlement rules for Codex /goal.
Build small probes for Codex /goal, Claude /ultrareview, Gemini workspace trust and memory patches, Hermes Curator output, and Pi session transport/extension metadata.
Create a worker capability matrix before deeper integration work.
Keep the public research loop conservative: no signal unless it can change the next action.

What Remains Uncertain

Whether provider-native state will be stable enough for long-horizon work or remain tool-local convenience.
Whether worker goals will remain single-worker convenience or become durable enough to coordinate ultra-long-horizon work under an operator charter.
Whether cloud/native review surfaces produce evidence that is inspectable enough for Bitter receipts.
Whether plugin and skill ecosystems will converge around common metadata or remain fragmented.
Which worker surfaces expose enough permission, session, plugin, transport, and release-channel state for trustworthy Bitter adapters.

Receipts

Primary receipts for this roll-up are preserved in the run manifest, source-specific findings, and structured signals under:

runs/2026-05-06-manual-2026-04-22_2026-05-06-frontier-v0/