Backstage
Backstage: Protected on Paper (2026-06-16 .. 2026-06-23)
Internal product-intake companion to the public digest. Not for publication. What Bitter and Factory should learn from this window's research.
The load-bearing intake: channel is a capability-profile property
Two consecutive windows now show the same thing: a large share of the sharpest authority and security work merges to a default branch (or a preview tag, or a later version) without reaching the binary an operator runs. This is not noise; it changes what a capability profile means.
Bitter implication. A capability profile keyed on "the project added X"
silently over-credits the deployed surface. Bitter's frontier-signal schema
should carry an explicit channel field (tagged-release | main-unreleased | preview-or-beta) on every capability claim, and capability-profile assumptions
should default to the tagged state unless the operator is known to run main.
This window: OpenHands' entire enterprise/security cluster (two windows
unreleased), Gemini's skill path-traversal fix (preview-only, second window),
Hermes' MCP-persistence wave (main, past v0.17.0), Paperclip's newest controls
(master, past v2026.618.0), and Flue's private-by-default observability (staged
in an Unreleased changelog section). An adapter that assumes a main-merged fix is
in force will mis-model the deployment.
Declared vs enforced: test the boundary, do not trust the note
Claude Code disclosed that two announced authority features — the 5-level
subagent depth cap and Agent(type)/Agent(x,y) permission rules — did not
actually bind until fixes this window. The lesson generalizes: a permission
feature is not a permission boundary until something refuses the disallowed
action.
Bitter implication. BitterBench / capability-probe work should test
authority boundaries rather than ingesting changelog claims: write an
Agent(type) deny and confirm a named spawn is refused; grant a Codex approval
in one environment and confirm it does not leak to another. An "enforcement
verified" bit on authority claims is worth more than the presence of the
feature. This is a candidate eval pattern, not just a profile note.
Identity planes are splitting — a new credential membrane
OpenHands decoupled API-key auth from Keycloak sessions (machine identity no
longer dies with the human SSO session) and generalized a per-user secret
enricher that injects linked OAuth tokens into sandboxes from any conversation
start path. Hermes added a root-owned, user-immutable /etc/hermes managed
scope.
Bitter implication (BitterPass / BitterGrid). The "which credential follows which principal into the sandbox" question is now a first-class membrane concern. If Bitter wraps these harnesses, the secret-injection paths (web/Slack/API start paths carrying different linked tokens) are exactly where a wake-packet's credential scope must be explicit. Watch the machine-identity / human-SSO split as a pattern Bitter likely needs to mirror rather than inherit.
Runaway-cost ceiling: a gap Bitter should own, not borrow
Hermes shipped background fire-and-forget fan-out subagents with the default wall-clock timeout removed; a heartbeat/inactivity backstop remains but a busy runaway worker has no wall-clock or cost bound.
Bitter implication (run contract / BitterGrid). Do not rely on the harness for a spend ceiling. A Bitter run contract should impose a wall-clock and cost bound it owns and can enforce/replay, independent of whether the wrapped harness has one this release.
Factory relevance
- Paperclip budget enforcement (#8347, master-unreleased) moves budget from surfacing to preflight enforcement (cancel queued work before an adapter starts). This is the closest external mirror of Factory allocation discipline seen on the watchlist; track whether it tags and how the caps are scoped.
- Paperclip task watchdog (#8339) — recovery/status actors structurally
cannot mutate approvals — is the "review actor narrower than work actor"
primitive Factory accountability wants.
factory_relevance: medium. - The Hermes exposed-control-plane → root agent → MCP-persistence failure
mode is workcell-doctrine intake (BitterGrid): a startup posture audit + IOC
blocklist is a replayable, auditable control Bitter can own and compare across
harnesses.
factory_relevance: lowbut workcell_relevance: high. - Most of this window's signals are
factory_relevance: none. Do not force the channel-gap thesis into an allocation story; it is a research-quality and capability-modeling lesson first.
Council / doctrine follow-up
- This window strongly motivates amendment 007 (
security_advisorydeployment-class scope): every advisory here is sharply scoped — "shared-pool operators," "if you expose a dashboard," "builds from main," "stable users installing third-party skills." The flat boolean would over-claim each one. Recommend prioritizing 007 for the next ratification pass. - The channel-as-evidence finding is drafted as amendment 010 (proposed) this
cycle — see
charter/proposed/. It is the standout doctrine signal of two consecutive windows. - The Hermes single-source campaign is a good template for an extraordinary-claim-attribution rule (attribute, do not assert, when the only source is project-controlled). Surfaced in the audit; not yet drafted.
Run-quality notes
- Harness: 10 Opus harvesters (sub-spawn authorized) → 5 Opus adversarial verifiers (one dedicated to the lede) → coordinator synthesis. The verify stage caught the Hermes single-source overclaim, the Flue staged-vs-shipped misframe, an OpenHands SHA transcription error, and two Claude Code version/framing precision fixes before publication. Recommend standing.
- Receipts and channel resolved by git ancestry, not date inference.