Backstage

Backstage: 2026-06-04 to 2026-06-16

Internal product intake for Bitter and Factory. Not part of the public digest.

What Bitter should test next

Channel-aware capability detection. This window proved that "merged" and "released" diverge widely (Hermes, Paperclip, OpenHands, Gemini all shipped sharp security work to a default branch only). A Bitter adapter that reports a provider's posture from release notes alone will be wrong. Bitter should detect capability/fix presence by the artifact it actually runs (tag, commit, image digest), not by changelog text.
Recursive-delegation supervision. Claude Code (subagents 5 deep, classifier gating spawns) and Hermes (fire-and-forget background subagents, default timeout removed) both moved the unit of work below the top agent. Bitter's run-contract and receipt model should treat a delegated subtree as the unit, and test whether a background child's result re-entering as a new turn is captured in the receipt trail.
Untrusted-input authority presets. Paperclip low-trust review containment, Pi project trust, and Gemini's skill path-traversal fix are the same primitive: a trust boundary around input the agent did not author. Bitter should test a "low-trust" run-contract posture and whether its own skill/MCP install paths are an untrusted-input boundary (they are).

Argument-aware permissions. Claude Code Tool(param:value) is a more expressive permission grammar than per-tool allow/deny. Bitter's permission model should be able to express argument-level rules (e.g. model-tier caps in delegated calls) or it will under-govern delegation trees.
Model-allowlist binding. Claude Code enforceAvailableModels plus the bypass-cluster fixes make org model allowlists actually binding, including against the default model. Bitter should verify its own model-routing respects an org allowlist on every path (default, env override, subagent, advisor).
ACP-as-shell. OpenHands ACP model switching reaching Docker/cloud, and Codex/Claude Code/Gemini being fronted under it, continue the "one harness wraps another's agent" pattern. Bitter should decide which side of that boundary it wants to be on and how credentials cross it (the in-window MCP-key and ACP-error-body plaintext fixes show the boundary is leaky by default).

Agent Zero's Remote Control CSRF and WebSocket-origin hardening is directly relevant to how Grid reasons about origin trust when a leased workcell is exposed over a tunnel: trust only the active tunnel origin, normalize before allowlisting.
Hermes's "an unpaired write deny is theater" and the cp-into-.ssh gap are a reminder for Grid's own egress/write guardrails: enumerate the syntactic variants of a denied operation, or the deny is advisory.

Paperclip's repositioning (zero-human companies -> manage agents for work) plus its human board visibility and audited recovery action is a calibration data point for Factory: the autonomous-company metaphor is being repriced as human-in-the-loop operating software. Factory's allocation/accountability model should assume a human approver in the loop, not its absence.
OpenHands concurrency limits and BYOK gating show the productized platform turning concurrency and model access into governed, billable resources with per-org/per-user knobs. Useful prior art if Factory ever meters run concurrency or model access per operating context.
Factory relevance is otherwise low this window; the dominant story is provider-side authority, not allocation.

Receipt confidence is high (every load-bearing claim adversarially re-fetched), but several signals are channel-tagged main-unreleased: their operator value is conditional on running main, and they should be re-checked when the carrying release lands.
The Pi harvester's full report was lost to a subagent final-message gap and recovered by direct re-fetch (see audit.md, item 3).