A source contract is the public commitment that defines where the loop looks for evidence, what
it accepts as a finding, and what it refuses before a profile or digest can carry the claim.
Each card below shows the contract as the loop sees it. Every field is rendered from
sources/[id].yml as-is — no editorial overlay.
codex · active · tier 1 · daily
Codex · OpenAI
Watch Codex as provider-native frontier capability, not just as an open-source CLI. Pay special attention to features that change Bitter's wrapper posture: long-horizon work, goals, subagents, workflows, sandboxing, permissions, AGENTS.md behavior, skills, plugins, MCP, browser/computer-use surfaces, non-interactive execution, SDKs, cost reporting, and enterprise governance.
Primary surfaces
-
-
watch: releases · new features · improvements · bug fixes · breaking changes
-
watch: cli · app · ide extension · web · workflows · subagents · sandboxing · memories · commands · agents md · mcp · plugins · skills · authentication · approvals · security · governance · automation
-
watch: releases · tags · commits · pull requests · issues · docs · examples · security
-
watch: package version · publication date · install surface
Accepts as evidence
- official changelog
- official docs
- github release
- tagged release
- maintainer commit
- merged pr
- official blog or developer post
- package registry release
- reproducible local probe
Refuses to promote
- unsourced social claim
- third party summary without primary link
- speculation
- stale model memory
- benchmark claim without method
- duplicate commentary
Default actionability
- release
- test
- docs change
- observe
- security change
- test
- breaking change
- adapt
- ecosystem package
- observe
- pricing or usage change
- observe
High-signal patterns
goal · long-horizon · subagent · memory · workflow · sandbox · approval · permission · command · hook · MCP · plugin · skill · AGENTS.md · local environment · browser · computer use · automation · non-interactive · SDK · cost · usage
Discovery state
last verified: 2026-05-06 · manual web · high confidence
- Which GitHub releases, tags, and npm package versions should be treated as canonical when they disagree with the official Codex changelog?
- Which provider-native long-horizon features should Bitter explicitly detect through local probes rather than relying on release notes?
claude-code · active · tier 1 · daily
Claude Code · Anthropic
Watch Claude Code as a fast-moving provider-native coding environment with strong session, hook, plugin, skill, permission, and enterprise surfaces. Its changelog is granular; promote findings only when they change how developers should run it, trust it, review its output, or wrap it inside a longer-lived project workflow.
Primary surfaces
-
-
watch: releases · new features · improvements · bug fixes · breaking changes
-
watch: notable features · examples · operator guidance · weekly rollups
-
watch: memory · hooks · slash commands · plugins · skills · subagents · permissions · settings · sandboxing · mcp · sdk · headless · telemetry · enterprise
-
watch: package version · publication date · install surface
Accepts as evidence
- official changelog
- official docs
- official whats new
- package registry release
- maintainer authored post
- reproducible local probe
Refuses to promote
- unsourced social claim
- third party summary without primary link
- speculation
- stale model memory
- benchmark claim without method
- duplicate commentary
Default actionability
- release
- test
- docs change
- observe
- security change
- test
- breaking change
- adapt
- ecosystem package
- observe
- pricing or usage change
- observe
High-signal patterns
recap · resume · rewind · plan · subagent · task · hook · permission · managed setting · plugin · skill · slash command · MCP · SDK · headless · telemetry · prompt caching · usage · model picker · enterprise
Discovery state
last verified: 2026-05-06 · manual web · high confidence
- Which GitHub source backing the published changelog should be captured directly in addition to the rendered official docs?
- Which Claude Code behaviors should be probed locally because the changelog is too granular to imply operator impact by itself?
gemini-cli · active · tier 1 · daily
Gemini CLI · Google
Watch Gemini CLI as a large open-source terminal agent with rapid release channels, explicit context-file behavior, tool and extension surfaces, checkpointing, sandboxing, IDE/GitHub integrations, and Google account or Vertex/enterprise authentication paths. Separate stable operator guidance from preview/nightly churn.
Primary surfaces
-
-
watch: releases · tags · commits · pull requests · issues · discussions · docs · roadmap · security
-
watch: stable · preview · nightly · breaking changes · security
-
watch: changelog · installation · authentication · configuration · commands · context files · checkpointing · tools · mcp · extensions · headless · ide · sandboxing · trusted folders · enterprise · telemetry
-
watch: package version · publication date · dist tags
Accepts as evidence
- official docs
- github release
- tagged release
- maintainer commit
- merged pr
- security advisory
- package registry release
- official google post
- reproducible local probe
Refuses to promote
- unsourced social claim
- third party summary without primary link
- speculation
- stale model memory
- benchmark claim without method
- duplicate commentary
Default actionability
- release
- test
- docs change
- observe
- security change
- test
- breaking change
- adapt
- ecosystem package
- observe
- pricing or usage change
- observe
High-signal patterns
checkpoint · resume · context file · GEMINI.md · tool call · shell · web fetch · search grounding · MCP · extension · sandbox · trusted folder · permission · IDE · GitHub Action · output format · stream-json · authentication · enterprise · telemetry · preview channel · security
Discovery state
last verified: 2026-05-06 · manual web · high confidence
- Should nightly and preview releases be harvested into findings or only used for adapter-probe canaries?
- Which security advisories should be treated as direct signals even when they do not change public docs?
hermes-agent · active · tier 1 · daily
Hermes Agent · Nous Research
Hermes should be watched as a broad self-improving agent platform, not just as a coding CLI. Pay special attention to memory, skills, automations, messaging surfaces, subagents, sandboxing, runtime portability, and research trajectory generation. Bitter's opening is the project workflow around tools like this: permissions, evidence, review, memory, and what the next run should know.
Primary surfaces
-
-
watch: releases · tags · commits · pull requests · issues · docs · examples · security
-
watch: release notes · breaking changes · migration notes · security
-
watch: installation · configuration · tools · toolsets · memory · skills · mcp · messaging · cron · security · terminal backends · architecture · context files · llms txt
Accepts as evidence
- official docs
- github release
- tagged release
- maintainer commit
- merged pr
- maintainer authored post
- reproducible local probe
Refuses to promote
- unsourced social claim
- third party summary without primary link
- speculation
- stale model memory
- benchmark claim without method
- duplicate commentary
Default actionability
- release
- test
- docs change
- observe
- security change
- test
- breaking change
- adapt
- ecosystem package
- observe
- pricing or usage change
- observe
High-signal patterns
memory · skill · self-improvement · subagent · delegate · toolset · terminal backend · sandbox · container · SSH · Modal · Daytona · cron · messaging gateway · Telegram · Discord · Slack · MCP · context file · SOUL.md · llms.txt · trajectory · RL
Discovery state
last verified: 2026-05-06 · manual web · high confidence
- Which docs domain should be considered canonical if GitHub README links and deployed docs diverge?
- Which social or Discord announcements are maintainer-authored enough to include, and how should they be cited?
pi-coding-agent · active · tier 1 · daily
Pi Coding Agent · earendil-works / Mario Zechner
Watch Pi as a minimal, extensible terminal coding harness. It is important partly because of what it chooses not to include by default: subagents, plan mode, permission popups, MCP, and other governance features. That deliberate minimalism clarifies Bitter's wedge as the project workflow around coding agents: durable goals, permissions, evidence, verification, and memory.
Primary surfaces
-
-
watch: positioning · installation · modes · providers · design principles · package ecosystem
-
watch: quickstart · usage · sessions · context files · system prompt files · compaction · skills · extensions · prompt templates · themes · packages · rpc · sdk · providers · settings
-
watch: releases · tags · commits · pull requests · issues · packages coding agent · docs · examples
-
watch: package version · publication date · dist tags
Accepts as evidence
- official docs
- official site
- github release
- tagged release
- maintainer commit
- merged pr
- package registry release
- maintainer authored post
- reproducible local probe
Refuses to promote
- unsourced social claim
- third party summary without primary link
- speculation
- stale model memory
- benchmark claim without method
- duplicate commentary
Default actionability
- release
- test
- docs change
- observe
- security change
- test
- breaking change
- adapt
- ecosystem package
- observe
- pricing or usage change
- observe
High-signal patterns
extension · skill · package · prompt template · theme · session tree · branch · share · export · AGENTS.md · SYSTEM.md · compaction · dynamic context · RPC · SDK · json mode · provider · login · permission · sandbox · MCP · subagent · plan mode
Discovery state
last verified: 2026-05-12 · manual web · high confidence
- Which package-registry or package-index surface should be watched for Pi extension ecosystem movement?
openclaw · active · tier 1 · daily
OpenClaw · OpenClaw
Watch OpenClaw as the accessibility calibration source for the agentic harness frontier. Its most important lesson may be product posture: making autonomous agent work feel reachable to everyday people. Pay special attention to onboarding, gateway surfaces, familiar channels, visual state, permissions, and any design move that hides setup complexity without hiding authority.
Primary surfaces
-
-
watch: releases · tags · commits · pull requests · issues · docs · examples · security
-
watch: getting started · gateway · installation · configuration · channels · plugins · skills · permissions · memory · remote access · security · mobile or desktop surfaces
-
watch: onboarding · setup steps · first run · user workflow
Accepts as evidence
- official docs
- github release
- tagged release
- maintainer commit
- merged pr
- maintainer authored post
- reproducible local probe
Refuses to promote
- unsourced social claim
- third party summary without primary link
- speculation
- stale model memory
- benchmark claim without method
- duplicate commentary
- seo clone or mirror
Default actionability
- release
- test
- docs change
- observe
- security change
- test
- breaking change
- adapt
- ecosystem package
- observe
- accessibility change
- study
Research lenses
- accessibility
- distribution surface
- everyday use
- gateway
- authority visibility
High-signal patterns
onboarding · setup · gateway · visual surface · desktop · mobile · channel · notification · remote access · everyday user · natural language workflow · permission · approval · visibility · handoff · plugin · skill · daemon · background agent · long-running task · memory
Discovery state
last verified: 2026-05-07 · manual web · medium confidence
- Which OpenClaw release surface should be treated as canonical if docs and GitHub move at different speeds?
- Which user-facing gateway surfaces are official product posture rather than experimental examples?
- Which security and authority boundaries are visible enough for everyday users to understand?
paperclip · active · tier 1 · daily
Paperclip · Paperclip
Watch Paperclip as the coordination and economic-control-plane source. Its relevance to Bitter is the Factory question: can agent work be organized into goals, roles, budgets, accountability, approvals, and operating state without becoming theater?
Primary surfaces
-
-
watch: positioning · onboarding · governance · company model · pricing · demos
-
watch: setup · goals · agents · teams · governance · budgets · accountability · approvals · integrations · security
-
watch: releases · tags · commits · pull requests · issues · docs · examples · security
Accepts as evidence
- official docs
- official site
- github release
- tagged release
- maintainer commit
- merged pr
- maintainer authored post
- reproducible local probe
Refuses to promote
- unsourced social claim
- third party summary without primary link
- speculation
- stale model memory
- benchmark claim without method
- duplicate commentary
Default actionability
- release
- test
- docs change
- observe
- security change
- test
- breaking change
- adapt
- ecosystem package
- observe
- governance change
- study
Research lenses
- coordination control plane
- economic governance
- accountability
- multi agent operations
- factory analogue
High-signal patterns
company · org chart · goal · budget · role · manager · employee · approval · governance · accountability · cost · task queue · progress · audit · session · dashboard · multi-agent · agent team
Discovery state
last verified: 2026-05-07 · manual web · medium confidence
- Which source is canonical for product changes if the public site, docs, and GitHub repository diverge?
- How much of the company/control-plane metaphor is backed by durable operating state versus UI framing?
- Which governance and budget primitives are enforceable rather than descriptive?
agent-zero · active · tier 1 · daily
Agent Zero · agent0ai
Watch Agent Zero as the workcell-autonomy source. Its relevance to Bitter is the Grid question: what happens when an agent gets a real computer environment, can use terminal/browser/files, and can grow tools or subagents inside that environment? Pay special attention to isolation, persistence, cleanup, visibility, and whether power remains governable.
Primary surfaces
-
-
watch: positioning · installation · ui · features · pricing · deployment
-
watch: installation · configuration · tools · code execution · browser · memory · subagents · custom tools · docker · security · remote access
-
watch: releases · tags · commits · pull requests · issues · docs · examples · security
Accepts as evidence
- official docs
- official site
- github release
- tagged release
- maintainer commit
- merged pr
- maintainer authored post
- reproducible local probe
Refuses to promote
- unsourced social claim
- third party summary without primary link
- speculation
- stale model memory
- benchmark claim without method
- duplicate commentary
Default actionability
- release
- test
- docs change
- observe
- security change
- test
- breaking change
- adapt
- ecosystem package
- observe
- runtime change
- test
Research lenses
- workcell autonomy
- computer use
- runtime isolation
- tool creation
- visible autonomy
High-signal patterns
Linux · terminal · file system · browser · code execution · Docker · container · tool creation · plugin · custom tool · subagent · memory · task · project · remote access · UI · safety · sandbox · persistence · cleanup
Discovery state
last verified: 2026-05-07 · manual web · high confidence
- Which release or docs surface best describes the current runtime isolation model?
- Which parts of Agent Zero's tool creation are safe to compare against Bitter-owned tool and receipt boundaries?
- What should Bitter test locally versus only study as product posture?
openhands · active · tier 1 · daily
OpenHands · OpenHands
Watch OpenHands as the productized software-agent platform source. Its relevance to Bitter is breadth: SDK, CLI, GUI, cloud, enterprise, integrations, sandboxing, collaboration, and evaluation in one system. Study what a full platform makes easier, and where Bitter should stay a wrapper/control layer instead of becoming the whole platform.
Primary surfaces
-
-
watch: positioning · cloud · enterprise · integrations · pricing · deployment
-
watch: installation · sdk · cli · gui · cloud · enterprise · integrations · sandboxing · security · evaluation · configuration · runtime
-
watch: releases · tags · commits · pull requests · issues · docs · examples · security
-
watch: release notes · breaking changes · migration notes · security
Accepts as evidence
- official docs
- official site
- github release
- tagged release
- maintainer commit
- merged pr
- security advisory
- maintainer authored post
- reproducible local probe
Refuses to promote
- unsourced social claim
- third party summary without primary link
- speculation
- stale model memory
- benchmark claim without method
- duplicate commentary
Default actionability
- release
- test
- docs change
- observe
- security change
- test
- breaking change
- adapt
- ecosystem package
- observe
- enterprise change
- study
Research lenses
- productized agent platform
- sandboxed development
- cli gui cloud surface
- enterprise governance
- evaluation
High-signal patterns
SDK · CLI · GUI · cloud · enterprise · self-hosting · sandbox · runtime · browser · evaluation · benchmark · security · RBAC · permission · collaboration · Slack · Jira · Linear · GitHub · extension · integration · multi-user
Discovery state
last verified: 2026-05-07 · manual web · high confidence
- Which OpenHands surfaces should be treated as one product versus separate SDK, CLI, cloud, and enterprise sources?
- Which evaluation and sandboxing claims can be probed locally?
- Which integrations change operator behavior enough to become signals?
flue · active · tier 2 · weekly
Flue · withastro
Watch Flue as the programmable harness / headless agent calibration source. Its core framing — "Agent = Model + Harness" — explicitly separates the model from the harness, filesystem, sandbox, skills, memory, sessions, and deployment surface, which directly validates Bitter's thesis that the valuable layer is the shaped environment around the model, not just the model call itself. Treat it as category evidence and possible integration reference, not stable infrastructure. APIs are self-described as experimental; monitor direction before treating any primitive as architectural precedent. Bitter is the operating loop / receipt layer / local actuation membrane; Flue is an agent framework — they are adjacent, not the same thing.
Primary surfaces
-
-
watch: commits · releases · tags · pull requests · readme · changelog · examples · docs
-
watch: versions · breaking changes · new features · fixes
-
watch: framing · feature surface · deployment targets · skill system · sandbox api
Accepts as evidence
- github commit
- github release
- tagged release
- merged pr
- readme change
- official docs
- maintainer authored post
- reproducible local probe
Refuses to promote
- unsourced social claim
- third party summary without primary link
- speculation
- stale model memory
- benchmark claim without method
- duplicate commentary
- seo clone or mirror
Default actionability
- release
- observe
- docs change
- observe
- api change
- study
- breaking change
- note
- ecosystem package
- observe
- philosophy change
- study
Research lenses
- agent harness architecture
- model harness separation
- programmable runtime
- headless agent
- sandbox design
- skill primitives
- session memory
- ci deployment
- filesystem abstraction
High-signal patterns
model + harness separation · programmable harness · headless agent · sandboxed execution · skill system · markdown skills · AGENTS.md · session management · memory · filesystem abstraction · HTTP server · CLI agent · CI/CD deployment · Cloudflare Workers · API change · breaking change · experimental
Discovery state
last verified: 2026-06-03 · harvest run · medium confidence
- Confirm GitHub repo is github.com/withastro/flue (withastro org is unusual for an agent harness project — verify ownership).
- Is the Apache-2.0 license confirmed in the repo?
- What is the actual star count and commit velocity at time of first harvest?
- Are APIs stable enough to treat individual primitives as architectural precedent, or watch-only for now?