Founding member access recorded.
Checkout cancelled.

Watched Sources

The intake promise.

A source contract is the public commitment that defines where the loop looks for evidence, what it accepts as a finding, and what it refuses before a profile or digest can carry the claim. Each card below shows the contract as the loop sees it. Every field is rendered from sources/[id].yml as-is — no editorial overlay.

codex · active · tier 1 · daily

Codex · OpenAI

Watch Codex as provider-native frontier capability, not just as an open-source CLI. Pay special attention to features that change Bitter's wrapper posture: long-horizon work, goals, subagents, workflows, sandboxing, permissions, AGENTS.md behavior, skills, plugins, MCP, browser/computer-use surfaces, non-interactive execution, SDKs, cost reporting, and enterprise governance.

Primary surfaces

Accepts as evidence

  • official changelog
  • official docs
  • github release
  • tagged release
  • maintainer commit
  • merged pr
  • official blog or developer post
  • package registry release
  • reproducible local probe

Refuses to promote

  • unsourced social claim
  • third party summary without primary link
  • speculation
  • stale model memory
  • benchmark claim without method
  • duplicate commentary

Default actionability

release
test
docs change
observe
security change
test
breaking change
adapt
ecosystem package
observe
pricing or usage change
observe

High-signal patterns

goal · long-horizon · subagent · memory · workflow · sandbox · approval · permission · command · hook · MCP · plugin · skill · AGENTS.md · local environment · browser · computer use · automation · non-interactive · SDK · cost · usage

Discovery state

last verified: 2026-05-06 · manual web · high confidence

  • Which GitHub releases, tags, and npm package versions should be treated as canonical when they disagree with the official Codex changelog?
  • Which provider-native long-horizon features should Bitter explicitly detect through local probes rather than relying on release notes?

claude-code · active · tier 1 · daily

Claude Code · Anthropic

Watch Claude Code as a fast-moving provider-native coding environment with strong session, hook, plugin, skill, permission, and enterprise surfaces. Its changelog is granular; promote findings only when they change how developers should run it, trust it, review its output, or wrap it inside a longer-lived project workflow.

Primary surfaces

Accepts as evidence

  • official changelog
  • official docs
  • official whats new
  • package registry release
  • maintainer authored post
  • reproducible local probe

Refuses to promote

  • unsourced social claim
  • third party summary without primary link
  • speculation
  • stale model memory
  • benchmark claim without method
  • duplicate commentary

Default actionability

release
test
docs change
observe
security change
test
breaking change
adapt
ecosystem package
observe
pricing or usage change
observe

High-signal patterns

recap · resume · rewind · plan · subagent · task · hook · permission · managed setting · plugin · skill · slash command · MCP · SDK · headless · telemetry · prompt caching · usage · model picker · enterprise

Discovery state

last verified: 2026-05-06 · manual web · high confidence

  • Which GitHub source backing the published changelog should be captured directly in addition to the rendered official docs?
  • Which Claude Code behaviors should be probed locally because the changelog is too granular to imply operator impact by itself?

gemini-cli · active · tier 1 · daily

Gemini CLI · Google

Watch Gemini CLI as a large open-source terminal agent with rapid release channels, explicit context-file behavior, tool and extension surfaces, checkpointing, sandboxing, IDE/GitHub integrations, and Google account or Vertex/enterprise authentication paths. Separate stable operator guidance from preview/nightly churn.

Primary surfaces

Accepts as evidence

  • official docs
  • github release
  • tagged release
  • maintainer commit
  • merged pr
  • security advisory
  • package registry release
  • official google post
  • reproducible local probe

Refuses to promote

  • unsourced social claim
  • third party summary without primary link
  • speculation
  • stale model memory
  • benchmark claim without method
  • duplicate commentary

Default actionability

release
test
docs change
observe
security change
test
breaking change
adapt
ecosystem package
observe
pricing or usage change
observe

High-signal patterns

checkpoint · resume · context file · GEMINI.md · tool call · shell · web fetch · search grounding · MCP · extension · sandbox · trusted folder · permission · IDE · GitHub Action · output format · stream-json · authentication · enterprise · telemetry · preview channel · security

Discovery state

last verified: 2026-05-06 · manual web · high confidence

  • Should nightly and preview releases be harvested into findings or only used for adapter-probe canaries?
  • Which security advisories should be treated as direct signals even when they do not change public docs?

hermes-agent · active · tier 1 · daily

Hermes Agent · Nous Research

Hermes should be watched as a broad self-improving agent platform, not just as a coding CLI. Pay special attention to memory, skills, automations, messaging surfaces, subagents, sandboxing, runtime portability, and research trajectory generation. Bitter's opening is the project workflow around tools like this: permissions, evidence, review, memory, and what the next run should know.

Primary surfaces

Accepts as evidence

  • official docs
  • github release
  • tagged release
  • maintainer commit
  • merged pr
  • maintainer authored post
  • reproducible local probe

Refuses to promote

  • unsourced social claim
  • third party summary without primary link
  • speculation
  • stale model memory
  • benchmark claim without method
  • duplicate commentary

Default actionability

release
test
docs change
observe
security change
test
breaking change
adapt
ecosystem package
observe
pricing or usage change
observe

High-signal patterns

memory · skill · self-improvement · subagent · delegate · toolset · terminal backend · sandbox · container · SSH · Modal · Daytona · cron · messaging gateway · Telegram · Discord · Slack · MCP · context file · SOUL.md · llms.txt · trajectory · RL

Discovery state

last verified: 2026-05-06 · manual web · high confidence

  • Which docs domain should be considered canonical if GitHub README links and deployed docs diverge?
  • Which social or Discord announcements are maintainer-authored enough to include, and how should they be cited?

pi-coding-agent · active · tier 1 · daily

Pi Coding Agent · earendil-works / Mario Zechner

Watch Pi as a minimal, extensible terminal coding harness. It is important partly because of what it chooses not to include by default: subagents, plan mode, permission popups, MCP, and other governance features. That deliberate minimalism clarifies Bitter's wedge as the project workflow around coding agents: durable goals, permissions, evidence, verification, and memory.

Primary surfaces

Accepts as evidence

  • official docs
  • official site
  • github release
  • tagged release
  • maintainer commit
  • merged pr
  • package registry release
  • maintainer authored post
  • reproducible local probe

Refuses to promote

  • unsourced social claim
  • third party summary without primary link
  • speculation
  • stale model memory
  • benchmark claim without method
  • duplicate commentary

Default actionability

release
test
docs change
observe
security change
test
breaking change
adapt
ecosystem package
observe
pricing or usage change
observe

High-signal patterns

extension · skill · package · prompt template · theme · session tree · branch · share · export · AGENTS.md · SYSTEM.md · compaction · dynamic context · RPC · SDK · json mode · provider · login · permission · sandbox · MCP · subagent · plan mode

Discovery state

last verified: 2026-05-12 · manual web · high confidence

  • Which package-registry or package-index surface should be watched for Pi extension ecosystem movement?

openclaw · active · tier 1 · daily

OpenClaw · OpenClaw

Watch OpenClaw as the accessibility calibration source for the agentic harness frontier. Its most important lesson may be product posture: making autonomous agent work feel reachable to everyday people. Pay special attention to onboarding, gateway surfaces, familiar channels, visual state, permissions, and any design move that hides setup complexity without hiding authority.

Primary surfaces

Accepts as evidence

  • official docs
  • github release
  • tagged release
  • maintainer commit
  • merged pr
  • maintainer authored post
  • reproducible local probe

Refuses to promote

  • unsourced social claim
  • third party summary without primary link
  • speculation
  • stale model memory
  • benchmark claim without method
  • duplicate commentary
  • seo clone or mirror

Default actionability

release
test
docs change
observe
security change
test
breaking change
adapt
ecosystem package
observe
accessibility change
study

Research lenses

  • accessibility
  • distribution surface
  • everyday use
  • gateway
  • authority visibility

High-signal patterns

onboarding · setup · gateway · visual surface · desktop · mobile · channel · notification · remote access · everyday user · natural language workflow · permission · approval · visibility · handoff · plugin · skill · daemon · background agent · long-running task · memory

Discovery state

last verified: 2026-05-07 · manual web · medium confidence

  • Which OpenClaw release surface should be treated as canonical if docs and GitHub move at different speeds?
  • Which user-facing gateway surfaces are official product posture rather than experimental examples?
  • Which security and authority boundaries are visible enough for everyday users to understand?

paperclip · active · tier 1 · daily

Paperclip · Paperclip

Watch Paperclip as the coordination and economic-control-plane source. Its relevance to Bitter is the Factory question: can agent work be organized into goals, roles, budgets, accountability, approvals, and operating state without becoming theater?

Primary surfaces

Accepts as evidence

  • official docs
  • official site
  • github release
  • tagged release
  • maintainer commit
  • merged pr
  • maintainer authored post
  • reproducible local probe

Refuses to promote

  • unsourced social claim
  • third party summary without primary link
  • speculation
  • stale model memory
  • benchmark claim without method
  • duplicate commentary

Default actionability

release
test
docs change
observe
security change
test
breaking change
adapt
ecosystem package
observe
governance change
study

Research lenses

  • coordination control plane
  • economic governance
  • accountability
  • multi agent operations
  • factory analogue

High-signal patterns

company · org chart · goal · budget · role · manager · employee · approval · governance · accountability · cost · task queue · progress · audit · session · dashboard · multi-agent · agent team

Discovery state

last verified: 2026-05-07 · manual web · medium confidence

  • Which source is canonical for product changes if the public site, docs, and GitHub repository diverge?
  • How much of the company/control-plane metaphor is backed by durable operating state versus UI framing?
  • Which governance and budget primitives are enforceable rather than descriptive?

agent-zero · active · tier 1 · daily

Agent Zero · agent0ai

Watch Agent Zero as the workcell-autonomy source. Its relevance to Bitter is the Grid question: what happens when an agent gets a real computer environment, can use terminal/browser/files, and can grow tools or subagents inside that environment? Pay special attention to isolation, persistence, cleanup, visibility, and whether power remains governable.

Primary surfaces

Accepts as evidence

  • official docs
  • official site
  • github release
  • tagged release
  • maintainer commit
  • merged pr
  • maintainer authored post
  • reproducible local probe

Refuses to promote

  • unsourced social claim
  • third party summary without primary link
  • speculation
  • stale model memory
  • benchmark claim without method
  • duplicate commentary

Default actionability

release
test
docs change
observe
security change
test
breaking change
adapt
ecosystem package
observe
runtime change
test

Research lenses

  • workcell autonomy
  • computer use
  • runtime isolation
  • tool creation
  • visible autonomy

High-signal patterns

Linux · terminal · file system · browser · code execution · Docker · container · tool creation · plugin · custom tool · subagent · memory · task · project · remote access · UI · safety · sandbox · persistence · cleanup

Discovery state

last verified: 2026-05-07 · manual web · high confidence

  • Which release or docs surface best describes the current runtime isolation model?
  • Which parts of Agent Zero's tool creation are safe to compare against Bitter-owned tool and receipt boundaries?
  • What should Bitter test locally versus only study as product posture?

openhands · active · tier 1 · daily

OpenHands · OpenHands

Watch OpenHands as the productized software-agent platform source. Its relevance to Bitter is breadth: SDK, CLI, GUI, cloud, enterprise, integrations, sandboxing, collaboration, and evaluation in one system. Study what a full platform makes easier, and where Bitter should stay a wrapper/control layer instead of becoming the whole platform.

Primary surfaces

Accepts as evidence

  • official docs
  • official site
  • github release
  • tagged release
  • maintainer commit
  • merged pr
  • security advisory
  • maintainer authored post
  • reproducible local probe

Refuses to promote

  • unsourced social claim
  • third party summary without primary link
  • speculation
  • stale model memory
  • benchmark claim without method
  • duplicate commentary

Default actionability

release
test
docs change
observe
security change
test
breaking change
adapt
ecosystem package
observe
enterprise change
study

Research lenses

  • productized agent platform
  • sandboxed development
  • cli gui cloud surface
  • enterprise governance
  • evaluation

High-signal patterns

SDK · CLI · GUI · cloud · enterprise · self-hosting · sandbox · runtime · browser · evaluation · benchmark · security · RBAC · permission · collaboration · Slack · Jira · Linear · GitHub · extension · integration · multi-user

Discovery state

last verified: 2026-05-07 · manual web · high confidence

  • Which OpenHands surfaces should be treated as one product versus separate SDK, CLI, cloud, and enterprise sources?
  • Which evaluation and sandboxing claims can be probed locally?
  • Which integrations change operator behavior enough to become signals?

flue · active · tier 2 · weekly

Flue · withastro

Watch Flue as the programmable harness / headless agent calibration source. Its core framing — "Agent = Model + Harness" — explicitly separates the model from the harness, filesystem, sandbox, skills, memory, sessions, and deployment surface, which directly validates Bitter's thesis that the valuable layer is the shaped environment around the model, not just the model call itself. Treat it as category evidence and possible integration reference, not stable infrastructure. APIs are self-described as experimental; monitor direction before treating any primitive as architectural precedent. Bitter is the operating loop / receipt layer / local actuation membrane; Flue is an agent framework — they are adjacent, not the same thing.

Primary surfaces

Accepts as evidence

  • github commit
  • github release
  • tagged release
  • merged pr
  • readme change
  • official docs
  • maintainer authored post
  • reproducible local probe

Refuses to promote

  • unsourced social claim
  • third party summary without primary link
  • speculation
  • stale model memory
  • benchmark claim without method
  • duplicate commentary
  • seo clone or mirror

Default actionability

release
observe
docs change
observe
api change
study
breaking change
note
ecosystem package
observe
philosophy change
study

Research lenses

  • agent harness architecture
  • model harness separation
  • programmable runtime
  • headless agent
  • sandbox design
  • skill primitives
  • session memory
  • ci deployment
  • filesystem abstraction

High-signal patterns

model + harness separation · programmable harness · headless agent · sandboxed execution · skill system · markdown skills · AGENTS.md · session management · memory · filesystem abstraction · HTTP server · CLI agent · CI/CD deployment · Cloudflare Workers · API change · breaking change · experimental

Discovery state

last verified: 2026-06-03 · harvest run · medium confidence

  • Confirm GitHub repo is github.com/withastro/flue (withastro org is unusual for an agent harness project — verify ownership).
  • Is the Apache-2.0 license confirmed in the repo?
  • What is the actual star count and commit velocity at time of first harvest?
  • Are APIs stable enough to treat individual primitives as architectural precedent, or watch-only for now?