Sources · Bitter Frontier

codex · active · tier 1 · daily

Codex · OpenAI

Watch Codex as provider-native frontier capability, not just as an open-source CLI. Pay special attention to features that change Bitter's wrapper posture: long-horizon work, goals, subagents, workflows, sandboxing, permissions, AGENTS.md behavior, skills, plugins, MCP, browser/computer-use surfaces, non-interactive execution, SDKs, cost reporting, and enterprise governance.

Primary surfaces

homepage https://developers.openai.com/codex/
changelog official changelog https://developers.openai.com/codex/changelog

watch: releases · new features · improvements · bug fixes · breaking changes
docs official docs https://developers.openai.com/codex/

watch: cli · app · ide extension · web · workflows · subagents · sandboxing · memories · commands · agents md · mcp · plugins · skills · authentication · approvals · security · governance · automation
repo github repo https://github.com/openai/codex

watch: releases · tags · commits · pull requests · issues · docs · examples · security
npm package registry https://www.npmjs.com/package/@openai/codex

watch: package version · publication date · install surface

Accepts as evidence

Refuses to promote

Default actionability

release: test
docs change: observe
security change: test
breaking change: adapt
ecosystem package: observe
pricing or usage change: observe

High-signal patterns

goal · long-horizon · subagent · memory · workflow · sandbox · approval · permission · command · hook · MCP · plugin · skill · AGENTS.md · local environment · browser · computer use · automation · non-interactive · SDK · cost · usage

Discovery state

last verified: 2026-05-06 · manual web · high confidence

Which GitHub releases, tags, and npm package versions should be treated as canonical when they disagree with the official Codex changelog?
Which provider-native long-horizon features should Bitter explicitly detect through local probes rather than relying on release notes?

claude-code · active · tier 1 · daily

Claude Code · Anthropic

Watch Claude Code as a fast-moving provider-native coding environment with strong session, hook, plugin, skill, permission, and enterprise surfaces. Its changelog is granular; promote findings only when they change how developers should run it, trust it, review its output, or wrap it inside a longer-lived project workflow.

Primary surfaces

homepage https://code.claude.com/docs/en/overview
changelog official changelog https://code.claude.com/docs/en/changelog

watch: releases · new features · improvements · bug fixes · breaking changes
whats-new official digest https://code.claude.com/docs/en/whats-new

watch: notable features · examples · operator guidance · weekly rollups
docs official docs https://code.claude.com/docs/en/overview

watch: memory · hooks · slash commands · plugins · skills · subagents · permissions · settings · sandboxing · mcp · sdk · headless · telemetry · enterprise
package package registry https://www.npmjs.com/package/@anthropic-ai/claude-code

watch: package version · publication date · install surface

Accepts as evidence

Refuses to promote

Default actionability

release: test
docs change: observe
security change: test
breaking change: adapt
ecosystem package: observe
pricing or usage change: observe

High-signal patterns

recap · resume · rewind · plan · subagent · task · hook · permission · managed setting · plugin · skill · slash command · MCP · SDK · headless · telemetry · prompt caching · usage · model picker · enterprise

Discovery state

last verified: 2026-05-06 · manual web · high confidence

Which GitHub source backing the published changelog should be captured directly in addition to the rendered official docs?
Which Claude Code behaviors should be probed locally because the changelog is too granular to imply operator impact by itself?

gemini-cli · active · tier 1 · daily

Gemini CLI · Google

Watch Gemini CLI as a large open-source terminal agent with rapid release channels, explicit context-file behavior, tool and extension surfaces, checkpointing, sandboxing, IDE/GitHub integrations, and Google account or Vertex/enterprise authentication paths. Separate stable operator guidance from preview/nightly churn.

Primary surfaces

homepage https://github.com/google-gemini/gemini-cli
repo github repo https://github.com/google-gemini/gemini-cli

watch: releases · tags · commits · pull requests · issues · discussions · docs · roadmap · security
releases github releases https://github.com/google-gemini/gemini-cli/releases

watch: stable · preview · nightly · breaking changes · security
docs official docs https://google-gemini.github.io/gemini-cli/docs/

watch: changelog · installation · authentication · configuration · commands · context files · checkpointing · tools · mcp · extensions · headless · ide · sandboxing · trusted folders · enterprise · telemetry
npm package registry https://www.npmjs.com/package/@google/gemini-cli

watch: package version · publication date · dist tags

Accepts as evidence

Refuses to promote

Default actionability

release: test
docs change: observe
security change: test
breaking change: adapt
ecosystem package: observe
pricing or usage change: observe

High-signal patterns

checkpoint · resume · context file · GEMINI.md · tool call · shell · web fetch · search grounding · MCP · extension · sandbox · trusted folder · permission · IDE · GitHub Action · output format · stream-json · authentication · enterprise · telemetry · preview channel · security

Discovery state

last verified: 2026-05-06 · manual web · high confidence

Should nightly and preview releases be harvested into findings or only used for adapter-probe canaries?
Which security advisories should be treated as direct signals even when they do not change public docs?

hermes-agent · active · tier 1 · daily

Hermes Agent · Nous Research

Hermes should be watched as a broad self-improving agent platform, not just as a coding CLI. Pay special attention to memory, skills, automations, messaging surfaces, subagents, sandboxing, runtime portability, and research trajectory generation. Bitter's opening is the project workflow around tools like this: permissions, evidence, review, memory, and what the next run should know.

Primary surfaces

homepage https://hermes-agent.nousresearch.com/docs
repo github repo https://github.com/NousResearch/hermes-agent

watch: releases · tags · commits · pull requests · issues · docs · examples · security
releases github releases https://github.com/NousResearch/hermes-agent/releases

watch: release notes · breaking changes · migration notes · security
docs official docs https://hermes-agent.nousresearch.com/docs

watch: installation · configuration · tools · toolsets · memory · skills · mcp · messaging · cron · security · terminal backends · architecture · context files · llms txt

Accepts as evidence

Refuses to promote

Default actionability

release: test
docs change: observe
security change: test
breaking change: adapt
ecosystem package: observe
pricing or usage change: observe

High-signal patterns

memory · skill · self-improvement · subagent · delegate · toolset · terminal backend · sandbox · container · SSH · Modal · Daytona · cron · messaging gateway · Telegram · Discord · Slack · MCP · context file · SOUL.md · llms.txt · trajectory · RL

Discovery state

last verified: 2026-05-06 · manual web · high confidence

Which docs domain should be considered canonical if GitHub README links and deployed docs diverge?
Which social or Discord announcements are maintainer-authored enough to include, and how should they be cited?

pi-coding-agent · active · tier 1 · daily

Pi Coding Agent · earendil-works / Mario Zechner

Watch Pi as a minimal, extensible terminal coding harness. It is important partly because of what it chooses not to include by default: subagents, plan mode, permission popups, MCP, and other governance features. That deliberate minimalism clarifies Bitter's wedge as the project workflow around coding agents: durable goals, permissions, evidence, verification, and memory.

Primary surfaces

homepage https://pi.dev/
site official site https://pi.dev/

watch: positioning · installation · modes · providers · design principles · package ecosystem
docs official docs https://pi.dev/docs/latest

watch: quickstart · usage · sessions · context files · system prompt files · compaction · skills · extensions · prompt templates · themes · packages · rpc · sdk · providers · settings
repo github repo https://github.com/earendil-works/pi

watch: releases · tags · commits · pull requests · issues · packages coding agent · docs · examples
npm package registry https://www.npmjs.com/package/@mariozechner/pi-coding-agent

watch: package version · publication date · dist tags

Accepts as evidence

Refuses to promote

Default actionability

release: test
docs change: observe
security change: test
breaking change: adapt
ecosystem package: observe
pricing or usage change: observe

High-signal patterns

extension · skill · package · prompt template · theme · session tree · branch · share · export · AGENTS.md · SYSTEM.md · compaction · dynamic context · RPC · SDK · json mode · provider · login · permission · sandbox · MCP · subagent · plan mode

Discovery state

last verified: 2026-05-12 · manual web · high confidence

Which package-registry or package-index surface should be watched for Pi extension ecosystem movement?

openclaw · active · tier 1 · daily

OpenClaw · OpenClaw

Watch OpenClaw as the accessibility calibration source for the agentic harness frontier. Its most important lesson may be product posture: making autonomous agent work feel reachable to everyday people. Pay special attention to onboarding, gateway surfaces, familiar channels, visual state, permissions, and any design move that hides setup complexity without hiding authority.

Primary surfaces

homepage https://docs.openclaw.ai/
repo github repo https://github.com/openclaw/openclaw

watch: releases · tags · commits · pull requests · issues · docs · examples · security
docs official docs https://docs.openclaw.ai/

watch: getting started · gateway · installation · configuration · channels · plugins · skills · permissions · memory · remote access · security · mobile or desktop surfaces
getting-started official docs https://docs.openclaw.ai/start/getting-started

watch: onboarding · setup steps · first run · user workflow

Accepts as evidence

Refuses to promote

Default actionability

release: test
docs change: observe
security change: test
breaking change: adapt
ecosystem package: observe
accessibility change: study

Research lenses

High-signal patterns

onboarding · setup · gateway · visual surface · desktop · mobile · channel · notification · remote access · everyday user · natural language workflow · permission · approval · visibility · handoff · plugin · skill · daemon · background agent · long-running task · memory

Discovery state

last verified: 2026-05-07 · manual web · medium confidence

Which OpenClaw release surface should be treated as canonical if docs and GitHub move at different speeds?
Which user-facing gateway surfaces are official product posture rather than experimental examples?
Which security and authority boundaries are visible enough for everyday users to understand?

paperclip · active · tier 1 · daily

Paperclip · Paperclip

Watch Paperclip as the coordination and economic-control-plane source. Its relevance to Bitter is the Factory question: can agent work be organized into goals, roles, budgets, accountability, approvals, and operating state without becoming theater?

Primary surfaces

homepage https://paperclip.ing/
site official site https://paperclip.ing/

watch: positioning · onboarding · governance · company model · pricing · demos
docs official docs https://docs.paperclip.ing/

watch: setup · goals · agents · teams · governance · budgets · accountability · approvals · integrations · security
repo github repo https://github.com/paperclipai/paperclip

watch: releases · tags · commits · pull requests · issues · docs · examples · security

Accepts as evidence

Refuses to promote

Default actionability

release: test
docs change: observe
security change: test
breaking change: adapt
ecosystem package: observe
governance change: study

Research lenses

High-signal patterns

company · org chart · goal · budget · role · manager · employee · approval · governance · accountability · cost · task queue · progress · audit · session · dashboard · multi-agent · agent team

Discovery state

last verified: 2026-05-07 · manual web · medium confidence

Which source is canonical for product changes if the public site, docs, and GitHub repository diverge?
How much of the company/control-plane metaphor is backed by durable operating state versus UI framing?
Which governance and budget primitives are enforceable rather than descriptive?

agent-zero · active · tier 1 · daily

Agent Zero · agent0ai

Watch Agent Zero as the workcell-autonomy source. Its relevance to Bitter is the Grid question: what happens when an agent gets a real computer environment, can use terminal/browser/files, and can grow tools or subagents inside that environment? Pay special attention to isolation, persistence, cleanup, visibility, and whether power remains governable.

Primary surfaces

homepage https://www.agent-zero.ai/
site official site https://www.agent-zero.ai/

watch: positioning · installation · ui · features · pricing · deployment
docs official docs https://www.agent-zero.ai/p/docs/

watch: installation · configuration · tools · code execution · browser · memory · subagents · custom tools · docker · security · remote access
repo github repo https://github.com/agent0ai/agent-zero

watch: releases · tags · commits · pull requests · issues · docs · examples · security

Accepts as evidence

Refuses to promote

Default actionability

release: test
docs change: observe
security change: test
breaking change: adapt
ecosystem package: observe
runtime change: test

Research lenses

High-signal patterns

Linux · terminal · file system · browser · code execution · Docker · container · tool creation · plugin · custom tool · subagent · memory · task · project · remote access · UI · safety · sandbox · persistence · cleanup

Discovery state

last verified: 2026-05-07 · manual web · high confidence

Which release or docs surface best describes the current runtime isolation model?
Which parts of Agent Zero's tool creation are safe to compare against Bitter-owned tool and receipt boundaries?
What should Bitter test locally versus only study as product posture?

openhands · active · tier 1 · daily

OpenHands · OpenHands

Watch OpenHands as the productized software-agent platform source. Its relevance to Bitter is breadth: SDK, CLI, GUI, cloud, enterprise, integrations, sandboxing, collaboration, and evaluation in one system. Study what a full platform makes easier, and where Bitter should stay a wrapper/control layer instead of becoming the whole platform.

Primary surfaces

homepage https://openhands.dev/
site official site https://openhands.dev/

watch: positioning · cloud · enterprise · integrations · pricing · deployment
docs official docs https://docs.openhands.dev/

watch: installation · sdk · cli · gui · cloud · enterprise · integrations · sandboxing · security · evaluation · configuration · runtime
repo github repo https://github.com/OpenHands/OpenHands

watch: releases · tags · commits · pull requests · issues · docs · examples · security
releases github releases https://github.com/OpenHands/OpenHands/releases

watch: release notes · breaking changes · migration notes · security

Accepts as evidence

Refuses to promote

Default actionability

release: test
docs change: observe
security change: test
breaking change: adapt
ecosystem package: observe
enterprise change: study

Research lenses

High-signal patterns

SDK · CLI · GUI · cloud · enterprise · self-hosting · sandbox · runtime · browser · evaluation · benchmark · security · RBAC · permission · collaboration · Slack · Jira · Linear · GitHub · extension · integration · multi-user

Discovery state

last verified: 2026-05-07 · manual web · high confidence

Which OpenHands surfaces should be treated as one product versus separate SDK, CLI, cloud, and enterprise sources?
Which evaluation and sandboxing claims can be probed locally?
Which integrations change operator behavior enough to become signals?

flue · active · tier 2 · weekly

Flue · withastro

Watch Flue as the programmable harness / headless agent calibration source. Its core framing — "Agent = Model + Harness" — explicitly separates the model from the harness, filesystem, sandbox, skills, memory, sessions, and deployment surface, which directly validates Bitter's thesis that the valuable layer is the shaped environment around the model, not just the model call itself. Treat it as category evidence and possible integration reference, not stable infrastructure. APIs are self-described as experimental; monitor direction before treating any primitive as architectural precedent. Bitter is the operating loop / receipt layer / local actuation membrane; Flue is an agent framework — they are adjacent, not the same thing.

Primary surfaces

homepage https://flueframework.com/
repo github repo https://github.com/withastro/flue

watch: commits · releases · tags · pull requests · readme · changelog · examples · docs
changelog changelog file https://github.com/withastro/flue/blob/main/CHANGELOG.md

watch: versions · breaking changes · new features · fixes
homepage official site https://flueframework.com/

watch: framing · feature surface · deployment targets · skill system · sandbox api

Accepts as evidence

Refuses to promote

Default actionability

release: observe
docs change: observe
api change: study
breaking change: note
ecosystem package: observe
philosophy change: study

Research lenses

High-signal patterns

model + harness separation · programmable harness · headless agent · sandboxed execution · skill system · markdown skills · AGENTS.md · session management · memory · filesystem abstraction · HTTP server · CLI agent · CI/CD deployment · Cloudflare Workers · API change · breaking change · experimental

Discovery state

last verified: 2026-06-03 · harvest run · medium confidence

Confirm GitHub repo is github.com/withastro/flue (withastro org is unusual for an agent harness project — verify ownership).
Is the Apache-2.0 license confirmed in the repo?
What is the actual star count and commit velocity at time of first harvest?
Are APIs stable enough to treat individual primitives as architectural precedent, or watch-only for now?

The intake promise.

Codex · OpenAI

Claude Code · Anthropic

Gemini CLI · Google

Hermes Agent · Nous Research

Pi Coding Agent · earendil-works / Mario Zechner

OpenClaw · OpenClaw

Paperclip · Paperclip

Agent Zero · agent0ai

OpenHands · OpenHands

Flue · withastro