Finding · openhands
OpenHands: Sub-Agent Delegation and Critic Evaluation Surface
What Changed
OpenHands shipped two operator-visible additions in the May 7--12 window: opt-in sub-agent delegation and a critic evaluation display in the GUI.
Sub-agent delegation (enable_sub_agents user setting, default: False):
PR #14122 adds a TaskToolSet to the app server that routes tasks to built-in
sub-agents. Four sub-agents ship in-box: bash-runner, code-explorer,
general-purpose, and web-researcher. Custom sub-agents can be defined as
Markdown files under .agents/agents/*.md in the working directory. Sub-agents
inherit the parent agent's LLM configuration, with streaming disabled during
delegation. The feature is user-setting-gated: operators must enable it
explicitly. The default remains single-agent operation.
Critic evaluation GUI (PR #14133): CriticResult objects are now
surfaced in the GUI as a score (0--1) with a star rating (0--5) and color-coded
threshold bands: green at ≥60%, yellow at ≥40%, red below 40%. Categories
shown: agent_behavioral_issues, user_followup_patterns,
infrastructure_issues. The critic display is deployment-controlled via
OH_ENABLE_CRITIC_BY_DEFAULT; disabled by default unless explicitly enabled in
the deployment. Operators can also toggle it per-deployment via
verification.critic_enabled = false in config.
CRITIC_API_KEY allows centralized cost routing for the critic endpoint
separately from the primary model key.
v1.7.0 baseline context: the May 1 stable release (within this profile
window) shipped KVM sandbox acceleration (SANDBOX_KVM_ENABLED flag for
lower-latency container startup), exposed the SDK settings schema, moved
Tavily search to MCP settings, and patched several CVEs. These are noted
here as baseline context; the primary signals are the delegation and critic
additions above.
Operator Consequence
Sub-agent delegation is meaningful for operators running long-horizon or multi-task sessions: a single invocation can now spawn scoped sub-agents for bash execution, code exploration, web research, and general-purpose tasks. The opt-in gate (default off) is correct for a feature this consequential -- an operator choosing to enable it should understand what sub-agent delegation means for session scope, cost, and authority surface before enabling.
The critic GUI makes evaluation state visible to operators and users without
requiring separate tooling. A score display in the interface is a different
operator posture than evaluation that lives only in logs: it invites feedback
loops and surfaces degraded sessions in real time. Whether deployments enable it
is operator-controlled via OH_ENABLE_CRITIC_BY_DEFAULT.
Bitter Implication
Sub-agent delegation at the platform level (not the LLM level) is the direction
Bitter should study, not just note. OpenHands is building a routing layer where
the orchestrator assigns work to specialized sub-agents with constrained tool
surfaces. The bash-runner sub-agent doesn't need web access; the
web-researcher doesn't need bash. This is authority surface reduction as a
product pattern.
The critic evaluation surface belongs in the same category as OpenClaw's per-agent message restrictions: explicit, operator-configurable, and visible. Bitter should ask whether its own evaluation posture is visible to operators in the same way -- or whether evaluation state is opaque by default.
Signal
Sub-agent delegation and critic evaluation are both action-bearing:
- Operators running multi-task sessions should test
enable_sub_agentsand evaluate whether built-in sub-agents reduce session length or improve task routing. - Operators managing cost should configure
CRITIC_API_KEYto separate critic spend from primary model spend. - Deployments that want to enable the critic display should set
OH_ENABLE_CRITIC_BY_DEFAULT; operators can also toggle per-deployment viaverification.critic_enabled.
Finding metadata
Run: 2026-05-12-partial-cycle-openhands-2026-05-07_2026-05-12-frontier-v0
Finding ID: 2026-05-12-openhands-subagent-delegation-and-critic-evaluation
Accepted signals
Source links
Primary links, including exact changelog lines when available.