Multi-agente · Step 6 · How one model commands a crew
Cross-agent delegation via cli -p
A loop run does not have to be done by a single AI. One model — the orchestrator — can hand out work to a whole crew of other AI assistants, one small job at a time, by talking to each one through its command line. This lesson is about how that handoff actually works: who is on the crew, how the orchestrator checks an assistant is even available before trusting it, the exact commands it uses to call each one, and the one safety rule that makes the whole thing reliable.
Read the plain version, or open the technical layer on any section.
1
The big idea: one model running a crew
So far the loop has had one worker doing every pass. But there are many capable AI assistants, and each is better at some things than others. The harness lets one of them act as a foreman — we call it the orchestrator — and hand out the actual work to the others, one small, well-defined job at a time.
The way the foreman talks to each assistant is delightfully ordinary: it runs the assistant's command-line program, the same kind of typed command you'd run in a terminal. Every one of these assistants ships a "headless" mode — a way to ask it a single question and get a single answer back, with no chat window, no clicking, just text in and text out. The foreman pipes a task in, reads the result, and moves on to the next job. That is the whole trick: delegation is just the orchestrator running another assistant's command and reading what comes back.
Three things have to be true for this to be safe, and they map onto three kinds of participant. A person decides, at the very start, which assistants are even allowed on the crew — that named list is called the roster. The orchestrator (itself an AI) checks each assistant is actually installed on this machine before trusting it, picks the right command to call it, and never lets the same assistant both build a thing and be the one to sign off on it. And the assistants themselves just do the one job they're handed and return the result. Nobody has to babysit the screen while this happens — that hands-off discipline came from the previous lesson.
Think of it like… a head chef during a dinner rush. The chef doesn't cook every plate personally. They call out one order at a time — "table four, the salmon" — to whichever cook owns that station, and they only call to stations that are actually staffed tonight. Crucially, the chef has a second person taste the dish before it leaves the kitchen — never the cook who made it, because the maker is the worst judge of their own work. Where the analogy bends: in our kitchen the chef is also one of the cooks, and the "calling out" is literally typing a command and reading the reply.
What "delegate via cli -p" means precisely
The orchestrator is a top-tier model driving the run. To delegate, it shells out to another agent's command-line interface in non-interactive mode — conventionally the -p ("prompt") flag or an exec subcommand — passing one bounded unit of work as the prompt and capturing stdout. There is no shared memory and no live session: each call is a fresh, stateless invocation, which is exactly what makes a unit bounded. The contract is "here is one job and the context it needs; return the result".
One unit at a time, on purpose
Delegation is serial per unit, not a free-for-all. The orchestrator hands out a single unit, reads the result, verifies it at the real boundary (the Proof Gate from lesson 3), and only then dispatches the next. That keeps every step auditable in LOOP-LOG.md and prevents two agents from racing on the same artifact. The roster can be heterogeneous — Claude, Codex, Kimi, Grok, GLM, Minimax — and the orchestrator is deliberately agnostic: it picks per unit, not once for the whole run.
2
Delegation in one picture
Here is the whole handoff as a single flow. A person sets the roster; the orchestrator takes one unit, checks the chosen assistant is installed, runs its command, and reads the result back; then a different assistant validates it. If the assistant isn't installed, the orchestrator simply skips it — it never calls a command that isn't there.
Read left → right. The solid path is a successful delegation; the dashed red path is an agent that isn't installed, gated out before any command runs.
person sets the rosterorchestrator picks per unitpreflight before trustone unit at a timevalidator ≠ builder
3
The four layers, step by step
Delegation stacks up in four layers, and it's easiest to feel them by walking through them one at a time. Use the rail below: each stop is one layer of the handoff, from a person naming the crew, to the orchestrator's safety check, to the actual command, to the independent sign-off. Click a tab, or step through with Next.
loop-engineering · one bounded unit · orchestrator → installed agent
Hand one unit to the right assistant, safely
AFK · human on observability4 layersroster: claude · codex · kimi · grok · glm · minimax
A person declares the crew — before the run starts.
The orchestrator does not get to invent its own helpers. At the very start of a run, a human writes down the Agents roster: the named set of assistants this run may delegate to. The default is agnostic — any capable assistant can be on it — but the list is explicit, so you always know who could touch your work.
claudeAnthropic's CLI. Strong all-rounder; often the orchestrator itself.installed
codexOpenAI's coding agent, run head-less with exec.installed
kimi · grokAdditional builders, each with its own headless flags.installed
glm · minimaxReached through a local proxy (cliproxyapi) rather than a native CLI.installed
pi · agyOn the roster by name, but not installed on this machine — so they must be gated, never called.gated
The orchestrator checks an assistant is really there — before it trusts it.
A roster is a wish-list; being on it doesn't mean the program exists on this computer. So before delegating, the orchestrator runs a tiny preflight: it asks the shell "does this command exist?" with command -v, or it runs a small detector script that fills in a PANELISTS= list of who's actually available. An assistant that fails the check — like pi or agy here — is gated: quietly dropped from the candidates for this unit. No command is ever run for an assistant that isn't installed.
preflight — does the agent exist on this box?
# cheapest possible check: is the binary on PATH?command -v claude codex kimi grok # prints the path of each that exists# or let the harness build the available set for yousource detect_panel.sh
echo"$PANELISTS"# e.g. "claude codex kimi grok glm minimax"# pi / agy absent here → gated, never invoked
It runs the assistant's headless command and reads the answer.
Each assistant has its own exact, proven way to be called from a script. The flags matter: they force a single non-interactive answer, fix the output format so the orchestrator can parse it, and set the working directory. These are the invocations the harness actually uses — copy them verbatim.
the proven cli -p invocations (one bounded unit each)
# Claude — JSON so the result is machine-parseable
claude -p "<one bounded unit>" --output-format json
# Codex — exec subcommand, quiet for clean stdout
codex exec --quiet "<one bounded unit>"# Kimi — plain text. NEVER combine -p with --yolo, and do NOT pass --work-dir
kimi -p "<one bounded unit>" --output-format text
# Grok — plain output, auto-approve tool calls, explicit working dir
grok -p "<one bounded unit>" --output-format plain --always-approve --cwd "$PWD"# GLM / Minimax — no native CLI; routed through the local proxy# (cliproxyapi exposes them on an OpenAI-compatible endpoint)
What the assistant does
Runs the one job in a fresh, stateless call and prints the result to stdout. No memory of other units.
What the orchestrator does
Builds the prompt, picks the command for the chosen agent, captures and parses the output.
A different assistant checks the work. Never the one that built it.
The rule that makes this trustworthy: the Validator is never the builder. Whoever produced the unit cannot be the one to sign it off — a maker is blind to its own mistakes and will happily call its own work correct. So validation is routed to a different agent from the roster, who checks the result against the real boundary (run the test, hit the endpoint, read the file).
This is why a heterogeneous roster is a feature, not just a convenience: with more than one assistant available, the orchestrator can always find a second pair of eyes that didn't write the code. The verdict still has to clear the Proof Gate — real evidence, never a claim — and it lands in LOOP-LOG.md for the human to audit later.
Layer 1 of 4 · Roster
Why each flag is what it is
--output-format json (Claude) gives a structured envelope the orchestrator can parse without scraping prose. codex exec --quiet suppresses progress chatter so stdout is just the answer. kimi -p … --output-format text is the supported headless path — combining -p with --yolo is unsupported and --work-dir is rejected, so neither is used. grok -p … --output-format plain --always-approve --cwd "$PWD" runs non-interactively, auto-approves its own tool calls (there's no human to click), and pins the working directory explicitly.
GLM and Minimax have no native CLI
They are reached through cliproxyapi, a local process that exposes them on an OpenAI-compatible HTTP endpoint. The orchestrator calls that endpoint instead of a binary, but the contract is identical: one bounded unit in, one result out, validated by someone else.
Gating absent agents
pi and agy appear in the roster for portability, but on a box where command -v can't find them they are excluded from the candidate set for every unit. Gating is silent and per-machine: the same roster file works everywhere, and each host simply delegates to whatever subset is actually installed.
4
Preflight: never call a command that isn't there
This layer is worth its own beat because it's the difference between a robust crew and a fragile one. The orchestrator treats "is this assistant installed?" as a question to be answered by the machine, not assumed. If the answer is no, that assistant is gated — dropped from the running for this unit — and the work goes to someone who's actually present. It's the same anti-assumption habit from LEARN, applied to the crew itself.
Run the preflight by hand
From your project directory, ask the shell which agents exist. command -v prints the resolved path for each binary it finds and stays silent for the rest, so the ones that print are your installed candidates.
terminal — list which agents are installed right now
# run from anywhere; prints a line per agentfor a in claude codex kimi grok pi agy; doif command -v "$a" >/dev/null 2>&1; thenecho"available: $a"elseecho"gated: $a"# not installed → never invokedfidone
If the harness ships detect_panel.sh, sourcing it does the same sweep and leaves the result in PANELISTS. Either way, the orchestrator only ever delegates to a name that survived this check.
5
The proven invocations, side by side
Once an assistant has passed preflight, the orchestrator calls it with the one command known to work head-less for that tool. You met these in the demo; here they are collected as a reference, with the gotchas spelled out. The point of the plain layer: each assistant is called differently, the flags are not interchangeable, and a couple of combinations are off-limits.
cli -p reference — copy verbatim, flags are load-bearing
claude -p "<unit>" --output-format json
codex exec --quiet "<unit>"
kimi -p "<unit>" --output-format text # never -p + --yolo; no --work-dir
grok -p "<unit>" --output-format plain --always-approve --cwd "$PWD"# glm / minimax → via cliproxyapi (OpenAI-compatible endpoint, no native CLI)
Reading a result back
Because the calls are stateless, the orchestrator captures stdout per invocation and parses it according to the format it asked for — JSON for Claude, plain text for the others. There's no follow-up turn; if more is needed, that's a new bounded unit and a fresh call. This is what keeps each delegation independently auditable.
The two Kimi traps
Two specific mistakes break Kimi in headless mode: pairing -p with --yolo, and passing --work-dir. The supported form is exactly kimi -p "<unit>" --output-format text. The harness encodes that so a delegating orchestrator can't fall into either trap.
6
Why the Validator is never the builder
If you remember one rule from this lesson, make it this one. The assistant that produced a unit is the worst judge of whether it's correct — it's already convinced. So the harness never lets a builder sign off on its own work. Validation always goes to a different assistant from the roster, who checks the result against reality: run the test, call the endpoint, read the file. Only then does the unit count as done.
This is the deepest reason to keep a mixed crew. With several assistants available, there is always a second one who didn't write the code and can look at it fresh. The orchestrator routes the check away from the author automatically — and the verdict still has to be real evidence (the Proof Gate), recorded where the human can read it.
Top: a builder grading itself — blind to its own bugs. Bottom: a different agent validates against real evidence, then it's logged.
The one rule
Builder and validator are always two different agents. A maker cannot certify its own work — the second pair of eyes is the whole point of having a crew.
7
Who does what: a person, the orchestrator, the assistants
The whole mechanism comes down to three responsibilities, and keeping them straight is what makes a multi-agent run safe rather than chaotic. They are not three phases you do in order — they're three roles acting at once.
A person
Declares the Agents roster at the start of the run — the named set of assistants that may be delegated to. After that, they stay on observability (reading LOOP-LOG.md), not in the doing.
The orchestrator (an LLM)
Preflights each agent, picks the right cli -p command, hands out one bounded unit at a time, and routes validation so the Validator is never the builder.
The assistants (agents)
Each runs the single job it's given in a fresh, stateless call and returns the result. Ones not installed here — pi, agy — are gated and never called.
Why the split matters
The human owning the roster bounds who can touch the work; the orchestrator owning dispatch bounds how (one unit, right command, independent validation); the agents owning execution keep each call stateless and replaceable. Break any one — let the orchestrator invent helpers, or let a builder self-validate, or call an uninstalled binary — and the run loses the property that every step is auditable and reproducible.
8
Quick check
Three quick questions to make the rules stick. Pick an answer; you'll see immediately whether it holds.
Q1Before delegating a unit to kimi, the orchestrator runs command -v kimi and gets nothing back. What happens?
Q2Agent A just produced a unit. Who should validate it?
Q3Which Kimi invocation is the supported headless one?
Answered 0 / 3 · 0 correct
Your agent is your teacher. Want to see a real delegation — your orchestrator preflighting the agents on your machine and handing one unit to whichever passed? Ask it to run the preflight and dispatch a single bounded unit, then validate it with a different agent. Next, we keep more than one assistant in play at the same time and judge their answers: fusion: a panel to judge.