Step 6 · Multi-agente · Multi-agente · Loop Engineering ENPT
Multi-agente · Step 6 · How one model commands a crew

Cross-agent delegation via cli -p

A loop run does not have to be done by a single AI. One model — the orchestrator — can hand out work to a whole crew of other AI assistants, one small job at a time, by talking to each one through its command line. This lesson is about how that handoff actually works: who is on the crew, how the orchestrator checks an assistant is even available before trusting it, the exact commands it uses to call each one, and the one safety rule that makes the whole thing reliable.

Read the plain version, or open the technical layer on any section.
1

The big idea: one model running a crew


So far the loop has had one worker doing every pass. But there are many capable AI assistants, and each is better at some things than others. The harness lets one of them act as a foreman — we call it the orchestrator — and hand out the actual work to the others, one small, well-defined job at a time.

The way the foreman talks to each assistant is delightfully ordinary: it runs the assistant's command-line program, the same kind of typed command you'd run in a terminal. Every one of these assistants ships a "headless" mode — a way to ask it a single question and get a single answer back, with no chat window, no clicking, just text in and text out. The foreman pipes a task in, reads the result, and moves on to the next job. That is the whole trick: delegation is just the orchestrator running another assistant's command and reading what comes back.

Three things have to be true for this to be safe, and they map onto three kinds of participant. A person decides, at the very start, which assistants are even allowed on the crew — that named list is called the roster. The orchestrator (itself an AI) checks each assistant is actually installed on this machine before trusting it, picks the right command to call it, and never lets the same assistant both build a thing and be the one to sign off on it. And the assistants themselves just do the one job they're handed and return the result. Nobody has to babysit the screen while this happens — that hands-off discipline came from the previous lesson.

Think of it like… a head chef during a dinner rush. The chef doesn't cook every plate personally. They call out one order at a time — "table four, the salmon" — to whichever cook owns that station, and they only call to stations that are actually staffed tonight. Crucially, the chef has a second person taste the dish before it leaves the kitchen — never the cook who made it, because the maker is the worst judge of their own work. Where the analogy bends: in our kitchen the chef is also one of the cooks, and the "calling out" is literally typing a command and reading the reply.

What "delegate via cli -p" means precisely

The orchestrator is a top-tier model driving the run. To delegate, it shells out to another agent's command-line interface in non-interactive mode — conventionally the -p ("prompt") flag or an exec subcommand — passing one bounded unit of work as the prompt and capturing stdout. There is no shared memory and no live session: each call is a fresh, stateless invocation, which is exactly what makes a unit bounded. The contract is "here is one job and the context it needs; return the result".

One unit at a time, on purpose

Delegation is serial per unit, not a free-for-all. The orchestrator hands out a single unit, reads the result, verifies it at the real boundary (the Proof Gate from lesson 3), and only then dispatches the next. That keeps every step auditable in LOOP-LOG.md and prevents two agents from racing on the same artifact. The roster can be heterogeneous — Claude, Codex, Kimi, Grok, GLM, Minimax — and the orchestrator is deliberately agnostic: it picks per unit, not once for the whole run.

2

Delegation in one picture


Here is the whole handoff as a single flow. A person sets the roster; the orchestrator takes one unit, checks the chosen assistant is installed, runs its command, and reads the result back; then a different assistant validates it. If the assistant isn't installed, the orchestrator simply skips it — it never calls a command that isn't there.

Person declares the roster Orchestrator takes ONE unit Agent installed? Invoke cli -p prompt in → text out Gated — skipped not on this machine Validator never the builder Result stdout yes no checked by a different agent
Read left → right. The solid path is a successful delegation; the dashed red path is an agent that isn't installed, gated out before any command runs.
person sets the roster orchestrator picks per unit preflight before trust one unit at a time validator ≠ builder
3

The four layers, step by step


Delegation stacks up in four layers, and it's easiest to feel them by walking through them one at a time. Use the rail below: each stop is one layer of the handoff, from a person naming the crew, to the orchestrator's safety check, to the actual command, to the independent sign-off. Click a tab, or step through with Next.

loop-engineering · one bounded unit · orchestrator → installed agent
Hand one unit to the right assistant, safely
AFK · human on observability 4 layers roster: claude · codex · kimi · grok · glm · minimax

A person declares the crew — before the run starts.

The orchestrator does not get to invent its own helpers. At the very start of a run, a human writes down the Agents roster: the named set of assistants this run may delegate to. The default is agnostic — any capable assistant can be on it — but the list is explicit, so you always know who could touch your work.

  • claudeAnthropic's CLI. Strong all-rounder; often the orchestrator itself.installed
  • codexOpenAI's coding agent, run head-less with exec.installed
  • kimi · grokAdditional builders, each with its own headless flags.installed
  • glm · minimaxReached through a local proxy (cliproxyapi) rather than a native CLI.installed
  • pi · agyOn the roster by name, but not installed on this machine — so they must be gated, never called.gated
Layer 1 of 4 · Roster

Why each flag is what it is

--output-format json (Claude) gives a structured envelope the orchestrator can parse without scraping prose. codex exec --quiet suppresses progress chatter so stdout is just the answer. kimi -p … --output-format text is the supported headless path — combining -p with --yolo is unsupported and --work-dir is rejected, so neither is used. grok -p … --output-format plain --always-approve --cwd "$PWD" runs non-interactively, auto-approves its own tool calls (there's no human to click), and pins the working directory explicitly.

GLM and Minimax have no native CLI

They are reached through cliproxyapi, a local process that exposes them on an OpenAI-compatible HTTP endpoint. The orchestrator calls that endpoint instead of a binary, but the contract is identical: one bounded unit in, one result out, validated by someone else.

Gating absent agents

pi and agy appear in the roster for portability, but on a box where command -v can't find them they are excluded from the candidate set for every unit. Gating is silent and per-machine: the same roster file works everywhere, and each host simply delegates to whatever subset is actually installed.

4

Preflight: never call a command that isn't there


This layer is worth its own beat because it's the difference between a robust crew and a fragile one. The orchestrator treats "is this assistant installed?" as a question to be answered by the machine, not assumed. If the answer is no, that assistant is gated — dropped from the running for this unit — and the work goes to someone who's actually present. It's the same anti-assumption habit from LEARN, applied to the crew itself.

Run the preflight by hand

From your project directory, ask the shell which agents exist. command -v prints the resolved path for each binary it finds and stays silent for the rest, so the ones that print are your installed candidates.

terminal — list which agents are installed right now
# run from anywhere; prints a line per agent
for a in claude codex kimi grok pi agy; do
  if command -v "$a" >/dev/null 2>&1; then
    echo "available: $a"
  else
    echo "gated:     $a"   # not installed → never invoked
  fi
done

If the harness ships detect_panel.sh, sourcing it does the same sweep and leaves the result in PANELISTS. Either way, the orchestrator only ever delegates to a name that survived this check.

5

The proven invocations, side by side


Once an assistant has passed preflight, the orchestrator calls it with the one command known to work head-less for that tool. You met these in the demo; here they are collected as a reference, with the gotchas spelled out. The point of the plain layer: each assistant is called differently, the flags are not interchangeable, and a couple of combinations are off-limits.

cli -p reference — copy verbatim, flags are load-bearing
claude  -p "<unit>" --output-format json
codex   exec --quiet "<unit>"
kimi    -p "<unit>" --output-format text        # never -p + --yolo; no --work-dir
grok    -p "<unit>" --output-format plain --always-approve --cwd "$PWD"
# glm / minimax → via cliproxyapi (OpenAI-compatible endpoint, no native CLI)

Reading a result back

Because the calls are stateless, the orchestrator captures stdout per invocation and parses it according to the format it asked for — JSON for Claude, plain text for the others. There's no follow-up turn; if more is needed, that's a new bounded unit and a fresh call. This is what keeps each delegation independently auditable.

The two Kimi traps

Two specific mistakes break Kimi in headless mode: pairing -p with --yolo, and passing --work-dir. The supported form is exactly kimi -p "<unit>" --output-format text. The harness encodes that so a delegating orchestrator can't fall into either trap.

6

Why the Validator is never the builder


If you remember one rule from this lesson, make it this one. The assistant that produced a unit is the worst judge of whether it's correct — it's already convinced. So the harness never lets a builder sign off on its own work. Validation always goes to a different assistant from the roster, who checks the result against reality: run the test, call the endpoint, read the file. Only then does the unit count as done.

This is the deepest reason to keep a mixed crew. With several assistants available, there is always a second one who didn't write the code and can look at it fresh. The orchestrator routes the check away from the author automatically — and the verdict still has to be real evidence (the Proof Gate), recorded where the human can read it.

FORBIDDEN agent A builds agent A validates REQUIRED agent A builds agent B validates real boundary · Proof Gate LOOP-LOG.md
Top: a builder grading itself — blind to its own bugs. Bottom: a different agent validates against real evidence, then it's logged.

The one rule

Builder and validator are always two different agents. A maker cannot certify its own work — the second pair of eyes is the whole point of having a crew.

7

Who does what: a person, the orchestrator, the assistants


The whole mechanism comes down to three responsibilities, and keeping them straight is what makes a multi-agent run safe rather than chaotic. They are not three phases you do in order — they're three roles acting at once.

A person

Declares the Agents roster at the start of the run — the named set of assistants that may be delegated to. After that, they stay on observability (reading LOOP-LOG.md), not in the doing.

The orchestrator (an LLM)

Preflights each agent, picks the right cli -p command, hands out one bounded unit at a time, and routes validation so the Validator is never the builder.

The assistants (agents)

Each runs the single job it's given in a fresh, stateless call and returns the result. Ones not installed here — pi, agy — are gated and never called.

Why the split matters

The human owning the roster bounds who can touch the work; the orchestrator owning dispatch bounds how (one unit, right command, independent validation); the agents owning execution keep each call stateless and replaceable. Break any one — let the orchestrator invent helpers, or let a builder self-validate, or call an uninstalled binary — and the run loses the property that every step is auditable and reproducible.

8

Quick check


Three quick questions to make the rules stick. Pick an answer; you'll see immediately whether it holds.

Q1Before delegating a unit to kimi, the orchestrator runs command -v kimi and gets nothing back. What happens?

Q2Agent A just produced a unit. Who should validate it?

Q3Which Kimi invocation is the supported headless one?

Answered 0 / 3 · 0 correct
Your agent is your teacher. Want to see a real delegation — your orchestrator preflighting the agents on your machine and handing one unit to whichever passed? Ask it to run the preflight and dispatch a single bounded unit, then validate it with a different agent. Next, we keep more than one assistant in play at the same time and judge their answers: fusion: a panel to judge.