Cross-agent delegation via cli -p

A loop run does not have to be done by a single AI. One model — the orchestrator — can hand out work to a whole crew of other AI assistants, one small job at a time, by talking to each one through its command line. This lesson is about how that handoff actually works: who is on the crew, how the orchestrator checks an assistant is even available before trusting it, the exact commands it uses to call each one, and the one safety rule that makes the whole thing reliable.

loop-engineering · one bounded unit · orchestrator → installed agent

Hand one unit to the right assistant, safely

AFK · human on observability 4 layers roster: claude · codex · kimi · grok · glm · minimax

A person declares the crew — before the run starts.

The orchestrator does not get to invent its own helpers. At the very start of a run, a human writes down the Agents roster: the named set of assistants this run may delegate to. The default is agnostic — any capable assistant can be on it — but the list is explicit, so you always know who could touch your work.

claudeAnthropic's CLI. Strong all-rounder; often the orchestrator itself.installed
codexOpenAI's coding agent, run head-less with exec.installed
kimi · grokAdditional builders, each with its own headless flags.installed
glm · minimaxReached through a local proxy (cliproxyapi) rather than a native CLI.installed
pi · agyOn the roster by name, but not installed on this machine — so they must be gated, never called.gated

The orchestrator checks an assistant is really there — before it trusts it.

A roster is a wish-list; being on it doesn't mean the program exists on this computer. So before delegating, the orchestrator runs a tiny preflight: it asks the shell "does this command exist?" with command -v, or it runs a small detector script that fills in a PANELISTS= list of who's actually available. An assistant that fails the check — like pi or agy here — is gated: quietly dropped from the candidates for this unit. No command is ever run for an assistant that isn't installed.

preflight — does the agent exist on this box?

# cheapest possible check: is the binary on PATH?
command -v claude codex kimi grok # prints the path of each that exists

# or let the harness build the available set for you
source detect_panel.sh
echo "$PANELISTS"           # e.g. "claude codex kimi grok glm minimax"
                            # pi / agy absent here → gated, never invoked

It runs the assistant's headless command and reads the answer.

Each assistant has its own exact, proven way to be called from a script. The flags matter: they force a single non-interactive answer, fix the output format so the orchestrator can parse it, and set the working directory. These are the invocations the harness actually uses — copy them verbatim.

the proven cli -p invocations (one bounded unit each)

# Claude — JSON so the result is machine-parseable
claude -p "<one bounded unit>" --output-format json

# Codex — exec subcommand, quiet for clean stdout
codex exec --quiet "<one bounded unit>"

# Kimi — plain text. NEVER combine -p with --yolo, and do NOT pass --work-dir
kimi -p "<one bounded unit>" --output-format text

# Grok — plain output, auto-approve tool calls, explicit working dir
grok -p "<one bounded unit>" --output-format plain --always-approve --cwd "$PWD"

# GLM / Minimax — no native CLI; routed through the local proxy
# (cliproxyapi exposes them on an OpenAI-compatible endpoint)

What the assistant does

Runs the one job in a fresh, stateless call and prints the result to stdout. No memory of other units.

What the orchestrator does

Builds the prompt, picks the command for the chosen agent, captures and parses the output.

A different assistant checks the work. Never the one that built it.

The rule that makes this trustworthy: the Validator is never the builder. Whoever produced the unit cannot be the one to sign it off — a maker is blind to its own mistakes and will happily call its own work correct. So validation is routed to a different agent from the roster, who checks the result against the real boundary (run the test, hit the endpoint, read the file).

This is why a heterogeneous roster is a feature, not just a convenience: with more than one assistant available, the orchestrator can always find a second pair of eyes that didn't write the code. The verdict still has to clear the Proof Gate — real evidence, never a claim — and it lands in LOOP-LOG.md for the human to audit later.

Layer 1 of 4 · Roster

# run from anywhere; prints a line per agent for a in claude codex kimi grok pi agy; do if command -v "$a" >/dev/null 2>&1; then echo "available: $a" else echo "gated: $a" # not installed → never invoked fi done

claude -p "<unit>" --output-format json codex exec --quiet "<unit>" kimi -p "<unit>" --output-format text # never -p + --yolo; no --work-dir grok -p "<unit>" --output-format plain --always-approve --cwd "$PWD" # glm / minimax → via cliproxyapi (OpenAI-compatible endpoint, no native CLI)

Cross-agent delegation via cli -p

The big idea: one model running a crew

What "delegate via cli -p" means precisely

One unit at a time, on purpose

Delegation in one picture

The four layers, step by step

What the assistant does

What the orchestrator does

Why each flag is what it is

GLM and Minimax have no native CLI

Gating absent agents

Preflight: never call a command that isn't there

Run the preflight by hand

The proven invocations, side by side

Reading a result back

The two Kimi traps

Why the Validator is never the builder

Who does what: a person, the orchestrator, the assistants

A person

The orchestrator (an LLM)

The assistants (agents)

Why the split matters

Quick check