Multi-agente · Step 8 · The biggest calls, and the family that powers them

Council and the three-tier adapter family

Some decisions are too big to trust to a single voice. Council is the harness's full board of AI models — members with weights and tiers that debate, vote, and synthesize a verdict, with cost caps and a full audit trail. It is the heavyweight sibling of fusion. And underneath both sits one quiet piece of plumbing that this lesson also unpacks: the three-tier adapter family — bash, mjs, and TypeScript layers that all drive the same agent CLIs from one proven source of truth.

Read the plain version, or open the technical layer on any section.

The big idea: a board for the biggest calls

By now you have seen the loop hand one job to one assistant, and you have seen fusion ask a small panel the same question at once and have a judge merge the answers. Council is the next step up. It is a standing board of AI models — a configured group, each with a named role, a voting weight, and a cost budget — convened to decide the calls that are too important to leave to one voice.

The shape is deliberately like a real committee. Each member reads the question and the evidence, then casts a vote — GO, PIVOT, or NO-GO. The votes are weighted (a more trusted member counts for more), scored against the dimensions the board cares about, and folded by a synthesizer into one written verdict: a recommendation, how confident the board is, how much it agreed, and — crucially — the dissent that did not win, written down on purpose so it is never lost. The whole session retries if a member stumbles, stops if it blows its time or money budget, and is logged in full so a human can replay it later.

That last part is the point of Council, and what separates it from a quick panel: it is built for doctrine-level decisions — the rules and big bets a project lives by — and for acting as a heavyweight, consensus Validator when being confidently wrong would be very expensive. A person convenes the board for those moments. fusion is the everyday second opinion; Council is the one you call when the answer becomes policy.

Think of it like… a courtroom, not a hallway chat. fusion is asking three sharp colleagues in passing and taking the sense of the room. Council is a full panel of judges: each has standing and weight, each writes their opinion, the majority forms a ruling, and the dissents are recorded in the official record — because tomorrow someone will need to know not just what was decided but why, and who disagreed. Where the analogy bends: here the judges are AI models, the "ruling" is a JSON report plus a memo, and the entire hearing can be re-run from its transcript.

What a board is made of

A board is defined by three kinds of file. members/<id>/member.yaml declares each model on a five-axis contract — model (provider + id + fallback), the adapter key that knows how to call it, a voting.weight from 0 to 2, a cost_policy (per-turn and per-run caps), and the tools and filesystem scope it is allowed. boards/<id>/board.yaml wires those members into roles, sets thresholds (the score cut-offs for GO / PIVOT), declares tension_pairs (the dialectics the synthesizer must explicitly test — "speed vs diligence", "user value now vs market shift"), and pins time and budget constraints. A workflows/<id>.yaml sets the phase order — how many rounds, who synthesizes, who verifies.

Vote → score → synthesis → verdict

Each run emits a decision-report.json on a fixed schema: a vote_tracker with every member's round-1 and round-2 position (and whether they shifted and why), a weighted_score, a recommendation (GO | PIVOT | NO-GO), a confidence and a consensus level, the top_tensions with their resolution, and a dissent_preserved[] array. An observer member — the verifier — does not vote; it decomposes the claims into atomic, tool-checkable units and grades them on a five-level ladder (PERFECT → VERIFIED → PARTIAL → FEEDBACK → FAILED). "A false PERFECT is worse than an honest PARTIAL", so the verifier grades conservatively, and an unresolved tension downgrades confidence by a notch.

Tiers decide how heavy the hearing is

Every change is classified T1–T4 (a tier-classifier.mjs suggests one from a description). A low tier runs a quick, cheap workflow; a high tier convenes the full board for more rounds. The board only ends a deliberation once it is past its minimum time and budget, and the runner aborts converse() in code — not by polite request — the moment it hits the maximum. Cost and audit are first-class: cost_usd, duration_ms, and an append-only activity.jsonl are part of every run.

How a board turns voices into a verdict

Here is the whole hearing in one picture. The same question goes to every member at once. Each returns a vote. The votes are weighted and scored, a synthesizer writes them up into a single verdict, and a separate verifier checks the claims before anything is signed. Notice that the verifier is an observer — it never votes on the outcome it is checking.

One brief → weighted votes → a synthesized verdict, with a non-voting verifier checking the claims. Dissent is written into the record, not discarded.

Members carry weight and a budget

The diagram's three members map onto real member.yaml roles. Each declares a voting.weight (0–2), a default_position, whether it may_abstain, and whether it is_observer (the verifier sets this true, so it never votes). Below is the shape of a board's roster — display values, not a config dump.

advocate

claude-cli

weight 1.5 · cap $0.40/turn

skeptic

codex-cli

weight 1.0 · cap $0.40/turn

architect

kimi-cli

weight 1.2 · cap $0.30/turn

The synthesizer must attempt to resolve at least one declared tension_pair per run, or mark it unresolved in top_tensions[] — and an unresolved tension knocks confidence down a level. That is how the board is forced to face its hardest disagreement rather than paper over it.

fusion or Council? Choose by the stakes

Council is more powerful than fusion, but power has a price: more models, more rounds, more time, more money. So you do not reach for it by default. The honest question is how expensive it would be to be wrong. If a mistake costs a few minutes to redo, a single check — or a quick fusion panel — is plenty. If a mistake would set a rule the whole project follows, or send a long autonomous run down the wrong road, that is when you convene the board.

An AI working inside the harness makes this same call constantly, and the rule it follows is exactly that: pick the lightest tool that fits the stakes. A routine unit gets one assistant. A risky or genuinely ambiguous fork gets fusion — a blind panel whose agreement is the strongest cheap signal you can get. A doctrine-level decision, or a verdict that must stand up to scrutiny later, gets Council. Spending a board's budget on a trivial question is as much a mistake as rubber-stamping a doctrine change with a single opinion.

Think of it like… deciding how serious a medical visit needs to be. A scrape is a bandage — handle it yourself. Something nagging gets a quick word with a couple of doctors who happen to be on shift. But a life-changing diagnosis goes to a full case conference where every specialist weighs in and the decision is documented. You match the ceremony to the cost of getting it wrong — not to how interesting the question feels.

The decision in the harness's own words

The loop's reference draws the line at the Proof Gate. A normal unit is checked once. "When a unit is risky or genuinely ambiguous and being confidently wrong is expensive", the Proof Gate runs as a fusion panel→judge — N blind panelists across the live roster, then a judge into consensus / contradictions / blind-spots and a grounded verdict (and for code it runs both candidates and merges the verified result). Council is the tier above that: the same panel→judge muscle, but standing, weighted, tiered, budgeted and audited — for adjudicating doctrine and for a consensus Validator on the highest-stakes work.

Tier classification drives it

Council's own tier-classifier.mjs --change "<description>" emits {tier, board, workflow, reason}. T1/T2 changes run light; T3/T4 — doctrine and architecture — convene the full board for more rounds. The classifier is the codified version of "choose by the stakes": the size of the change picks the weight of the hearing.

Walk the choice, one step at a time

Pick a situation, then press Next to walk the decision the harness makes. Each diamond is a yes/no question about the stakes; follow where a "no" peels off and watch which tool the path lands on — one assistant, a fusion panel, or the full Council board.

Trace situation:

Read top → bottom. Two questions about the stakes route the work to the lightest reviewer that fits. (The center landing shows "one assistant" or "fusion panel" depending on the path.)

Step 1 of 3

Start here

A unit comes up for review

Press Next to follow the routine unit through the choice. Switch the situation above to see how a riskier call routes differently.

The same forks, as guard clauses

The flowchart is just two early-return checks: cheap by default, escalate only when the stakes demand it. The order matters — you ask "is being wrong expensive?" before you ask "is this doctrine?", because most units fail the first check and never reach the second.

function pickReviewer(unit) {
  if (!unit.wrongIsExpensive) return 'one-assistant';   // routine → single check
  if (!unit.isDoctrine && !unit.mustBeAudited)
    return 'fusion';                                  // risky/ambiguous → blind panel
  return 'council';                                    // doctrine / audited → the board
}

One adapter family, three tiers

Step back from Council for a moment, because there is a second idea hiding under it — and under fusion too. Every one of these systems has to do the same humble thing: actually call an AI model's command-line program and read what comes back. That little piece of plumbing is called an adapter. And in this suite there is not one adapter but a family of them, in three layers, all driving the very same agent CLIs.

Why three? Because three different parts of the suite need to make the same call with different amounts of armor. The bash tier is what fusion uses — quick shell scripts that run a CLI and grab its output. The mjs tier is what Council uses — the same call, now wrapped with retries, cost tracking and an audit log, returning a tidy result object. The TypeScript tier lives in a product codebase and adds the heaviest armor of all — strict validation at the edges, a guarantee that it never crashes, a "circuit breaker" that stops hammering a failing service, and tests. Same destination, three thicknesses of safety belt.

The thing to hold onto: they all dial the same number. The exact, proven way to invoke each CLI — which flags, in which order — is shared across all three tiers. So when you learn how the harness calls Kimi or Codex once, you have learned it everywhere.

Think of it like… three vehicles built on one engine. A go-kart (bash) is light and fast and has almost no safety gear. A family car (mjs) adds seatbelts, airbags and a service light. A racing car with full telemetry (TypeScript) adds a roll cage, a kill switch and a pit crew watching every reading. Different jobs, different protection — but the same engine block, and if you retune the engine you retune all three.

bash

fusion's run_*.sh — runs a CLI, captures stdout. Adds a perl timeout (macOS has no timeout), a pty for TTY-less CLIs, anti-empty guards, and a throwaway workdir.

mjs

Council's adapters/*.mjs — invoke(opts) → Result. Adds retry/backoff, cost accounting, an audit log, and a structured success/failure result.

Alembic's local-cli.ts — ModelAdapter.run(input) → ModelRunResult. Adds a Zod-validated boundary, a never-throws invariant, a circuit breaker, and tests.

The three tiers, precisely

bash — ~/.claude/skills/fusion/scripts/run_*.sh: shell, captures stdout; adds the perl timeout helper (no gtimeout on stock macOS), a pty for CLIs that emit nothing without a TTY, anti-empty retries, and a throwaway copy of the workdir so a panelist's writes never touch your checkout.

mjs — scripts/council/adapters/*.mjs: each exports invoke({ memberId, prompt, system, model, maxTokens, runId, timeoutMs, cwd, signal }) → Result. A success Result is { success:true, response, cost_usd, latency_ms, model_version_used, raw }; a failure is { success:false, error, retryable, status_code?, reason?, latency_ms }. A shared _lib.mjs provides withRetry, isTransient, writeAudit, estimateCost and runSubprocess.

ts — appfy/alembic/packages/adapters/src/local-cli.ts: a ModelAdapter.run(input) → ModelRunResult on a Zod-validated boundary. "The whole call is wrapped so it NEVER throws: spawn errors, non-zero exits, and timeouts all become classified failures." A circuit breaker stops calling a service that keeps failing, and the behavior is covered by tests.

One source of truth for the proven invocations

Here is why the family matters, told as a small disaster. The exact way to call a CLI is fiddly — Kimi, for instance, refuses some flag combinations and silently fails on others. Once, that exact invocation drifted: someone tweaked it in Council's adapter and, separately, in fusion's table, and the two slowly fell out of step. The fix was a rule the whole suite now lives by: there is one source of truth for the proven headless invocations, and all three tiers must stay in step with it.

This is a perfect, tiny example of the loop's anti-assumption discipline applied to the plumbing. You do not remember how to call a CLI and hope it still works — flags drift between versions. You re-prove it against the live CLI, and when you change one tier you update the other two in the same breath. That is a job for an agent: when a flag drifts, re-prove it at the real boundary, then propagate the fix across bash, mjs and TypeScript so they never disagree again.

Think of it like… a recipe pinned in a shared kitchen. Three cooks work from it. The day one cook scribbles a change on their own copy, the dishes stop matching. The rule that saves the kitchen: there is one master recipe on the wall, every copy is checked against it, and you only change it after you have actually cooked the new version and it worked.

The shared invocations, verbatim

These are the proven headless calls — verified at the real boundary 2026-06-16, mirrored across fusion's run_*.sh, Council's *.mjs, and the cross-agent-headless-cli-flags memory. Do not "improve" them without re-proving against the live CLI.

scripts/council/adapters/README.md — the shared source of truth

# codex — default path is cliproxyapi /v1/chat/completions (clean text + tokens)
#         subprocess path (COUNCIL_CODEX_USE_SUBPROCESS=1):
codex exec --model <m> --quiet <prompt>

# kimi — run IN the cwd. Rejects -p + --yolo; has NO --work-dir/--print/--quiet
kimi -p <prompt> --output-format text

# grok — -p is --single; --output-format is plain|json|streaming-json (NO "text")
grok -p <prompt> --output-format plain --always-approve --cwd <cwd>

The rule, when you touch an adapter

The README spells out the discipline: (1) re-prove the invocation against the live CLI by running the adapter's built-in smoke — flags drift between versions; (2) update the matching run_*.sh (fusion) and the cross-agent-headless-cli-flags memory so the three tiers stay identical; (3) remember the cross-platform gap — macOS has no timeout/gtimeout, so each tier handles it its own way (perl helper in bash, runSubprocess's timeoutMs in mjs, withRetry's policy in ts).

terminal — run an adapter's smoke to re-prove it

# each adapter ships a `node <adapter>.mjs` smoke at the bottom of the file
cd ~/Documents/Resources/scripts/council/adapters
node codex-cli.mjs    # prints a live Result — proof, not memory
node kimi-cli.mjs

In the code: an mjs adapter's Result

To make the middle tier concrete, here is the shape of a Council adapter. The detail that matters even at a glance: it never just returns text. It returns a Result — either a success carrying the answer, its cost and how long it took, or a failure that says what went wrong and whether it is worth retrying. That structure is what lets a board track its budget and write an honest audit trail.

scripts/council/adapters/codex-cli.mjs — invoke() returns a Result

export async function invoke({
  memberId, prompt, system = '', model = 'gpt-5.5',
  maxTokens = 128000, reasoningEffort = 'high',
  runId = null, timeoutMs = 300_000,
} = {}) {
  if (!prompt) return { success: false, error: 'prompt is required', retryable: false };

  if (process.env.COUNCIL_CODEX_USE_SUBPROCESS === '1') {       // optional CLI path
    const bin = CODEX_BIN_CANDIDATES.find(p => existsSync(p));
    if (bin) {
      const r = await withRetry(() => invokeSubprocess({ bin, prompt, system, model, timeoutMs }),
        { adapter: 'codex-cli:exec', model, runId, maxAttempts: 2 });
      if (r.success) return r;
    }
  }
  // default: route via cliproxyapi, with retry/backoff
  return withRetry(() => invokeProxy({ prompt, system, model, maxTokens, reasoningEffort, timeoutMs }),
    { adapter: 'codex-cli', model, runId, maxAttempts: 3 });
}

Where it lives and how to look

The adapters sit in scripts/council/adapters/ (a vendored copy also ships under loop-engineering-suite/vendor/council/engine/adapters/). Open the file and read the top-of-file comment block — each adapter documents why it chose HTTP vs subprocess and which flags are load-bearing. The success/failure Result contract is defined in _lib.mjs.

terminal — read the adapter and its shared contract

cd ~/Documents/Resources/scripts/council/adapters
# the Result helpers + retry/cost/audit live here:
sed -n '1,30p' _lib.mjs
# the README is the one source of truth for the invocations:
sed -n '1,40p' README.md

Notice withRetry(..., { adapter, model, runId, maxAttempts }) wrapping both paths: transient failures (a 5xx, a 429, a timeout) are retried with backoff; a hard failure (missing prompt, missing key) is returned immediately with retryable:false. That is the mjs tier's armor — the same call as bash, now safe to run unattended inside a board.

Who does what: humans, LLMs, agents

You have now met both ideas in this lesson — the board and the adapter family — so here is how the three kinds of participant use them, pulled together in one place. The split is the same one that runs through the whole harness.

A human convenes a board for the biggest calls — a doctrine change, a bet the project will live with — and then steps back to observability: they read the verdict, the confidence, and the preserved dissent in the audit trail, but they do not sit in the deliberation. An LLM working in the loop chooses the right instrument for the stakes: one assistant for routine work, fusion for a risky or ambiguous fork, Council when the answer becomes policy or must withstand scrutiny later. And an agent maintaining the suite guards the plumbing: when a CLI flag drifts, it re-proves the invocation at the real boundary and updates all three tiers — bash, mjs and TypeScript — so they never disagree.

Think of it like… a company's governance. The board members (humans) call the big votes and read the minutes afterward. The duty manager (the LLM) decides, hour to hour, which decisions need a quick huddle and which need the full board. And the facilities team (the agent) keeps the building's wiring to code — re-checking it after every change so nothing fails mid-meeting.

The one thing to remember

Council is fusion grown up: a weighted, tiered, audited board for doctrine-level calls and a consensus Validator. Both ride one adapter family — bash, mjs, TypeScript — that dials the same CLIs from a single proven source of truth. Match the ceremony to the stakes; keep the three tiers in step.

Human → convene the board, then observe LLM → fusion vs Council by stakes Agent → flag drifts → re-prove → update all three tiers

Quick check

Three quick questions. Pick an answer to see whether it lands — and why.

Q1When should you convene Council instead of a quick fusion panel?

Q2What does a Council member's voting.weight do?

Q3A CLI flag drifts. What does a maintaining agent do?

Answered 0 / 3 · correct 0

Your agent is your teacher. Want to see a real board's decision-report.json, or watch an adapter's smoke print a live Result? Ask it to open scripts/council/adapters/README.md and run node codex-cli.mjs with you. Next up — the everyday toolbelt the loop reaches for: Bright Data, Computer Use, and ultragoal.