Multi-agente · Step 7 · One question, many minds, one verdict

fusion: panel to judge

Sometimes one assistant's answer is not enough — the stakes are high, or two good options disagree. Fusion asks the very same question to several assistants at once, each working blind to the others, and then has one model read all the answers and write a single grounded verdict: what they agree on, where they clash, and what only one of them saw. It is the loop's way of getting a high-confidence answer when one voice is too risky to trust alone.

Read the plain version, or open the technical layer on any section.

The big idea: ask many, judge once

Most of this course has one assistant doing the work, with a second one checking it. Fusion is for the moments when that is not enough — when you want a high-confidence answer to a hard question, or two good options are pulling in opposite directions and someone has to settle it.

The move is simple to say. Take your question and send the exact same question to several assistants at the same time — a panel. Each one answers blind: it never sees what the others wrote, so nobody can copy, defer to a louder voice, or drift toward the group. Then one capable model — the judge — reads every answer side by side and writes a single verdict: where the panel agrees, where it openly disagrees, what only one member noticed, and a final answer that rests on all of it.

Why bother fanning out at all? Because independent answers fail in independent ways. If four assistants reach the same conclusion on their own, that agreement is real evidence. If they split, the split itself is the finding — it tells you exactly where the question is genuinely hard, instead of one assistant papering over the doubt with a confident-sounding paragraph. You get both the answer and a map of how trustworthy it is.

One discipline holds the whole thing together, and it is the heart of this lesson: independence first, synthesis second. The panelists are not handed contrived roles ("you be the optimist, you be the skeptic") — that just manufactures disagreement that was never there. Each one simply answers the real question as well as it can, on its own. The judge only steps in after all the independent answers are on the table.

Think of it like… a panel of doctors reading the same scan in separate rooms. None of them hears the others' opinion first, so no one is swayed by the most senior voice or the loudest one. Then a lead physician collects all the written reads: where every doctor agrees, that is solid; where two disagree, that is flagged for a closer look; if one spotted something the rest missed, that gets weighed too. The final call is grounded in the whole panel — not in whoever happened to speak first. Where the analogy breaks: the doctors are people with schedules, while fusion's panelists answer in parallel in seconds, so asking five instead of one costs you almost no extra time.

The panel, concretely

A full fusion run fans one prompt to N panelists in parallel — in the default roster that is Opus 4.8 alongside codex (GPT-5.5), kimi, grok, and agy (Gemini 3.1 Pro). Each runs as its own process with no shared context, so the answers are genuinely independent. When every panelist has returned, Opus 4.8 takes the judge seat and adjudicates the set into one structured verdict.

Independence-then-synthesis, not contrived lenses

The panelists are not assigned opposing personas. Manufactured "devil's advocate" roles produce disagreement that is an artifact of the prompt, not a signal about the question. Fusion keeps each answer honest and independent, then lets real overlaps and real conflicts emerge under the judge. Agreement that survives independence means something; disagreement that survives it tells you exactly where to dig.

Why this beats one strong answer

A single model — however good — has one failure mode at a time: one blind spot, one bias, one stale fact. Fanning out trades that single point of failure for a distribution. The judge's grounded final is anchored in what the panel collectively saw, and its blind-spots note flags what none of them covered, so you also learn the limits of the answer.

Fusion in one picture

The whole shape fits on one line of flow. One question fans out to a panel that answers in parallel and blind; all the answers fan back in to a single judge; the judge writes one grounded verdict. Fan-out, then fan-in.

Read left → right: one prompt fans out to a blind panel, the answers fan back in to one judge, the judge emits a single grounded verdict.

same prompt to all each panelist blind answers in parallel one judge synthesizes grounded final

Blind first, judged second

The order is everything. Fusion runs in two clean phases, and the value comes from keeping them apart.

Phase one — independence. The same question goes to every panelist, and each answers alone. No panelist sees another's draft. There are no assigned personas, no "you argue the other side" — every member just answers the real question as well as it can. This is what makes the result trustworthy later: when answers line up, it is because they independently arrived at the same place, not because they copied each other.

Phase two — synthesis. Only once all the blind answers are in does the judge read them together and reconcile them into one verdict. The judge is not a sixth opinion thrown onto the pile; its job is to compare — to find the overlaps, name the conflicts, and decide what the grounded answer actually is.

Why not let them talk to each other and converge on their own? Because that destroys the independence you paid for. The moment one assistant sees another's confident answer, it tends to anchor on it, and five voices quietly collapse into one. Keeping them blind preserves five genuinely separate reads — which is the whole point of asking more than one.

The rule that makes fusion work

Independence first, synthesis second. Real agreement is earned by answering blind; manufactured disagreement (contrived roles) is worthless. Let overlaps and conflicts emerge on their own, then judge them.

Process isolation, not a polite instruction

Each panelist runs in its own invocation with its own context window. Blindness is a property of how they are launched — separate processes, same prompt, no shared transcript — not a sentence in the prompt asking them to "please ignore the others". That is why fusion fans out in parallel rather than running a round-table: parallelism and isolation are the same mechanism here.

What the judge is actually given

The judge receives the original question plus the full set of panelist answers, labeled by source. It is told to synthesize — not to re-answer from scratch. Its output is structured (the categories in the next section), so a reader can see the shape of the agreement, not just a final paragraph. The judge is also expected to ground the final: where a claim needs an external fact, it is confirmed (via the Bright Data CLI, lesson 11), never asserted from memory.

What the judge produces

The judge does not just pick a favourite answer. It sorts the panel's reads into a small, fixed set of buckets, and then writes a final answer that sits on top of them. Those buckets are what make a fusion verdict more useful than any single reply — they show you the structure of the answer.

Consensus — what every panelist agreed on, independently. This is the bedrock; treat it as the most reliable part of the answer.
Contradictions — where panelists openly disagree. The judge does not hide these; a real conflict is a signal that this part of the question is genuinely hard.
Partial — points some members raised and others simply did not address — agreement that is suggestive but not unanimous.
Unique — something only one panelist saw. Often the most valuable line in the whole run: the insight you would have missed by asking just one assistant.
Blind spots — what none of them covered. The judge names the gap so you know the limits of the answer, not just its content.
Grounded final — the single answer, built from all of the above, with any external facts confirmed rather than assumed.

The judge sorts the blind answers into five buckets, then writes one grounded final on top of them.

How to actually use each bucket

Consensus is your safe ground — act on it with confidence. Contradictions are where you slow down: the panel is telling you this sub-question is unsettled, so either gather more evidence or make the trade-off explicitly. Unique insights deserve a second look precisely because they survived only one mind — they are easy to dismiss and often the most valuable. Blind spots protect you from a confident answer that quietly skipped something important.

Why a structured verdict beats a single paragraph

A lone answer flattens certainty and doubt into one tone of voice. The bucketed verdict keeps them separate, so you can see which parts of the answer to trust and which to probe. That is the real product of fusion: not just an answer, but a calibrated answer.

For code: run both, merge the verified result

When the question is "answer this", fusion judges words. When the question is "write this code", fusion does something stronger: it does not just read the two candidate solutions and guess which is better — it runs both at the real boundary (the tests, the build, the actual output) and merges the result that actually worked. The verdict is grounded in execution, not in how confident either candidate sounded.

This is the same Proof-Gate discipline from earlier in the course, applied to a panel: a candidate's merit is decided by whether it passes the real check, not by the prose around it. Two strong drafts go in; one verified, working result comes out.

$ how a code fusion is invoked (full panel)

# Ask the same coding task to the panel, judged by Opus 4.8
/fusion-3 "Implement retry-with-backoff for the upload client; keep the public API."

# Fusion fans the prompt to the live roster, then for CODE candidates it
# RUNS both and keeps the one the boundary accepts:
for cand in candidate_a candidate_b; do
  apply "$cand" && run_tests            # the real boundary, not a claim
done
# → judge merges the VERIFIED result into one grounded patch

The human entry point

You invoke a full panel with /fusion-3 — Opus 4.8 + GPT-5.5 + Gemini 3.1 Pro in parallel, judged by Opus 4.8. There are lighter single-seat variants too (/fusion-opus4.8, /fusion-gpt5.5) when you want one specific model rather than the whole board. For prose, the judge returns the bucketed verdict; for code, it returns a merged patch whose pieces have each been run.

Why "run both" and not "pick the better-looking one"

Two candidates can both look correct and only one compiles, passes the edge-case test, or returns the right bytes. Reading them cannot tell you which — only the boundary can. Fusion therefore treats each candidate as a hypothesis to be executed, and the merge is assembled from the parts that actually passed. That is what "grounded" means for code: verified, never asserted.

Where this sits in the loop

A code fusion is most useful at a hard fork inside EXECUTE, or as a Validator at the Proof Gate when a single agent's patch is too risky to trust on its own. The independence of the candidates plus the execution check is exactly the kind of evidence a high-stakes unit deserves.

Try it: assemble a verdict

See it move. Here is the same fusion run, frozen at the moment all four blind answers have landed and the judge is about to write the verdict. Switch between the panelists to read each independent answer, then open the judge's view to watch those answers sort into consensus, a contradiction, a unique catch, and a grounded final. The diagram on the left highlights whose answer you are reading.

The question on the table: "Should the upload client retry failed uploads, and if so how?"

Reading Opus's blind answer — one of four independent reads feeding the judge.

Blind panelist

Opus

Yes — retry, but only on transient network errors, with exponential backoff and a cap. Never retry a 4xx; that just repeats a client mistake.

+Retry on timeouts / 5xx, with backoff and a max attempt count.
+Make the upload idempotent so a retry can't double-create.
!Do not retry 4xx — the request itself is wrong.

Blind panelist

agy · Gemini 3.1 Pro

Yes — and honour the server's Retry-After header on a 429 instead of using your own backoff. The server is telling you exactly how long to wait.

+Backoff with a cap on transient errors.
★On 429, read Retry-After and wait that long — only one panelist raised this.
+Idempotency key so retries are safe.

Judge · Opus 4.8

Grounded verdict

All four answered blind and converged on the core, with one open conflict and one catch worth keeping. Here is how they sort:

ConsensusRetry only transient errors (timeouts / 5xx), with capped exponential backoff. Never retry a plain 4xx. All four, independently.

ContradictionHow to treat a 429: own-backoff vs special-case. Genuinely unsettled — flagged, not hidden.

Unique catchagy alone: on 429, honour the server's Retry-After. Resolves the contradiction — adopt it.

Grounded finalCapped backoff with jitter on transient errors; on 429 honour Retry-After; idempotent upload; public API unchanged. For code, both candidates were run against the tests and the passing pieces merged.

Anatomy

The controls are a role="tablist" of real <button> elements — four blind panelists plus the judge. Selecting one sets aria-selected="true", shows the matching role="tabpanel", lights the corresponding node in the inline SVG, and rewrites the figcaption so screen-reader and sighted users stay in sync. Arrow keys / Home / End move between roles (the WAI-ARIA tabs pattern).

What it is teaching

Notice the shape of the real result: the panelists overlap on the safe core (consensus), split on exactly one sub-question (the 429 contradiction), and one of them alone supplies the fact that resolves it (the unique catch). That is the everyday payoff of asking many and judging once — and you would have missed the Retry-After insight by asking a single assistant.

The live roster, and the bash tier

Who actually sits on the panel is not hard-coded. Before a run, a small script looks at this machine and asks: which of the candidate assistants are installed and reachable right now? Only those make the panel. If grok isn't set up on your box, fusion simply runs the panelists you do have — the roster is discovered, not assumed, the same look-before-you-leap discipline from the rest of the course.

Once the roster is picked, each panelist is launched the same humble way the crew was in the last lesson: a small shell script per assistant that takes the prompt, calls that assistant's command line, and hands back its answer. Same prompt in, one answer out, no shared context — that is what keeps them blind.

detect_panel.sh — pick the live roster, then run each panelist blind

# 1) detect_panel.sh: keep only the assistants actually installed here
panel=$(detect_panel.sh)          # e.g. "opus codex kimi grok agy"

# 2) run_*.sh: one bash-tier wrapper per assistant — same prompt, blind
for member in $panel; do
  run_${member}.sh "$PROMPT" > "answers/${member}.txt" &   # parallel · isolated
done; wait

# 3) the judge (Opus 4.8) reads answers/*.txt and writes the verdict

detect_panel.sh — the roster picker

detect_panel.sh probes the machine for each candidate assistant (is its CLI on the PATH, does it authenticate) and emits the list of live members. That list is the panel for this run. Because it is detected per machine, the same /fusion-3 command adapts to whatever crew you actually have — no panelist is ever assumed present.

run_*.sh — the bash tier of the adapter family

The run_*.sh scripts are the shell tier of the same adapter family you will meet again with Council (lesson 8). Each one is a thin wrapper: take the prompt, invoke one assistant headlessly, return the text. Running them in parallel with isolated output files is exactly what makes the panel both fast and blind. The judge then consumes those files and synthesizes — it is handed the answers, never the live panelists.

Who reaches for fusion, and when

Three kinds of participant touch fusion, each for a different reason — and they map onto the three audiences this whole course keeps in view.

A person runs /fusion-3 when they want a high-confidence answer to a question that matters — a design call, a tricky bug, a "which of these two approaches is right" — and a single assistant's reply feels too thin to bet on. The bucketed verdict tells them not just the answer but how solid it is.

An orchestrator LLM reaches for fusion as a tool, not a chat. It uses a panel as a Validator at the Proof Gate when a unit is too important to sign off on one opinion, or to adjudicate a hard fork — two plausible paths, real disagreement — by getting independent reads and a grounded ruling instead of flipping a coin. Crucially, the model that judges should not be the lone author of what's being judged; fusion's independence is what makes it a fair check.

The agents — the assistants themselves — do the humble part. detect_panel.sh picks whichever of them are live on this machine, and the run_*.sh wrappers each take the prompt and return one blind answer. They don't coordinate and they don't see each other; they just answer well and hand the result back for the judge to weigh.

At the Proof Gate

When a unit's correctness is high-stakes, an orchestrator can route it to a fusion panel instead of a single validator. Independent reads plus — for code — actually running the candidates gives the gate far stronger evidence than one agent's say-so. The builder of the unit is never the sole judge of it; that separation is the same rule that protected the Proof Gate throughout the course.

Adjudicating a hard fork

When the loop hits a genuine fork — two defensible designs, a real disagreement between agents — fusion turns the standoff into data: who agrees, where the conflict actually is, and what the grounded ruling is. It replaces "argue until someone gives up" with "ask independently, then judge".

Recap

Fusion is the loop's high-confidence answer machine. Fan the same question to a panel of assistants in parallel, each blind so their answers stay independent; then one judge — Opus 4.8 — reads them all and writes a single grounded verdict: consensus, contradictions, partial agreement, unique catches, blind spots, and a final answer built on top. For code, it doesn't guess between candidates — it runs both and merges what actually passed.

Carry this forward

Independence first, synthesis second. Real agreement is earned by answering blind; real disagreement is a map of where the question is hard. Humans run /fusion-3 for a high-confidence answer; an orchestrator uses it as a Validator at the Proof Gate or to settle a hard fork; detect_panel.sh picks the live roster and the run_*.sh scripts are the bash tier that launches each panelist.

Your agent is your teacher. Want to feel the payoff? Ask your assistant to run /fusion-3 on a real decision you're chewing on and read the verdict bucket by bucket — notice where the panel agrees, where it splits, and whether anyone caught something the rest missed. Next, we widen the panel into a standing, configurable board with its own roles and weights: Council and the adapter family.

fusion: panel to judge

The big idea: ask many, judge once

The panel, concretely

Independence-then-synthesis, not contrived lenses

Why this beats one strong answer

Fusion in one picture

Blind first, judged second

Process isolation, not a polite instruction

What the judge is actually given

What the judge produces

How to actually use each bucket

Why a structured verdict beats a single paragraph

For code: run both, merge the verified result

The human entry point

Why "run both" and not "pick the better-looking one"

Where this sits in the loop

Try it: assemble a verdict

Opus

codex · GPT-5.5

grok & kimi

agy · Gemini 3.1 Pro

Grounded verdict

Anatomy

What it is teaching

The live roster, and the bash tier

detect_panel.sh — the roster picker

run_*.sh — the bash tier of the adapter family

Who reaches for fusion, and when

At the Proof Gate

Adjudicating a hard fork

Recap