Before any of the moving parts make sense, one idea has to land: the loop-engineering-suite is a harness — a verify-and-improve loop that drives any task to a measurable finish, then ships and publishes a course about it. It runs the same way no matter which AI model or command-line tool is doing the work, and three different audiences read it at once: the human who watches, the model that runs the method, and the agents that are the runtime.
A harness is the gear that lets you control something powerful safely — the rig on a climber, the frame around a test engine. The loop-engineering-suite is exactly that, but for getting work done with AI. You wrap it around a task — write some code, fix a bug, draft a prompt, produce a document — and it carries that task from a rough request all the way to a finished, double-checked result.
It is not one tool, and it is not tied to one AI. It is a way of working: a small machine with a single heartbeat. Every beat, it does four things in order. It looks at how things really stand right now. It picks one next move — not ten. It makes that one move. Then it proves the move worked by checking the real thing — running the actual test, opening the actual page — never by simply declaring success. Then it beats again.
The heartbeat keeps going until the task crosses a finish line you wrote down before starting: a plain, measurable done-when ("the test passes", "the page loads under one second"). The moment it crosses that line, the harness does two more things on its own — it delivers a self-contained visual course that explains what it built (a course exactly like the one you are reading), and it publishes that course so you get a link you can share.
That last habit is why this lesson exists. The whole suite is built to be understood, not just run — so the natural place to start is the idea that one disciplined loop, plus a written finish line, plus a course at the end, is the entire shape of the thing. Everything else in this course is detail hung on that frame.
Think of it like… a pit crew running a race car through repeated laps. Each lap: check the car, change exactly one thing — one tyre, not four guesses at once — send it back out, then read the lap time off the timing board to prove the change helped. Nobody trusts a mechanic who says "that felt faster"; they trust the clock. The race ends at a lap target set before the green flag. That is the harness: look, change one thing, measure on the real clock, repeat to a finish line you set in advance.
The suite is model- and agent-agnostic. The unit of work is one pass of a five-step cycle: LEARN → ANALYZE → EXECUTE one bounded unit → VERIFY at the real boundary → DECIDE (improve the artifact, or improve the prompt that drives it, then loop). The loop only stops when a written, measurable done-when is satisfied — never on a feeling that it is probably finished.
"Verify at the real boundary" is the load-bearing phrase: done is established by evidence produced where the work actually meets reality (a real test run, a live HTTP response, a build that compiles), never by a model asserting it in prose. That checkpoint is called the Proof Gate, and it is the spine of the whole method — lesson 3 is devoted to it.
Crossing the done-when fires two standing deliverables: a visual-teach course (the human-readable result, a self-contained HTML file with inline CSS, SVG, and JS — no build step) and a Publish step (a private gist, optionally a hosted page). Delivering and publishing are not optional polish; they are part of what "done" means here.
Here is the entire idea on one canvas. One inner loop drives the work; three audiences sit around it, each with a different job. The loop repeats until the done-when is met, then the result fans out into a delivered course and a published link.
What makes this harness unusual is that three very different readers are always present at once, and the system is designed so that each has a clearly separated job. Keeping those jobs apart is not bureaucracy — it is exactly what keeps the work honest. Here is who they are and what each one does.
The human observes and decides — and, crucially, is not in the path of the work. When the harness runs, it runs on its own; your job is to watch the timing board and make only the calls a person should make. You watch through two plain text files the loop keeps updating: LOOP-LOG.md, a running diary of every pass (what it looked at, the one move it picked, whether the proof passed), and review.md, a quality write-up of what is solid and what is shaky. You step in for exactly one kind of moment — a decision that is irreversible, outward-facing, or about business intent (money moving, something shipping publicly, a direction that cannot be cheaply undone). For that, the harness pauses and hands you a clear, decision-ready question. Everything else, it handles without you.
The model — the LLM driving a pass — runs the method and honours the gates. Its discipline is the product: it looks before it leaps, picks exactly one bounded move, makes it, and then proves it at the real boundary — running the actual thing and reading the actual result, never a claim, never a mock, never a green sticker it placed on its own work. After any change that touches the path it just proved, it proves it again, because the state it read a moment ago may already be stale. And it never sits idle: each pass either advances the work or improves the prompt that drives it. When it genuinely cannot reach the real boundary, or the next move is a human-only call, it does not fake progress and it does not block forever — it stops and hands back a decision-ready question.
The agents are the runtime — the hands that actually turn the wrenches. The method is run by a model, but the work happens inside concrete command-line tools: Claude Code, Codex, Kimi, Grok, and others, including a Council of several models voting together. An orchestrator hands one bounded unit at a time to one of these agents through its headless command-line mode, so different passes can run on whichever agent is best or available for that move. The whole harness — its skills and instructions — is installed into every one of them, so the same way of working is present no matter which tool picks up the next unit. One rule is sacred and runs through the entire suite: the agent that checks a result is never the agent that built it.
Think of it like… a race team. The crew chief only watches the timing board and makes the big calls — pit now, or push on (the human). The rulebook everyone follows — one change per lap, always measured on the clock — is the discipline (the model). The mechanics with the tools are the ones actually under the car, and a different inspector signs off the work than the one who did it (the agents). Same team, three roles, never blurred — which is exactly why nothing slips through.
The human role is AFK (away-from-keyboard) observability. You do not execute units and you are never handed a QA task — validation is done by the harness, through an agent that is not the one that built the thing. You are pulled in only at a genuine user-only fork, surfaced as a decision-ready handoff. The files you read are LOOP-LOG.md (the per-pass diary) and review.md (the QA observability report); a status view summarizes progress. You consume; you do not run.
# LOOP-LOG · search-handler hardening pass 007 LEARN read services/search/handler.py + test output ANALYZE bucket=correctness · pick=empty-query guard (1 unit) EXECUTE added guard + 1 test VERIFY real run: pytest -k empty_query → PASS ✓ proof at boundary DECIDE improved artifact · 2 done-when criteria left → loop pass 008 LEARN re-read state (repo moved since 007) ...
One pass = one bounded unit. Run LEARN → ANALYZE → EXECUTE (one) → VERIFY → DECIDE; never batch units, never idle, re-run proof after any change to the proven path. If the boundary is unreachable, stop and ask, decision-ready — never claim, never simulate the check in your head and report it as passed.
The orchestrator delegates one bounded unit to an installed agent via its headless cli -p mode, after a preflight (command -v / a panel-detect step) so only live agents are used. The Validator is always a different agent than the builder — that separation is what makes the Proof Gate trustworthy. These are the proven invocations (flags verbatim):
# Claude Code — JSON out claude -p "<one bounded unit>" --output-format json # Codex — quiet exec codex exec --quiet "<one bounded unit>" # Kimi — text out (never -p WITH --yolo; no --work-dir) kimi -p "<one bounded unit>" --output-format text # Grok — plain out, auto-approve, explicit cwd grok -p "<one bounded unit>" --output-format plain --always-approve --cwd .
The same skills are copied or symlinked into every agent (Claude Code reads them as /loop-engineering & /fusion-*; Codex/Kimi/Grok run headless; Cursor/Gemini/Aider/OpenCode/Crush/Goose read the SKILL.md). The runtime differs; the method is identical.
If three audiences are the who, four gates are the rules of the game — the checkpoints a run must clear so that "finished" is a fact rather than an opinion. You will meet each one in depth later; here is the shape, so the word "gate" is never mysterious when it comes up.
The Scope Gate comes first: the loop refuses to start churning until there is a written, measurable done-when. No finish line, no race. The Proof Gate is the heart of everything: a pass only counts as progress when the change is proven at the real boundary — a real run, a real response — not asserted. The Course Gate fires on convergence: when the work is done, a visual-teach course explaining it must exist. And the Publish Gate ships that course outward — a private gist, optionally a hosted page — so the result is shareable, not trapped on one machine.
Think of it like… the checkpoints on a marathon course. You cannot start without a registered finish line (Scope). Your time only counts if you actually cross the timing mats, not if you say you ran fast (Proof). At the end you collect a printed result certificate (Course). And the result is posted publicly, not whispered to you alone (Publish). Skip a mat and the run simply does not count.
An LLM can produce fluent, confident prose describing a success that never happened. The Proof Gate exists precisely to neutralize that failure mode: the only thing that advances a pass is an observation taken where the artifact meets reality — the exit code of a real test, the bytes of a real HTTP response, a build that actually compiles. A claim, a mock, or a self-graded check is rejected. Lesson 3 covers the Proof Gate in full.
Scope is a precondition (you cannot loop without a done-when). Proof runs every pass. Course and Publish are convergence triggers. Together they turn "I think it is done" into "here is the evidence, and here is the link".
The suite is a handful of named parts that snap together. You meet each one properly later in the course; here is the map, so the names are already familiar when they arrive.
loop-engineering is the core loop itself — the heartbeat from section 1. Forge is the front-end you reach for when the request is a rough idea rather than a spec; it turns a vague "I want…" into a measurable scope in seven steps. ultragoal is the discipline behind a durable, verifiable goal — the GOAL.md the loop tests against. visual-teach is the engine that builds the course you are reading right now. brightdata-cli is how the harness gets real, current web evidence instead of guessing from memory; computer-use-cli is how it drives native macOS apps when a task lives in a desktop program. fusion fans one hard question out to a panel of models in parallel and has a judge pick the best answer; Council is the heavyweight board — multiple models with defined roles and votes — for the biggest calls. The adapter family is the one shared piece of wiring that lets all of these reach the same underlying CLIs. And the publish gates are what ship the finished course outward at the end.
Takeaway
One loop, four gates (Scope · Proof · Course · Publish), three audiences (human observes, model runs the method, agents are the runtime), and a named set of parts. Hold those four facts and the rest of the course is just detail filling in around them.
Now watch the pieces work together as a single pipeline — a rough idea on the left, a published course on the right. Use the sticky contents to walk each hand-off; the diagram shows the same flow, and the snippet at the end is the shape of one proven pass. (This is also a live demo: the contents on the left track your place as you scroll.)
You hand over a few sentences and, some time later, a finished, checked deliverable plus a shareable link comes back. In between, the request travels through five hand-offs: a rough idea is sharpened by Forge into a measurable scope, the loop grinds through passes, the Proof Gate demands real evidence each pass, and the verified result is delivered & published.
Think of it like… commissioning a custom part. You describe what you want (idea). A shop foreman writes the exact spec and tolerances (Forge). Machinists make it in measured passes (loop). An inspector puts it on the gauge — a different person than the maker (proof). Then it is boxed with its certificate and shipped (deliver & publish).
A vague prompt enters Forge (grill → research → prototype → PRD → issues + /goal GOAL.md). The compiled, measurable scope feeds the loop, which dispatches one bounded unit per pass to an agent via cli -p. Each pass must clear the Proof Gate at the real boundary. On convergence, visual-teach builds the course (Course Gate) and the Publish Gate ships it.
A human describes what they want in a few sentences — no spec, no acceptance criteria yet. Nothing has been built; this step just delivers the intent to a machine that knows how to sharpen it.
Forge interrogates the idea (grill), optionally researches and prototypes, then compiles a PRD, issues, and a GOAL.md with a measurable done-when. The fuzzy "I want…" becomes a finish line the loop can test against — that is the Scope Gate being satisfied.
Each pass reads the real state, picks one bounded unit, and executes it — dispatched to an agent via cli -p. One change at a time, never a batch of guesses, repeating until the done-when is satisfied.
Before a pass counts as progress, the change is proven by running the real thing and reading the real result. A claim is not proof; a mock is not proof. A different agent than the builder does the checking — the Proof Gate.
When the done-when is met, visual-teach builds a self-contained course (the Course Gate) and the Publish Gate ships it — a private gist, optionally a hosted page — handing the human a finished result and a link.
Here is the shape of one pass — read it top to bottom and you will recognize every stage from the diagram. This is the heartbeat the harness repeats until done.
# one bounded unit, five steps, proven at the boundary def one_pass(scope, agent): state = learn(scope) # LEARN · read the real state if meets(state, scope.done_when): return deliver_and_publish(scope) # Course + Publish gates unit = analyze(state, scope) # ANALYZE · pick exactly ONE result = agent.run(unit) # EXECUTE · via cli -p proof = verify_at_boundary(result) # VERIFY · run the real thing if not proof.passed: # Proof Gate: evidence, not claims return retry_or_handoff(proof) return decide(scope, result) # DECIDE · improve · loop again
From any installed agent, start the harness on a task with /loop-engineering; for a rough idea, start with Forge instead. Watch progress in LOOP-LOG.md and read the QA write-up in review.md — you observe, the loop runs.
The Validator (verify_at_boundary) is always a different agent than the builder (agent.run) — that separation is what makes the Proof Gate trustworthy. The publish step runs node ~/.claude/skills/loop-engineering/publish-course-gist.js <courseDir> and reports the gist URL.
You now have the frame the whole course hangs on. The loop-engineering-suite is a harness: look, change one thing, prove it at the real boundary, repeat to a written finish line — then deliver and publish a course. It is model- and agent-agnostic, so the same method runs on any capable AI and any command-line tool. And it is read by three audiences whose jobs never blur: the human observes and decides, the model runs the method and honours the gates, the agents are the runtime that turns the wrenches.
The one rule
Nothing is "done" because a model said so. It is done when the real boundary says so — and then the result is taught and published, not just claimed. Everything else in this course is how that promise is kept.