Step 2 · Fundamentos · The harness · Loop Engineering ENPT
Module 1 · Fundamentos · How the work actually moves

The loop: one bounded unit per pass

Loop engineering does not do the whole job in one giant leap. It does it in passes — small, repeating turns where each turn looks at reality, makes exactly one change, and checks that change against something real before the next turn begins. This lesson walks through one full pass, beat by beat, so you can read what is happening when the harness is running.

Read the plain version, or open the technical layer on any section.
1

The big idea: small turns, not one giant leap


Imagine you ask the harness to fix a bug, build a feature, or polish a piece of writing. It will not try to do the whole thing in one heroic burst and then hand you a finished pile and hope it works. Instead it works in passes. A pass is one short, complete turn of the same six-step cycle, and the harness keeps taking passes until the job is provably finished.

Every pass follows the exact same shape: look at where things really stand, decide what the single most important next move is, make just that one move, check it against something real, and decide what to fix next — then loop. The discipline is the point: one change at a time, each one checked before the next begins. That is what keeps a long, unattended run from drifting into a mess.

Think of it like climbing a ladder in the dark. You do not leap for the top — you cannot even see it. You feel for the next rung, put your weight on it to make sure it holds, and only then reach for the one after. Each rung is a pass: a small move you actually test before trusting it. Skip the testing and rush three rungs at once, and the first weak one drops you all the way down.

The cycle, named

One pass is: LEARN → ANALYZE → EXECUTE one bounded unit → VERIFY at the real boundary → DECIDE → (loop). SCOPE sits before the very first pass and defines what "done" means; it is the contract every later pass is measured against. The loop repeats until the scope's done-when conditions are all met at a real boundary, not in the model's imagination.

Why a loop at all

A long task done in one shot accumulates unverified assumptions: by the end, an error early on has silently shaped everything after it. Bounding each pass to one verified change turns a fragile monologue into a sequence of checkpoints. If pass 7 breaks something, you know it was pass 7 — the previous six were each proven good at their own boundary.

2

It starts with "done"


Before the loop turns even once, there is a contract: what does "done" actually mean here? Not a vibe — a checkable condition. "Done" is not "the tests feel like they pass" or "it looks better." It is something you could hand to a stranger who would agree, just by looking, whether you hit it or not. "The login page returns in under 300 ms for 95% of requests." "All 14 unit tests pass and the build is green." "The article reads at a 9th-grade level and every claim has a source."

This is the one place where you, the human, are firmly in charge. You set the target — the measurable done-when — or you approve the one the harness proposes. Everything the loop does afterward is in service of that target, so if the target is fuzzy, the whole run is fuzzy. A sharp, measurable "done" is the single highest-leverage thing you provide.

Think of it like a finish line painted on the track. Runners can pace themselves, sprint, or coast — but nobody argues about who won, because the line is right there on the ground. A measurable "done" is that painted line. Without it, every pass is a runner asking "are we there yet?" and getting a different answer each time.

done-when is the spec

In the harness, scope is captured as a small set of done-when conditions — each one observable at a boundary (a test, a build, an HTTP response, a file diff, a rendered page). The next lesson covers the gates that enforce them; for now the key idea is that the loop has a fixed thing to aim at, written down before any change is made.

For a raw, vague ask, the front-end Forge exists precisely to turn "make it better" into measurable done-when conditions before the loop starts (you will meet Forge in Module 2). Either way, the rule is the same: no measurable done, no loop.

3

The six beats of a single pass


Here is one full pass, in order. Read it as a story — each beat hands its result to the next.

LEARN — look first. The pass opens by reading reality: the current state of the code or document, the scope, and any trusted sources. It inspects the real artifacts rather than guessing what they probably contain. Starting from a guess is the cheapest way to ruin a whole pass.

ANALYZE — name the gap, then pick ONE. With reality in hand, it compares where things are against where "done" says they should be, and buckets the gap into candidate moves. Each candidate gets a quick rating — Fit, Risk, Proof, Blocker, Next — and from that ranking it picks exactly one unit of work to do this pass. Not three. One.

EXECUTE — do just that one thing. It makes the single chosen change and nothing else. The temptation to "while I'm here, also fix…" is exactly what the loop refuses, because a pass that changes five things can't tell you which one broke.

VERIFY — check it for real. It then tests the change at a real boundary — runs the test, builds the project, loads the page, diffs the file — and looks at the actual result. Not "this should work." Proof.

DECIDE — improve, then loop. Based on what verification showed, it decides what to fix next: usually the artifact itself, but sometimes the instructions driving the work. Then it starts a fresh pass at LEARN. The cycle repeats until every done-when condition is met.

Think of it like a careful cook tasting as they go. Look at the pot (LEARN), decide the dish needs salt and only salt (ANALYZE → pick one), add a pinch (EXECUTE), taste it (VERIFY), then decide the next single adjustment (DECIDE). A cook who dumps in salt, pepper, lemon, and chili all at once and tastes at the end has no idea what to change — and neither would the loop.

Each beat names a contract

LEARN reads state from the real boundary (filesystem, git, running process, trusted docs) — never from stale memory. ANALYZE produces a ranked list and selects a single bounded unit. EXECUTE applies that one unit. VERIFY runs the check at the boundary and records the observed result. DECIDE chooses the next target — artifact or prompt — and re-enters the loop. SCOPE's done-when is the termination condition.

The detail beats expand later

LEARN, ANALYZE, EXECUTE and VERIFY each get fuller treatment across the course; the gates that make VERIFY trustworthy are the whole of Lesson 3. This lesson's job is the shape: that these beats run in this order, once per pass, every pass.

4

The loop in one picture


The same six beats, drawn as the cycle they form. Scope sets the target once; then the five inner beats turn, again and again, with verification as the gate that decides whether you advance or loop back to fix.

The loop engineering cycle SCOPE feeds into a repeating cycle of LEARN, ANALYZE, EXECUTE one unit, VERIFY, and DECIDE, which loops back to LEARN until done. SCOPE done-when LEARN read the real state ANALYZE rate, pick ONE EXECUTE one bounded unit VERIFY real boundary DECIDE artifact or prompt loop ↻ done-when met →
One pass = LEARN → ANALYZE → EXECUTE one unit → VERIFY → DECIDE. The dashed arrow is the loop back; the green exit fires only when every done-when condition is met at a real boundary.
5

ANALYZE: bucket the gap, rate it, pick ONE


ANALYZE is where a pass earns its focus. Looking at reality usually surfaces several things that could be done — a failing test here, a missing edge case there, a rough sentence, a slow query. The loop does not attack all of them. It first buckets the gap (groups what's missing into a short list of candidate moves), then gives each candidate a quick rating, and picks the single best one to do this pass.

The rating asks five plain questions about each candidate:

Fit — does it move us toward "done"? Risk — how likely is it to break things? Proof — can we verify it at a real boundary? Blocker — is anything else stuck behind it? Next — is it the natural next step?

The winner is the one unit with the best balance: high Fit, manageable Risk, something you can actually prove, ideally a Blocker that frees up later work. Everything else waits for a future pass. This is the rule that keeps the loop honest — one bounded unit per pass, chosen on purpose, not whatever is most tempting.

Think of it like triage in an emergency room. Five patients arrive at once. The nurse does not treat all five at half-speed; they rate each by urgency and what can actually be helped right now, and the most critical one goes first. The others are not forgotten — they are next in line. ANALYZE is that triage nurse for the work.

Fit / Risk / Proof / Blocker / Next

The five axes are a fast, repeatable rubric rather than a heavy scoring model. Proof is decisive: a candidate that cannot be verified at a real boundary this pass is deprioritised, because an unverifiable change can't safely close. Blocker captures dependency order — picking the unit that unblocks three others is usually higher-leverage than a flashy but isolated change.

"Bounded" is the operative word

A bounded unit is one whose effect and verification both fit inside a single pass — small enough to execute and check before the next LEARN. "Refactor the whole module" is not bounded; "extract this one function and keep the tests green" is. If a candidate is too big to bound, the right move is often to pick the smaller unit that splits it.

6

Why exactly one unit — never batch, never idle


The single most important rule of the loop is the one that sounds almost too simple: do exactly one bounded thing per pass. Not zero. Not five. One.

Why not five? Because if you change five things and then verify, and the result is wrong, you cannot tell which of the five caused it. You have lost the thing the loop is for — the ability to point at a single change and say "this one is proven good." Batching trades a moment of apparent speed for a debugging swamp later.

Why not zero? Because a pass that looks, thinks, and then does nothing is wasted motion. The loop never idles: every pass either ships one verified change or hits a real blocker it surfaces clearly. "I'll wait and see" is not a pass; it is a stall, and over a long unattended run, stalls are how progress quietly dies.

Think of it like surgery, one incision at a time. A surgeon does not make five cuts at once "to save time," and they do not stand frozen over the patient either. Each deliberate action is made, checked, and only then is the next one taken. The patient — your codebase, your document — survives precisely because nothing happens that isn't one controlled, verified move.

Bisectability is the payoff

One-unit passes make a run trivially bisectable: every checkpoint isolates exactly one change against its verification. A regression introduced at pass N is attributable to pass N by construction — there is no "which of these edits did it?" because each pass made one edit and proved it. This is what allows a long autonomous run to stay debuggable.

Never idle is a liveness rule

The discipline cuts both ways. An LLM driving the loop must convert each pass into either a shipped, verified unit or an explicitly surfaced blocker — never a no-op, never "let me think about it" with no output. Idle passes burn budget and erode observability, since the human watching the log can no longer tell whether progress is happening.

7

VERIFY at the real boundary


A change isn't done because it looks done. It's done because you checked it against reality and reality agreed. That check is VERIFY, and it always happens at a real boundary — the actual place where the truth lives. If the unit was a code fix, you run the test and read the result. If it was a build change, you build the project. If it was a page, you load the page. If it was a sentence, you re-read it in context.

What VERIFY refuses is the comfortable lie: claiming success from memory, or trusting a mock that always says yes. The harness has a hard rule here — verify by actually running the check, never by simulating it in your head and declaring victory. A pass that "should pass" hasn't passed. Only the boundary gets to say so.

Think of it like a smoke detector versus your own nose. You might smell nothing and feel sure the house is fine. The detector is the real boundary — it doesn't care how confident you are; it samples the actual air. The loop trusts the detector, not the hunch. That's the difference between "I think it works" and "it works."

Proof Gate, in brief

This is the heart of what Lesson 3 calls the Proof Gate: every claimed completion must be backed by observed output from the real boundary — a test runner's exit code, a build log, an HTTP status, a rendered diff. "Claim" and "mock" are explicitly not evidence. A unit only advances when its done-when condition is observed true, not asserted true.

The boundary depends on the artifact

Boundaries are concrete and varied: a unit/integration test, a compiler, a linter, a running server hit with a request, a screenshot, a file diff reviewed against the spec, or live web evidence pulled via a tool. The skill is choosing a boundary that actually exercises the change — a test that doesn't touch the changed path proves nothing.

8

DECIDE: improve the artifact — or the prompt — then loop


Verification just told you the truth about your one change. DECIDE is what you do with that truth. Usually the answer is straightforward: pick the next single improvement to the artifact — the next bug, the next missing piece — and start a fresh pass at LEARN.

But there's a subtler, powerful move. Sometimes the thing that needs fixing isn't the artifact at all — it's the instructions driving the work. If pass after pass keeps missing the same way, the smartest change might be to sharpen the prompt or the scope itself, so the next pass aims better. The loop is allowed to improve itself, not just its output. Then, either way, it loops — and keeps looping until every done-when condition is met.

Think of it like a GPS rerouting. After each stretch of road it checks where you actually are against where you should be. Usually it just says "continue" (improve the artifact). But if you keep drifting off course, it doesn't repeat the same wrong turn louder — it recalculates the whole route (improve the prompt). Same destination, smarter directions.

Two surfaces of improvement

DECIDE can target either editable surface: the artifact (the code, doc, or design under construction) or the prompt/scope (the instructions and done-when guiding the loop). Repeated near-misses on the same axis are the classic signal to improve the prompt rather than grind the artifact — a cheap meta-correction that re-aims every subsequent pass.

Convergence and termination

The loop terminates when the scope's done-when conditions are all observed true at their boundaries — that is convergence. If a pass surfaces a blocker that needs a human decision, the loop pauses at a clearly-marked handoff rather than guessing. Otherwise it keeps taking passes; "improve until convergence" is the literal stop rule.

9

Watching passes go by in LOOP-LOG.md


While the loop runs — often for hours, unattended — you don't sit and drive it. You watch it. The harness writes a running record, LOOP-LOG.md, that turns the invisible passes into something you can read at a glance: how many passes have happened, how many units actually shipped, how often verification passed, and whether anything is blocked. This is your observability window. You set "done"; the log lets you confirm the loop is honestly converging on it.

The panel below is that window, made live. The four tiles up top are the loop's vital signs; the table is one row per unit of work, each with a badge for where it stands. Hit Run a pass to advance the loop by one turn, or flip on Auto-loop to watch passes tick by the way they would during a real AFK run.

LOOP-LOG.md — auth-refresh goal

one bounded unit per pass · verify at the real boundary

Converging — passes healthy
pass 6 · just now
Passes run
6
+1 this turn
Units shipped
4/ 6
+1 verified
Verify pass-rate
83%
+5 pts
Idle passes
0
held at zero
Units this run — one bounded unit per pass
Unit of work State Fit Last verify (real boundary)

One model, two views

A single list of units drives both the table and the rollup banner. Each pass advances exactly one unit — moving a queued unit into "verifying", or resolving a "verifying" unit into "shipped" (verify passed) or "blocked" (verify failed at the boundary). The vital-sign tiles are recomputed from that list every pass: passes run, units shipped, the rolling verify pass-rate, and idle passes held at zero. The overall banner reads the worst unit: any blocker turns it amber, otherwise it reports healthy convergence.

Why these four numbers

Passes run is the loop's heartbeat. Units shipped over passes is its true throughput — verified work, not attempts. Verify pass-rate is the honesty signal; a falling rate says the prompt may need improving (back to DECIDE). Idle passes must stay at zero — the moment it climbs, the loop is stalling rather than progressing. Reading these four is exactly how a human supervises an AFK run without touching it.

10

Who does what: you, the LLM, the agents


The loop is one cycle, but three different kinds of actor touch it, each with a clear job.

You, the human, own the edges. You set or approve the measurable done-when at the start, and then you mostly watch — reading LOOP-LOG.md to confirm the run is converging honestly. You step back in only when the loop surfaces a real decision it shouldn't make alone. You are the supervisor, not the driver.

The LLM is the engine of a pass. It runs the six beats and obeys the iron rule: exactly one bounded unit per pass — never batch several changes, never idle on a no-op. Each pass it produces either one verified change or a clearly-surfaced blocker. That discipline is what makes its long, unattended output trustworthy.

The agents are how passes get spread across tools. A single pass can be handed to a different command-line model than the last one — one pass driven by one CLI, the next by another — so the strongest tool for each step does that step. The orchestration layer dispatches a pass to whichever agent fits, and the log keeps it all legible.

Think of it like a film set. The director (you) sets the vision and watches the monitor, but doesn't operate the camera. Each shot (pass) is taken by whichever specialist crew is right for it — and the call sheet (the log) lets the director see every take without standing behind every lens.

Dispatching a pass to a different CLI

In a multi-agent run, the orchestrator can dispatch any single pass to a specific command-line agent in headless mode — conventionally cli -p "<the one bounded unit for this pass>". Because each pass is bounded and independently verified, it does not matter that pass 4 ran on one model and pass 5 on another; the boundary check is what certifies the result, not the identity of the engine. A roster of agents is chosen up front, and the validator of a pass is never the same agent that produced it.

The human stays out of the inner loop

Everything inner-loop runs AFK (away-from-keyboard): the human has observability, not a steering wheel. The only place a run blocks for a person is a deliberate, decision-ready handoff. Watching LOOP-LOG.md (and, later, a review file) is the supervision surface; it never requires the human to execute a step themselves.

11

In the code: one pass, written down


None of this is abstract. A pass leaves a trail you can read. Here is the kind of entry the loop appends to LOOP-LOG.md after a single pass — notice it records all six beats: what it learned, the one unit it picked and why, what it did, the real verification it ran, and the decision for next time.

LOOP-LOG.md — appended after pass 7
## pass 7  — 2026-06-14 09:41
learn    read src/auth/refresh.ts + 14 tests; scope done-when = "all auth tests green"
analyze  gap bucketed into 3 candidates; rated Fit/Risk/Proof/Blocker/Next
        picked → "handle expired-token retry"  (Fit:high Risk:low Proof:test Blocker:yes)
execute  edited refresh.ts: retry once on 401, then surface error  (1 unit, no batching)
verify   $ npm test -- auth/refresh
        ✓ 14 passing  (real boundary: test runner exit 0)
decide   artifact ok; next unit = "rotate refresh token on success" → loop
How to find & run this yourself

When the harness drives a goal, the log lives at the root of the working directory. To watch passes arrive in real time, tail it:

your terminal
$ cat LOOP-LOG.md            # read the whole run so far
$ tail -f LOOP-LOG.md         # follow new passes as they land

A pass dispatched to a specific command-line agent is launched in headless mode with the one bounded unit as its prompt:

orchestrator — dispatch one pass
$ cli -p "EXECUTE one unit: handle expired-token retry in refresh.ts; verify with: npm test -- auth/refresh"

The exact agent behind cli can differ pass to pass — what certifies the result is the verification at the boundary, not which engine ran it.

The takeaway

Every pass is one bounded unit, learned from reality, executed alone, and proven at a real boundary before the loop turns again. That is the whole engine: small verified steps, repeated until "done" is observably true.

12

Quick check


Three quick questions. Pick an answer and the panel tells you why it's right or wrong — retrieval beats re-reading.

Q1During ANALYZE, the loop finds five things that could be fixed. How many does it do this pass?

Q2What does "VERIFY at the real boundary" rule out?

Q3In an AFK run, what is the human's main job once the loop is going?

Answered 0 / 3 · 0 correct
Your agent is your teacher. Want to see a real pass on your own repo, or curious how VERIFY decides what counts as proof? Ask it to run one bounded unit and show you the log. Next up — what makes that verification trustworthy: The gates and the Proof Gate.