Scope Gate
"Done" is written as something you can measure — a number to hit, a test to pass, a behavior to observe. A vague goal can't open this gate.
The loop never declares victory on its own say-so. Four gates stand between a task and "shipped" — Scope, Proof, Course, and Publish — and one of them is the heart of the whole method: the Proof Gate. It accepts only real-boundary evidence — you actually ran it and hit the real path — never a claim, a mock, or a green assertion you wrote yourself.
A gate is a checkpoint with one rule: you do not pass until a specific condition is true — and the condition is checked, not assumed. The loop has four of them, and together they are the contract for what "done" means. Nobody — not a person in a hurry, not an AI feeling confident — gets to wave a task through. Either the gate's condition is met, or the work is not done. Full stop.
The four gates, in the order a piece of work meets them: Scope asks "is 'done' written down as something we can actually measure?" Proof asks "did we run it for real and watch it work?" Course asks "is there a clear, self-contained lesson explaining the result?" And Publish asks "is the result out where it can be found and used?" Each one is a yes/no you can point at — never a vibe.
Why bother turning "done" into four hard checks? Because "done" is the most over-claimed word in any project. It is cheap to say something works and expensive to show it. The gates exist so the expensive thing — showing — is the only thing that counts. The middle gate, Proof, is where most of the danger lives, so most of this lesson lives there too.
Think of it like… the locks of a canal. A boat can't just float from one water level to the next because it feels ready — each lock has to physically fill before the gate opens, in order, one at a time. The lock doesn't care how confident the captain is; it cares whether the water is actually level. The gates of the loop are those locks: the water has to really rise before the next gate opens.
Every pass of the loop is LEARN → ANALYZE → EXECUTE one bounded unit → VERIFY at the real boundary → DECIDE. The Scope Gate is settled before the loop starts spinning (it is the measurable done-when from lesson 2). The Proof Gate is what the VERIFY step must clear on the way to DECIDE — it is not a separate phase, it is the bar VERIFY is held to. The Course and Publish gates fire once, at convergence, when the whole goal is met.
It helps to read each gate as a boolean function over reality: scopeMet(), proofMet(), courseMet(), publishMet(). "Done" for the run is the and of all four. Because each returns a value derived from something observed — a measurement, a command's exit code, a file that exists, a URL that resolves — there is no place for an opinion to sneak in. That is the entire point: the contract is enforced by evidence, not by trust.
Here are the four, left to right, the way work flows through them. A piece of work has to clear each before the next opens — and if any one fails, the work is sent back, not waved on. Notice the second one is highlighted: that is the heart of the method, and the rest of the lesson zooms into it.
"Done" is written as something you can measure — a number to hit, a test to pass, a behavior to observe. A vague goal can't open this gate.
The change was run at the real boundary — the actual command, the actual path, the actual output — and observed to work. A claim or a mock does not count.
The result comes with a self-contained, multi-lesson visual-teach course — a human can learn what was built without reading the code.
The deliverable is pushed somewhere findable — a private gist, a Pages site — and the URL is reported back. A result no one can reach isn't shipped.
Scope is fixed up front and re-read every pass (it can be sharpened, never quietly widened). Proof is enforced on every VERIFY — each bounded unit has to prove itself before the loop moves on. Course and Publish are convergence gates: they fire once the whole goal is met, turning a working result into something a human can learn and reach. The Publish Gate gets its own lesson (lesson 11); the Course Gate is what lesson 10 is about.
It is easy to think shipping code is the finish line. In this harness it isn't: a result you can't explain is treated as not-yet-done, because the next person (or the next agent) can't safely build on it. The Course Gate forces the build to end in understanding, not just bytes — which is exactly why this very page exists.
Of the four, the Proof Gate is the one the whole method is built around. Its rule is short and unforgiving: "done" requires real-boundary evidence — you ran it and hit the real path — never a claim, a mock, or a green assertion. Let's unpack that, because every word is load-bearing.
A claim is anyone saying "it works" — including a very fluent AI that sounds completely sure. The Proof Gate does not accept claims, no matter who makes them. A mock is a stand-in: a fake version of the thing that always answers nicely, used to avoid touching the real system. Mocks are useful while building, but they prove nothing about reality, so they don't open this gate. A green assertion is the trickiest: a test that passes — but only because it was written to pass, or only checks the easy half, or runs against the mock instead of the real boundary. Green is not the same as true.
What does open the gate is the opposite of all three: you take the real thing, run it, push it down the real path that matters, and watch what actually happens. Then you point at that — the command, the output, the page that loaded, the value that came back. If you can't point at the evidence, the gate stays shut and the work is not done.
Think of it like… a parachute. The packing slip says "inspected — OK". The label is a claim. A practice fold on the table is a mock. A unit test that confirms the slip was filled in is a green assertion. None of them are the thing you actually want, which is: the canopy opened, in the air, on a real jump. The Proof Gate is the harness refusing to count anything less than the canopy opening.
The Proof Gate in one sentence
If you cannot point at the moment the real thing did the real work, you have a claim — and a claim is not done.
The boundary is wherever the change meets reality and could fail: the HTTP endpoint really returns 400, the binary really exits 0, the page really renders in a browser, the file really lands on disk, the migration really runs against a real schema. Proving against an inner layer you fully control (a function you just wrote, a mock you configured) is the classic miss — it tests your intent, not the world. Pick the outermost seam the unit is responsible for, and hit that.
A test is evidence only when it exercises the real boundary and could genuinely fail. Three ways "green" lies: it runs against a mock instead of the real dependency; it asserts something trivially true (the function returns a value, not the right value); or it was written after the fact to match whatever the code already does. The Proof Gate asks of any green check: what real thing did this touch, and how would it have gone red if the work were wrong?
Sometimes you genuinely can't hit the real path — no credentials, a service is down, the environment is missing. The rule then is not "mock it and call it done". It is stop and ask, decision-ready: surface exactly what is blocked, what you'd run if unblocked, and the options — so a human can make one clean call. Faking the proof to keep moving is the one thing the gate exists to prevent.
The gates are a contract, and a contract only works if everyone honors it the same way. In this harness three kinds of participant meet the Proof Gate — the human watching, the language model doing the reasoning, and the agent (a CLI tool) doing the running — and each has a clear duty. They aren't separate audiences to address; they're three hands on the same gate.
You read the result the way an inspector reads a report: you look for the proof, not the confidence. "It works" earns nothing; a command you can re-run and an output you can see earn everything. The gates are your contract — they're what lets you stay out of the doing and still trust the done.
Every time you touch the path, the old proof is stale — so you prove it again, against the real boundary. And if the boundary is unreachable, you don't fake it: you stop and ask, decision-ready, laying out what's blocked and what the options are, so the human can make one clean call.
Whoever built the change does not get to be the one who proves it. A separate agent — the Validator — runs the boundary check. An author marking their own homework is exactly the conflict the Proof Gate removes; independence is what makes the evidence worth trusting.
"The Validator is never the builder" is the same principle as a second pair of eyes on a deploy, or a different accountant signing off the books. The builder is invested in the change passing; that bias is human and unavoidable, so the harness routes the proof through someone (something) without it. In a multi-agent run the orchestrator hands the bounded unit to one agent and the verification to another — the independence is structural, not a matter of good intentions.
When an LLM hits an unreachable boundary, "stop and ask" is only useful if the ask is complete. Decision-ready means: here is exactly what is blocked, here is the evidence I do have, here are the 2–3 options with their trade-offs, and here is my recommendation. That lets the human unblock with a single reply instead of a round of questions — and it keeps the human on observability, never in the doing (the AFK discipline of lesson 5).
Proof isn't an idea — it's a few lines anyone can re-run. When a unit clears the Proof Gate, the evidence is captured so the human can audit it later without being in the room. Here is the same bounded unit recorded two ways: a claim (which the gate rejects) and the proof that replaced it (which the gate accepts). Read both and the difference is obvious.
# unit: /search must reject an empty query with 400 # status: claimed done note: "added the guard, looks correct, should return 400 now" ← a CLAIM test: "wrote test_empty_query — it passes" ← GREEN, but against a mock router # Proof Gate: REJECTED — no real boundary was hit.LOOP-LOG.md — the same unit, now WITH proof
# unit: /search must reject an empty query with 400 # VERIFY — run the REAL endpoint, observe the REAL response $ curl -s -o /dev/null -w "%{http_code}\n" "localhost:8000/search?q=" 400 ← observed at the real boundary $ curl -s -o /dev/null -w "%{http_code}\n" "localhost:8000/search?q=shoes" 200 ← the happy path still works # Proof Gate: PASSED — evidence above, re-runnable. Validator: agent-B (not the builder).
The proof above is just curl hitting the running service and printing the status code with -w "%{http_code}". You start the service, then run the two calls and read the numbers with your own eyes — 400 for the empty query, 200 for a real one. That observed pair is the evidence; it goes straight into LOOP-LOG.md, the file the human reads to audit a run (you'll meet the log in lesson 5).
If the fact you need lives on the web rather than in the repo — a current API behavior, a version, a spec — the trusted-source path is the Bright Data CLI (brightdata search "…"), never a guess and never WebSearch/WebFetch; that's lesson 9. Same principle, different boundary: evidence over recollection.
Here is the gate in action. Below is a real proposed change — the very /search guard from the log above — annotated the way a Validator reviews it. The pills along the top are the Validator's one-glance read of how the change moves things. The clay dots are notes pinned to the exact line they're about; one is blocking, which means the Proof Gate stays shut until it's resolved. Click any dotted line to read the note, and find the blocker.
Think of it like… a building inspector walking the site with a clipboard. Green-checks most things, but pins a red tag to the one beam that isn't to code — and that single red tag is enough to withhold the certificate. The blocking note is that red tag.
Adds a guard to GET /search so an empty q returns 400, not the whole index. 1 file changed · +6 −1
Each clay-dotted line carries a note. 1 is blocking — it's the reason this change has not cleared the Proof Gate yet. See if you can find it.
The blocking note sits on the test line because the test passes against FakeRouter() — a mock — not the real endpoint. That is a green assertion with no real-boundary evidence behind it: it proves the author's intent, not the world's behavior. The fix is the proof from the previous section — a real curl against the running service showing 400 for ?q= and 200 for a real query. Until that observed evidence exists, the gate is correctly shut.
This widget is the same review surface a Validator uses, lifted whole and namespaced so it can live inside this lesson. Everything is inline — open it from file:// and it just works.
The whole discipline collapses into one fork. A unit reaches VERIFY; the Proof Gate asks for real-boundary evidence. Yes → it passes and the loop moves on. No → it goes back, every single time. There is no third door labelled "close enough".
The one rule
The gate has no door for "probably". Either the evidence exists and you point at it, or the unit is not done — and if you truly can't reach the boundary, you stop and ask instead of faking it.
Watch the gate do its job on one unit, start to finish. The task: "make /search reject an empty query with a 400." An author reports it done. The Proof Gate disagrees — twice — before it finally opens. Each beat is the gate refusing a different impostor of "done".
A confident sentence and nothing to point at. The Proof Gate doesn't read intentions — it reads evidence. Rejected: no boundary was hit. Go prove it.
The test runs against FakeRouter(), a mock — it asserts the author's intent, not the real endpoint's behavior. Green here proves the fake works. Rejected: a green assertion against a mock is not real-boundary evidence.
curl "localhost:8000/search?q=" → 400, and curl "localhost:8000/search?q=shoes" → 200. Observed at the real boundary, re-runnable, captured in the log. A different agent — the Validator — ran it. Passed: now there is something to point at.
What the gate produced
Two rejections and one pass — and the only difference between "rejected" and "done" was a pair of status codes someone actually watched come back from the real service. The claim and the green test changed nothing about reality; the curl did. That is the Proof Gate earning its place as the heart of the loop.
Reads only on the way in, evidence on the way out — every claim has a re-runnable source, and the Validator line records that the prover was not the builder.
# VERIFY — fix/empty-query — Validator: agent-B (builder was agent-A) $ curl -s -o /dev/null -w "%{http_code}\n" "localhost:8000/search?q=" # → 400 $ curl -s -o /dev/null -w "%{http_code}\n" "localhost:8000/search?q=shoes" # → 200 # Proof Gate: PASSED. evidence above. unit done; loop may DECIDE.
Four gates make up the contract for "done" — Scope, Proof, Course, Publish — and they're checked, never assumed. The middle one is the heart: the Proof Gate takes real-boundary evidence and nothing else. A claim is not proof. A mock is not proof. A green assertion you can't trace to the real path is not proof. You ran it, you hit the real seam, you watched it work, and you can point at that — or the work isn't done.
The three hands on that gate keep it honest: the human trusts evidence over claims, the LLM re-proves after any change and stops-and-asks when the boundary is out of reach, and the agent keeps the Validator separate from the builder. Carry that and the rest of the harness — the AFK runs, the multi-agent crews, the publishing — stays trustworthy, because everything downstream is built on proof you could re-run yourself.