Sometimes what you hand the harness isn't a spec — it's a hunch. "Build me a thing that does roughly this." The loop you met in the last lessons needs a measurable done to run against, and a hunch doesn't have one. Forge is the front-end that turns the hunch into that contract: seven steps that grill the idea, write it down as a PRD, slice it into tickets and a GOAL.md, then hand the whole thing to the loop to build and prove — while you just watch.
The loop from the last lessons is precise: each pass picks one bounded unit, makes the change, and proves it at the real boundary. But that precision needs something to aim at — a measurable "done". Give it a sharp spec and it runs beautifully. Give it a vague wish — "build me a tool that helps people track their reading" — and it has nothing crisp to check itself against. It would have to guess what you meant, and guessing is exactly what this whole method exists to avoid.
Forge is the part you run before the loop when all you have is a hunch. Its only job is to turn that rough idea into a real contract the loop can build against: a written description, a sliced list of work, and a goal file with a measurable finish line. Once Forge has done its job, the loop takes over — and it now knows exactly what success looks like.
Crucially, Forge does this with you but mostly without needing you. You drop in the rough idea and then step back to watch. The machine interrogates its own thinking, fills in its own gaps, and only taps you on the shoulder for the rare decision that genuinely only you can make — a true fork in the road, like "should this be free or paid?" Everything else, it figures out and writes down so you can read it later.
Think of it like… briefing an architect. You don't hand over engineering drawings — you say "I want a light, open house for a family of four, near the garden." A good architect doesn't start pouring concrete. They ask sharp questions, sketch, and come back with blueprints and a materials list you can sign off on. Forge is that architect's process: it refuses to build until the hunch has become a plan precise enough to build from. The loop is the construction crew that then follows the plan.
Forge is the front-end to the loop, not a replacement for it. It runs once, up front, to manufacture the inputs the loop assumes already exist: a PRD (the written spec), a set of issues on a kanban board with blocking relationships, and a GOAL.md with a measurable done-when. The moment those exist, the autonomous build loop you learned in lessons 2 and 3 can run, because it finally has a contract to verify against.
When the prompt you start with is already a tight spec — clear scope, obvious done-when — you skip Forge and go straight to the loop. Forge earns its keep precisely when the input is raw: an idea, a sentence, a screenshot, a "wouldn't it be cool if…".
The whole flow fits on one line: a rough idea goes in on the left, passes through seven steps, and what comes out on the right is a contract precise enough for the loop to build and prove. The two middle steps in dashed boxes are optional — used only when the idea needs more grounding or a quick visual before it's written down.
The one rule
Forge does not let the build start until the idea has become a contract with a measurable finish line. No measurable done-when, no construction.
Here is the whole flow as a list. Read it top to bottom — each step takes the output of the one above and sharpens it. Steps 2 and 3 are skippable; the other five always run.
grill — interrogate the idea
The idea gets stress-tested with hard questions: who is this for, what does "good" mean, what's explicitly out of scope, what could go wrong. Most questions the machine answers itself by reasoning aloud; only a genuine you-only decision is escalated to you.
research — ground it in realityoptional
If the idea leans on facts that live in the world — a competitor's pricing, what an API actually returns, whether a library still exists — those are checked against a trusted source rather than assumed. Skipped when the idea needs no outside facts.
prototype — sketch it fastoptional
A throwaway mock or a rough screen, just enough to make the idea concrete and catch a "that's not what I pictured" early — before it's written into the spec. Skipped when the shape is already clear.
to-prd — write the spec
Everything settled so far is written down as a Product Requirements Document: the problem, who it's for, what's in and out, and what success means. This is the first durable artifact — the thing everything downstream is checked against.
issues + /goal GOAL.md — slice and set the finish line
The PRD is cut into small tickets on a kanban board, with blocking relationships so the order is explicit (this can't start until that is done). Alongside it, GOAL.md records the measurable done-when using the ultragoal discipline — the single line the loop will hold itself to.
implement — the AFK build loop
Now the loop runs, ticket by ticket, away-from-keyboard. For each ticket an Executor builds the change and a separate Validator proves it at the real boundary. You aren't in this path — you're watching it.
review — AFK QA pass → review.md
When the build converges, a final quality pass runs — also away-from-keyboard — and writes its findings to review.md, the observability report you read to see how it went. Even the QA is something you observe, not something you run by hand.
The steps line up with concrete moves in the suite: grill (a self-questioning pass), an optional research pass, an optional prototype, to-prd (writes the PRD), to-issues plus /goal (writes the kanban tickets and GOAL.md), the implement loop, and a review pass that emits review.md. Every step leaves a durable file behind, which is what makes the run observable after the fact.
The blocking relationships on the issues matter: they encode the dependency order so the build loop never picks up a ticket whose prerequisites aren't met. That is how a flat list of work becomes a safe sequence to execute unattended.
The reason Forge can run while you make coffee is that the work is split across three kinds of actor, each with a strict lane. Knowing the lanes is the whole trick to trusting the process.
You (the human) do the smallest, most valuable thing: hand over the rough idea, then observe. You read the artifacts as they appear — the PRD, the board, review.md — but you don't drive the steps. The one exception is a genuine fork only you can settle, where Forge stops and asks.
The model (the LLM) does the thinking that used to need a meeting. In grill it argues with itself — poses the hard question, then answers it from reason — and writes the answer down. It self-answers almost everything; it only escalates the rare decision that truly belongs to you.
The agents do the building, in the implement step, and they come in two roles that are never the same actor: an Executor writes the code for a ticket, and a Validator proves that ticket actually works at the real boundary. Separating them is deliberate — the one who built it is never the one who certifies it.
Think of it like… a film set. You're the producer who greenlights the concept and watches the dailies. The director (the model) makes the thousand creative calls and only brings the biggest ones to you. The crew (the agents) actually shoot — and a separate continuity supervisor checks each shot, because the person who filmed it is the last one who should judge whether it's good.
Now make it concrete. Pick a step and a lens — yourself, the model, or the agents — and the panel shows exactly what that actor does at that step, plus the file or artifact it leaves behind. This is the same idea as the three lanes above, but you steer it: change either control and the readout re-renders.
Forge step
Through which lens
The panel has no per-combination markup. Each segmented control just writes one value onto a small state object — state.step and state.lens — and a single render() reads both, paints the pill, the title, the role sentence, and the code readout. It is the same token-driven pattern as a design-system preview: change a value, re-render, no new markup. Here the "tokens" are which step and whose lane.
The matrix is the lesson in data form: for a given step, the human row is usually "observe / read the artifact", the model row is "decide and write", and the agent row only lights up with real work at implement (Executor builds, Validator proves) and at review (the AFK QA pass).
Two artifacts deserve a closer look, because they are what make the build safe to run unattended. The first is the kanban of issues — the PRD chopped into small tickets, each with blocking relationships that say what must finish before it can start. The second is one GOAL.md that pins the measurable finish line for the whole run, written with the ultragoal discipline.
Why both? The issues say what to build and in what order; GOAL.md says how the run knows it's actually done. Without the ordering, an agent could pick up a ticket whose foundations don't exist yet. Without the done-when, the loop would have no boundary to prove against — the exact problem Forge exists to fix.
Think of it like… a recipe card pinned next to a prep list. The prep list (issues) is ordered — chop before you sauté, sauté before you plate. The recipe card (GOAL.md) is the single line that tells you the dish is finished: "serves four, plated, sauce reduced." The crew follows the prep list; the card is how anyone knows to stop.
GOAL.md holds the one measurable line the loop verifies against./goal compiles GOAL.md under the ultragoal discipline: it refuses to finalize until the done-when is measurable and signed off, and it structures the file into explicit blocks (goal, context, constraints, verification, done-when). That structure is what lets an unattended loop check itself honestly — there is a named, testable target instead of a vibe.
The blocking relationships on issues are a dependency graph, not decoration. The build loop only pulls a ticket once its blockers are Done, which is what keeps an AFK run from building floor three before the foundation exists.
Forge isn't abstract — it's a sequence you kick off, then watch produce real files. Here is the shape of a run: one command to start from the rough idea, and the durable artifacts that appear as each step lands. Notice that after the first line, your job is to read, not type.
# hand over the rough idea — this is the only thing YOU drive forge "a small web app to track books I'm reading" # Forge runs the steps and leaves artifacts you READ, in order: PRD.md # the written spec (from to-prd) issues/ # kanban tickets with blocking relationships GOAL.md # the measurable done-when (from /goal, ultragoal) LOOP-LOG.md # the AFK build loop's running trail (implement) review.md # the AFK QA findings (review) — your final read
You start the flow from the rough idea and then step out of the path. The artifacts above are your windows in: open PRD.md to check the spec matches your intent, scan issues/ and the board to see the slicing, read GOAL.md to confirm the finish line is the one you wanted, and follow LOOP-LOG.md and review.md to watch the build and its QA without touching either.
If grill needs a fact from the world during research, that fact comes from a trusted source — the Bright Data CLI (brightdata search "…"), never a guess and never WebSearch/WebFetch. The one place Forge stops for you is a genuine you-only fork; everything else it self-answers and records.
Let's run a single rough idea through Forge end to end: "a small web app to track the books I'm reading." Watch a vague wish become a contract, then a built-and-proven thing — with you reading along, never driving.
It poses the hard questions and answers most from reason: "Track what — title, progress, rating? All three. Multi-user? No, single-user v1. Out of scope? Social features." One question is a true fork — "store data locally or in the cloud?" — so it escalates that one to you and waits. You answer "local for now." That is the only time it needs you.
No outside facts to ground and the shape is obvious, so both optional steps are skipped. Forge doesn't pad the flow with steps the idea doesn't earn.
PRD.md modelEverything settled becomes a short PRD: problem, single-user scope, the three tracked fields, local storage, social explicitly out. You read it and confirm it matches what you pictured. First durable artifact, signed off.
Five tickets appear: #1 scaffold → #2 data model → #3 add/list books → #4 progress + rating → #5 local persistence, each blocking the next. GOAL.md records the done-when: "can add a book, set progress and rating, data survives reload; suite green." Now the loop has a target.
The loop runs unattended, ticket by ticket. For #3, the Executor writes the add/list code; the Validator — a different actor — actually adds a book and lists it back to prove it at the real boundary, not by claiming success. You watch it advance in LOOP-LOG.md.
review.md agents · AFKWhen the board is green and GOAL.md's done-when is met, a final quality pass runs on its own and writes its findings to review.md. You read that report — and that's the run. You handed over a sentence and one fork decision; everything else was observed.
What Forge produced
A sentence became a signed PRD, a blocking board, a measurable GOAL.md, a built app proven ticket-by-ticket, and a review.md to read — with exactly one human decision in the whole run. That is the bargain: you bring the idea and your judgement on the true forks; the machine brings the discipline and the hands.
Every step leaves a line you can read after the fact — the observability the whole suite runs on (you'll meet it fully in lesson 5). The build never claims success; the Validator proves it.
# FORGE — "book tracker" grill # self-answered 6 Qs; escalated 1 fork (storage) → human: local research # skipped — no external facts needed prototype # skipped — shape is clear to-prd # → PRD.md (single-user, 3 fields, social out) to-issues # → #1..#5 with blocking edges /goal # → GOAL.md done-when (add/track/persist; suite green) implement # Executor builds #1..#5; Validator proves each at boundary review # → review.md (human reads; run complete)
Recall beats re-reading. Answer each from memory before you peek — the option you pick grades instantly, with a note on why. The answers are spread around on purpose.
Q1When should you reach for Forge instead of going straight to the loop?
Q2During grill, what does the model do with most of its hard questions?
Q3In the implement step, why are the Executor and the Validator never the same actor?
Q4What do the blocking relationships on the issues board actually enforce?
Q5Across a Forge run, what is the human's role?
GOAL.md looks like for your idea? Ask. Next up — now that the contract exists and the build runs unattended — is how you watch a run without ever stepping into its path: AFK + observability: never in the path.