O fluxo · How the work runs while you are away

AFK + observability: the human is never in the path

In this harness, the work runs AFK — away from keyboard. The whole loop — learn, build, verify, decide — runs without a person feeding it the next move. Your job changes from operator to observer: you read what happened, you don't push the buttons. The machine only ever stops for you at one specific kind of decision — and never to hand you a chore.

Read the plain version, or open the technical layer on any section.

The big idea: you watch, you don't drive

AFK stands for away from keyboard. In most tools, "AI helps you work" means a tight back-and-forth: it does a little, then stops and waits for you to type the next instruction, approve the next step, click the next button. This harness is built the opposite way. Once the goal is set, the loop runs the whole job — figure out the state, make one change, prove it worked, decide whether to keep going — on its own, pass after pass, without a person in the middle.

So what is left for the human? Observability. That is a fancy word for a simple promise: at any moment you can see exactly what the system has done and is doing, without having to take part in it. You read a running log, you read a report, you check a status line. You are a person standing at a window watching a kitchen, not a line cook on the station. The work does not pause because you looked away, and it does not need your hands to continue.

This matters because the failure it prevents is so common. The moment a system needs a human to advance — "waiting for approval", "please confirm", "what should I do next?" — it stops being autonomous. It becomes a thing that runs only as fast as you can babysit it, and stalls the instant you step away. The harness refuses that. The default is: keep going, leave a trail, and only ever stop for the human at one very specific kind of decision (we will get to exactly which one).

There is one more half of the rule, and it is the part people get backwards. Not only does the human stay out of the doing — the human is also never handed the checking. The system does not finish its work and then turn to you with a list of "now please test this" chores. The verification is the machine's job too. Your reading is for your own understanding and trust; it is never a task queue the agents offloaded onto you.

Think of it like… the dashboard of a self-driving car versus being the driving instructor with the second brake. A dashboard shows you speed, route, and what the car sees — you stay informed, and you can take over if something genuinely calls for a human, but you are not steering, and the car does not ask you to grade its parking afterward. The harness puts you in the passenger seat with a great dashboard, not in the instructor's seat pumping the pedals. Where the analogy bends: a car asks you to take over for danger; this harness pulls you in only for a decision that is properly yours to make — not because it got stuck.

"AFK" is the operating mode, not a feature

Every non-trivial task in this suite is driven by the loop, and the loop is designed to run unattended: LEARN → ANALYZE → EXECUTE one bounded unit → VERIFY at the real boundary → DECIDE, repeated until it converges on the measurable done-when from the Scope Gate (lesson 3). Nothing in that cycle blocks on a person. DECIDE chooses "go again" or "converged" from evidence the loop itself gathered, not from a human's say-so.

Observability is the human's only interface

The human reads three things and writes none of the work: a running narrative (LOOP-LOG.md), a final observability report (review.md), and a live status readout. These are append-only outputs of the run — the human consumes them, the agents produce them. The human's hands never enter the execution path; that is what "never in the path" means.

The two halves of the rule

Half one: LLMs never block waiting for a human on routine work. Half two: LLMs never hand the human a QA task. Together they force the system to be genuinely autonomous in both directions — it neither stalls for you nor delegates its checking to you. The single exception is a deliberate, well-defined fork covered in section 8.

Who does what: human, LLM, agent

Three kinds of player share this stage, and the whole method works because each one stays in its lane. It is worth naming them plainly, because the magic of AFK is really just a clean division of labour.

The human — that's you — does exactly one thing during a run: observe. You read the log, you read the report, you glance at status. You set the goal at the start, and you may be pulled back for one special decision at the end, but in between you do not execute and you do not test. The LLMs (the models driving the work) carry the discipline: they never sit idle waiting for you on routine work, and they never turn around and hand you a checking chore — they keep the loop turning and leave a trail behind them. The agents (the orchestrator and the workers it delegates to) do the building and the proving, and they emit the report — review.md — as a record of what was observed about the run, not as a to-do list aimed at you.

Read the diagram below as three lanes that almost never cross. The only place a line reaches back to the human is the single dotted fork on the right — and that is a decision, not a task.

Three lanes that rarely cross. Goal flows down once; log and report flow up to the human's reading. The human never reaches down into the work.

human · observe only LLM · never blocks LLM · never hands off QA agent · emits a report goal in, trail out

In one picture: the human is beside the loop, not inside it

Here is the whole idea in a single shape. The loop spins on its own — each step feeds the next, and "go again" returns to the start without anyone's permission. The human sits beside the loop, reading its outputs, never wired into the chain that makes it turn. The forbidden wiring — the dotted red arrow — is a human placed inside the loop, where the machine would have to stop and wait for a click to continue. That is exactly what AFK removes.

The loop turns on its own; the human reads it through a one-way window. Wiring a person into the ring (dotted red) is the anti-pattern AFK exists to delete.

The one rule

If the system has to wait for a human to take its next routine step, it is not AFK. The human belongs beside the loop reading it — never inside the loop feeding it.

What the human actually reads

"Observability" is only real if there is something concrete to observe. In this harness there are three things, and that's the whole list. You don't dig through internals or attach a debugger; you read three plain outputs the run keeps up to date for you.

First, LOOP-LOG.md — the running narrative. Every pass of the loop appends a few lines: what it learned, the one unit it picked, what it changed, and how it verified. Reading it top to bottom is like reading a ship's log: you see the journey, in order, as it happened. Second, review.md — the report the QA agent writes when the work converges. It is a snapshot of how the finished run looks under inspection: what was checked, what held up, what is worth a human's attention. Third, the status readout — a one-glance "where are we right now": which pass, converged or still going, any blocker. Between these three you always know the past (log), the verdict (review), and the present (status) — without ever touching the controls.

Think of it like… following a long bake through a glass oven door. The LOOP-LOG.md is the timer ticking through each stage; the status light tells you it is still baking or done; and review.md is the note the baker leaves describing how the loaf came out. You learn everything you need by looking — you never have to open the door and stick your hands in.

Three append-only outputs, one direction

LOOP-LOG.md grows by one entry per loop pass and is never rewritten — that immutability is what makes it trustworthy as a record. status is a derived view (current pass, converged flag, open blocker) you can print at any time. review.md is produced once at convergence by the validator — and critically, by an agent that is not the one that built the work, so the report is an independent read, not a self-grade.

Why these and not a live console

A streaming console would tempt a human to jump in and steer. Files and a status command keep the relationship one-way: the human pulls information when they want it, and the loop never waits to see whether they did. Pull, don't push; read, don't drive.

This is the same observability you met in the gates

The proof that VERIFY ran at the real boundary (lesson 3) is exactly what lands in these outputs. Observability and the Proof Gate are two views of one idea: every claim the run makes is backed by something a human can go read.

The models never block — and never delegate the checking

This is the rule the models live by, stated as plainly as possible. On routine work, an LLM never stops to wait for a human. If it hits a fork it can settle from evidence — which file, which fix, whether the test passed — it settles it and keeps moving, leaving the reasoning in the log. It does not post "let me know how you'd like to proceed" and idle. Idling on the routine is the exact behaviour that turns an autonomous loop back into a babysitting job.

The mirror-image rule is just as important: an LLM never hands the human a QA task. When the work is done, the agents do not turn to you and say "please run the tests" or "can you confirm this looks right?" Checking the work is part of the work, and the work belongs to the machine. The report you read afterward is the result of that checking, already done — not a request for you to do it. If you ever feel like the system finished by handing you homework, something has gone wrong with the discipline.

Put the two together and you get the shape of a genuinely AFK system: it neither stalls waiting for you, nor offloads its verification onto you. It runs, it checks itself, it writes down what happened, and it only ever reaches for you at the one decision that is truly yours — which is the next section but one.

"Routine" has a precise edge

Routine = anything resolvable from the artifact, the scope, or a trusted source. A missing fact is fetched (via the Bright Data CLI for web facts, lesson 9), not asked of the human. An ambiguous instruction is resolved by re-reading the done-when, or by picking the safer interpretation and recording the choice. Only an irreversible, outward, or business-intent decision is allowed to leave the routine lane — that is the fork in section 8.

QA stays inside the loop

The VERIFY step is non-negotiable and never exported. The validator that writes review.md is itself an agent in the run — usually a different model from the builder, so it is an independent check rather than a self-grade — but it is still the machine checking, not the human. The human reading review.md is auditing trust, not completing the test plan.

Self-answering, not human-pinging

Where a phase has questions, the agents answer them themselves and only surface a genuinely author-only question with a recommendation attached. The bias is heavily toward keeping the human out of the loop, precisely so the loop can run while the human is away.

Live: a system that needs you vs one you observe

Feel the difference. Below are two ways to run the same long job. The left one is not AFK: it stops and waits for a human at every routine step, so the human's "stale beliefs" (and the stall) pile up the longer they look away. The right one is AFK: it keeps running and just writes down what it does, so the human's picture stays fresh by reading — at zero cost to the run. Press Next round a few times and watch the gap open.

Needs a human in the path

Babysat run

Stops at every routine step to wait for a click. The moment the human looks away, progress stalls and their picture goes stale.

Stalled / stale rounds

Picture is fresh — it just looked.

Waiting for the first round.

AFK + observability

Autonomous run

Keeps running and appends to LOOP-LOG.md every round. The human reads when they like; the picture is never stale and the run never waited.

Stalled / stale rounds

Picture is fresh — it just looked.

Waiting for the first round.

Rounds elapsed: 0

Time moves every round whether the human is watching or not. Only the AFK run keeps making progress and keeps the human informed — because observing is read-only.

Babysat run — blocks on a human each round

async function round() {
  const step = plan();
  await waitForHumanClick();   // stalls here until a person acts
  return run(step);
}

AFK run — never blocks, just records

async function round() {
  const step = plan();
  const out = run(step);          // keeps moving on its own
  appendTo("LOOP-LOG.md", out);     // leaves a trail to OBSERVE
  return out;
}

The whole difference is one line: delete the await waitForHumanClick() and replace it with an appendTo("LOOP-LOG.md", …). That single move turns "a human must be in the path" into "a human may read the trail" — autonomy plus observability, instead of babysitting.

review.md is an observability report, not a to-do list

When a run converges, an agent writes review.md. It is easy to misread what this file is for, so let's be exact. It is a report: a description of how the finished work looks when an independent agent inspects it. It is emphatically not a list of chores assigned to the human. There is no "TODO: test the login flow" waiting for you in there, because the testing already happened — that is what the report is reporting on.

Why does the distinction matter so much? Because the instant a report becomes a to-do, the human is back in the path. "Here are five things for you to verify" is just babysitting wearing a nicer hat — the run didn't really finish, it stopped and delegated. A true observability report closes the loop: it says this was done, here is the evidence, here is what an outside reader should know. You read it to decide whether you trust the result, not to find out what you still have to do.

Think of it like… a home-inspection report when you buy a house. The inspector already climbed onto the roof and ran the taps — the report tells you what they found so you can make a decision. A good inspector hands you findings; a bad one hands you a ladder and says "go check the roof yourself." review.md is the findings, never the ladder.

review.md — an observability report (findings, not chores)

# review.md — RHG search-handler run · converged ✓

## What was checked (by the validator, not the human)
- empty-query guard: covered  — pytest test_empty_query_returns_400 now passes
- regression suite:  green    — 12 passed, 0 failed (was 11 passed, 1 skipped)
- real boundary:     hit      — curl "/search?q=" → 400, observed live

## How it looks under inspection
- change is one bounded unit (a guard in api.py); no scope creep
- LOOP-LOG.md trail is complete: 3 passes, each with proof

## For the reader's awareness (NOT tasks)
- downstream callers of /search were not audited — out of this run's scope
- consider a follow-up goal if empty-body POSTs need the same guard

Read it, don't action it

Open it like any file: cat review.md, or read it in your editor. The "for the reader's awareness" section is observability, not assignment — it names things an outside reader should know, often explicitly out of scope. If you decide one of them matters, that becomes a new goal for a fresh run, set the normal way — it is never an implicit chore the run left for you.

Who writes it

The validator agent — by policy, not the agent that built the change — so the report is an independent read. In a multi-agent run the builder and the validator are different models; the human reading the report is the third, outermost check, auditing trust rather than doing QA.

The one time the loop stops for you

There is exactly one situation where the autonomous run will deliberately pause and pull the human in. It is not because it got stuck, and not because it wants you to test something. It is a user-only fork: a decision that is genuinely yours to make because it is irreversible, reaches outward into the world, or expresses business intent the system can't infer.

Examples: publishing something public, sending money, deleting data that can't be recovered, choosing between two directions that are both valid but mean different things for the business. These aren't routine — no amount of reading the artifact tells you which one the human wants. So the loop stops, but it stops the right way: through a deliberate handoff, presented decision-ready. That means the system has already done all the homework — gathered the facts, laid out the options, and attached its recommendation — so the human makes a clean call and the loop resumes. You are answering one well-framed question, not picking up an abandoned task.

Everything else — every routine fork — the loop settles itself. This single, narrow exception is the only thread that reaches from the loop back to the human's hands, and even then it hands you a decision, fully prepared, never a chore.

Think of it like… an autopilot that flies the whole route, but for one thing asks the captain: "Two valid diversion airports, here's weather, fuel, and my recommendation — your call." It doesn't ask the captain to fly; it doesn't ask the captain to re-check its instruments. It asks the one judgement only a human should own, and it asks it fully briefed.

Almost every decision takes the top (routine) path and never touches the human. Only the bottom (user-only) path pauses — and it pauses decision-ready.

Where the fork lives

In the Forge front-end (lesson 4) this is the handoff mechanism: the run blocks only at a user-only fork and presents it decision-ready. Everywhere else the bias is to self-answer and keep moving. "Decision-ready" is a hard bar — a handoff that just says "what now?" is a defect; it must carry the gathered facts, the laid-out options, and a recommendation.

Three tests for "user-only"

A decision earns a handoff if it is irreversible (can't be cleanly undone — destructive deletes, sends), outward (it leaves the sandbox into the world — publishing, money, third parties), or business intent (it encodes a preference the system can't read off the artifact or the goal). Fail all three and it is routine: the loop decides and logs it.

In the files: how a human observes a live run

Here is what observability looks like as the actual things you'd do while a run is going — all of them reads, none of them steering. You tail the log, you print status, you read the report when it lands. Notice there is no command here that advances the work; that's the point.

observing a run — all reads, never drives it

# watch the running narrative as passes append to it
tail -f LOOP-LOG.md

# one-glance: which pass, converged or going, any blocker
cat status            # or: ./status

# when it converges, read the independent report (findings, not chores)
cat review.md

# there is NO "advance" command — the loop moves itself.
# the human only ever acts at a decision-ready handoff.

Run it yourself

From the run's working directory: tail -f LOOP-LOG.md follows the narrative live; cat status prints the current pass and convergence flag; cat review.md reads the validator's report once it exists. All three are pure reads — running them never changes the run or unblocks it.

The absence is the feature

There is intentionally no loop next or approve command for routine work. The only human action point in the whole run is the handoff at a user-only fork (section 8), and that one is surfaced to you explicitly, decision-ready. If you find yourself looking for a button to push to keep things moving, the system is telling you it doesn't need one.

Worked example: an overnight run, from the human's chair

Let's watch one real-shaped run entirely from the observer's seat. The goal was set in the evening: "make /search reject an empty query with a 400." You go to bed. Here is your whole involvement — four moments of reading, and exactly one decision.

21:40 · youSet the goal, then walk away write once

You write the done-when and start the run. From here you do not type another instruction. The loop takes over: LEARN reads the repo, ANALYZE picks one unit, EXECUTE adds the guard, VERIFY hits the real endpoint. You are asleep for all of it.

23:15 · the runA routine fork — settled without you no human

The agent is unsure whether Flask gives None or "" for an empty param. That's routine — a fact that lives on the web — so it grounds it via the Bright Data CLI and writes the answer into LOOP-LOG.md. It did not wait for you. Reversible, inferable, internal: the loop decides and logs.

07:50 · youRead the trail over coffee read

tail LOOP-LOG.md shows three clean passes, each with proof. cat status reads converged ✓. cat review.md is an independent report: guard covered, 12 passed, real boundary hit at /search?q=. Nobody handed you a test to run — it was already done. You are auditing trust, not doing QA.

07:52 · the forkThe one decision that's yours decision-ready

review.md notes a user-only fork waiting: the fix is ready to ship publicly, and publishing is outward and irreversible. The handoff is decision-ready — diff summarised, risk noted, recommendation: "ship". You answer once. The loop resumes and publishes. That single click is the only time the run needed your hands.

What the human did all night

Set a goal, read three files, answered one decision-ready question. Zero routine steps, zero QA chores. The loop ran AFK end-to-end; observability told you everything; the one fork that was truly yours was handed to you fully prepared. That is the human staying out of the path.

The trail, append-only, that you read in the morning

# LOOP-LOG.md — fix/empty-query (AFK, no human in the path)
[pass 1] LEARN  api.py:42 has no empty-q guard; suite 11 passed, 1 skipped
[pass 1] ANALYZE pick ONE unit → add guard returning 400 on empty q
[pass 1] EXECUTE added guard in api.py
[pass 1] VERIFY  curl "/search?q=" → 400 (real boundary) · pytest 12 passed
[pass 1] DECIDE  done-when met → converged
[fork]   user-only: publish (outward, irreversible) → decision-ready handoff
# human answered: ship → loop resumed → published. no QA delegated.

Quick check: did it stick?

Recall beats re-reading. Answer each from memory before you peek — the option you pick grades instantly, with a note on why. No tells in the formatting; the answers are spread around on purpose.

Q1During a routine AFK run, what is the human's job?

Q2An LLM hits a routine fork it can settle from the artifact. What does it do?

Q3What is review.md?

Q4Which decision is allowed to pause the loop for a human?

Q5What does "decision-ready" mean for a handoff?

Score: 0 / 5

Your agent is your teacher. Want to watch a real run from the observer's seat, or unsure whether a given decision counts as a user-only fork? Ask. Next up — how the loop hands one bounded unit to another model entirely: Cross-agent delegation via cli -p.