A loop that only thinks is a loop that drifts. Three tools keep it honest. Bright Data is how it looks things up on the live web. Computer Use is how it operates a real Mac app. And ultragoal is how it sets a finish line it can actually prove it crossed. This lesson is about what each one is, when to reach for it, and the rule that makes them trustworthy.
An AI working alone is very good at reasoning and surprisingly bad at knowing. It can argue beautifully about a fact it half-remembers, write code against a screen it has never actually seen, and declare a job finished without ever checking. The loop you have been learning fights that tendency with one habit: don't guess — go and look. Tools are how it looks. They are the loop's hands and eyes on the real world.
This lesson covers the three that matter most. Bright Data is the one tool for anything that lives on the web — searching, reading a page, even driving a live browser. Computer Use is for the desktop: it can read and operate an ordinary Mac application, the same buttons and fields a person would click. And ultragoal is different in kind — it isn't a way to reach out and touch something, it's a way to write down, before any work starts, exactly what "done" means and how a finish will be proven. The first two let the loop act on the world and see it; the third makes sure it can tell when it has actually succeeded.
One discipline ties all three together: whenever the loop is about to lean on something it isn't sure of — a fact, a screen, a claim that the work is done — it reaches for the matching tool and gets real evidence instead. A person never has to sit and watch this happen; the same hands-off, observe-only stance from earlier lessons holds here. The tools are simply the loop's approved way to be sure.
Think of it like… a careful journalist. They don't run a story from memory: they phone the source to check a fact (that's Bright Data), they go to the building and try the door themselves (that's Computer Use), and before they file, they have a fixed standard for what counts as a confirmed story — two independent sources, on the record — written down in advance (that's ultragoal). The reporting is the loop; these three are the notebook, the legwork, and the standard the editor will hold them to.
Two of the three are effectors plus sensors — they change or read the world and feed real observations back into the loop. brightdata covers the web surface (SERP, arbitrary page fetch behind bot-walls, a live browser session, and 40+ structured datasets); computer-use covers the native-app surface on macOS via the Accessibility API. The third, ultragoal, is a contract: it does not touch the world at all, it constrains when the loop is allowed to declare victory.
Across all three, the rule from the Proof Gate holds — evidence comes from a real boundary, never from the model's own say-so. Bright Data answers "is this still true?" with a fetched page; Computer Use answers "did the UI actually change?" by driving the app; ultragoal answers "are we done?" with a verifier that runs and can fail. None of them accepts a claim in place of a check.
Before the detail, here is the whole toolbelt on one row. Each card is one tool: what it is for, and the one thing to never do with it. The "never" lines matter as much as the "for" lines — picking the wrong tool is the most common way these go wrong.
The one path to web data. Run a search, read any page (even ones behind bot-walls or CAPTCHA), drive a live browser, or pull from 40+ structured datasets — social, commerce, maps, finance, jobs, real estate.
NEVER WebSearch / WebFetch · NEVER the Bright Data MCP. Always the brightdata CLI.
Operate a native macOS app from the shell — read its accessibility tree, click a button, type into a field, drag a slider or a window, launch an app. The same moves a person makes, done through the Accessibility API.
For native apps only. Web pages → the browser tools. An app with its own MCP → use that MCP.
Turn a fuzzy intent into a durable, verifiable goal: an observable finish line, a verifier that can actually fail, a red-team pass before you commit — and state that survives on disk in SCOPE.md, GOAL.md, LOOP-LOG.md.
Not a web or app tool — it does not touch the world. It decides when the loop is allowed to stop.
Here is where the three sit relative to the loop. The loop reasons in the middle. When it needs a fact or a page, it reaches out through Bright Data to the live web. When it needs to operate an app, it reaches out through Computer Use to the desktop. Both send a real observation back. ultragoal isn't a reach-outward at all — it sits at the edge as the gate that decides whether the loop has actually finished.
Any time the loop needs something that lives on the internet — a fact to confirm, a page to read, a price or a profile to pull, a site to click through — it uses Bright Data, and only Bright Data. It is one uniform command (brightdata) that every assistant can run from its shell, and it does four kinds of job: run a search and read the results; scrape any page, even ones that try to block robots; drive a real browser session step by step; and pull clean, structured records from dozens of big sites — social networks, shops, maps, finance, job boards, real estate.
The reason it is the only web tool is reliability, not preference. A plain web fetch trips over bot-walls, CAPTCHAs and rate limits; Bright Data is built to get through them, so the loop's evidence is a real page instead of a blocked error screen. That is why the rule is strict: for web data, it is always brightdata — never the generic WebSearch or WebFetch, and never the Bright Data "MCP" plug-in. One path, everywhere, so the loop's window on the web behaves the same no matter which assistant is driving.
The habit this enables is the important part. When the loop hits any doubt — "is this still true?", "what does that page actually say?", "is this library current?" — it does not lean on what it half-remembers. It reaches for Bright Data and grounds the answer in a page it just fetched. Guessing is the failure mode; fetching is the fix.
# 1 · search the web (SERP) — read real results, not memory brightdata search "loop engineering harness" # 2 · scrape ANY page — gets through bot-walls / CAPTCHA brightdata scrape "https://example.com/pricing" # 3 · drive a live browser session, step by step brightdata browser "open the page, accept cookies, read the table" # 4 · pull a structured record from a known site (40+ datasets) brightdata dataset linkedin_company_profile "https://www.linkedin.com/company/example-co"
The brightdata CLI is on the PATH for every wired-up agent. From any project directory you can sanity-check that it exists and run a search without leaving the terminal. The point of doing it by hand once is to feel that the loop's "look it up" is a real command with real output, not a black box.
# is the web tool present? command -v brightdata # prints its path if installed # ground a single doubt with a live search brightdata search "loop engineering harness pricing" | head -n 20
The generic fetchers fail on exactly the sites worth reading — anything with a bot-wall, a CAPTCHA, or aggressive rate limiting — so their "evidence" is often a block page. The Bright Data MCP exists but is deliberately not used here: a single CLI is the one uniform path every agent shares through the shell, which keeps behaviour identical across the whole roster. So the standing rule is always the brightdata CLI, for search, scrape, browser, and datasets alike.
Concretely: confirming a fact before acting on it, fetching a page the user referenced, checking whether a claim is still current, enriching a dataset, or pulling a structured record (an X/Reddit/YouTube/LinkedIn item, a product, a listing). It is not for local files or code — those are read directly — and not a replacement for a first-party API you already hold keys for.
Some jobs don't live on the web — they live in an app on your Mac. Computer Use (the computer-use command, also spelled cu) is how the loop operates one. It can read an app's controls, click a button, type into a field, drag a slider or move a window, and launch an app — the same handful of moves a person makes with a mouse and keyboard. It is how the loop both acts on the desktop and verifies a desktop change actually happened: it can drive the UI and then read the UI back.
The clever, calm part is how it does this. It does not grab your real mouse and start moving the cursor while you watch. It works through the Mac's built-in accessibility layer — the same plumbing screen-readers use — so it reads and presses controls directly. That means it never fights you for the pointer, never yanks a window to the front, and stays out of your way. You grant permission once, per app, and from then on the loop can operate that app without interrupting you.
Choosing it is a matter of matching the tool to the surface. A web page is a job for the browser tools, not this. An app that ships its own dedicated integration should be driven through that. But for an ordinary native macOS app with no special integration — Calculator, Notes, Finder, System Settings, some third-party .app — Computer Use is exactly the right hands.
Think of it like… a helper who operates your computer through the accessibility menu rather than by grabbing the mouse out of your hand. They read the labelled controls and activate them directly, so the cursor never jumps and your window stays where you left it — the work gets done quietly in the background while you keep typing.
# grant once, per app (System Settings → Privacy → Accessibility, then a per-app grant) cu grant Calculator # read the app's controls (the accessibility tree) — look before you act cu tree Calculator # press a control by its label / index — no synthetic mouse, cursor never moves cu click Calculator "7" # set a text field's value, or move a slider / window with drag cu type Notes "meeting at 3pm"
computer-use drives apps purely through the macOS Accessibility API — it issues AX press on an element (by index, or by hit-testing an x/y point) and sets values directly, rather than synthesising mouse and keyboard events. Consequences that matter: it never moves the operator's real cursor, never raises or focuses the target app, and so does not interrupt whatever the human is doing. It can read the full accessibility tree, click, set a field's value (type), move a window or slider (drag), launch apps, and authorize apps (grant).
Access is granted per app and some categories are restricted. Browsers are read-only to computer-use (visible in screenshots, but clicks/typing are blocked — use the browser tools for those). Terminals and IDEs are click-only (you can press a Run button but not type into them — use the shell for commands). Everything else is full. So the decision tree is simple: web page → browser tools; an app with its own MCP → that MCP; an ordinary native macOS app → computer-use. macOS only, and it requires Accessibility permission plus a per-app grant.
Because it can both operate the UI and read it back, Computer Use closes its own Proof Gate for desktop work: make the change, then re-read the tree to confirm the field really holds the new value or the button really toggled. That is the loop's "did it actually happen?" answered against the real app, not against a claim.
The point worth feeling, not just reading, is that a tool call moves the loop through real states — and that some moves simply aren't allowed from where you are. Below is one "touch the world" action modelled as a tiny machine: the loop calls a tool, the tool runs, an observation comes back, and only then is it gated against the finish line. Press an event and watch the highlighted state move; buttons grey out the moment a move isn't legal. Try to skip a step and the machine refuses — exactly like the real loop refusing to declare "done" before a verifier has run.
The clay-filled node is where you are now. Faint nodes are unreachable from here.
Current state
REASON
The loop is thinking. It has a doubt to ground — pick a tool and call it.
Allowed transitions
Event log
Everything the demo does is driven by this table. The buttons read it to decide what to enable; pressing one looks up machine[state][event] and moves there. There is exactly one current state, a fixed set of events, and an illegal move simply isn't in the table — so it can't be taken. That is the same shape as the loop refusing to mark a unit done before its verifier has actually run.
const machine = { REASON: { call: 'RUNNING' }, RUNNING: { return: 'OBSERVED' }, OBSERVED: { verify: 'VERIFIED', fail: 'BLOCKED' }, VERIFIED: {}, // terminal — the finish line, reached on evidence BLOCKED: {} // terminal — verifier failed at the boundary }; function send(state, event) { const next = machine[state][event]; return next ?? state; // reject: stay put if the move isn't allowed }
The third tool is not about reaching out into the world at all — it is about deciding, in advance, what counts as done. ultragoal is the discipline of writing a goal that is durable and verifiable before any work starts. "Durable" means it lives on disk, so a crash, a new session, or a different assistant picks up exactly where things stood. "Verifiable" means the goal comes with a way to check it that can actually fail — not a vibe, a test.
Three things make a goal an ultragoal. First, an observable finish line: "done" is stated as something you could point a stranger at and they would agree it's met — not "improve the page" but "the page loads under 200ms on this measurement". Second, a verifier that fails at the boundary: a real check — run the test, hit the endpoint, read the file — that comes back red when the goal isn't met, so success can never be merely claimed. Third, a red-team pass before activation: before committing to the goal, you deliberately try to break it — find the loophole, the way it could be "passed" without really being done — and tighten it. Only then is the goal turned on.
And because it's durable, the goal isn't held in one assistant's head — it's written down in plain files. SCOPE.md captures what's in and out of bounds; GOAL.md states the finish line and how it's verified; LOOP-LOG.md records each pass so a person can audit the run later. That is what lets the loop run unattended and still be trusted: anyone can open those files and see what "done" means and whether it has been reached.
Think of it like… a contractor who won't start until the contract is signed. The contract spells out exactly what "finished" looks like (the finish line), names the inspector who will come and check it against code (the verifier that can fail), and is reviewed by a lawyer hunting for loopholes before anyone signs (the red-team). Once signed, it's filed where both sides can read it — not remembered, written down.
The /ultragoal workflow has three moves and refuses to skip the middle one. Design drafts the goal with a measurable done-when. Critique is the red-team: it adversarially tries to satisfy the letter of the goal while violating its spirit, surfaces the loophole, and tightens the wording — a goal that can be gamed is sent back. Activate only happens after the goal survives critique, at which point it becomes the contract the loop runs against.
The non-negotiable property is that verification runs at the real boundary and can return red. This is the Proof Gate from lesson 3, made durable: GOAL.md's verification block is a command or check that is executed, not a sentence the model writes. If it "passes" only because the model asserts it passes, it is not an ultragoal — it's a wish.
Because the goal, scope, and log are files, an overnight or multi-agent run is resumable and auditable. A fresh assistant — or the same one after a restart — reads SCOPE.md and GOAL.md to know the boundary and the finish line, appends to LOOP-LOG.md, and the human reads those files for observability rather than sitting in the loop. The contract outlives any single session.
It helps to see how the same three tools look from each side of a run. They are one toolbelt, but a person, the reasoning model, and the working agents each relate to them differently — and keeping that straight is what makes the toolbelt safe rather than noisy.
These are how the loop actually touches the real world and persists — Bright Data and Computer Use are its hands and eyes; ultragoal's files are how a run survives and stays auditable. You read those files; you don't sit in the loop.
Whenever it would otherwise guess, it reaches for evidence instead: Bright Data for any web doubt, Computer Use to see a real app, and an ultragoal so it only ever stops on a check that can fail — never on its own say-so.
Each tool is a CLI or skill available everywhere — brightdata, computer-use/cu, and /ultragoal — the one uniform path every assistant on the roster has through its shell, so behaviour is identical no matter who is driving.
The reason all three are shaped as a shell CLI or a loadable skill is portability: the orchestrator and every delegated agent (from lesson 6) get the same tools through the same interface. There's no per-agent special-casing — brightdata means the same thing to Claude, Codex, Grok, GLM and the rest, which is what keeps a heterogeneous crew's evidence consistent and a run reproducible.
Web data is always the brightdata CLI (never WebSearch/WebFetch, never the Bright Data MCP). Native macOS apps with no dedicated MCP are computer-use (web → browser tools; app-with-MCP → that MCP). And a non-trivial run gets an ultragoal so the finish line is observable, durable in SCOPE.md/GOAL.md/LOOP-LOG.md, and gated by a verifier that can fail.
Three quick questions to make the tool choices stick. Pick an answer; you'll see immediately whether it holds.
Q1The loop needs to confirm a current price on a shopping site that blocks ordinary bots. What does it use?
Q2A task needs the loop to click a button in Calculator, an ordinary Mac app with no special integration. Which tool, and why is it safe to run while you work?
Q3What makes a goal an ultragoal rather than just a to-do?
brightdata search, to read an app's accessibility tree with cu tree, or to draft a one-line GOAL.md with a done-when a verifier could check. Next, we look at the engine that turns a finished run into a course like this one: visual-teach: the course engine.