PR write-up: narrate a change from motivation to rollout
Use this when you want a reviewer (or your future self) to understand why a change happened, which files moved and why, the one tricky bit, and how it ships safely — not just read a diff.
This is a copyable exemplar. Lift the <section> below into a lesson built from assets/lesson-template.html — keep the design tokens and the Simple → Technical toggle pattern intact.
1
Narrating the rate-limit change
A good change tells a story. Before anyone reads a single line of code, they should know the motivation (why we touched this at all), get a quick file tour (what moved and why), see the focus (the one subtle part worth a second look), and trust the rollout (how it goes live without breaking anyone).
Think of it like… a tour guide, not a map dump. A map shows every street at once; a guide walks you through, points at the one statue that matters, and tells you where the exit is. The four buttons below are stops on that walk.
acme/api · pull request #2184
Replace fixed-window rate limiter with sliding-window counter
Ready for review+318 −964 filesflag: ratelimit_sliding_v2
Why we're touching the rate limiter at all.
Our old limiter counted requests in fixed one-minute buckets. At the boundary between two buckets a caller could fire a full minute's quota in the last second of one window and again in the first second of the next — twice the intended traffic in a heartbeat. That burst tipped a downstream service into timeouts twice last week.
The pain
Boundary bursts let clients send 2× the allowed rate for ~2 seconds, overloading the payments service.
The goal
Smooth enforcement so the limit holds across any rolling 60-second span, not just clock-aligned minutes.
Why now
Two incidents in seven days; the workaround (lower the cap) hurt well-behaved customers.
Four files moved. Here's each one and why.
addlimiter/sliding_window.goThe new algorithm. Keeps a small ring of per-second sub-counts and sums the trailing 60 of them. This is the heart of the change.
editlimiter/middleware.goSwaps the limiter behind the flag. When ratelimit_sliding_v2 is off, the old fixed-window path is untouched — zero behavior change by default.
editconfig/flags.yamlRegisters the new flag, defaulted to false. Lets us enable per-environment without a redeploy.
testlimiter/sliding_window_test.goReproduces the boundary-burst case that fixed windows fail, and asserts the new limiter blocks it. The test is the spec.
The one tricky bit worth a careful read.
Read this slowly: the ring buffer is shared across goroutines, so the read-sum-and-increment must be atomic. A naïve "check then add" lets two concurrent requests both pass when only one slot remains.
limiter/sliding_window.go — the critical section
// guarded by sw.mu — do NOT split the read and the writefunc (sw *SlidingWindow) Allow(now time.Time) bool {
sw.mu.Lock()
defer sw.mu.Unlock()
sw.evictOlderThan(now.Add(-60 * time.Second))
if sw.total >= sw.limit {
returnfalse// over budget for the trailing 60s
}
sw.record(now) // increment INSIDE the same lockreturntrue
}
If you only review one hunk, make it this one. Everything else is plumbing; correctness lives here.
How it goes live without waking anyone at 3am.
Ship dark behind a flag
Merge with ratelimit_sliding_v2=false everywhere. No traffic hits the new path. Safe to merge today.
Canary at 5% in staging, then prod
Enable the flag for 5% of API keys. Watch for 1 hour before widening.
Monitor the right signals
Dashboards: ratelimit.rejections (should rise slightly), payments.timeouts (should drop to zero), and limiter.allow.p99 latency (must stay under 1ms). Alert if p99 doubles.
Ramp 5% → 25% → 100%
Step up only when the prior step is clean for a full hour. Full rollout expected over two days.
Rollback plan
Flip ratelimit_sliding_v2=false — instant revert to the old limiter, no deploy. Keep the flag for two weeks, then delete the dead path in a follow-up PR.
Stop 1 of 4 · Motivation
The reviewer's job, made cheap
A diff answers "what changed?" but a reviewer's real questions are "why?", "where do I look?", and "will this break prod?". The four-stage shape front-loads exactly those answers, so review time drops and the right scrutiny lands on the right hunk.
Motivation prevents the wrong fix
Without the "why now" framing, a reviewer can't tell whether lowering a config value would have sufficed. Stating the incident count and the rejected workaround pre-empts the bikeshed.
Focus directs attention
Most hunks are mechanical; one is load-bearing. Naming the concurrency-critical section (read-check-increment under one lock) means the careful review happens where a bug would actually hurt, not spread thin across renamed variables.
Rollout earns trust
Flag-gated, canaried, monitored on named metrics, with a no-deploy rollback. The reviewer approves a plan, not a leap — the merge is reversible by a config flip, not a revert-and-redeploy scramble.
2
The whole arc in one picture
Read it left to right: the change starts with a reason and ends safely in production. The four stops above are these four beats.
The forward path ships the change; the dashed red path is the instant rollback — a config flip, not a redeploy.