Demo type · 17

PR write-up: narrate a change from motivation to rollout

Use this when you want a reviewer (or your future self) to understand why a change happened, which files moved and why, the one tricky bit, and how it ships safely — not just read a diff.

This is a copyable exemplar. Lift the <section> below into a lesson built from assets/lesson-template.html — keep the design tokens and the Simple → Technical toggle pattern intact.

Narrating the rate-limit change

A good change tells a story. Before anyone reads a single line of code, they should know the motivation (why we touched this at all), get a quick file tour (what moved and why), see the focus (the one subtle part worth a second look), and trust the rollout (how it goes live without breaking anyone).

Think of it like… a tour guide, not a map dump. A map shows every street at once; a guide walks you through, points at the one statue that matters, and tells you where the exit is. The four buttons below are stops on that walk.

acme/api · pull request #2184

Replace fixed-window rate limiter with sliding-window counter

Ready for review +318 −96 4 files flag: ratelimit_sliding_v2

Why we're touching the rate limiter at all.

Our old limiter counted requests in fixed one-minute buckets. At the boundary between two buckets a caller could fire a full minute's quota in the last second of one window and again in the first second of the next — twice the intended traffic in a heartbeat. That burst tipped a downstream service into timeouts twice last week.

The pain

Boundary bursts let clients send 2× the allowed rate for ~2 seconds, overloading the payments service.

The goal

Smooth enforcement so the limit holds across any rolling 60-second span, not just clock-aligned minutes.

Why now

Two incidents in seven days; the workaround (lower the cap) hurt well-behaved customers.

Four files moved. Here's each one and why.

add limiter/sliding_window.go The new algorithm. Keeps a small ring of per-second sub-counts and sums the trailing 60 of them. This is the heart of the change.
edit limiter/middleware.go Swaps the limiter behind the flag. When ratelimit_sliding_v2 is off, the old fixed-window path is untouched — zero behavior change by default.
edit config/flags.yaml Registers the new flag, defaulted to false. Lets us enable per-environment without a redeploy.
test limiter/sliding_window_test.go Reproduces the boundary-burst case that fixed windows fail, and asserts the new limiter blocks it. The test is the spec.

The one tricky bit worth a careful read.

Read this slowly: the ring buffer is shared across goroutines, so the read-sum-and-increment must be atomic. A naïve "check then add" lets two concurrent requests both pass when only one slot remains.

limiter/sliding_window.go — the critical section

// guarded by sw.mu — do NOT split the read and the write
func (sw *SlidingWindow) Allow(now time.Time) bool {
  sw.mu.Lock()
  defer sw.mu.Unlock()

  sw.evictOlderThan(now.Add(-60 * time.Second))
  if sw.total >= sw.limit {
    return false          // over budget for the trailing 60s
  }
  sw.record(now)        // increment INSIDE the same lock
  return true
}

If you only review one hunk, make it this one. Everything else is plumbing; correctness lives here.

How it goes live without waking anyone at 3am.

Ship dark behind a flag

Merge with ratelimit_sliding_v2=false everywhere. No traffic hits the new path. Safe to merge today.
Canary at 5% in staging, then prod

Enable the flag for 5% of API keys. Watch for 1 hour before widening.
Monitor the right signals

Dashboards: ratelimit.rejections (should rise slightly), payments.timeouts (should drop to zero), and limiter.allow.p99 latency (must stay under 1ms). Alert if p99 doubles.
Ramp 5% → 25% → 100%

Step up only when the prior step is clean for a full hour. Full rollout expected over two days.
Rollback plan

Flip ratelimit_sliding_v2=false — instant revert to the old limiter, no deploy. Keep the flag for two weeks, then delete the dead path in a follow-up PR.

Stop 1 of 4 · Motivation

The reviewer's job, made cheap

A diff answers "what changed?" but a reviewer's real questions are "why?", "where do I look?", and "will this break prod?". The four-stage shape front-loads exactly those answers, so review time drops and the right scrutiny lands on the right hunk.

Motivation prevents the wrong fix

Without the "why now" framing, a reviewer can't tell whether lowering a config value would have sufficed. Stating the incident count and the rejected workaround pre-empts the bikeshed.

Focus directs attention

Most hunks are mechanical; one is load-bearing. Naming the concurrency-critical section (read-check-increment under one lock) means the careful review happens where a bug would actually hurt, not spread thin across renamed variables.

Rollout earns trust

Flag-gated, canaried, monitored on named metrics, with a no-deploy rollback. The reviewer approves a plan, not a leap — the merge is reversible by a config flip, not a revert-and-redeploy scramble.

The whole arc in one picture

Read it left to right: the change starts with a reason and ends safely in production. The four stops above are these four beats.

The forward path ships the change; the dashed red path is the instant rollback — a config flip, not a redeploy.