Use this when you want to deep-dive one feature or mechanism — a focused page with an on-page table of contents, tabbed code (algorithm / config / usage), one annotated diagram, and an FAQ that answers the questions readers actually ask.
This is a copyable exemplar. Lift the .demo-card section below into a lesson built from assets/lesson-template.html — the design tokens, tech-toggle, tabs, and SVG patterns are already wired to match.
Feature deep-dive · Rate limiting with the token-bucket algorithm
A rate limiter decides how many requests a single caller is allowed to make in a given window of time. If a client stays under its allowance, every request goes through. If it suddenly floods the server, the limiter starts turning requests away — usually with an HTTP 429 Too Many Requests — so one noisy caller can't starve everyone else.
The token-bucket algorithm is the most common way to do this. Picture a bucket that holds tokens. Every request must take one token to proceed. Tokens drip back in at a steady rate, and the bucket has a maximum size. When the bucket is empty, requests are denied until it refills — but because the bucket can hold a reserve, short bursts are allowed while the long-run average stays capped.
Think of it like… a coin-operated turnstile. Each entry costs one coin. A machine drops a fresh coin into the tray at a fixed pace (say, 5 coins a second), and the tray only holds 10 coins at most. A quick rush can spend the 10 saved-up coins all at once, but after that, people enter only as fast as new coins appear.
The bucket has two parameters: capacity B (max tokens, the burst ceiling) and refill rate r (tokens added per second, the sustained ceiling). State per key is just two numbers: tokens and last_refill_timestamp.
Refill is computed lazily on access rather than by a background timer: tokens = min(B, tokens + (now - last_refill) * r). A request of cost c is admitted iff tokens >= c, decrementing by c; otherwise it is rejected and the caller can compute Retry-After = (c - tokens) / r.
This differs from a fixed-window counter (cheaper, but allows 2× bursts at window edges) and from a leaky bucket (which shapes output to a constant rate and does not permit bursts at all). Token-bucket is O(1) time and O(1) memory per key, which is why it backs most production limiters.
Tokens drip in from the top at a steady rate. Each request you send drains one token. Spend the reserve and the next request gets denied — watch the counter.
Three views of the same feature: the algorithm that decides allow/deny, the config you tune in production, and how you use it as middleware on a route.
import time class TokenBucket: def __init__(self, capacity, refill_rate): self.capacity = capacity # B — burst ceiling self.refill_rate = refill_rate # r — tokens / second self.tokens = capacity self.updated = time.monotonic() def allow(self, cost=1): now = time.monotonic() elapsed = now - self.updated # lazy refill, capped at capacity self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate) self.updated = now if self.tokens >= cost: self.tokens -= cost return True, 0.0 # allowed retry_after = (cost - self.tokens) / self.refill_rate return False, retry_after # denied → 429
# one bucket policy per route tier. key = client API token. buckets: default: capacity: 10 # allow short bursts up to 10 refill_rate: 5 # sustained 5 req/s long-run search: capacity: 30 refill_rate: 10 auth_login: capacity: 5 # brute-force guard: tight burst refill_rate: 1 storage: redis # shared state across app servers key_by: api_token # or: ip, user_id on_deny: "429" # + Retry-After header
from limiter import limit # decorator built on TokenBucket @app.route("/search") @limit(policy="search") # 30 burst, 10/s sustained def search(req): return do_search(req.query) # when the bucket is empty the decorator short-circuits: # HTTP/1.1 429 Too Many Requests # Retry-After: 2 # X-RateLimit-Remaining: 0
Find it yourself: grep -rn "class TokenBucket" limiter/
capacity is a savings account that absorbs those legitimate bursts while refill_rate still bounds the long-run average. If you truly need a perfectly smooth output, that's the leaky-bucket variant instead.
Retry-After header and wait that many seconds before retrying — don't hammer immediately. A good client backs off exponentially with jitter on repeated 429s. The server computes the wait as (cost − tokens) / refill_rate, so it's the exact time until enough tokens exist.
storage: redis) and do the refill-and-decrement atomically — a small Lua script or an INCR/EXPIRE pair — so all servers debit the same bucket.