Build Masterclass
How one developer made a 1v1 quiz platform fast, cheat-resistant, and portable on a managed edge stack, decision by decision.
What you'll learn
Scroll to begin ↓
01·The game-module boundary
If game two means a rewrite, the roadmap is a lie.
Ship the first game fast, hard-code its rules into the match loop, and promise to 'extract an interface later'. Later never has time, and by then the lobby, the scoring, the reveal flow all assume two numbers being compared.
Toggle each layer to see what the system relies on when the other is gone.
A single GameModule contract that every game implements; the platform talks only to the interface, never to a specific game.
The platform code (lobby, match orchestration, scoring shell, anti-cheat) never imports higher-or-lower directly. It calls gameModuleForCategory(category), which switches on category.gameType and returns the right module. When multi-choice arrived months later it slotted in as a second case in that switch, and the three existing categories kept working by adding one gameType field. The abstraction paid for itself the first time it was used, not the day it was written.
Discovering the boundary later, after game two: the spec is explicit that the interface was designed up front because the whole point of the platform was that game two lands as a module, not a rewrite. Waiting would have meant unwinding higher-or-lower assumptions out of the lobby and scoring at the worst possible time.
02·Realtime without a server to run
The 'opponent answered' ping has to land in under 100 ms, and you are one person with no ops budget.
Stand up a Node WebSocket server with Socket.io and Redis for shared state. Now you own a process that can crash, a Redis you have to scale, sticky sessions for the socket layer, and a pager that goes off at 2 a.m., all to move a few 50-byte messages.
Toggle each layer to see what the system relies on when the other is gone.
One Durable Object per match, one Lobby DO for the queue, running at the Warsaw edge via PartyKit, with native WebSocket and no message protocol on top.
Each match is its own stateful object holding the game in memory, so there is no global lock and no shared cache to keep consistent. State lives where the match lives. D1 is touched only at session boundaries (login, match start, match end), never on the per-message hot path, so the thing that has to be fast (the ping) never waits on a database. Native WebSocket instead of Socket.io drops the protocol overhead the spec called out as pure cost for 50-byte payloads.
Socket.io + Redis on a Hetzner box. The spec keeps it as the documented migration target for >10k DAU and estimates roughly a week of work, precisely because the transport layer was kept thin and isolated from day one. It is the fallback, not the start.
03·Keeping the exit unlocked
The fear that follows every such choice: the day the bill or the limits force you off it, you are trapped.
Let Cloudflare-specific APIs leak into the game rules and the schema. Durable Object handles in the scoring code, D1 quirks in the queries, and 'portable' becomes a word in the README that no longer matches the code.
Game logic is pure TypeScript with zero Cloudflare imports; the transport layer is kept deliberately thin so it can be swapped without touching the rules.
The game engine runs in a Worker, a browser, and a Node test unchanged, because it depends on no platform API. The registry comment makes 'keep it pure and sync so it runs in any environment' an explicit rule. That same purity is what lets the same scoring code replay a run on the server to catch cheaters. The migration plan reads as 'replace PartyKit transport, keep everything else' and budgets about a week, which is only credible because the expensive parts never learned they were on Cloudflare.
Full Cloudflare optimization with DO handles threaded through the game code. Faster to write on day one, but it converts the documented one-week migration into a rewrite, the spec treats the thin, isolated transport as the price of keeping that exit cheap.
04·Anonymous-first identity
But everything that matters (rating, streaks, leaderboard rank) lives in localStorage, and one cache clear wipes a player's entire history.
Put a login wall in front of play to make identity durable. You protect the rating and lose the player, because the three-second promise is the product. Or skip accounts entirely and watch retention leak out through every cleared cache and every new phone.
Better Auth creates a real anonymous account on first connection; the player still just types a nick, but identity now lives server-side and can be upgraded later without ever blocking play.
An anonymous user is a first-class server account from the first touch, so the session cookie, not localStorage, owns identity, and it survives a cache clear or a device switch. The nick-and-play flow is untouched because account creation happens silently on connection. Upgrading to a passkey, Google, or magic-link is offered only after the player has something worth saving (a first win, a top-50 rank), so the cost of signing in is paid exactly when the value is visible, never before.
A login wall, rejected because it trades the three-second first-touch, the core of the product, for durability. The spec frames localStorage-only identity as the dominant churn risk for a product that wants 12-month retention, so doing nothing was not on the table either.
05·Bridging the cookie to the edge
The realtime game lives on a different origin entirely, *.partykit.dev, where that cookie does not exist.
Reach for the easy fix and let the realtime endpoint mint a token from whatever nick the request body claims. Now anyone can ask for a token under any name, with no session behind it, and the 'identity' the whole match system trusts is just a string a stranger typed.
A SvelteKit endpoint reads the verified Better Auth session and mints a short-lived HS256 JWT whose subject is the stable user id; PartyKit only ever trusts that signed token.
The cookie cannot cross to *.partykit.dev, so the bridge is a stateless token the edge can verify on its own with no callback to the web app. Crucially the JWT subject is the stable Better Auth user id, not a fresh value per call, so leaderboard nick-hashes, lobby pairing keys, and match-room slots all stay bound to the same identity across sessions and devices. The earlier path, where the lobby minted tokens straight from a request body with no session check, was flagged in the security audit precisely because the subject was random and the nick was unverified; moving the mint behind the session closes that.
Sharing the session cookie across origins, impossible for *.partykit.dev, or keeping the old body-supplied token mint, which the audit shows lets any web page mint a player token from a victim's rate-limit budget. The bridge exists so all the existing PartyKit verification code stays unchanged and only the token issuer moves.
06·Carrying progress across the upgrade
The moment they finally register a passkey, all of that is owned by an anonymous account that is about to disappear.
Treat the upgrade as a fresh start: the new account is empty, the guest's rating and history evaporate. The player did exactly what you asked, committed to the product, and you punished them for it with a wiped profile.
On upgrade, re-parent every game-state row from the anonymous user onto the real one inside Better Auth's onLinkAccount hook, then delete the anonymous shell.
Counters are summed, solo bests are merged per bucket keeping the better entry, and matches are re-pointed, so the player keeps everything they earned. But the Glicko rating deliberately keeps the real user's value rather than importing the anonymous one, because anonymous ratings are noisy and re-importing them would punish people the moment they upgrade. The current streak resets at the anon-to-real boundary on purpose: the next match's identity is genuinely a different user, so pretending the streak continued would be a lie the data shouldn't tell.
Re-importing the anonymous rating verbatim, rejected in the merge code's own comment because noisy guest ratings would drag down a careful player right as they commit. A clean fresh start was rejected for the opposite reason, it makes upgrading feel like a loss, killing the conversion the whole anonymous-first design exists to earn.
07·Replaying the run to score it
The client knows the questions, the answers, and its own score. Whatever number it sends, you are about to rank the whole world by it.
Trust the client's reported score. It is the obvious move, the browser already computed it. And it works perfectly until the first person opens the network tab, posts correctCount: 1000, and parks themselves at the top of every board forever.
The client submits its seed and answers; the server replays the entire run with the same deterministic engine and scores it from scratch, the client's tally is never trusted.
Because the game engine is pure and deterministic, the server can re-run the exact same rounds from the same seed and arrive at the only score it will accept. The client's number is treated as a claim to be verified, not a result to be stored. Two cheap sanity floors back it up: a 1500 ms-per-round minimum that lets fast humans through but flags scripted replays, and a single-use run token persisted to DO storage so an eviction can't be used to redeem the same run twice. The security audit confirmed this path holds, spamming the board still requires actually playing a correct game.
Trusting the client tally, rejected because it makes the leaderboard meaningless the day someone reads the docs. Running the full game server-side per player was unnecessary: replay gets the same guarantee at session boundaries only, keeping the per-message path free of the database.
08·Failing loud on a weak secret
Set it to a short dev placeholder by accident, and every token in the system is forgeable. Silently. With everything still appearing to work.
Check only that the secret exists. A one-byte 'x' passes. A copy-pasted dev placeholder passes. The app boots green, signs tokens with a key an attacker can guess, and nothing tells you until someone forges their way into a match.
getSecret enforces a 32-character minimum and a blacklist of known leaked dev placeholders, throwing at startup rather than booting with a weak key.
A missing secret is an obvious failure; a weak one is the dangerous case because everything keeps working while every token is forgeable. So the guard rejects both short keys and the specific placeholders that have leaked before, and it throws loudly instead of degrading quietly. A misconfigured deploy fails to start, which is the only safe failure mode for a signing key. The same discipline was later mirrored into the web app's Better Auth secret check, so neither half of the system can boot on a guessable key.
A bare non-empty check, the default most projects ship, rejected because it lets a three-byte secret through, which is functionally no protection. Trusting the environment to always be set correctly was rejected too: the README documents that an older build leaked a placeholder, so the blacklist exists because the failure already happened once.
09·A live counter without a bottleneck
Every visitor wants that number, and at viral peak that is a lot of reads hitting one shared piece of state.
Point every homepage load at the single object that holds the count. Durable Objects are single-threaded per id, so one global counter read by every visitor becomes the exact bottleneck you built the edge to avoid: the busiest page in the product, funneled through one lock.
Toggle each layer to see what the system relies on when the other is gone.
Lobby and match objects push state changes into one Presence counter DO; the homepage reads a cached endpoint, so the counter sees writes but almost no reads.
The counter only ever sees writes it cannot avoid, a lobby heartbeat at about one per second and match start/end events, which stays far under the per-object request ceiling. Reads are absorbed by the Cache API at roughly one counter hit every five seconds no matter how many people are on the page, so a million viewers and ten viewers cost the counter the same. Staleness is handled honestly: a heartbeat older than thirty seconds is treated as zero in case the lobby hibernated, and a DO alarm sweeps match entries that a crash left behind.
Pushing the count to the homepage over WebSocket, the spec rejects it as a different complexity tier and notes polling is enough for a number that changes slowly. Letting every visitor read the DO directly is the trap itself: it recreates the singleton bottleneck the edge architecture was chosen to escape.
10·Signing the calls between your own services
The counter has a public address. Anyone can POST to it.
Assume that because these are 'internal' calls, they are safe. But there is no network boundary at the edge. The counter endpoint is reachable from the open internet, so an unsigned 'internal' POST is just an open write endpoint waiting to be flooded with fake match-start events.
Every service-to-service call carries an HMAC-SHA256 signature over a timestamp plus the body; the receiver recomputes it in constant time and rejects anything stale or unsigned.
There is no private network between these Workers, so trust has to travel in the request itself, not in where it came from. The signature covers the body so it can't be tampered with, and it covers a timestamp rejected beyond a thirty-second skew so a captured request can't be replayed later. The comparison is constant-time to avoid leaking the signature byte by byte through timing. If the shared secret is ever missing, presence calls are skipped rather than sent unsigned, matches keep working, the count just freezes, which is the safe direction to fail.
Trusting internal calls implicitly was rejected because at the edge there is no 'internal'; the endpoint is on the public internet. IP-allowlisting was not viable either, since edge Workers don't present stable origin IPs to each other the way boxes on a VPC would.