Build Masterclass

duels

How one developer made a 1v1 quiz platform fast, cheat-resistant, and portable on a managed edge stack, decision by decision.

What you'll learn

10 chapters

Scroll to begin ↓

01·The game-module boundary

You are building one game today and a second one is already on the roadmap.

If game two means a rewrite, the roadmap is a lie.

The trap

Ship the first game fast, hard-code its rules into the match loop, and promise to 'extract an interface later'. Later never has time, and by then the lobby, the scoring, the reveal flow all assume two numbers being compared.

Platform: lobby + match + scoringGameModule contract (registry)higher-or-lower / multi-choice

Toggle each layer to see what the system relies on when the other is gone.

The decision

A single GameModule contract that every game implements; the platform talks only to the interface, never to a specific game.

The platform code (lobby, match orchestration, scoring shell, anti-cheat) never imports higher-or-lower directly. It calls gameModuleForCategory(category), which switches on category.gameType and returns the right module. When multi-choice arrived months later it slotted in as a second case in that switch, and the three existing categories kept working by adding one gameType field. The abstraction paid for itself the first time it was used, not the day it was written.

Road not taken

Discovering the boundary later, after game two: the spec is explicit that the interface was designed up front because the whole point of the platform was that game two lands as a module, not a rewrite. Waiting would have meant unwinding higher-or-lower assumptions out of the lobby and scoring at the worst possible time.

02·Realtime without a server to run

Two strangers race to answer the same question.

The 'opponent answered' ping has to land in under 100 ms, and you are one person with no ops budget.

The trap

Stand up a Node WebSocket server with Socket.io and Redis for shared state. Now you own a process that can crash, a Redis you have to scale, sticky sessions for the socket layer, and a pager that goes off at 2 a.m., all to move a few 50-byte messages.

Lobby DO (FIFO queue)Match DO (one per match, in-memory state)D1 (session boundaries only)

Toggle each layer to see what the system relies on when the other is gone.

The decision

One Durable Object per match, one Lobby DO for the queue, running at the Warsaw edge via PartyKit, with native WebSocket and no message protocol on top.

Each match is its own stateful object holding the game in memory, so there is no global lock and no shared cache to keep consistent. State lives where the match lives. D1 is touched only at session boundaries (login, match start, match end), never on the per-message hot path, so the thing that has to be fast (the ping) never waits on a database. Native WebSocket instead of Socket.io drops the protocol overhead the spec called out as pure cost for 50-byte payloads.

Road not taken

Socket.io + Redis on a Hetzner box. The spec keeps it as the documented migration target for >10k DAU and estimates roughly a week of work, precisely because the transport layer was kept thin and isolated from day one. It is the fallback, not the start.

03·Keeping the exit unlocked

You picked a managed platform to move fast.

The fear that follows every such choice: the day the bill or the limits force you off it, you are trapped.

The trap

Let Cloudflare-specific APIs leak into the game rules and the schema. Durable Object handles in the scoring code, D1 quirks in the queries, and 'portable' becomes a word in the README that no longer matches the code.

The decision

Game logic is pure TypeScript with zero Cloudflare imports; the transport layer is kept deliberately thin so it can be swapped without touching the rules.

The game engine runs in a Worker, a browser, and a Node test unchanged, because it depends on no platform API. The registry comment makes 'keep it pure and sync so it runs in any environment' an explicit rule. That same purity is what lets the same scoring code replay a run on the server to catch cheaters. The migration plan reads as 'replace PartyKit transport, keep everything else' and budgets about a week, which is only credible because the expensive parts never learned they were on Cloudflare.

Road not taken

Full Cloudflare optimization with DO handles threaded through the game code. Faster to write on day one, but it converts the documented one-week migration into a rewrite, the spec treats the thin, isolated transport as the price of keeping that exit cheap.

04·Anonymous-first identity

The whole product promise is 'type a nick and play in three seconds'.

But everything that matters (rating, streaks, leaderboard rank) lives in localStorage, and one cache clear wipes a player's entire history.

The trap

Put a login wall in front of play to make identity durable. You protect the rating and lose the player, because the three-second promise is the product. Or skip accounts entirely and watch retention leak out through every cleared cache and every new phone.

The decision

Better Auth creates a real anonymous account on first connection; the player still just types a nick, but identity now lives server-side and can be upgraded later without ever blocking play.

An anonymous user is a first-class server account from the first touch, so the session cookie, not localStorage, owns identity, and it survives a cache clear or a device switch. The nick-and-play flow is untouched because account creation happens silently on connection. Upgrading to a passkey, Google, or magic-link is offered only after the player has something worth saving (a first win, a top-50 rank), so the cost of signing in is paid exactly when the value is visible, never before.

Road not taken

A login wall, rejected because it trades the three-second first-touch, the core of the product, for durability. The spec frames localStorage-only identity as the dominant churn risk for a product that wants 12-month retention, so doing nothing was not on the table either.

05·Bridging the cookie to the edge

Identity lives in a session cookie on your domain.

The realtime game lives on a different origin entirely, *.partykit.dev, where that cookie does not exist.

The trap

Reach for the easy fix and let the realtime endpoint mint a token from whatever nick the request body claims. Now anyone can ask for a token under any name, with no session behind it, and the 'identity' the whole match system trusts is just a string a stranger typed.

The decision

A SvelteKit endpoint reads the verified Better Auth session and mints a short-lived HS256 JWT whose subject is the stable user id; PartyKit only ever trusts that signed token.

The cookie cannot cross to *.partykit.dev, so the bridge is a stateless token the edge can verify on its own with no callback to the web app. Crucially the JWT subject is the stable Better Auth user id, not a fresh value per call, so leaderboard nick-hashes, lobby pairing keys, and match-room slots all stay bound to the same identity across sessions and devices. The earlier path, where the lobby minted tokens straight from a request body with no session check, was flagged in the security audit precisely because the subject was random and the nick was unverified; moving the mint behind the session closes that.

Road not taken

Sharing the session cookie across origins, impossible for *.partykit.dev, or keeping the old body-supplied token mint, which the audit shows lets any web page mint a player token from a victim's rate-limit budget. The bridge exists so all the existing PartyKit verification code stays unchanged and only the token issuer moves.

06·Carrying progress across the upgrade

A player has been grinding as a guest, a rating, a streak, leaderboard entries.

The moment they finally register a passkey, all of that is owned by an anonymous account that is about to disappear.

The trap

Treat the upgrade as a fresh start: the new account is empty, the guest's rating and history evaporate. The player did exactly what you asked, committed to the product, and you punished them for it with a wiped profile.

The decision

On upgrade, re-parent every game-state row from the anonymous user onto the real one inside Better Auth's onLinkAccount hook, then delete the anonymous shell.

Counters are summed, solo bests are merged per bucket keeping the better entry, and matches are re-pointed, so the player keeps everything they earned. But the Glicko rating deliberately keeps the real user's value rather than importing the anonymous one, because anonymous ratings are noisy and re-importing them would punish people the moment they upgrade. The current streak resets at the anon-to-real boundary on purpose: the next match's identity is genuinely a different user, so pretending the streak continued would be a lie the data shouldn't tell.

Road not taken

Re-importing the anonymous rating verbatim, rejected in the merge code's own comment because noisy guest ratings would drag down a careful player right as they commit. A clean fresh start was rejected for the opposite reason, it makes upgrading feel like a loss, killing the conversion the whole anonymous-first design exists to earn.

07·Replaying the run to score it

Your solo leaderboard is public and global.

The client knows the questions, the answers, and its own score. Whatever number it sends, you are about to rank the whole world by it.

The trap

Trust the client's reported score. It is the obvious move, the browser already computed it. And it works perfectly until the first person opens the network tab, posts correctCount: 1000, and parks themselves at the top of every board forever.

The decision

The client submits its seed and answers; the server replays the entire run with the same deterministic engine and scores it from scratch, the client's tally is never trusted.

Because the game engine is pure and deterministic, the server can re-run the exact same rounds from the same seed and arrive at the only score it will accept. The client's number is treated as a claim to be verified, not a result to be stored. Two cheap sanity floors back it up: a 1500 ms-per-round minimum that lets fast humans through but flags scripted replays, and a single-use run token persisted to DO storage so an eviction can't be used to redeem the same run twice. The security audit confirmed this path holds, spamming the board still requires actually playing a correct game.

Road not taken

Trusting the client tally, rejected because it makes the leaderboard meaningless the day someone reads the docs. Running the full game server-side per player was unnecessary: replay gets the same guarantee at session boundaries only, keeping the per-message path free of the database.

08·Failing loud on a weak secret

Every JWT that binds a player to a match is only as strong as one secret.

Set it to a short dev placeholder by accident, and every token in the system is forgeable. Silently. With everything still appearing to work.

The trap

Check only that the secret exists. A one-byte 'x' passes. A copy-pasted dev placeholder passes. The app boots green, signs tokens with a key an attacker can guess, and nothing tells you until someone forges their way into a match.

The decision

getSecret enforces a 32-character minimum and a blacklist of known leaked dev placeholders, throwing at startup rather than booting with a weak key.

A missing secret is an obvious failure; a weak one is the dangerous case because everything keeps working while every token is forgeable. So the guard rejects both short keys and the specific placeholders that have leaked before, and it throws loudly instead of degrading quietly. A misconfigured deploy fails to start, which is the only safe failure mode for a signing key. The same discipline was later mirrored into the web app's Better Auth secret check, so neither half of the system can boot on a guessable key.

Road not taken

A bare non-empty check, the default most projects ship, rejected because it lets a three-byte secret through, which is functionally no protection. Trusting the environment to always be set correctly was rejected too: the README documents that an older build leaked a placeholder, so the blacklist exists because the failure already happened once.

09·A live counter without a bottleneck

The homepage needs to show 'players dueling now' to prove the place is alive.

Every visitor wants that number, and at viral peak that is a lot of reads hitting one shared piece of state.

The trap

Point every homepage load at the single object that holds the count. Durable Objects are single-threaded per id, so one global counter read by every visitor becomes the exact bottleneck you built the edge to avoid: the busiest page in the product, funneled through one lock.

Homepage read (Cache API, 5s TTL)Presence counter DO (global)Lobby + Match DOs (HMAC writes)

Toggle each layer to see what the system relies on when the other is gone.

The decision

Lobby and match objects push state changes into one Presence counter DO; the homepage reads a cached endpoint, so the counter sees writes but almost no reads.

The counter only ever sees writes it cannot avoid, a lobby heartbeat at about one per second and match start/end events, which stays far under the per-object request ceiling. Reads are absorbed by the Cache API at roughly one counter hit every five seconds no matter how many people are on the page, so a million viewers and ten viewers cost the counter the same. Staleness is handled honestly: a heartbeat older than thirty seconds is treated as zero in case the lobby hibernated, and a DO alarm sweeps match entries that a crash left behind.

Road not taken

Pushing the count to the homepage over WebSocket, the spec rejects it as a different complexity tier and notes polling is enough for a number that changes slowly. Letting every visitor read the DO directly is the trap itself: it recreates the singleton bottleneck the edge architecture was chosen to escape.

10·Signing the calls between your own services

Your match objects and your web app are separate services on separate origins that have to talk, to push counts, to persist results.

The counter has a public address. Anyone can POST to it.

The trap

Assume that because these are 'internal' calls, they are safe. But there is no network boundary at the edge. The counter endpoint is reachable from the open internet, so an unsigned 'internal' POST is just an open write endpoint waiting to be flooded with fake match-start events.

The decision

Every service-to-service call carries an HMAC-SHA256 signature over a timestamp plus the body; the receiver recomputes it in constant time and rejects anything stale or unsigned.

There is no private network between these Workers, so trust has to travel in the request itself, not in where it came from. The signature covers the body so it can't be tampered with, and it covers a timestamp rejected beyond a thirty-second skew so a captured request can't be replayed later. The comparison is constant-time to avoid leaking the signature byte by byte through timing. If the shared secret is ever missing, presence calls are skipped rather than sent unsigned, matches keep working, the count just freezes, which is the safe direction to fail.

Road not taken

Trusting internal calls implicitly was rejected because at the edge there is no 'internal'; the endpoint is on the public internet. IP-allowlisting was not viable either, since edge Workers don't present stable origin IPs to each other the way boxes on a VPC would.