05.1 Designing systems on a whiteboard

System design interviews are bad at testing system design and good at testing how you think under uncertainty. That’s actually the more useful skill to learn for real engineering work. The mistake most students make is preparing for the interview as if it’s a memorization exercise — “memorize the URL shortener answer, then the Twitter answer, then the chat answer” — and then performing those scripts on the day. Interviewers see through this in roughly 90 seconds and grade accordingly.

The opinion: the interviewer is grading how you think, not what you’ve memorized. The 6-step framework in this module is not a script — it’s the structure that lets you think clearly when you’ve never seen the problem before. Once you internalize it, you’ll find it works equally well for “design a chat app” and for “design our actual production retry queue at work next Tuesday.” That’s the real point.

Set up

You don’t need a stack for this module. You need a whiteboard or excalidraw, 8-12 hours of focused practice, and at least one practice partner who will challenge you. Pair-practice is non-negotiable — you cannot evaluate your own clarity.

# Practice cadence
# - 4 problems, 45 min each, with a partner
# - Self-record (audio) and listen back
# - Compare your output against the references in "Going deeper"

Read these first

Three sources, in this order, then stop:

Donne Martin — System Design Primer. repo · 6 hrs (skim first, deep on relevant sections) · the most-used free reference. Don’t try to read all of it — skim the index, then read sections as questions come up.
Alex Xu — System Design Interview Vol 1, chapters 1-4. book · 4 hrs · the cleanest framework treatment. Worth buying.
Hussein Nasser — Database Engines talks. channel · pick 2-3 at 30 min each · for the parts of system design that touch databases. Far better than reading docs.

You will be tempted to do dozens of “Top 50 System Design Questions” videos. Don’t. Memorizing answers is what produces the bad interview performances this module exists to prevent.

The 6-step framework

Every system design problem, real or interview, follows this structure. Walk it linearly. Do not skip steps because you “already know what to build.”

Step	Time	What you do	What the interviewer learns
1. Clarify	5 min	Ask scoping questions. Confirm constraints.	You don’t build the wrong thing
2. Estimate	5 min	QPS, storage, bandwidth, growth	You can reason about scale
3. API	5 min	Define the public API contract first	You design from the user inward
4. Data	5 min	Schema and storage choices	You pick storage based on access pattern
5. High-level	10 min	Boxes and arrows. Major components only.	You can decompose a problem
6. Deep-dive	15 min	One or two components in detail	You can go all the way down when needed

The most common failure mode: jumping to step 5 in minute 2. Don’t. Steps 1-4 take twenty minutes and they are the difference between a good answer and a generic one.

Step 1 — Clarify

Ask, in this order:

“Who’s using this and what’s the primary action?” (Defines the scope.)
“How many users? How much traffic?” (Defines the scale.)
“What’s the read/write ratio?” (Defines the architecture shape.)
“What constraints — latency, consistency, availability?” (Defines the trade-offs you’ll make.)
“What’s out of scope?” (Stops you from over-building.)

If the interviewer says “you decide,” that’s a test. Make a defensible choice and announce it: “I’ll assume 100M DAU and a 10:1 read/write ratio. Tell me if you want different.”

Step 2 — Estimate

Memorize these numbers; they unlock everything else.

Quantity	Number
Seconds per day	86,400 ≈ 10⁵
100M DAU at 10 actions/day	1000 QPS average, ~3000 QPS peak
1 KB JSON per write	1 GB / 10⁶ writes
1 photo (compressed)	~100 KB
1 video minute	~1-10 MB
Cross-region RTT	~100 ms
Same-DC RTT	~1 ms
Disk seek (HDD)	~10 ms
SSD random read	~100 µs
RAM access	~100 ns
L1 cache	~1 ns

Now estimate live, on the board:

“100M DAU, 10 actions/day → 10⁹ events/day → ~12K QPS average → ~36K QPS peak (3x).”
“Each event is 500 bytes → 500 GB/day → 180 TB/year → 540 TB at 3-year retention.”
“Show your math. Round to one significant figure. The interviewer is checking that you can do this — exact numbers don’t matter.”

Step 3 — API

Sketch the public contract. RPC-style or REST-style — be consistent.

POST /messages
  body: { conversationId, senderId, text, mediaIds[] }
  returns: { messageId, timestamp }

GET /conversations/:id/messages?before=:cursor&limit=20
  returns: { messages[], nextCursor }

WS /conversations/:id/subscribe
  → server pushes new messages

This forces you to commit to the user-facing surface before you optimize storage. Most bad designs start with “let’s use Cassandra” before knowing what the actual reads look like.

Step 4 — Data

For each entity, decide:

Key/index: how is it queried? (Primary key, secondary indexes.)
Storage class: relational, KV, document, search, blob, queue, time-series?
Sharding key: when scale demands it.
Consistency model: strong, eventual, monotonic.

Most interview answers go to NoSQL too fast. Postgres handles a stunning amount of traffic with one read replica and the right indexes. Default to Postgres unless you have a specific reason — and articulate it.

Step 5 — High-level

Boxes and arrows. Major components only. The eight building blocks below cover almost everything.

Step 6 — Deep-dive

The interviewer will pick a component. Be ready to go deep on:

The chat write path, including delivery semantics.
The feed generation strategy (push, pull, hybrid).
The cache invalidation strategy.
The rate-limiting algorithm and where it lives.
The failure mode if your primary database goes down.

If the interviewer doesn’t pick, propose: “I’d like to deep-dive on X because that’s where the hard trade-off is. OK to go there?”

The eight building blocks

These eight cover 90% of any system you’ll design. Know each well enough to use without explaining what it is.

Block	What it solves	Avoid when
Load balancer	Spread traffic, health checking	Single-instance internal services
CDN	Static asset delivery, edge caching	Dynamic per-user content
App server	Stateless request handling	Compute-heavy or batch
Cache (Redis/Memcached)	Hot read path, sessions, rate limits	Strong-consistency reads
Database (Postgres/MySQL)	Source of truth	Time-series, full-text search
Queue (Kafka/SQS/NATS)	Async work, decoupling, backpressure	Sub-millisecond round-trip
Search (Elasticsearch/OpenSearch)	Full-text, faceted queries	Source of truth
Blob store (S3/GCS)	Files, images, video, dataset storage	Sub-millisecond read latency

A clean answer composes these. A bad answer reaches for “let’s use Kafka” without ever explaining why.

The four classic problems, walked through

You should be able to walk each of these end-to-end in 45 minutes. Practice each at least twice with a partner.

URL shortener

Clarify: 100M URLs, 10:1 read:write, sub-100ms read latency
Estimate: write QPS ~120, read QPS ~1200; storage ~100GB/year
API: POST /shorten {url} → {short}; GET /:short → 302 redirect
Data: KV store, key=short_id, value=long_url. Postgres or DynamoDB.
High-level: LB → app → cache → DB. Counter for ID generation.
Deep-dive: ID strategy. Counter? Hash? Range allocation?
  Most teams: range-allocate IDs from a counter, base62-encode, cache.

Twitter timeline

Clarify: 200M DAU, average 200 followers, celebs at 100M
Estimate: post writes ~10K QPS, timeline reads ~1M QPS
API: POST /tweets, GET /timeline
Data: tweets in KV; followers as adjacency list; timelines as cache
High-level: hybrid push-pull
  - Push tweets to follower timelines on write (for non-celebs)
  - Pull tweets from celebs at read time (don't fan out 100M times)
Deep-dive: the push/pull threshold and how you decide it.

Clarify: drivers' positions update every 4s, riders match within ~5s
Estimate: 10M drivers, ~2.5M position updates/sec
API: POST /location, POST /request_ride, WS /driver
Data: positions in geohash-keyed cache (Redis with sorted sets);
      rides in Postgres
High-level: location service + matching service + ride state service
Deep-dive: geohash precision and the search-radius algorithm

Chat (real-time)

Clarify: 1B users, average 50 messages/day, online status, 1:1 + groups
Estimate: ~580K msgs/sec write peak; persistent WS connections
API: WS connection, POST /messages, presence
Data: messages in time-series-keyed store, group fanout via queue
High-level: edge WS layer, message bus, persistence, fanout workers
Deep-dive: message delivery semantics (at-least-once + dedup)

How to spot a bad answer

These four patterns tell you the answer is going to be marked down:

Anti-pattern	What it sounds like
Over-engineering	”I’ll use Kafka, Spark, Flink, Elasticsearch, and Cassandra” — for an MVP
No estimates	Skipping step 2; building for unknown scale
Jumping to NoSQL	”Definitely DynamoDB” without articulating the access pattern
Faking depth	Naming a tech without being able to explain how it solves the problem

The first three are common. The fourth is the lethal one — it’s the difference between a junior and senior performance.

The thing the framework can’t teach you

The framework gives you structure. The structure gets you to “competent.” The leap to “excellent” is doing this with calibration — naming what you don’t know without panicking.

Bad: "I'll use a circuit breaker." (No idea what that does)
Bad: "Hmm, I'm not sure." (Frozen, no progress)
Good: "I'd want a circuit breaker here. I haven't shipped one in production
      myself but the role would be to fail fast when the dependency is down,
      with a half-open recovery state. Want me to keep moving and we can
      come back if needed?"

That third pattern is the thing senior engineers do that juniors don’t. Practice it. Memorize the meta-pattern: name the gap, propose the role of the missing piece, keep moving.

Going deeper

When you have specific questions, in this order:

High Scalability — case studies — read Instagram’s, Discord’s, and Stack Overflow’s writeups in particular. Real architectures, real trade-offs.
Martin Kleppmann — Designing Data-Intensive Applications. book · the bible for the database half of system design. Read after a few interviews.
ByteByteGo — Alex Xu’s video series. Polished and well-paced.
Discord engineering blog — How Discord stores trillions of messages — concrete data on the scale problem chat systems actually face.

Skip the YouTube channels that “do system design in 10 minutes.” Real answers take 45.

Checkpoints

If any wobbles, reread the corresponding section.

Walk through the 6-step framework on a problem you’ve never seen — say “design a feature flag service for 1000 engineers.” Use a timer; aim for 45 minutes.
From memory: rough QPS for 100M DAU at 10 actions/day. Storage for 1 KB events at that volume over 3 years. Show your math.
Why is “default to Postgres” usually a better answer than “default to DynamoDB”? Name a specific access pattern that flips the answer.
Pick the Twitter timeline problem. Why does pure push fan-out break, and why does pure pull break? Describe the hybrid.
Talk for 60 seconds, out loud, about a system you’ve built or used heavily — APIs, data model, scale, failure mode. If you stumble, that’s where to practice next.

When you can answer all five from memory, move to 05.2 Caching, queues, rate limits. The boxes you drew on the whiteboard are about to become real components in code.