$ yuktics v0.1

T2 — CS Theory You Actually Need module 02.1 ~10–14 hrs

Data structures, built not memorized

The DSA module that isn't a LeetCode grind. Build the dozen data structures that matter from scratch — once each — and learn to recognize the situations where each one is the right answer.

Prerequisites

  • 01.1 (Python)
  • comfort with functions, loops, classes

Stack

  • Python 3.12
  • pytest (for verifying your impls)
  • a notebook (paper or digital — for drawing)

By the end of this module

  • Implement each of the dozen core data structures from scratch, in <50 lines, without reference.
  • Know the actual time complexity of every operation — and the constant-factor reality the textbook hides.
  • Recognize the pattern in a problem statement that tells you which DS to reach for.
  • Use the library version correctly in production, instead of re-implementing.

The traditional DSA course has a problem. It teaches you to memorize a hundred algorithms — most of which you will never implement again — without teaching you the small number of habits that actually let you reason about data structures in the wild. This module flips that. You will implement the small set of data structures that matter, by hand, once. Then you will use the library version forever, with confidence.

This is not a LeetCode grind. That comes in module 02.2. This is the foundation under it: knowing what each data structure actually is, what it costs you, and when it is and isn’t the right answer.

Why “build, then never re-implement”

The reason to implement a hash map by hand once is not so you can implement one in production. (You won’t. The standard library’s is faster, more correct, and battle-tested.) The reason is so that when you read code that uses one, you have a mental model of what the machine is actually doing. When dict[key] is slow in your code, you’ll know to look at the hash function, the load factor, or the keys. When the textbook says “amortized O(1),” you’ll know what’s being amortized.

In other words: you are building these once for intuition, not for use. Internalize that frame. It changes how carefully you build them.

The dozen that actually matter

These are the data structures every CS student should be able to draw, implement, and reason about. The order is roughly easiest to hardest.

#Data structureBuild it becauseUse it via
1Dynamic arrayThe default container, and ~half of all data structures internallylist (Py) / Vec (Rust) / Array (JS)
2Linked listPointer manipulation, and the trade-off vs. arraysrare in real code, common in interviews
3StackLIFO; recursion, undo, parsinglist with .append/.pop
4QueueFIFO; BFS, schedulingcollections.deque
5Hash mapThe single most-used data structure in productiondict (Py) / Map (JS) / HashMap (Rust)
6Hash setHash map’s “I only care if it exists” cousinset
7Binary search treeOrdered data with O(log n) operationsrare directly; rebalancing variants are everywhere
8Heap (priority queue)“Give me the smallest / largest” in O(log n)heapq (Py) / BinaryHeap (Rust)
9TriePrefix problems, autocomplete, IP routingrolled by hand or as a package
10Graph (adjacency list)Anything with relationshipsrolled by hand using dict[T, list[T]]
11Disjoint set / union-findConnectivity, Kruskal’s, online connectivity queriesrolled by hand
12Bloom filterProbabilistic membership at huge scalepybloom_live / DB internals / CDN logic

That’s the whole list. Anything beyond this is a variation, an optimization, or a domain-specific structure you’ll learn in context. If your CS course makes you implement red-black trees from scratch, do it once and then stop — they are an excellent exercise in pointer manipulation and an awful production data structure to roll yourself.

How to actually build them

For each of the twelve, follow the same loop. It takes 30–90 minutes per structure.

1. Draw it on paper. What does it look like in memory?
   What invariants must hold after every operation?
2. Write the operations as method signatures, with type hints.
   Add cost (Big-O) as a comment beside each.
3. Implement, in plain Python, no fancy tricks.
4. Write a pytest file with at least 5 cases per operation,
   including: empty, single element, duplicates, the operation
   that changes the underlying structure (resize, rebalance).
5. Run a tiny benchmark: insert 1M items, time it.
   Compare to the stdlib equivalent. Note the ratio.

Step 5 is the one most students skip and the one that produces the most insight. Your hand-rolled hash map will be 10–30× slower than dict. That gap is the lesson — that is what years of optimization, cache locality, and CPython internals buy you.

Build prompt: the dozen, in order

Open a fresh project. Build each one in src/<name>.py, with tests/test_<name>.py next to it.

mkdir dsa && cd dsa
uv venv .venv && source .venv/bin/activate
uv pip install pytest
mkdir src tests

Skeletons to start from. Don’t peek at solutions until you’ve struggled for at least 20 minutes per structure.

# src/hash_map.py — implement this from scratch, no `dict`
class HashMap:
    def __init__(self, capacity: int = 16):
        ...
    def __setitem__(self, key, value) -> None: ...   # O(1) amortized
    def __getitem__(self, key):                       # O(1) avg, O(n) worst
        ...
    def __delitem__(self, key) -> None: ...
    def __contains__(self, key) -> bool: ...
    def __len__(self) -> int: ...

A few specific stretch goals when you finish:

  • Hash map: handle collisions with both chaining and open addressing. Benchmark both.
  • Heap: implement it as a flat array using the 2i+1 / 2i+2 indexing trick. Your bubble_up and bubble_down should each fit in 8 lines.
  • Graph: write BFS, DFS, and Dijkstra on top of your adjacency-list class. Run them on a real dataset (random edges, or scrape Wikipedia categories).
  • Union-find: implement with both path compression and union-by-rank, then turn each off in turn and benchmark on n = 10^6 operations.

When all twelve are green and benchmarked, you have done more genuine DSA work than 90% of CS undergrads finishing their degree.

The five patterns that recur

Once you’ve built the structures, the next thing to internalize is the patterns that tell you which one to reach for. There are roughly five.

Pattern in problemReach forWhy
”Has this been seen before?” / “Group by X”hash map / hash setO(1) membership
”Always smallest / largest / next”heapO(log n) extract-min
”Range / ordered queries”sorted structure / segment treelog-time queries on intervals
”Path / connectivity / dependency”graph + BFS/DFSmost things in real systems
”Prefix / autocomplete / matching”triebranching by character

Drill this. When you read a problem, the first thing you should do is name the pattern. Then pick the structure. Most students do this in the wrong order and end up choosing a structure first because they’ve been over-trained to think in code.

Going deeper

When you have specific questions, in this order:

  1. Sedgewick & Wayne — Algorithms (4th ed.). book · the cleanest reference there is. The companion site has all the code.
  2. MIT 6.006 — Introduction to Algorithms. course · free, well-paced, with proofs when proofs matter.
  3. Visualgo. site · animations of every operation. Useful when the diagrams in your head aren’t matching the code.
  4. CLRS — only after the above. It’s a reference, not a teacher. Most students pick it up too early and get demoralized.

Skip the rented “Top 10 LeetCode patterns” videos until module 02.2. Patterns mean nothing without the structures under them.

Checkpoints

If any one wobbles, build that structure again from scratch.

  1. Walk through what happens physically in memory when a Python dict exceeds its load factor. Why is the resize amortized O(1)?
  2. Why is a heap typically stored as a flat array, not as a tree of nodes? What does the array layout buy you?
  3. Give the time complexity of: lookup in a hash map, lookup in a balanced BST, lookup in a trie of strings of length L. Now give the constant-factor reality of each.
  4. You’re given a stream of 10 billion URLs and 16 GB of RAM. You need to answer “have we seen this URL before?” with high probability. Which structure do you reach for, and what’s the trade-off?
  5. Pick any two structures from the dozen and explain when one is the right answer and the other is a tempting wrong answer. Be specific about the problem statement that distinguishes them.

If you can answer all five from memory, you’ve earned 02.1. Move on to 02.2 (Algorithms & complexity, the practical view) — that’s where the patterns above turn into solved problems.