03.4 PostgreSQL like a pro

The database is the part of your stack that survives every framework you’ll ever switch off of. Spend an order of magnitude more time learning Postgres than learning whichever ORM you happen to be using this year. The ORM will be replaced. The schema will not.

The opinionated take, said upfront so you can fight it: you almost never need a NoSQL database. Postgres has document storage (JSONB), search (tsvector or pg_trgm), key-value (HSTORE or just two columns), geospatial (PostGIS), pub-sub (LISTEN/NOTIFY), queues (SKIP LOCKED), and now vector search (pgvector). A junior team starting on MongoDB or DynamoDB in 2026 is making a 10-year bet against the most successful database of the last decade. Pick Postgres until you have a specific, measured reason not to.

The build for this module is to take the TODO API from 03.3 and make it survive a load test — a thousand users, a million todos, queries that stay under 50ms at the 95th percentile.

Set up

# A real Postgres locally
docker run --name pg --rm -d \
  -e POSTGRES_PASSWORD=dev \
  -p 5432:5432 \
  postgres:16

# A friendlier psql
brew install pgcli         # or: pipx install pgcli

# Connect
pgcli postgres://postgres:dev@localhost:5432/postgres

Get comfortable in psql or pgcli. Every GUI eventually betrays you in some interesting way; the CLI does not. Memorize \d, \d+ <table>, \di+, \timing, EXPLAIN ANALYZE, and \copy. That is the whole muscle memory you need for daily work.

Read these first

Five resources, in order, then stop:

PostgreSQL docs — Performance Tips. docs · 30 min · the canonical short reference, written by people who built it.
Markus Winand — Use The Index, Luke!. site · 2 hrs · still the best practical writing about indexes on the open web. Chapters 1–3 minimum.
Crunchy Data — EXPLAIN ANALYZE in Postgres 16. post · 20 min · how to read plans in this specific version.
Citus — Postgres tips and tricks. post · 15 min · slightly older, still gold.
Heap — Basic performance analysis saved us 100 hours. post · 10 min · what production-scale Postgres feels like.

Skip the medium articles titled “Top 10 Postgres tips.” They’re rehashes of the docs above with worse examples.

Step 1 — Schema design beyond textbook normalization

Third normal form is a starting point, not a destination. Real production schemas denormalize on purpose, in specific places, for specific reasons. The decisions you should be able to defend on a whiteboard:

Decision	When to denormalize	When to stay normalized
Counts (followers, comments)	high-read, low-write — store the count, update with a trigger	low-traffic — `SELECT COUNT(*)` is fine
Computed columns	the computation is deterministic and queried — generated columns	the computation changes — recompute on read
Joining across small tables	never denormalize because of JOIN cost; Postgres is good at this	always
Audit / event history	append-only event table, current-state table for fast reads	one table only when history is irrelevant

Schema for the TODO API extended for scale:

CREATE TABLE users (
  id           bigserial PRIMARY KEY,
  email        citext UNIQUE NOT NULL,
  password_hash text NOT NULL,
  created_at   timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE todos (
  id          bigserial PRIMARY KEY,
  user_id     bigint NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  title       text NOT NULL,
  body        text,
  done        boolean NOT NULL DEFAULT false,
  due         timestamptz,
  tags        text[] NOT NULL DEFAULT '{}',
  metadata    jsonb NOT NULL DEFAULT '{}'::jsonb,
  search      tsvector GENERATED ALWAYS AS
              (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,''))) STORED,
  created_at  timestamptz NOT NULL DEFAULT now(),
  updated_at  timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX todos_user_done_due ON todos (user_id, done, due);
CREATE INDEX todos_search ON todos USING GIN (search);
CREATE INDEX todos_tags ON todos USING GIN (tags);

Things to internalize:

bigserial not serial. Hitting 2 billion rows is a bad day to discover the difference.
citext for emails, so Foo@Bar.com and foo@bar.com collide on UNIQUE automatically.
ON DELETE CASCADE is a real decision. Wrong default for users; right default for child rows of a single owner.
timestamptz always, never timestamp. Time zones are not optional.
A generated tsvector column is cheaper than building it on every read, and the GIN index makes search sub-millisecond.

Step 2 — Migrations, the safe way

Never use prisma db push in production. Never run python manage.py makemigrations and immediately migrate without reading the SQL. The pattern that survives:

Write the migration as a versioned file. Alembic, Prisma Migrate, Drizzle Kit, sqlc all support this. Pick one.
Read the generated SQL. Every migration. Every time. The ORM will sometimes generate something stupid (a full table rewrite when you renamed a column).
Make it backwards compatible. Old code must work against the new schema, and new code must work against the old schema, for one deploy cycle.
Run it on a staging copy of production data first. “It worked on the empty dev DB” is not a passing test.

The expand-contract pattern for renaming a column without downtime:

-- Migration 1 (deploy with old code still reading old name)
ALTER TABLE todos ADD COLUMN title_new text;
UPDATE todos SET title_new = title;
ALTER TABLE todos ALTER COLUMN title_new SET NOT NULL;

-- Migration 2 (after deploying code that writes both)
-- nothing, code change only

-- Migration 3 (after deploying code that reads title_new)
ALTER TABLE todos DROP COLUMN title;
ALTER TABLE todos RENAME COLUMN title_new TO title;

That sequence ships safely. A single migration that renames the column live ships an outage.

Step 3 — Indexes, in depth

The textbook will tell you “btree is the default index type.” Production tells you it’s also the wrong one about a third of the time. The five you should know cold:

Type	When to reach for it
`btree`	equality and range on ordered scalars. The default for almost everything.
`hash`	equality only, no range. Rarely strictly better than btree in modern Postgres.
`GIN`	arrays, tsvector, jsonb existence/path queries. The big one for search.
`GiST`	geospatial, ranges, fuzzy types. Reach for it when GIN is wrong.
`BRIN`	huge append-only tables sorted by insertion (logs, time series). Tiny, slower lookups.

Two index modifiers that earn their weight:

Partial index. CREATE INDEX active_todos ON todos (user_id, due) WHERE NOT done; — half the size, twice the speed for the most common query (list active todos).
Covering index. CREATE INDEX ON todos (user_id) INCLUDE (title, due); — Postgres can satisfy the query from the index alone, no heap fetch. Look for “Index Only Scan” in EXPLAIN ANALYZE.

The rule of thumb: every column in your WHERE, ORDER BY, and JOIN ON clauses for hot queries should be in some index. Then look at the actual plan and remove the indexes that aren’t getting used. pg_stat_user_indexes tells you which.

Step 4 — EXPLAIN ANALYZE for real

You will not write a fast query by guessing. You will run EXPLAIN ANALYZE on the slow ones until you understand why they are slow.

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT id, title, due
FROM todos
WHERE user_id = 42 AND NOT done
ORDER BY due
LIMIT 20;

A reasonable plan for a million-row table with the right index:

Limit  (cost=0.43..16.20 rows=20 width=42) (actual time=0.04..0.18 rows=20 loops=1)
  Buffers: shared hit=23
  ->  Index Scan using active_todos on todos
        Index Cond: (user_id = 42)
        Filter: (NOT done)

Three terms to recognize on sight:

Plan node	What it means	When to worry
Seq Scan	Full table read	Bad on tables over 10k rows when an index could exist.
Index Scan	Use index, then fetch rows from heap	Usually fine.
Index Only Scan	Answer entirely from the index	The win. Aim for this on hot paths.
Bitmap Heap Scan	Bulk-fetch rows after gathering an index	Fine when many rows match.
Nested Loop with high actual rows	Loop is repeating millions of times	Indicates a missing index or a bad join order.

EXPLAIN ANALYZE runs the query for real (writes too — wrap in a transaction and ROLLBACK if you don’t want them). EXPLAIN alone just plans. BUFFERS shows cache hits vs disk reads — a query that hits disk on cold cache and stays hot in production is a different beast from one that always hits disk.

Step 5 — Transactions and isolation

Default isolation in Postgres is READ COMMITTED. It is not enough for anything that does read-then-write logic. A canonical bug:

-- two transactions running simultaneously, both at READ COMMITTED:
BEGIN;
SELECT balance FROM accounts WHERE id = 1;   -- both read 100
UPDATE accounts SET balance = 100 - 30 WHERE id = 1;
COMMIT;
-- result: balance = 70, not 40. Lost update.

The fix is to use SERIALIZABLE for any transaction that does compare-then-write, and write your code to retry on serialization_failure:

from sqlalchemy.exc import OperationalError
import time

def transfer(from_id: int, to_id: int, amount: int):
    for attempt in range(5):
        try:
            with engine.begin() as conn:
                conn.execute(text("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE"))
                # ... read both balances, update both ...
            return
        except OperationalError as e:
            if "serialization" not in str(e) or attempt == 4:
                raise
            time.sleep(0.05 * (2 ** attempt))

If you remember one thing: SERIALIZABLE plus a retry loop is the cheapest correct concurrency you can build on a relational database.

Step 6 — JSONB and full-text search

Postgres’s JSONB is a schema-on-read document store inside a schema-on-write database. Two rules:

Use JSONB for irregular metadata. Tags that vary per integration, audit payloads, vendor-specific fields. Anything where you’d otherwise have 40 columns half-null.
Do not use JSONB for everything. If a field has a known shape and is queried in WHERE clauses, give it its own column. The GIN index on JSONB is real, but a btree on a typed column is faster and self-documents the schema.

Full-text search, in one query:

SELECT id, ts_rank(search, q) AS rank, title
FROM todos, plainto_tsquery('english', 'milk bread') q
WHERE search @@ q AND user_id = 42
ORDER BY rank DESC
LIMIT 20;

That’s good enough for ten million rows. When it isn’t, reach for pg_trgm for fuzzy search, then for an external search engine (Meilisearch, Typesense, OpenSearch) — usually not before tens of millions of rows.

Step 7 — Survive the load test

Take the TODO API from 03.3. Seed it with a million realistic todos across a thousand users. Then load test:

# k6 (npm or brew) — write a small script
k6 run --vus 100 --duration 60s load.js

Targets that prove your work:

Metric	Target
p95 latency, list todos	under 50ms
p95 latency, search todos	under 100ms
Error rate	under 0.1 percent
CPU on a 1-vCPU instance	under 70 percent at peak

If you fail any of those, the next step is not “scale up the box.” It’s EXPLAIN ANALYZE the slow query, find the plan, fix the index. You will probably learn more from one missed target than from the previous five sections combined.

Going deeper

The Internals of PostgreSQL · free, deep, technical. Skim once you have shipped a real schema.
Postgres Weekly · the only Postgres newsletter worth your time.
pgvector docs · for the vector-search use case, when it’s right.
Use The Index, Luke! — chapters 4–8 · query plans and concurrency once the basics click.
Designing Data-Intensive Applications, chapters 2–3 · Kleppmann · the high-level theory the docs assume you know.

Skip “Postgres vs MongoDB benchmarks 2026.” They benchmark different things and conclude what the author was going to conclude anyway.

Checkpoints

Why is serial a bad default in 2026, and what specifically goes wrong when you hit the limit?
Walk through the expand-contract migration pattern for renaming a column. Why does a single-step rename cause an outage?
You have a query: SELECT * FROM todos WHERE user_id = ? AND NOT done ORDER BY due LIMIT 20. Design the right index, including any partial or INCLUDE clause, and explain why.
In an EXPLAIN ANALYZE plan, what’s the difference between Index Scan and Index Only Scan, and why would you redesign a query to get the second?
Name a problem that is the right fit for JSONB and one that is the wrong fit. Defend each in one sentence.

When the load test passes the targets above, move to 03.5 Deploying and DevOps lite, where you’ll get the API and frontend onto a real domain that a recruiter can click.