Module 6 — Window Functions

From "what does OVER() even mean" to running totals, gap-and-island detection, sessionization, and retention analysis — the single most tested topic in FAANG SQL interviews.

Foundations

Why window functions exist

Go back to Module 3. You already know how to get "total revenue per customer" — that's GROUP BY. It collapses many rows into one row per group. You lose the individual rows.

But now ask a different question: "show me every single order, and next to each one, what percentage of that customer's total revenue does it represent?"

You can't answer that with GROUP BY. The moment you group by customer, the individual order rows are gone — averaged, summed, or grouped away. You need the detail row and the aggregate, at the same time, on the same line.

That is the one-sentence reason window functions exist: they let you compute an aggregate (or a rank, or a "look at the next row") without collapsing the rows you're computing it over.

Analogy Think of a school report card. GROUP BY is the class average printed once at the bottom of the page — one number, all individual student rows are gone. A window function is a column added next to every student's own row that shows "class average" or "your rank in class" — the student's row survives, and the aggregate rides along beside it. The "window" is the group of rows (the whole class, or just the students in Section A) that the calculation is allowed to "look at" while computing that value for one row.

When to use it

You need detail rows and an aggregate/rank/comparison in the same result set (rank, running total, % of total, comparison to previous row).
Top-N-per-group problems ("top 3 highest paid employees per department").
Time-series analysis: running totals, moving averages, period-over-period change.
Deduplication: keep exactly one row per group based on some rule (latest record, highest value).
Gap detection, streak detection, sessionization — anything where "compare this row to the row before/after it" matters.

When NOT to use it

If you only need one aggregated row per group and don't need the detail rows — plain GROUP BY is simpler and usually cheaper.
If you need to filter on the aggregate directly in the same query without a wrapping query — you cannot put a window function in WHERE (more on this trap later).
Very old database versions without window function support (rare today, but some legacy MySQL < 8.0 setups still exist) — you'd fall back to correlated subqueries or variables.

How it executes internally

⚙️

Conceptually, the engine runs window functions after WHERE, GROUP BY, and HAVING have already produced the row set, but before the final ORDER BY and SELECT list are applied. For each row, it:

Figures out which other rows belong to the same "window" (the PARTITION BY group, or the whole result set if there's no partition).
Within that window, sorts by ORDER BY if one is given.
Applies the frame clause (which subset of the partition — e.g. "from the start up to this row") if one is given.
Computes the function (rank, sum, lag, etc.) over exactly that frame, and attaches the result to the current row — without deleting any other rows.

Common mistake Beginners assume a window function runs once per query, like a plain aggregate. It actually conceptually re-evaluates per row (real engines optimize this with a single sort + streaming pass per window spec, not literally N passes — but mentally model it as "recomputed per row" until you internalize the frame concept).