Window Functions vs GROUP BY
Scenario: A teammate is writing a query for a marketing dashboard. They want each row to show the order along with “this customer’s total spend so far.” They keep getting stuck because GROUP BY collapses the rows. They ask why their query keeps disappearing the order detail.
In the interview, the question is:
Explain when you would reach for a window function instead of GROUP BY. Use an example you would actually draw on a whiteboard.
Your Task:
- Explain GROUP BY in one line.
- Explain a window function in one line.
- Show one small example where each is the right tool.
- Cover the three things window functions can do that GROUP BY cannot.
What a Good Answer Covers:
- GROUP BY collapses rows; window functions keep them.
- PARTITION BY vs GROUP BY.
- Running totals, ranking, lag and lead.
- The performance side: window functions are not free.
- The “use both” pattern.
Try the problem on your own first. Solutions are most valuable after you've struggled with it.
Solution 20: Window Functions vs GROUP BY
Short version you can say out loud
GROUP BY collapses many rows into one row per group. Window functions compute a value across a group but keep every row. If the question is “give me one number per customer,” I use GROUP BY. If the question is “for each order, also show this customer’s running total,” I use a window function. The key tell is the word “for each row also” in the requirement.
The whiteboard picture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Same data, two shapes.
orders
─────────────────────────────────────────
order_id │ customer_id │ amount │ created_at
1 │ A │ 100 │ Jan 1
2 │ A │ 50 │ Jan 5
3 │ B │ 200 │ Jan 3
4 │ A │ 70 │ Jan 10
GROUP BY WINDOW
──────────────── ──────────────────────────────────
SELECT SELECT
customer_id, order_id, customer_id, amount,
SUM(amount) AS total SUM(amount) OVER (
FROM orders PARTITION BY customer_id
GROUP BY customer_id; ORDER BY created_at
) AS running_total
FROM orders;
Result: Result:
customer │ total order │ customer │ amount │ running
A │ 220 1 │ A │ 100 │ 100
B │ 200 2 │ A │ 50 │ 150
3 │ B │ 200 │ 200
4 │ A │ 70 │ 220
Notice: GROUP BY returned 2 rows. WINDOW returned 4 rows. Same SUM, different result shape.
What each one is, in one line
- GROUP BY: collapse rows that share group-key values into one row per group, with aggregates over the group.
- Window function: compute an aggregate (or ranking, or lag) for each row, looking at a window of related rows around it, without collapsing anything.
When to reach for each
| Need | Use |
|---|---|
| Total revenue per customer (one row per customer) | GROUP BY |
| Count of orders per country | GROUP BY |
| For each order, show running total per customer | Window |
| Rank customers by spend within each country | Window |
| Find each customer’s previous order date | Window (LAG) |
| For each row, compare the value to the group avg | Window |
| Pick the top 1 row per group | Window (ROW_NUMBER) |
The shortcut: if the answer should have more rows than the number of groups, you need a window.
Three things window functions do that GROUP BY cannot
1. Running totals and moving averages.
1
2
3
4
5
6
7
8
9
SELECT
date,
amount,
SUM(amount) OVER (ORDER BY date) AS cumulative,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7day_avg
FROM daily_sales;
GROUP BY cannot do this because it would force one row per group.
2. Ranking inside a group.
1
2
3
4
SELECT
customer_id, country, total_spend,
RANK() OVER (PARTITION BY country ORDER BY total_spend DESC) AS rank_in_country
FROM customer_totals;
ROW_NUMBER, RANK, DENSE_RANK give you the row’s position inside its partition. Indispensable for “top N per group” queries.
3. Lookups to previous / next rows.
1
2
3
4
5
SELECT
meter_id, reading_time, value,
LAG(value) OVER (PARTITION BY meter_id ORDER BY reading_time) AS prev,
value - LAG(value) OVER (PARTITION BY meter_id ORDER BY reading_time) AS delta
FROM meter_reads;
LAG and LEAD make “compare this row to the next/previous one” trivial. Without windows, you would need a self-join.
The “top N per group” pattern
This is so common it is worth memorizing:
1
2
3
4
5
6
7
WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY country ORDER BY total_spend DESC) AS rn
FROM customer_totals
)
SELECT * FROM ranked WHERE rn <= 3;
This gives you the top 3 customers in each country. With GROUP BY alone, you would have to do tricks with self-joins or LATERAL queries.
The PARTITION BY vs GROUP BY confusion
A common bug: writing GROUP BY country in a query that also has window functions, and being surprised the windows behave oddly.
GROUP BY countrycollapses to one row per country. Window functions then operate on those collapsed rows.PARTITION BY countryinside an OVER clause divides the rows into groups without collapsing.
These are different. They can coexist, but you have to be deliberate.
Performance notes
Window functions are not free. They typically require a sort by the PARTITION BY columns, then by the ORDER BY columns. For very large tables, that sort can be the bottleneck.
Two practical things:
- Filter early. Run the window only on the rows you need.
- Use the right window.
ROWS BETWEEN... AND CURRENT ROWis much cheaper thanRANGEfor time-based windows on some engines.
Common mistakes interviewers want you to name
- Trying to GROUP BY when the answer needs per-row detail. “Show me each order with the customer’s lifetime total.” GROUP BY cannot keep the order row.
- Forgetting
ORDER BYinside the window.SUM(...) OVER (PARTITION BY x)without ORDER BY is the same total on every row. Often someone meant a running total but got the grand total. - Filtering on a window column in WHERE. You cannot. WHERE runs before windows. Use a subquery or CTE and filter on the alias in the outer SELECT.
- PARTITION BY too coarse. “Top 5 products overall” needs no partition. “Top 5 products in each country” needs
PARTITION BY country. Easy to mix up.
Bonus follow-up the interviewer might throw
“What is the difference between RANK, DENSE_RANK and ROW_NUMBER?”
ROW_NUMBER: every row gets a unique number, ties broken arbitrarily.RANK: ties share a number, then it skips.1, 2, 2, 4.DENSE_RANK: ties share a number, no skip.1, 2, 2, 3.
Use ROW_NUMBER when you want exactly one row per “first place” (top-N-per-group). Use RANK or DENSE_RANK when ties matter, like leaderboards.