Explaining Fact Table Grain
Scenario: You are mentoring an analyst who keeps producing dashboards that don’t quite match. The cause is the same every time: they don’t have a clear sense of what each row in the fact table represents. They ask you to explain “grain” in a way they can use the next time they build a model.
In the interview, the question is:
What is the grain of a fact table, and how would you explain it to a non technical stakeholder?
Your Task:
- Define grain in one sentence.
- Give three different grains for the same business.
- Show why getting it wrong produces bad numbers.
- Walk through how to choose grain for a new fact.
What a Good Answer Covers:
- Grain as “one row means one X.”
- Picking grain by what the dashboard asks.
- Mixing grains is the bug.
- Coarser is faster; finer is more flexible.
- Bridge tables for many-to-many.
Try the problem on your own first. Solutions are most valuable after you've struggled with it.
Solution 44: Explaining Fact Table Grain
Short version you can say out loud
Grain is the answer to the question “what does one row in this table mean?” If you cannot finish the sentence “one row in this table is one __,” you do not have grain pinned down yet. Every aggregation, every join, every dashboard depends on it. Get grain wrong and your numbers are off by exactly the right amount to look plausible but be misleading.
Explaining grain to a non technical stakeholder
The way I would say it in a meeting:
“Imagine each table is a stack of paper. Grain is what is printed on one sheet. If the sheet is ‘one customer,’ then there is one sheet per customer and we can count sheets to count customers. If the sheet is ‘one order,’ there is one per order. If the sheet is ‘one item on one order,’ there are many more sheets, because each order has multiple items. We have to know what one sheet is before we know what summing them up means.”
A stakeholder who understands “one row = one X” is suddenly ready to ask the right questions.
Three grains for the same business
Imagine a coffee shop chain.
Grain: one sale.
1
2
3
sale_id │ store │ date │ total_amount
1 │ A │ 2025-05-14 │ 12.50
2 │ A │ 2025-05-14 │ 7.00
Useful for “total revenue today per store.” Not useful for “how many large lattes were sold.”
Grain: one item on one sale.
1
2
3
4
sale_id │ store │ date │ product │ qty │ amount
1 │ A │ 2025-05-14 │ Latte L │ 1 │ 6.50
1 │ A │ 2025-05-14 │ Croissant│ 2 │ 6.00
2 │ A │ 2025-05-14 │ Coffee │ 2 │ 7.00
Useful for everything the first grain can do, plus “how many lattes did we sell.” More rows, more flexibility.
Grain: one minute of sales per store.
1
2
3
store │ minute │ revenue │ items_sold
A │ 2025-05-14 09:01:00 │ 23.50 │ 4
A │ 2025-05-14 09:02:00 │ 0.00 │ 0
Useful for “what minute are we busiest.” Not useful for “what was sale 1234.”
Same business, three legitimate grains. Each one answers a different question.
How wrong grain produces wrong numbers
Imagine the team mixes grains accidentally:
1
2
3
4
5
6
SELECT
s.store,
SUM(s.total_amount) AS revenue,
COUNT(*) AS line_items_sold
FROM fact_sale_lines s
GROUP BY s.store;
fact_sale_lines is at the line item grain. So SUM(total_amount) may be double-counting the sale’s total if every line carries the order’s total. The COUNT is right (it counts line items), but the SUM is wrong.
This is the most common modeling bug. The reason it goes unnoticed: the numbers are usually “in the right ballpark,” so nothing screams. You only catch it by reconciling against the source system.
The fix: each fact table should carry only measurements at its own grain. Order total goes on fact_orders, not fact_order_lines. Line price goes on fact_order_lines, not on fact_orders.
Choosing grain for a new fact
Three questions:
- What does the dashboard or report want? This is the only honest answer. Build to the finest grain you actually need.
- Is there a natural “thing that happens once” in the business? That is usually your grain. A sale, an order, a payment, a shipment, a meter reading.
- What is the right time grain? Per-event is most flexible. Per-minute or per-day is a pre-aggregation, useful when the finer grain is too big.
The default for transactional facts is “one row per event.” Pre-aggregations sit on top for performance.
Mixed grain in one table is the bug
Sometimes a fact table tries to be both order-grain and line-grain. The smell: most columns are filled in, but a few columns (like line_id or quantity) are null on some rows.
The fix is to split into two tables, each with one clear grain:
1
2
fact_orders : one row per order (order_total, customer_key, ...)
fact_order_lines : one row per line (order_id, product_key, qty, price)
If you need both in one query, join them. Do not collapse them.
Coarser is faster, finer is more flexible
The trade-off:
- Coarser grain (one row per day per store) means fewer rows, faster queries. But you can never answer questions below that grain (“what was sold at 9:42 AM”).
- Finer grain (one row per second, or per individual event) keeps every question open. But the table is huge.
The common pattern: keep the finest grain in fact_*, build pre-aggregates agg_* for the questions you ask most.
Bridge tables for many-to-many
What if one fact has many “many-to-many” relationships? An order can have many promotions applied, a promotion can apply to many orders.
You cannot put both on the same row without violating grain. The fix:
1
2
fact_orders one row per order
fact_order_promotions one row per (order, promotion) pair
The bridge table has its own grain. You join through it when you want “promotions per order” or “orders per promotion.”
Common mistakes interviewers want you to name
- No grain documented. The new team member has to figure it out from the data.
- Mixed grain. Some rows are orders, some are line items, in the same table.
- SUM across the wrong grain. Counts items but sums order totals; gets a multiplied number.
- Building one giant table with every possible measurement at every grain. Always wrong somewhere.
- Confusing time grain with entity grain. “Per day” is a time grain. “Per order” is an entity grain. Both can apply at once.
How I would teach this
When mentoring, I would write the grain in the first line of the table’s docs:
fact_orders: one row per order. An order is everything a single customer placed in one checkout, including all its line items.
And on fact_order_lines:
fact_order_lines: one row per item on an order. A single product appears once per order, with a quantity column.
Two sentences. The whole team now reads tables the same way.
Bonus follow-up the interviewer might throw
“What if the business asks ‘show me both order count and line item count in one query’? Won’t joining force a grain choice?”
You join, but you have to aggregate to one grain on each side before joining. For example:
1
2
3
4
5
6
7
8
9
10
11
WITH orders_per_day AS (
SELECT order_date, COUNT(*) AS orders
FROM fact_orders GROUP BY order_date
),
lines_per_day AS (
SELECT order_date, SUM(qty) AS items
FROM fact_order_lines GROUP BY order_date
)
SELECT o.order_date, o.orders, l.items
FROM orders_per_day o
JOIN lines_per_day l USING (order_date);
Each CTE has its own grain (per-day), then you join at the common grain. No row-multiplication.