Problem #25 Hard System Design

Smart Meter to Monthly Bill PDF

billingSCD2idempotencyaudit

Scenario: You work for an energy retailer. Every month you produce a bill PDF for each of 200,000 customers. The bill has to reflect actual consumption (one reading every 15 minutes), the tariff in effect for each period (which can change mid-month), taxes, fees, and any account adjustments. Regulators want the bill to be reproducible and auditable. If a customer disputes their bill, you need to show exactly how each number was computed.

In the interview, the question is:

Design the pipeline that turns raw smart meter readings into a monthly bill PDF.

Your Task:

Sketch the end-to-end flow.
Cover billing correctness when readings arrive late, are missing, or get corrected.
Show how mid-month tariff changes are handled.
Explain the audit trail.
Cover what happens when generation goes wrong.

What a good answer covers:

The fact and dimension shape (Problem 10 SCD2 feeds into this).
An idempotent monthly job, keyed by billing period.
How estimated readings are handled.
PDF rendering as a separate, deterministic step.
Sealed bills and adjustments.

Try the problem on your own first. Solutions are most valuable after you've struggled with it.

Solution 25: Smart Meter to Monthly Bill PDF

The whole flow at a glance

flowchart TB
    M([Meter readings, 15-min<br/>s3://meter-raw/...])
    C([Curated 15-min fact<br/>reads_15min<br/>partitioned by date])
    B([Billing calc job<br/>per customer per period<br/>uses reads_15min + tariffs (SCD2)<br/>+ customers + adjustments])
    S([bills (sealed) + bill_lines<br/>immutable once issued])
    P([PDF rendering service<br/>deterministic, pure function of bill_id])
    D([Customer portal + email delivery])

    M --> C --> B --> S --> P --> D

    style M fill:#fef3c7,stroke:#a16207,color:#713f12
    style C fill:#fef3c7,stroke:#a16207,color:#713f12
    style B fill:#dbeafe,stroke:#1e40af,color:#1e3a8a
    style S fill:#fed7aa,stroke:#c2410c,color:#7c2d12
    style P fill:#dbeafe,stroke:#1e40af,color:#1e3a8a
    style D fill:#dcfce7,stroke:#15803d,color:#14532d

The principle: each step is a function of the layer beneath it. Re-run the same step on the same inputs and you get the same output. The PDF is the last and least interesting step.

The data model

reads_15min  (fact, partition by date)
─────────────────────────────────────────
meter_id, read_at, kwh, quality_flag, ingested_at

customers  (dim)
─────────────────────────────────────────
customer_id, meter_id, plan_id, billing_address, ...

tariffs  (SCD Type 2 dim)
─────────────────────────────────────────
plan_id, energy_rate_per_kwh, fixed_daily_charge,
valid_from, valid_to, is_current

account_adjustments  (fact)
─────────────────────────────────────────
adjustment_id, customer_id, amount, reason, applied_on

bills  (mart, sealed)
─────────────────────────────────────────
bill_id (PK), customer_id, period_start, period_end,
kwh_total, energy_charge, fixed_charge, adjustments,
tax, total_due, issued_at, status

bill_lines  (mart)
─────────────────────────────────────────
bill_id, line_no, description, quantity, unit_price, amount

The bill_lines table is the audit trail. Every cent on the bill has a corresponding row showing exactly how it was computed.

The billing calculation, step by step

For one customer for one billing period:

1. Pull readings.

  
SELECT read_at, kwh, quality_flag
FROM reads_15min
WHERE meter_id = @meter
  AND read_at >= @period_start
  AND read_at <  @period_end
ORDER BY read_at;

Expect 4 × 24 × 30 = 2880 rows for a 30-day month.

2. Fill gaps.

If any interval has no reading or a missing flag, fill with an estimate. The estimation rule is usually defined by the regulator: typically “average of the same interval over the previous N days.” Mark each filled row with quality_flag = 'estimated'.

3. Join to the tariff in effect at each interval.

  
SELECT r.read_at, r.kwh, t.energy_rate_per_kwh
FROM readings r
JOIN tariffs t
  ON  t.plan_id = @plan_id
  AND r.read_at >= t.valid_from
  AND r.read_at <  t.valid_to;

This is the SCD2 “as-of” join from Problem 10. If the tariff changed mid-month, half the rows match the old rate and half match the new one, automatically.

4. Compute the energy charge.

energy_charge = sum( kwh * rate ) over all 15-min intervals

5. Add the fixed daily charge.

fixed_charge = days_in_period * fixed_daily_charge

(If the fixed daily charge changed mid-month, the same SCD2 join handles it per day.)

6. Apply adjustments.

  
SELECT SUM(amount) FROM account_adjustments
WHERE customer_id = @cust
  AND applied_on BETWEEN @period_start AND @period_end;

Examples: refunds, goodwill credits, late fees.

7. Apply tax.

Usually a flat percentage of the subtotal. Some markets have different rates for residential vs commercial.

8. Write bill and bill_lines, sealed.

One bills row plus one bill_lines row per category (energy, fixed, each adjustment, tax). After this, the bill is frozen. Any later correction is a new adjustment in the next billing period, not an edit.

Idempotency

The job is keyed by (customer_id, period_start, period_end). Re-running it for the same key:

DELETEs any draft bill_lines for that key.
Recomputes from current reads_15min and tariffs.
If a bill with that key is status = 'issued', the job refuses to overwrite. You can only correct via an adjustment.

Pseudocode:

  
def compute_bill(customer_id, period_start, period_end):
    existing = bills.get(customer_id, period_start)
    if existing and existing.status == 'issued':
        raise BillAlreadyIssued(existing.bill_id)

    # idempotent re-compute
    delete_lines_for(customer_id, period_start)
    lines = build_lines(customer_id, period_start, period_end)
    write_lines(lines)
    write_bill(summary_of(lines), status='draft')

Late and corrected readings

This is where the design pays off.

If readings for May 10 arrive late on May 14, the curated layer rebuilds the May 10 partition.
If May’s bill has not been issued yet, the next run of the billing job picks up the new numbers naturally. No special code.
If the bill was already issued, the change creates an account_adjustment row to apply in the next billing cycle. The regulator sees a clean trail: “your May bill was X. On June 5 we received corrected readings for May 10. We credited Y dollars on your June bill.”

Much cleaner than editing a sealed bill, which would break audits.

Mid-month tariff changes

These happen often (price reviews, customer plan switches). Handled by:

The tariffs table is SCD Type 2.
The as-of join in step 3 matches each 15-minute reading to the rate that was in force at that exact moment.
Bill lines show “Energy 1-15 May at 0.18/kWh” and “Energy 16-31 May at 0.22/kWh” separately, so the customer can see what changed.

Auditability

The audit trail is the bill_lines table plus a snapshot of the inputs. Three pieces:

The bill itself (issued, sealed).
bill_lines showing every component.
A reading snapshot. At bill time we record a reads_snapshot_id pointing to the version of the curated reads we used. If a later correction changes the readings, the snapshot still shows what we billed on.

A dispute conversation looks like: “Your May energy charge of $XX comes from these 2,880 readings, totaling YYY kWh, at the rates shown in lines 1 and 2. The estimated readings are flagged on line 3.”

PDF rendering

The PDF service is a pure function of the bill_id. It reads the bill and bill_lines and renders. It computes nothing. If we change the PDF template, every old bill can be re-rendered to the new look without changing any numbers.

Two practical patterns:

PDFs are written to S3 with a deterministic name: bills/{customer_id}/{period}/{bill_id}.pdf.
The PDF carries a small footer like “Bill ID: B-2025-05-12345” so the support team can find the source data instantly.

What happens when generation fails

Failure modes I’d design for:

Missing readings for the period. Job pauses, alerts ops, no bill is generated.
PDF rendering crashes. Bill row exists, no PDF. The PDF service retries; if it never works, ops gets paged.
Wrong tariff applied (config error). All bills for the period get re-generated. Since they are not yet issued, the job can do this idempotently.
Adjustment job double-fires. Each adjustment has a unique id; the bill query SUMs over DISTINCT ids.

Why split the job per customer

Two reasons:

Parallelism. 200,000 customers at ~1 second per bill is 56 hours sequentially. With 200 workers in parallel, ~17 minutes.
Isolation. One customer’s bug doesn’t poison the rest. If the bill job for customer 12345 fails, the other 199,999 still go out.

A simple orchestrator (Airflow with dynamic task mapping, or a queue of customer ids) is enough.

Common mistakes interviewers want you to name

Editing a sealed bill when a correction arrives. Use adjustments.
Not handling SCD2 tariffs. Mid-month tariff changes get billed wrong.
Computing during PDF rendering. A template change later silently changes old totals.
No quality flag for estimates. Auditors require “what was measured vs estimated.”
Ignoring per-customer time zones when defining the billing period.

Bonus follow-up the interviewer might throw

“How would you parallelize this for 5 million customers?”

Same architecture, more parallelism. Two changes worth mentioning:

Run the billing job on Spark or BigQuery instead of Python-per-customer. The “per customer” loop becomes SQL with GROUP BY customer.
The PDF render service stays per-customer but moves behind a queue with a thousand workers.

Five million bills at ~0.5 seconds each, 1,000 workers, is roughly 40 minutes of wall clock.