Concept
Cloud Comparisons

Data warehouses: Redshift vs BigQuery vs Synapse

Storage model, pricing, concurrency.

A cloud data warehouse runs SQL across huge analytical datasets, fast, at a price-per-byte-scanned that would have been impossible ten years ago. Redshift, BigQuery, and Synapse are the major-cloud answers. They all expose SQL, all use columnar storage, and all scale horizontally. The big difference is the pricing model (which shapes how you use it), the separation of storage and compute (which affects performance and cost), and the surrounding ecosystem. The right answer depends as much on how you want to be billed as on raw capability.

The three at a glance

flowchart TB
    subgraph RS["AWS Redshift"]
        direction LR
        R1[("Postgres-flavoured SQL")]:::server
        R2[("provisioned (RA3) or serverless")]:::server
        R3[("storage + compute now separable<br/>(RA3 nodes, RMS storage)")]:::server
    end

    subgraph BQ["GCP BigQuery"]
        direction LR
        B1[("standard SQL")]:::server
        B2[("fully serverless from day one")]:::server
        B3[("storage and compute decoupled by design")]:::server
    end

    subgraph SY["Azure Synapse / Microsoft Fabric"]
        direction LR
        SY1[("T-SQL based")]:::server
        SY2[("dedicated SQL pools (provisioned)<br/>or serverless")]:::server
        SY3[("integrates with Data Lake Gen2,<br/>Spark, Power BI")]:::server
    end

    classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px

Pricing: the biggest practical difference

The pricing model shapes how the warehouse feels to use.

flowchart TB
    subgraph PROV["Provisioned (pay for nodes/hours regardless of usage)"]
        direction LR
        P1["Redshift provisioned (RA3 nodes)"]:::infra
        P2["Synapse dedicated SQL pools"]:::infra
    end

    subgraph SERV["Serverless / on-demand (pay per query)"]
        direction LR
        S1["BigQuery on-demand: $X per TB scanned"]:::strong
        S2["Redshift Serverless: pay per RPU-hour"]:::strong
        S3["Synapse Serverless: pay per TB processed"]:::strong
    end

    subgraph FLAT["Flat-rate / reservation (predictable monthly)"]
        direction LR
        F1["BigQuery editions (Standard, Enterprise, Enterprise Plus)<br/>buy slots, predictable bill"]:::infra
        F2["Redshift reserved nodes"]:::infra
    end

    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef strong fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px

The headline change of the last few years is that all three offer serverless modes now. BigQuery was always serverless; Redshift Serverless and Synapse Serverless caught up.

Cost intuition: at low query volumes, serverless is cheapest. At sustained high volumes, provisioned/reservation tiers win. The crossover happens at the “is this our main analytics engine running all day?” point.

Storage vs compute: the architectural divide

flowchart TB
    subgraph COUPLED["Old: storage and compute together"]
        direction LR
        OLD1[("Redshift dense storage nodes<br/>(historical model)")]:::store
        OLD2["scale up = bigger node, more storage AND compute"]:::weak
    end

    subgraph DECOUPLED["New: storage and compute separate"]
        direction LR
        NEW1[("storage: cheap, durable, separate")]:::store
        NEW2[("compute: scales independently,<br/>can shrink to zero")]:::strong
        NEW3["BigQuery, Redshift RA3 + RMS,<br/>Synapse serverless"]:::strong
    end

    classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px
    classDef weak fill:#fed7aa,stroke:#c2410c,color:#7c2d12,stroke-width:1.5px
    classDef strong fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px

Modern data warehouses all separate storage from compute. Storage is cheap and grows; compute scales up and down per workload. The whole industry converged on this around 2020.

What actually differs

flowchart TB
    F1["Concurrency<br/>BigQuery: implicit, effectively unlimited<br/>Redshift: limited by cluster size or concurrency scaling<br/>Synapse: depends on DWU/SLO"]:::infra
    F2["Streaming ingest<br/>BigQuery: streaming inserts, native<br/>Redshift: federated queries on Kinesis, less native<br/>Synapse: integrates with Event Hubs"]:::infra
    F3["External tables / lake-querying<br/>BigQuery: native (external tables on GCS, BigLake)<br/>Redshift: Spectrum (queries S3 directly)<br/>Synapse: serverless pools query Data Lake directly"]:::infra
    F4["ML integration<br/>BigQuery: BQML built-in<br/>Redshift: Redshift ML (SageMaker integration)<br/>Synapse: tied to Azure ML"]:::infra
    F5["Ecosystem tools (dbt, Looker, Metabase, Tableau)<br/>All three are well-supported; BigQuery has historically been the most-loved by data engineers"]:::infra

    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px

When to pick which

flowchart TB
    Q1{"Which cloud?"}:::query
    Q2{"Steady-state heavy use<br/>or spiky low-volume?"}:::query

    A1["BigQuery.<br/>Serverless by default. Lowest operational burden.<br/>BQML for in-warehouse ML."]:::strong
    A2["Redshift Serverless.<br/>If you are on AWS and want serverless with deep AWS integration."]:::strong
    A3["Redshift provisioned.<br/>Sustained high volume, predictable workload."]:::mid
    A4["Synapse / Fabric.<br/>Microsoft ecosystem, T-SQL, Power BI."]:::strong

    Q1 -->|"GCP"| A1
    Q1 -->|"AWS"| Q2
    Q2 -->|"spiky"| A2
    Q2 -->|"steady"| A3
    Q1 -->|"Azure / Microsoft"| A4

    classDef query fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef strong fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
    classDef mid fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px

For new projects on any cloud, the serverless tiers are usually the right starting point. Move to provisioned only when sustained costs justify the commitment.

Snowflake: the cross-cloud option

Snowflake (the company) runs on all three clouds and is many teams’ answer when they want a warehouse that does not lock them in. The performance is competitive with the native services; the pricing is usually higher; the operational simplicity and feature consistency across clouds is the win.

Common mistakes

  • Querying the warehouse like an OLTP database. Joins on terabyte tables are expensive; pay for the query plan you actually wrote.
  • No partitioning / clustering. Both clustering (Redshift, Snowflake) and partitioning (BigQuery, Synapse) drastically reduce scanned bytes. Always design for it.
  • SELECT * on a wide table. Columnar storage means you pay for the columns you read. Be specific.
  • Warehouse for transactional workloads. Warehouses are OLAP. Use a proper OLTP database (see OLTP vs OLAP).
  • Streaming small inserts. Warehouses prefer batches. Either use the streaming API (BigQuery streaming inserts) or batch into reasonable chunks.
  • Cost surprises. BigQuery’s per-TB-scanned model can produce surprising bills if dashboards are inefficient. Set query cost limits.
  • One warehouse for everything. Production analytics, ad-hoc exploration, and ML feature pipelines have different access patterns. Often separate workloads or workspaces are right.

Quick recap

  • All three offer columnar SQL warehouses with separated storage and compute and serverless options.
  • BigQuery: serverless from day one, lowest operational burden, per-TB pricing.
  • Redshift: deepest AWS integration, both serverless and provisioned tiers.
  • Synapse / Fabric: Microsoft ecosystem, T-SQL, Power BI integration.
  • Pick by cloud first, by usage pattern (steady vs spiky) second.
  • Partitioning, clustering, and column selection are the universal “make queries cheap” levers.

This concept sits in Stage 4 (Scaling and reliability) of the System Design Roadmap.