Slowly Changing Dimensions (SCD) are one of those concepts that quietly decide whether your analytics will be trusted – or endlessly questioned in meetings. Any time a "stable" business attribute changes – a customer's region, a product's category, a sales rep's team – you face a choice:
.png)
SCDs are the patterns that formalize those choices in your data warehouse. Getting them right means your dashboards, BI tools, and AI models see the same consistent history, no matter how many times your source systems change their mind.

In this article, you'll see what SCDs are, why they matter, and how SCD types 0–6 map to specific business needs in modern cloud warehouses like BigQuery, Snowflake, Redshift, Databricks, and Athena.
In dimensional modeling, a dimension is a table that describes the who, what, where, when, and how of your business: customers, products, campaigns, locations, employees, and more. A Slowly Changing Dimension is a dimension where those descriptive attributes:
Typical examples:
The word "slowly" doesn't mean "rarely" – it means "not on every row." And each time something changes, you must decide:
SCD types 0–6 are standardized modeling patterns that answer those questions explicitly.

In classic Kimball-style dimensional modeling, a data warehouse is organized into:
An SCD defines:
Each SCD type (0–6) prescribes a specific structure and update behavior so analysts can reason about "current" vs "as-of" attributes consistently.
Incorrect handling of changing attributes usually shows up as business confusion, not SQL errors. Typical symptoms are:
SCDs matter because they:
When you apply SCD patterns consistently, analysts can reliably answer questions like:
A useful way to frame this is the classic facts vs dimensions distinction:
Facts tend to be immutable once recorded ("the order was placed for $120"), but dimensions are not. A customer can move, products can be re-categorized, and account managers can change. SCDs are about how you store and query those changing dimension attributes so your facts remain analytically meaningful over time.
You need SCDs anywhere the business cares about both what the attribute is now and what it used to be when past events occurred.
In each case, simple overwrites make either historical views or current-state views incorrect. SCDs allow you to support both in a controlled way.
If you ignore SCDs and simply overwrite dimension values, you introduce subtle but serious issues:
For example, if all past orders are joined to a customer's current region (because you overwrote the region in the dimension table), last year's "North America" revenue may suddenly appear larger or smaller, depending on today's assignments.
Once stakeholders notice this kind of drift, trust in the data warehouse declines quickly. Typical business-side consequences:
Every SCD decision translates into table structure and ETL/ELT logic in your warehouse:
In modern cloud warehouses like BigQuery, Snowflake, Redshift, Databricks, and Athena, this typically means:
This is why SCD logic belongs in the warehouse layer, not in BI tools (where logic is duplicated and fragile) or application code (where it's invisible to analysts and difficult to audit). You define SCD logic once in your transformation layer, and everything downstream – dashboards, reports, experiments, and AI models – reads the same consistent history.
If you're not ready to design all of this from scratch, OWOX Data Marts can help you implement SCD logic directly on top of your warehouse, with ready-to-use data models for marketing and product analytics. You can explore it hands-on by starting free here: OWOX Data Marts.
Instead of treating SCD types as academic labels, think of them as patterns along three axes:
This table is intentionally simplified. Real implementations will add details like surrogate keys, flags, and validity timestamps, which we'll cover in later sections.
Each SCD type balances accuracy, complexity, and storage differently.
In modern cloud warehouses, storage is usually cheap. Complexity and maintainability tend to matter more than raw disk usage.
You don't have to pick a single SCD type for your entire warehouse. Instead, decide per attribute or per dimension:
A practical rule of thumb:
Most real-world warehouses rely heavily on three patterns: Type 0, Type 1, and Type 2. Understanding these clearly is essential if you want to make the right call in situations like SCD Type 1 vs Type 2, design a robust customer dimension SCD, or implement a correct SCD Type 2 surrogate key strategy.
This section walks through each type with concrete business examples and practical SQL/dbt-style patterns you can adapt in BigQuery, Snowflake, Redshift, Databricks, or Athena.

Definition: Type 0 assumes that once a dimension row is created, specific attributes never change. If the source system changes them, you ignore those changes in the warehouse.
Typical attributes:
These attributes are baked into historical facts – if you later reclassify them, you risk corrupting past analysis.
Example: Product reference data
1CREATE TABLE dim_product (
2product_sk BIGINT GENERATED BY DEFAULT AS IDENTITY,
3product_id STRING, -- business key
4sku_code STRING, -- fixed, Type 0
5created_date DATE,
6-- other descriptive attributes...
7PRIMARY KEY (product_sk)
8 );
Your load logic might:
This keeps a stable reference backbone for joins and ensures analyses like "profit by original SKU" remain reproducible.
When to use Type 0:
Definition: Type 1 simply overwrites attributes when they change. You keep a single row per business key; no historical versions.
This is ideal for attributes where:
Common scenarios:
Example: Correcting customer contact info
1CREATE TABLE dim_customer (
2customer_sk BIGINT GENERATED BY DEFAULT AS IDENTITY,
3customer_id STRING, -- business key
4customer_name STRING,
5email STRING,
6phone STRING,
7signup_date DATE,
8PRIMARY KEY (customer_sk)
9);
Upserts (simplified) in dbt-style pseudo-SQL:
1MERGE INTO dim_customer AS t
2USING stg_customer_updates AS s
3ON t.customer_id = s.customer_id
4WHEN MATCHED THEN
5UPDATE SET
6customer_name = s.customer_name,
7email = s.email,
8phone = s.phone
9WHEN NOT MATCHED THEN
10INSERT (customer_id, customer_name, email, phone, signup_date)
11VALUES (s.customer_id, s.customer_name, s.email, s.phone, s.signup_date);
No additional columns, no history logic. Facts always join to the latest attribute values.
When to use Type 1:
Definition: Type 2 stores a new row for each change of relevant attributes and uses:
This is the go-to pattern when you need reliable historical analysis.
Key principles:
Example: Customer lifecycle and region changes
1CREATE TABLE dim_customer_scd2 (
2customer_sk BIGINT GENERATED BY DEFAULT AS IDENTITY,
3customer_id STRING, -- business key
4customer_name STRING,
5lifecycle_stage STRING, -- e.g., 'Trial', 'Active', 'Churned'
6region STRING, -- e.g., 'EMEA', 'NA'
7valid_from DATE,
8valid_to DATE,
9is_current BOOLEAN,
10PRIMARY KEY (customer_sk)
11);
A single customer_id might have rows like:
Basic SCD Type 2 upsert (simplified logic)
1-- 1. End-date existing current rows where something changed
2UPDATE dim_customer_scd2 AS t
3SET
4valid_to = s.change_date - INTERVAL '1 DAY',
5is_current = FALSE
6FROM stg_customer_changes AS s
7WHERE t.customer_id = s.customer_id
8AND t.is_current = TRUE
9AND (
10t.lifecycle_stage != s.lifecycle_stage
11OR t.region != s.region
12);
13
14-- 2. Insert new "current" version rows
15INSERT INTO dim_customer_scd2 (
16customer_id, customer_name, lifecycle_stage, region,
17valid_from, valid_to,is_current
18)
19SELECT
20s.customer_id,
21s.customer_name,
22s.lifecycle_stage,
23s.region,
24s.change_date AS valid_from,
25DATE '9999-12-31' AS valid_to,
26TRUE AS is_current
27FROM stg_customer_changes AS s;
Querying with SCD Type 2
You join facts (e.g., orders) to the dimension using the surrogate key already resolved in a previous step or by time-based logic:
1SELECT
2o.order_id,
3o.order_date,
4c.lifecycle_stage,
5c.region
6FROM fact_order AS o
7JOIN dim_customer_scd2 AS c
8ON o.customer_sk = c.customer_sk;
Or, if you only have customer_id and order_date:
1JOIN dim_customer_scd2 AS c
2ON o.customer_id = c.customer_id
3AND o.order_date BETWEEN c.valid_from AND c.valid_to
When to use Type 2:
This pattern is crucial for a robust customer dimension SCD in any subscription or B2B context.
Example question: "Was this customer in 'Trial' or 'Active' when they made their first purchase?" → Requires Type 2; Type 1 can't answer this reliably.
Example (Type 2 price plan dimension):
1CREATE TABLE dim_pricing_plan_scd2 (
2plan_sk BIGINT GENERATED BY DEFAULT AS IDENTITY,
3plan_id STRING,
4plan_name STRING,
5monthly_price NUMERIC,
6valid_from DATE,
7valid_to DATE,
8is_current BOOLEAN
9);
Regions and territories are often redrawn:
Example question: "What was Q1 revenue under the region layout at that time?" → Type 2. "How would Q1 have looked under today's regions?" → Type 1 (or Type 6 hybrid, discussed later).
In practice, a single dimension often mixes approaches: some attributes are Type 0 (immutable), some Type 1 (corrections only), and others Type 2 (full history). The right balance depends on which questions stakeholders ask, regulatory/audit requirements, and tolerance for complexity vs. need for accuracy.
Once you've mastered Types 0, 1, and 2, you'll occasionally run into more complex situations where those basic patterns don't quite fit. That's where SCD types 3, 4, 5, 6 come in – the "advanced slowly changing dimensions" patterns used for nuanced, high-scale, or hybrid requirements.

Core idea: Keep current value + a fixed number of previous values as separate columns on the same row. You don't add new rows; you rotate values between columns.
Example: tracking a customer's current and previous segments:
1CREATE TABLE dim_customer_type3 (
2customer_id STRING PRIMARY KEY,
3current_segment STRING,
4previous_segment STRING,
5segment_assigned_at DATE
6);
When current_segment changes, you:
Use cases:
Pros:
Cons:
Both Types 4 and 5 split concerns between the current state and history, but they do it differently.
Core idea: Keep a slim current dimension table and move historical versions into a separate history table.
Example:
1CREATE TABLE dim_customer_current (
2customer_id STRING PRIMARY KEY,
3current_segment STRING,
4region STRING
5-- no history columns
6);
7
8CREATE TABLE dim_customer_history (
9customer_hist_sk BIGINT GENERATED BY DEFAULT AS IDENTITY,
10customer_id STRING,
11segment STRING,
12region STRING,
13valid_from DATE,
14valid_to DATE
15);
Operational systems and most BI reports join to dim_customer_current. More advanced analytics, audits, or time-travel analyses query dim_customer_history.
Pros:
Cons:
Core idea: Move highly volatile, often-used attributes into a separate mini-dimension, while retaining a Type 1 "current snapshot" in the main dimension for convenience.
Example:
1-- Mini-dimension for fast-changing attributes
2CREATE TABLE dim_customer_profile_mini (
3profile_sk BIGINT GENERATED BY DEFAULT AS IDENTITY,
4segment STRING,
5risk_band STRING,
6valid_from DATE,
7valid_to DATE
8);
9
10-- Main dimension referencing the mini-dimension
11CREATE TABLE dim_customer_type5 (
12customer_id STRING PRIMARY KEY,
13profile_sk BIGINT, -- historical link
14current_segment STRING, -- Type 1 override
15current_risk STRING
16);
Facts can join via profile_sk to get historical segmentation or use current_segment for quick, high-level reporting.
Pros:
Cons:
Type 6 is the "Swiss Army knife" of SCDs – a hybrid of Type 1 (overwrite), Type 2 (row-versioned history), and Type 3 (previous value columns).
Core idea: Keep full row-level history and convenience columns for current and, sometimes, previous values.
Example:
1CREATE TABLE dim_customer_type6 (
2customer_sk BIGINT GENERATED BY DEFAULT AS IDENTITY,
3customer_id STRING, -- business key
4segment STRING, -- historical (Type 2)
5region STRING, -- historical (Type 2)
6current_segment STRING, -- Type 1-style current snapshot
7previous_segment STRING, -- Type 3-style previous
8valid_from DATE,
9valid_to DATE,
10is_current BOOLEAN
11);
When a customer's segment changes, you:
Use cases:
Pros:
Cons:
Not every team needs advanced slowly changing dimensions from day one. Overengineering SCDs can slow down adoption and confuse analysts.
Use advanced types when:
Keep it simple when:
A practical approach:
For most teams, SCD Type 2 is where slowly changing dimensions move from theory to real engineering. Whether you're building dbt slowly changing dimensions, designing an SCD Type 2 surrogate key strategy, or implementing BigQuery slowly changing dimensions, the same fundamentals apply:

An SCD Type 2 dimension must be crystal clear about grain and keys.
Grain definition: Before modeling, answer: "What does one row in this dimension represent?"
For a typical customer dimension SCD Type 2:
Schema sketch:
1CREATE TABLE dim_customer_scd2 (
2customer_sk BIGINT GENERATED BY DEFAULT AS IDENTITY,
3customer_id STRING, -- natural key
4customer_name STRING,
5lifecycle_stage STRING,
6region STRING,
7-- SCD metadata
8valid_from TIMESTAMP,
9valid_to TIMESTAMP,
10is_current BOOLEAN,
11PRIMARY KEY (customer_sk)
12);
Key rule: Facts should point to customer_sk where possible, so each event is tied to the exact historical version that was valid at that time. If you can't do that (e.g., due to system constraints), you'll need time-based joins via customer_id + date, which are more complex and slower.
SCD Type 2 works because every row knows when it was valid.
Typical pattern:
Design considerations:
Example (closed-open with sentinel):
1valid_from valid_to is_current
2------------------- ------------------- ----------
32024-01-01 00:00:00 2024-03-10 00:00:00 FALSE
42024-03-10 00:00:00 9999-12-31 00:00:00 TRUE
Why the is_current flag?
Basic dbt tests (YAML-style example):
1tests:
2 - dbt_utils.unique_combination_of_columns:
3 combination_of_columns: ['customer_id', 'valid_from']
4 - dbt_utils.expression_is_true:
5 expression: "is_current = TRUE"
6 where: "valid_to = '9999-12-31'"
Storage is rarely the limiting factor anymore, but scan cost and join performance still matter. Behavior varies slightly by platform:
Across all platforms:
A practical dbt slowly changing dimensions implementation follows a repeatable pattern:
Below is a simplified pattern you can adapt to BigQuery, Snowflake, Redshift, or Databricks.
1WITH latest_dim AS (
2 SELECT *
3 FROM dim_customer_scd2
4 WHERE is_current = TRUE
5),
6
7staged AS (
8 SELECT
9 customer_id,
10 customer_name,
11 lifecycle_stage,
12 region,
13 updated_at -- from source
14 FROM stg_customer
15),
16
17diff AS (
18 SELECT
19 s.*,
20 d.customer_sk,
21 d.lifecycle_stage AS old_lifecycle_stage,
22 d.region AS old_region
23 FROM staged s
24 LEFT JOIN latest_dim d
25 ON s.customer_id = d.customer_id
26)
27
28SELECT *
29FROM diff
30WHERE
31 old_lifecycle_stage IS NULL
32 OR old_lifecycle_stage != lifecycle_stage
33 OR old_region != region;
This diff identifies new customers (no customer_sk) and existing customers whose tracked attributes changed.
1MERGE INTO dim_customer_scd2 AS t
2USING diff AS s
3ON t.customer_id = s.customer_id
4AND t.is_current = TRUE
5
6WHEN MATCHED THEN
7 -- End-date existing current row
8 UPDATE SET
9 valid_to = s.updated_at,
10 is_current = FALSE
11
12WHEN NOT MATCHED THEN
13 -- Insert new version row
14 INSERT (
15 customer_id,
16 customer_name,
17 lifecycle_stage,
18 region,
19 valid_from,
20 valid_to,
21 is_current
22 )
23 VALUES (
24 s.customer_id,
25 s.customer_name,
26 s.lifecycle_stage,
27 s.region,
28 s.updated_at,
29 TIMESTAMP '9999-12-31 00:00:00',
30 TRUE
31 );
Adapter notes:
You can wrap this pattern into a reusable macro, e.g., scd_type_2, that accepts table name, natural key columns, SCD-tracked columns, and valid_from/valid_to columns. This keeps your SCD logic consistent across dimensions.
Common pitfalls:
Best practices:

By this point, you've seen how different SCD types work and how easy it is to create reporting inconsistencies when SCD logic is scattered across tools. The missing piece is how to make all that careful modeling reusable and dependable at scale.
That's where data mart slowly changing dimensions come in. Instead of letting every report and notebook reinvent history, you expose a small set of governed data marts that encode SCD rules once and serve them everywhere.
A data mart is a curated, business-friendly layer on top of your raw and modeled tables. For SCDs, this means:
For example:
Self-service users don't need to understand how dbt slowly changing dimensions are implemented. They just need clear table names, stable schemas, and confidence that "revenue by region" means the same thing everywhere.
Once your SCD-aware dimensions are centralized in data marts, you can plug them into:
The key outcome: every consumer reads from the same governed models, whether it's a CEO's dashboard, a retention cohort in a notebook, or a feature pipeline for an ML model.
If you'd rather not hand-roll all of this, OWOX Data Marts helps you implement and maintain SCD logic directly in your cloud warehouse. It provides:
Instead of rebuilding SCD patterns across every project, you get tested, centralized queries that every BI tool, spreadsheet, and AI workflow can consume as-is. You can explore it here: OWOX Data Marts.