Fabric | dbt – Slowly Changing Dimension (SCD 2) – Snapshots and Check Strategy in dbt

Slowly Changing Dimensions (SCD) represent a way to store and manage historical changes of dimensions over time in a data warehouse. To just recap the theory of what facts and dimensions are, I recommend to check the article – Facts and Dimensions – Tables in a Data Warehouse before continuing.

In the context of a medallion architecture, in which we work in Fabric, generating SCDs falls into the Silver layer, with the Gold layer further handling edge cases of validity intervals. We will use Type 2 dimensions (see Theory below). The goal is to ensure full history of changes for all key dimensions, allowing back-in-time analysis (time travel). Within Fabric, we have SCD implemented in the dbt (Data Build Tool) environment, which allows managing data transformations declaratively – so often we don’t need to exert too much effort to correctly build dimensions, as dbt handles it for us. But step by step…

Theory – What is a Slowly Changing Dimension (SCD) and Its Goals

The Slowly Changing Dimensions concept was first systematically described by Ralph Kimball in the book “The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses”, published in 1996. 1

In simple terms, SCD describes a set of techniques and strategies to capture and manage changes in dimension data. Historically, several types of Slowly Changing Dimensions have evolved, with Type 1 and 2 being the most commonly used in practice. The choice of a specific SCD type in a data solution affects two main areas:

  • The primary goal of SCD is to ensure accurate interpretation of historical and current data in relation to proper analysis of facts and current and future reporting needs of enterprise stakeholders – considering the secondary goal:
  • The secondary goal in terms of SCD type selection is consideration of performance, technical aspects, and sustainability of the data solution. SCD 2 is technically more demanding in terms of development and operational costs because it generates more records. Moreover, it naturally increases the cost of the entire data model and can have significant financial implications, especially when the data environment operates in Pay-As-You-Go environments (Azure, Snowflake, AWS, etc.).

Types of Slowly Changing Dimensions – Significance and Time Validity of Records

Several types of SCD exist:

  • SCD Type 0 – No changes are stored >> data is static.
  • SCD Type 1 – Changes overwrite the original value >> history is not preserved, only the latest state is kept.
  • SCD Type 2 – Each change creates a new record >> full history of changes is preserved. This is our approach for the Silver/Stage layer in Fabric.
  • SCD Type 3 – Limited history is stored, e.g., previous and current value >> less common.
  • Others – Some sources describe additional SCD types (4-7), but they are more relevant for specific scenarios, such as very large dimension tables with performance optimization needs. 2

In practice, SCD Type 2 is the most commonly used because it allows maintaining the complete history of dimension changes, including validity intervals of individual records – typically represented as:

  • valid_from – when the historical version of the record starts being valid
  • valid_to – until when the historical version of the record remains valid

valid_from and valid_to should be discrete across shared business keys (primary key of the source table) – i.e., they must be consecutive without time gaps.

SCD 2 and Edge Values of Validity – Sentinel Value

In practice, data architectures often need to handle edge intervals of validity. In the table below, filtered by AddressId=659, only specific attributes are selected.

  • The first historical version of the address is valid from 2025-01-01 to 2025-10-19
  • The second historical version is valid from 2025-10-19 to NULL (infinity) – current record

dbt-validity-scd2-slowly-changing-dimension

Here’s the core issue. This method of handling edge values is not ideal for the Gold layer, as it complicates joins:

  • Replacing NULL with a date in every join
  • The initial value may theoretically not align with the driving date in the fact table (the fact date may precede the dimension start date)

For these reasons, a common practice is to implement sentinel values:

  • The first occurrence of a dimension is assigned Technical Start = 1900-01-01 (or any old date)
  • The current value is assigned Technical End = 3000-12-31 (or any other distant date)

How I implement SCD in a Fabric project – using dbt, we generate SCD 2 in the Silver layer (example below) as full history; sentinel values or other adjustments are handled at the Gold layer during preparation of dimension and fact tables.

Implementing SCD 2 in dbt using Snapshots – Example

In dbt, SCD Type 2 is typically realized using snapshots. Snapshots capture table states over time and automatically manage historical changes based on a defined strategy (strategy), typically timestamp or check.

In this Fabric project, the implementation is based on the check strategy, which compares the current data state with the state stored in the previous snapshot. If any of the defined columns (check_cols) change (can be defined or set to all), dbt creates a new record and closes the previous one (using dbt_valid_from and dbt_valid_to).

{% snapshot st_aw_address %}

{{
  config(
    target_schema='02_silver',
    strategy='check',
    check_cols='all',
    unique_key='source_businesskey',
    updated_at='ingestion_date',
    invalidate_hard_deletes=True,
    schema='02_silver',
    alias='st_aw_address'
  )
}}

WITH [ranked_source] AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY [source_businesskey],[source_system] ORDER BY [ingestion_date] DESC,[source_updated_at] DESC) AS [dbt_row_num]
    FROM {{ source('bronze_data_source_fabricdatalake', 'bt_aw_address') }}
    
    /* DATA CLEANSING, FORMATTING, DEDUPLICATION */
)
SELECT *
FROM [ranked_source]
WHERE [dbt_row_num] = 1

{% endsnapshot %}

Similarly, all snapshot table definitions are prepared for historical tracking. The screenshot below shows how this looks in dbt. Execution is simply via the terminal command “dbt snapshot“.

dbp-snapshot-example-sql-script

This straightforward approach allows automatic versioning of dimensions whenever source values change. Using the parameter invalidate_hard_deletes=True ensures that physically deleted source records are also properly closed in the snapshot. The result is a fully historized dimension in the Silver layer, providing a consistent data source for the Gold layer and final preparation of dimensions.

Disadvantage of the dbt snapshot approach may be higher computational cost, especially for large dimension tables. However, dimension tables are generally not as large as fact tables, and the approach is acceptable due to significant time savings.

Summary of SCD 2 via dbt

Implementing SCD Type 2 with dbt snapshots provides a robust, automated method for managing historical changes of dimensions. Combined with Fabric architecture and medallion approach, it is an effective tool for creating transparent and auditable data models with full historical tracking.

5/5 - (1 vote)

Reference

  1. Wikipedia, Ralph Kimball [online]. [cited 2025-10-31]. Available from: https://en.wikipedia.org/wiki/Ralph_Kimball
  2. hevodata.com, Slowly Changing Dimensions: 5 Key Types and Examples [online]. [cited 2025-10-31]. Available from: https://hevodata.com/learn/slowly-changing-dimensions/
Category: Fabric

About Ing. Jan Zedníček - Data Engineer & Controlling

My name is Jan Zednicek, and I have been working as a freelance Data Engineer for roughly 10 years. During this time, I have been publishing case studies and technical guides on this website, targeting professionals, students, and enthusiasts interested in Data Engineering particularly on Microsoft technologies as well as corporate finance and reporting solutions. 🔥 If you found this article helpful, please share it or mention me on your website or Community forum

Leave a Reply

Your email address will not be published. Required fields are marked *