Implementing SCD Type 2 in Microsoft Fabric Data Warehouse with T-SQL Stored Procedure
Implementing SCD Type 2 with UPSERT for Dimension Tables
Slowly Changing Dimension (SCD) describes situations in data warehousing where the attributes of a dimension member change over time, requiring a strategy to manage and record these modifications effectively.
SCD Type 2 is a method that preserves the history of changes by adding a new row to the dimension table for each change. This approach relies on three key columns in the dimension table:
Surrogate Key: A unique identifier for each record.
FromDate: The date when the record became effective.
ToDate: The date when the record is no longer valid.
The dimension table must include these three columns to implement SCD Type 2 effectively, ensuring that all historical data is retained and enabling accurate tracking of changes over time.
Implementing SCD Type 2 with UPSERT for Dimension Tables:
In data warehousing, maintaining accurate and up-to-date records in dimension tables is crucial. To achieve this, I tested an example of SCD Type 2 using a T-SQL stored procedure, inspired by Reza from Radacad article.
This following stored procedure 👇👇 performs an UPSERT operation on the Dim_Customer table by automating the update and insertion of customer data.
The procedure will detect changes in the customer data from the source table (stored in the ‘stg’ schema). Specifically, if a change is detected in the '“City” field ( the SCD Type 2 field in this case) then it will perform an UPSERT operation by:
Updating the existing record to set the “ToDate” attribute
Inserting a new record with the updated information.
How the “UpsertDimCustomer” Stored
Procedure Works:
Here’s the stored procedure code that handles UPSERT where there new records in the source table ([stg].[Customer])
Here's a breakdown of how it works:
Source Data: The source table contains the latest customer data exactly as-is, without any modifications. This table serves as the basis for detecting changes or new records.
Update Logic: If a customer's details have changed—such as a move to a new city, the stored procedure updates the existing record in Dim_Customer by setting its ToDate to indicate the end of that record's validity.
Insert Logic: New customer records are then inserted into the Dim_Customer table with:
A unique CustomerKey (surrogate key).
A FromDate that reflects the current date.
A ToDate that remains null until a future change occurs.
CREATE PROC [dbo].[UpsertDimCustomer]
AS
BEGIN
-- Variable to store the maximum CustomerKey
DECLARE @MaxCustomerKey INT
-- Update existing records: Set "ToDate" to current date for records with changed City
UPDATE [dbo].[Dim_Customer]
SET ToDate = CONVERT(DATE, GETDATE())
WHERE CustomerKey IN (
SELECT dim.CustomerKey
FROM stg.Customer stg
LEFT OUTER JOIN [dbo].[Dim_Customer] dim
ON stg.CustomerID = dim.CustomerID AND dim.ToDate IS NULL
AND stg.City <> dim.City
)
-- Recalculate the max CustomerKey after updates
SELECT @MaxCustomerKey = MAX(CustomerKey) FROM [dbo].[Dim_Customer]
-- Insert new records for customers not present in Dim_Customer
INSERT INTO [dbo].[Dim_Customer] (CustomerID, Fullname, City, CustomerKey, FromDate)
SELECT
stg.CustomerID,
stg.Fullname,
stg.City,
@MaxCustomerKey + 1 AS CustomerKey,
CONVERT(DATE, GETDATE()) AS FromDate
FROM stg.Customer stg
LEFT OUTER JOIN [dbo].[Dim_Customer] dim
ON stg.CustomerID = dim.CustomerID AND dim.ToDate IS NULL
WHERE dim.CustomerKey IS NULL
END
GO
By automating these tasks, the UpsertDimCustomer stored procedure ensures that your dimension tables remain consistent and that historical data is preserved, enabling accurate reporting and analysis over time. This approach guarantees that your data warehouse reflects the most current state of your business while retaining a complete history of changes.
Orchestrating SCD Type 2 Data Loading for Dimension tables via Fabric Data Pipelines:
To implement this in a Microsoft Fabric Data Warehouse environment, you can integrate this stored procedure into a data pipeline. The pipeline would typically:
Execute a data copy activity to bring in the latest data from the source.
Run the UpsertDimCustomer stored procedure to track and record changes in the dimension table using the SCD Type 2 methodology.
This automated process simplifies the management of historical data changes and ensures that your dimension tables always contain the most accurate and relevant information.