Use Slowly changing dimensions in Dataform

This document you how to use the open-source Slowly changing dimensions package in Dataform as an example of using an open-source package.

The Slowly changing dimensions package contains common data models for creating type 2 slowly changing dimensions tables from mutable data sources in Dataform.

Slowly changing dimensions tables are incremental tables that contain data that can change unpredictably, not on a regular schedule, such as customers or products. In a type 2 slowly changing dimensions table, new data is appended in a new row without overwriting existing table rows. Table history is preserved in multiple records for a given key in the slowly changing dimension key. Each record has a unique key.

The Slowly changing dimensions package creates the following relations in BigQuery for a given NAME :

  • NAME - a view with scd_valid_from and scd_valid_to fields
  • NAME _updates - an incremental table that stores the change history of the source table

Dataform updates slowly changing dimensions every time it executes a slowly changing dimensions incremental table. You might want to schedule slowly changing dimensions table to run every day or every hour, depending on the granularity of changes you want to capture.

To learn how to schedule Dataform executions, see Schedule executions with Cloud Composer or Schedule executions with Workflows and Cloud Scheduler .

Before you begin

  1. On the Slowly changing dimensions releases page , copy the .tar.gz URL of the newest release.
  2. Create a Dataform repository .
  3. Create and initialize a workspace in the repository.
  4. Install the Slowly changing dimensions package in your Dataform repository .

Required roles

To get the permissions that you need to configure a package, ask your administrator to grant you the Dataform Editor ( roles/dataform.editor ) IAM role on workspaces. For more information about granting roles, see Manage access to projects, folders, and organizations .

You might also be able to get the required permissions through custom roles or other predefined roles .

Create a slowly changing dimensions table with the Slowly changing dimensions package

To create a slowly changing dimensions table with the Slowly changing dimensions package package in Dataform, follow these steps:

  1. In the Filespane, next to definitions/ , click the Moremenu.

  2. Click Create file.

  3. In the Create new filepane, do the following:

    1. In the Add a file pathfield, after definitions/ , enter the name of the file followed by .js . For example, definitions/definitions.js .

    2. Click Create file.

  4. In the Filespane, select the newly created .js . file.

  5. Import the package to the file in the following format:

       
     const 
      
      CONSTANT 
     - 
     NAME 
     
      
     = 
      
     require 
     ( 
     "dataform-scd" 
     ); 
     
    

    Replace CONSTANT-NAME with a name for the constant, for example, scd .

  6. Create slowly changing dimensions tables in the following format:

      scd 
     ( 
     "source_data_scd" 
     , 
      
     { 
      
     uniqueKey 
     : 
      
     " UNIQUE_ID 
    " 
     , 
      
     timestamp 
     : 
      
     " UPDATED_AT 
    " 
     , 
      
     // 
      
     A 
      
     field 
      
     that 
      
     stores 
      
     a 
      
     timestamp 
      
     or 
      
     date 
      
     of 
      
     when 
      
     the 
      
     row 
      
     was 
      
     last 
      
     changed 
     . 
      
     source 
     : 
      
     { 
      
     schema 
     : 
      
     " SOURCE_SCHEMA 
    " 
     , 
      
     // 
      
     The 
      
     source 
      
     table 
      
     to 
      
     build 
      
     slowly 
      
     changing 
      
     dimensions 
      
     from 
     . 
      
     name 
     : 
      
     " SOURCE_SCHEMA_NAME 
    " 
     , 
      
     } 
     , 
      
     incrementalConfig 
     : 
      
     { 
      
     // 
      
     Any 
      
     configuration 
      
     parameters 
      
     to 
      
     apply 
      
     to 
      
     the 
      
     incremental 
      
     table 
      
     that 
      
     will 
      
     be 
      
     created. 
      
     bigquery 
     : 
      
     { 
      
     partitionBy 
     : 
      
     " UPDATED_AT 
    " 
     , 
      
     } 
     , 
      
     } 
     , 
     } 
     ); 
     
    

    Replace the following:

    • UNIQUE_ID : a unique identifier for rows in the table
    • UPDATED_AT : a name for the field that stores a timestamp or date of when the row was last changed, for example, updated_at
    • SOURCE_SCHEMA : the schema of the source table, for example, dataform_scd_example
    • SOURCE_SCHEMA_NAME : the name of the source table, for example, source_data
  7. Optional: Click Format.

The following code sample shows a slowly changing dimensions table definition created with the Slowly changing dimensions package:

  const 
  
 scd 
  
 = 
  
 require 
 ( 
 "dataform-scd" 
 ); 
 /** 
  
 * 
  
 Create 
  
 an 
  
 SCD 
  
 table 
  
 on 
  
 top 
  
 of 
  
 the 
  
 table 
  
 defined 
  
 in 
  
 source_data 
 . 
 sqlx 
 . 
  
 */ 
 const 
  
 { 
  
 updates 
 , 
  
 view 
  
 } 
  
 = 
  
 scd 
 ( 
 "source_data_scd" 
 , 
  
 { 
  
 // 
  
 A 
  
 unique 
  
 identifier 
  
 for 
  
 rows 
  
 in 
  
 the 
  
 table 
 . 
  
 uniqueKey 
 : 
  
 "user_id" 
 , 
  
 // 
  
 A 
  
 field 
  
 that 
  
 stores 
  
 a 
  
 timestamp 
  
 or 
  
 date 
  
 of 
  
 when 
  
 the 
  
 row 
  
 was 
  
 last 
  
 changed 
 . 
  
 timestamp 
 : 
  
 "updated_at" 
 , 
  
 // 
  
 The 
  
 source 
  
 table 
  
 to 
  
 build 
  
 slowly 
  
 changing 
  
 dimensions 
  
 from 
 . 
  
 source 
 : 
  
 { 
  
 schema 
 : 
  
 "dataform_scd_example" 
 , 
  
 name 
 : 
  
 "source_data" 
 , 
  
 }, 
  
 // 
  
 Any 
  
 tags 
  
 that 
  
 will 
  
 be 
  
 added 
  
 to 
  
 actions 
 . 
  
 tags 
 : 
  
 [ 
 "slowly-changing-dimensions" 
 ], 
  
 // 
  
 Documentation 
  
 of 
  
 table 
  
 columns 
  
 columns 
 : 
  
 { 
 user_id 
 : 
  
 "User ID" 
 , 
  
 some_field 
 : 
  
 "Data Field" 
 , 
  
 updated_at 
 : 
  
 "Timestamp for updates" 
 }, 
  
 // 
  
 Configuration 
  
 parameters 
  
 to 
  
 apply 
  
 to 
  
 the 
  
 incremental 
  
 table 
  
 that 
  
 will 
  
 be 
  
 created 
 . 
  
 incrementalConfig 
 : 
  
 { 
  
 bigquery 
 : 
  
 { 
  
 partitionBy 
 : 
  
 "updated_at" 
 , 
  
 }, 
  
 }, 
 }); 
 // 
  
 Additional 
  
 customization 
  
 of 
  
 the 
  
 created 
  
 models 
  
 can 
  
 be 
  
 done 
  
 by 
  
 using 
  
 the 
  
 returned 
  
 actions 
  
 objects 
 . 
 updates 
 . 
 config 
 ({ 
  
 description 
 : 
  
 "Updates table for SCD" 
 , 
 }); 
 

What's next