The Slowly changing dimensions package contains common data models for creating type 2 slowly changing dimensions tables from mutable data sources in Dataform.
Slowly changing dimensions tables are incremental tables that contain data that can change unpredictably, not on a regular schedule, such as customers or products. In a type 2 slowly changing dimensions table, new data is appended in a new row without overwriting existing table rows. Table history is preserved in multiple records for a given key in the slowly changing dimension key. Each record has a unique key.
The Slowly changing dimensions package creates the following relations in
BigQuery for a given NAME
:
-
NAME
- a view withscd_valid_from
andscd_valid_to
fields -
NAME _updates
- an incremental table that stores the change history of the source table
Dataform updates slowly changing dimensions every time it executes a slowly changing dimensions incremental table. You might want to schedule slowly changing dimensions table to run every day or every hour, depending on the granularity of changes you want to capture.
To learn how to schedule Dataform executions, see Schedule executions with Cloud Composer or Schedule executions with Workflows and Cloud Scheduler .
Before you begin
- On the Slowly changing dimensions releases page
,
copy the
.tar.gz
URL of the newest release. - Create a Dataform repository .
- Create and initialize a workspace in the repository.
- Install the Slowly changing dimensions package in your Dataform repository .
Required roles
To get the permissions that you need to configure a package,
ask your administrator to grant you the Dataform Editor
( roles/dataform.editor
) IAM role on workspaces.
For more information about granting roles, see Manage access to projects, folders, and organizations
.
You might also be able to get the required permissions through custom roles or other predefined roles .
Create a slowly changing dimensions table with the Slowly changing dimensions package
To create a slowly changing dimensions table with the Slowly changing dimensions package package in Dataform, follow these steps:
-
In the Filespane, next to
definitions/
, click the Moremenu. -
Click Create file.
-
In the Create new filepane, do the following:
-
In the Add a file pathfield, after
definitions/
, enter the name of the file followed by.js
. For example,definitions/definitions.js
. -
Click Create file.
-
-
In the Filespane, select the newly created
.js
. file. -
Import the package to the file in the following format:
const CONSTANT - NAME = require ( "dataform-scd" );
Replace CONSTANT-NAME with a name for the constant, for example,
scd
. -
Create slowly changing dimensions tables in the following format:
scd ( "source_data_scd" , { uniqueKey : " UNIQUE_ID " , timestamp : " UPDATED_AT " , // A field that stores a timestamp or date of when the row was last changed . source : { schema : " SOURCE_SCHEMA " , // The source table to build slowly changing dimensions from . name : " SOURCE_SCHEMA_NAME " , } , incrementalConfig : { // Any configuration parameters to apply to the incremental table that will be created. bigquery : { partitionBy : " UPDATED_AT " , } , } , } );
Replace the following:
- UNIQUE_ID : a unique identifier for rows in the table
- UPDATED_AT
: a name for the field that
stores a timestamp or date of when the row was last changed, for example,
updated_at
- SOURCE_SCHEMA
: the schema of the source
table, for example,
dataform_scd_example
- SOURCE_SCHEMA_NAME
: the name of the source
table, for example,
source_data
-
Optional: Click Format.
The following code sample shows a slowly changing dimensions table definition created with the Slowly changing dimensions package:
const
scd
=
require
(
"dataform-scd"
);
/**
*
Create
an
SCD
table
on
top
of
the
table
defined
in
source_data
.
sqlx
.
*/
const
{
updates
,
view
}
=
scd
(
"source_data_scd"
,
{
//
A
unique
identifier
for
rows
in
the
table
.
uniqueKey
:
"user_id"
,
//
A
field
that
stores
a
timestamp
or
date
of
when
the
row
was
last
changed
.
timestamp
:
"updated_at"
,
//
The
source
table
to
build
slowly
changing
dimensions
from
.
source
:
{
schema
:
"dataform_scd_example"
,
name
:
"source_data"
,
},
//
Any
tags
that
will
be
added
to
actions
.
tags
:
[
"slowly-changing-dimensions"
],
//
Documentation
of
table
columns
columns
:
{
user_id
:
"User ID"
,
some_field
:
"Data Field"
,
updated_at
:
"Timestamp for updates"
},
//
Configuration
parameters
to
apply
to
the
incremental
table
that
will
be
created
.
incrementalConfig
:
{
bigquery
:
{
partitionBy
:
"updated_at"
,
},
},
});
//
Additional
customization
of
the
created
models
can
be
done
by
using
the
returned
actions
objects
.
updates
.
config
({
description
:
"Updates table for SCD"
,
});
What's next
- To learn how to create incremental tables in Dataform, see Configure an incremental table .
- To learn more about packages in Dataform with JavaScript includes, see Introduction to JavaScript in Dataform .
- To learn how to create your own package and use it in Dataform, see Create a package in Dataform .