This document shows you how to define tables with Dataform core in Dataform.
About table definitions
To define a table, you define the table type and write a SELECT
statement in
a type: "table"
SQLX file. Dataform then compiles your
Dataform core code into SQL, executes the SQL code, and creates your
defined tables in BigQuery.
In a Dataform core SELECT
statement, you define table structure and
reference other objects of your SQL workflow.
In addition to defining tables in a type: "table"
SLQX file, you can create
empty tables by defining a custom SQL query in a type: "operations"
SQLX file.
For more information, see Create an empty table
.
Reference dependencies with ref
To reference a SQL workflow object in a SELECT
statement and automatically
add it as a dependency, use the ref
function. Dataform executes
dependencies before tables that depend on them to ensure correct pipeline ordering.
The ref
function is a Dataform core built-in function that is
critical to dependency management in Dataform. The ref
function lets
you reference and automatically depend on the following objects defined in your
Dataform SQL workflow instead of hard coding the schema and table names:
- tables of all supported table types
- data source declarations
- custom SQL operations
with the
hasOutput
property set totrue
Dataform uses the ref
function to build a dependency tree of all the
tables to be created or updated.
After compiling, Dataform adds boilerplate statements like CREATE
, REPLACE
, INSERT
, or MERGE
to the SQL statement.
The following code sample shows a table definition with the use of
the ref
function:
config
{
type:
"table"
}
SELECT
order_date
AS
date,
order_id
AS
order_id,
order_status
AS
order_status,
SUM(item_count)
AS
item_count,
SUM(amount)
AS
revenue
FROM
${
ref
(
"store_clean"
)
}
GROUP
BY
1,
2
In the ref
function, you provide the name of the table or data source
declaration that you want to depend on. This is typically the filename of the
SQLX file in which that table or data source declaration is defined.
If a table name is overridden, use the overridden name in the ref
function.
For example, reference a table with config { name: "overridden_name" }
as ref("overridden_name")
. For more information about overriding table names,
see Configure additional table settings
.
When you have multiple tables of the same name in different schemas, you can
reference a specific table by providing two arguments to the ref
function:
schema name and table name.
The following code sample shows the ref
function with two arguments to
specify a table within a specific schema:
config
{
type:
"table"
}
SELECT
*
FROM
${
ref
(
"schema"
,
"store_clean"
)
}
You can also add table dependencies manually to the config
block for tables,
assertions, data source declarations, or custom SQL operations that are not
referenced in a ref
function in the SELECT
statement. Dataform
executes these dependencies before dependent tables.
The following code sample shows a table dependency in the config
block:
config { dependencies: [ "unreferenced_table" ] }
SELECT * FROM ...
For more information on dependency management in your SQL workflow, see Declare dependencies .
Reference other tables with resolve
The resolve
function
lets you reference a table or data source declaration in a SELECT
statement
like the ref
function, but does not add the reference as a dependency. This
means that the object referenced using resolve
does not affect execution of
the table that uses resolve
.
For more information on built-in Dataform core functions, see the Dataform core reference .
Before you begin
- Create and initialize a development workspace in your repository .
- Optional: Declare a data source .
Required roles
To get the permissions that you need to define a table,
ask your administrator to grant you the Dataform Editor
( roles/dataform.editor
) IAM role on workspaces.
For more information about granting roles, see Manage access to projects, folders, and organizations
.
You might also be able to get the required permissions through custom roles or other predefined roles .
Create a SQLX file for table definition
Store table definition SQLX files in the definitions/
directory. To create a
new SQLX file in the definitions/
directory, follow these steps:
-
In the Cloud Console, go to the Dataformpage.
-
To open a repository, click the repository name.
-
To open a development workspace, click the workspace name.
-
In the Filespane, next to
definitions/
, click the Moremenu. -
Click Create file.
-
In the Add a file pathfield, enter the name of the file followed by
.sqlx
afterdefinitions/
. For example,definitions/my-table.sqlx
.Filenames can only include numbers, letters, hyphens, and underscores.
-
Click Create file.
Define the table type
To create a new table type definition, follow these steps:
- In your development workspace, in the Filespane, expand the
definitions/
directory. - Select the table definition SQLX file that you want to edit.
-
In the file, enter the following code snippet:
config { type : " TABLE_TYPE " }
Replace TABLE_TYPE with one of the following table types:
-
table
-
incremental
-
view
-
-
Optional: To define a materialized view, enter the
materialized
property undertype: "view"
in the following format:config { type : "view" , materialized : true }
For more information, see
ITableConfig
Dataform core reference . -
Optional: Click Format.
Define table structure and dependencies
To write a table definition SELECT
statement and define table structure and
dependencies, follow these steps:
- In your development workspace, in the Filespane, expand
the
definitions/
directory. - Select the table definition SQLX file that you want to edit.
- Below the
config
block, write aSELECT
statement. - Optional: Click Format.
The following code sample shows a table definition with a SELECT
statement
and the ref
function:
config
{
type:
"table"
}
SELECT
customers.id
AS
id,
customers.first_name
AS
first_name,
customers.last_name
AS
last_name,
customers.email
AS
email,
customers.country
AS
country,
COUNT(orders.id)
AS
order_count,
SUM(orders.amount)
AS
total_spent
FROM
dataform-samples.dataform_sample.crm_customers
AS
customers
LEFT
JOIN
${
ref
(
'order_stats'
)
}
orders
ON
customers.id
=
orders.customer_id
WHERE
customers.id
IS
NOT
NULL
AND
customers.first_name <>
'Internal
account'
AND
country
IN
('UK',
'US',
'FR',
'ES',
'NG',
'JP')
GROUP
BY
1,
2,
3,
4,
5
Add manual table dependencies
To add table dependencies that are not referenced in the SELECT
statement,
but need to be executed before the current table, follow these steps:
- In your development workspace, in the Filespane, expand
the
definitions/
directory. - Select the table definition SQLX file that you want to edit.
-
In the
config
block of the table, enter the following code snippet:dependencies : [ " DEPENDENCY_TABLE " , ]
Replace DEPENDENCY_TABLE with the filename of the table you want to add as a dependency. You can enter multiple filenames.
-
Optional: Click Format.
The following code sample shows two tables added as manual table dependencies
to the config
block of a table definition file:
config { dependencies: [ "some_table", "some_other_table" ] }
What's next
- To learn how configure incremental tables, see Configure incremental tables .
- To learn how to configure table partitions and clusters, see Create table partitions and clusters .
- To learn how to add documentation to tables, see Add table documentation .
- To learn how to test table data with assertions, see Test tables with assertions .
- To learn how to define tables with JavaScript, see Create SQL workflows with JavaScript .
- To learn how to configure additional table settings, see Configure additional table settings .