Usage
view: my_view { derived_table: { distribution : "customer_id" ... } }
distribution
- or -
distribution
distribution
is supported only for Redshift and Aster databases.Definition
distribution
lets you specify the column from a persistent derived table
(PDT) or an aggregate table
on which to apply a distribution key to spread the data around a cluster. When two tables are joined by the column specified in the distribution
parameter, the database can find the joined data on the same node to minimize internode I/O. Currently, distribution
works only with Redshift and Aster databases. For other SQL dialects (such as MySQL and Postgres), use indexes
instead.
The
distributionparameter works only with tables that are persistent , such as PDTs and aggregate tables.distributionis not supported for derived tables without a persistence strategy.In addition, the
distributionparameter is not supported for derived tables that are defined usingcreate_processorsql_create.
Generally speaking, a distribution key should be applied to the column that will be acting as a foreign key (the column you use to join your table to others). Refer to the documentation for your dialect for best practices.
Examples
Create a customer_order_facts
persistent derived table with a distribution key on customer_id
:
view: customer_order_facts {
derived_table: {
explore_source: order {
column: customer_id { field: order.customer_id }
column: lifetime_orders { field: order.lifetime_orders }
}
datagroup_trigger: order_datagroup
distribution: "customer_id"
}
}
Create a customer_order_facts
derived table based on a SQL query with a distribution key on customer_id
:
view: customer_order_facts {
derived_table: {
sql:
SELECT
customer_id,
COUNT(*) AS lifetime_orders
FROM
order ;;
persist_for: "24 hours"
distribution: "customer_id"
}
}

