Usage
view: my_view { derived_table: { distribution : "customer_id" ... } }
distribution
- or -
distribution
distribution
is supported only for Redshift and Aster databases.Definition
The distribution
parameter lets you specify the column from a persistent derived table
(PDT) or an aggregate table
on which to apply a distribution key. The distribution key spreads the data around a cluster. When two tables are joined by the column specified in the distribution
parameter, the database can find the joined data on the same node to minimize internode I/O. The distribution
parameter works only with Redshift and Aster databases. For other SQL dialects (such as MySQL and Postgres), use indexes
instead.
The
distributionparameter works only with tables that are persistent , such as PDTs and aggregate tables.distributionis not supported for derived tables without a persistence strategy.In addition, the
distributionparameter is not supported for derived tables that are defined usingcreate_processorsql_create.
Generally speaking, a distribution key should be applied to the column that will be acting as a foreign key (the column you use to join your table to others). Refer to the documentation for your dialect for best practices.
Examples
Create a customer_order_facts
persistent derived table with a distribution key on customer_id
:
view: customer_order_facts {
derived_table: {
explore_source: order {
column: customer_id { field: order.customer_id }
column: lifetime_orders { field: order.lifetime_orders }
}
datagroup_trigger: order_datagroup
distribution: "customer_id"
}
}
Create a customer_order_facts
derived table based on a SQL query with a distribution key on customer_id
:
view: customer_order_facts {
derived_table: {
sql:
SELECT
customer_id,
COUNT(*) AS lifetime_orders
FROM
order ;;
persist_for: "24 hours"
distribution: "customer_id"
}
}

