distribution

Usage

view: my_view { 
  derived_table: { 
 distribution 
 : 
 "customer_id" 
 ... 
 } 
 }

Hierarchy

View File

view

derived_table

distribution

- or -

Model File

explore

aggregate_table

materialization

distribution

Default Value

None

Accepts

The name of a column in a derived table or aggregate table

Special Rules

distribution is supported only for Redshift and Aster databases.

Definition

distribution lets you specify the column from a persistent derived table (PDT) or an aggregate table on which to apply a distribution key to spread the data around a cluster. When two tables are joined by the column specified in the distribution parameter, the database can find the joined data on the same node to minimize internode I/O. Currently, distribution works only with Redshift and Aster databases. For other SQL dialects (such as MySQL and Postgres), use indexes instead.

The distribution parameter works only with tables that are persistent , such as PDTs and aggregate tables. distribution is not supported for derived tables without a persistence strategy.

In addition, the distribution parameter is not supported for derived tables that are defined using create_process or sql_create .

Generally speaking, a distribution key should be applied to the column that will be acting as a foreign key (the column you use to join your table to others). Refer to the documentation for your dialect for best practices.

Examples

Create a customer_order_facts persistent derived table with a distribution key on customer_id :

 view: customer_order_facts {
  derived_table: {
    explore_source: order {
      column: customer_id { field: order.customer_id }
      column: lifetime_orders { field: order.lifetime_orders }
    }
    datagroup_trigger: order_datagroup
    distribution: "customer_id"
  }
}

Create a customer_order_facts derived table based on a SQL query with a distribution key on customer_id :

 view: customer_order_facts {
  derived_table: {
    sql:
      SELECT
        customer_id,
        COUNT(*) AS lifetime_orders
      FROM
        order ;;
    persist_for: "24 hours"
    distribution: "customer_id"
  }
}