Use primary and foreign keys

Primary keys and foreign keys are table constraints that can help with query optimization. This document explains how to create, view, and manage constraints, and use them to optimize your queries.

BigQuery supports the following key constraints:

  • Primary key: A primary key for a table is a combination of one or more columns that is unique for each row and not NULL .
  • Foreign key: A foreign key for a table is a combination of one or more columns that is present in the primary key column of a referenced table, or is NULL .

Primary and foreign keys are typically used to ensure data integrity and enable query optimization. BigQuery doesn't enforce primary and foreign key constraints. When you declare constraints on your tables, you must ensure that your data conforms to them. BigQuery can use table constraints to optimize your queries.

Manage constraints

Primary and foreign key relationships can be created and managed through the following DDL statements:

You can also manage table constraints through the BigQuery API by updating the TableConstraints object .

View constraints

The following views give you information about your table constraints:

Optimize queries

When you create and enforce primary and foreign keys on your tables, BigQuery can use that information to eliminate or optimize certain query joins. While it's possible to mimic these optimizations by rewriting your queries, such rewrites aren't always practical.

In a production environment, you might create views that join many fact and dimension tables. Developers can query the views instead of querying the underlying tables and manually rewriting the joins each time. If you define the proper constraints, join optimizations happen automatically for any queries they apply to.

The examples in the following sections reference the store_sales and customer tables with constraints:

  CREATE 
  
 TABLE 
  
 mydataset 
 . 
 customer 
  
 ( 
 customer_name 
  
 STRING 
  
 PRIMARY 
  
 KEY 
  
 NOT 
  
 ENFORCED 
 ); 
 CREATE 
  
 TABLE 
  
 mydataset 
 . 
 store_sales 
  
 ( 
  
 item 
  
 STRING 
  
 PRIMARY 
  
 KEY 
  
 NOT 
  
 ENFORCED 
 , 
  
 sales_customer 
  
 STRING 
  
 REFERENCES 
  
 mydataset 
 . 
 customer 
 ( 
 customer_name 
 ) 
  
 NOT 
  
 ENFORCED 
 , 
  
 category 
  
 STRING 
 ); 
 

Eliminate inner joins

Consider the following query that contains an INNER JOIN :

  SELECT 
  
 ss 
 . 
 * 
 FROM 
  
 mydataset 
 . 
 store_sales 
  
 AS 
  
 ss 
  
 INNER 
  
 JOIN 
  
 mydataset 
 . 
 customer 
  
 AS 
  
 c 
  
 ON 
  
 ss 
 . 
 sales_customer 
  
 = 
  
 c 
 . 
 customer_name 
 ; 
 

The customer_name column is a primary key on the customer table, so each row from the store_sales table has either a single match, or no match if sales_customer is NULL . Since the query only selects columns from the store_sales table, the query optimizer can eliminate the join and rewrite the query as the following:

  SELECT 
  
 * 
 FROM 
  
 mydataset 
 . 
 store_sales 
 WHERE 
  
 sales_customer 
  
 IS 
  
 NOT 
  
 NULL 
 ; 
 

Eliminate outer joins

To remove a LEFT OUTER JOIN , the join keys on the right side must be unique and only columns from the left side are selected. Consider the following query:

  SELECT 
  
 ss 
 . 
 * 
 FROM 
  
 mydataset 
 . 
 store_sales 
  
 ss 
  
 LEFT 
  
 OUTER 
  
 JOIN 
  
 mydataset 
 . 
 customer 
  
 c 
  
 ON 
  
 ss 
 . 
 category 
  
 = 
  
 c 
 . 
 customer_name 
 ; 
 

In this example, there is no relationship between category and customer_name . The selected columns only come from the store_sales table and the join key customer_name is a primary key on the customer table, so each value is unique. This means that there is exactly one (possibly NULL ) match in the customer table for each row in the store_sales table and the LEFT OUTER JOIN can be eliminated:

  SELECT 
  
 ss 
 . 
 * 
 FROM 
  
 mydataset 
 . 
 store_sales 
 ; 
 

Reorder joins

When BigQuery can't eliminate a join, it can use table constraints to get information about join cardinalities and optimize the order in which to perform joins.

Limitations

Primary keys and foreign keys are subject to the following limitations:

  • Key constraints are unenforced in BigQuery. You are responsible for maintaining the constraints at all times. Queries over tables with violated constraints might return incorrect results.
  • Primary keys can't exceed 16 columns.
  • Foreign keys must have values that are present in the referenced table column. These values can be NULL .
  • Primary keys and foreign keys must be of one of the following types: BIGNUMERIC , BOOLEAN , DATE , DATETIME , INT64 , NUMERIC , STRING , or TIMESTAMP .
  • Primary keys and foreign keys can only be set on top-level columns.
  • Primary keys can't be named.
  • Tables with primary key constraints can't be renamed.
  • A table can have up to 64 foreign keys.
  • A foreign key can't refer to a column in the same table.
  • Fields that are part of primary key constraints or foreign key constraints can't be renamed, or have their type changed.
  • If you copy , clone , restore , or snapshot a table without the -a or --append_table option, the source table constraints are copied and overwritten to the destination table. If you use the -a or --append_table option, only the source table records are added to the destination table without the table constraints.

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: