Use primary and foreign keys
Primary keys and foreign keys are table constraints that can help with query optimization. This document explains how to create, view, and manage constraints, and use them to optimize your queries.
BigQuery supports the following key constraints:
- Primary key: A primary key for a
table is a combination of one or more columns that is unique for each row and
not
NULL
. - Foreign key: A foreign key for a table is a combination of one or more
columns that is present in the primary key column of a referenced table, or
is
NULL
.
Primary and foreign keys are typically used to ensure data integrity and enable query optimization. BigQuery doesn't enforce primary and foreign key constraints. When you declare constraints on your tables, you must ensure that your data conforms to them. BigQuery can use table constraints to optimize your queries.
Manage constraints
Primary and foreign key relationships can be created and managed through the following DDL statements:
- Create primary and foreign key constraints when you create a table by using
the
CREATE TABLE
statement . - Add a primary key constraint to an existing table by using the
ALTER TABLE ADD PRIMARY KEY
statement . - Add a foreign key constraint to an existing table by using the
ALTER TABLE ADD FOREIGN KEY
statement . - Drop a primary key constraint from a table by using the
ALTER TABLE DROP PRIMARY KEY
statement . - Drop a foreign key constraint from a table by using the
ALTER TABLE DROP CONSTRAINT
statement .
You can also manage table constraints through the BigQuery API
by updating the TableConstraints
object
.
View constraints
The following views give you information about your table constraints:
- The
INFORMATION_SCHEMA.TABLE_CONSTRAINTS
view contains information about all of the primary and foreign key constraints on tables within a dataset. - The
INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE
view contains information about each table's primary key columns and columns referenced by foreign keys from other tables within a dataset. - The
INFORMATION_SCHEMA.KEY_COLUMN_USAGE
view contains information about each table's columns that are constrained as primary or foreign keys.
Optimize queries
When you create and enforce primary and foreign keys on your tables, BigQuery can use that information to eliminate or optimize certain query joins. While it's possible to mimic these optimizations by rewriting your queries, such rewrites aren't always practical.
In a production environment, you might create views that join many fact and dimension tables. Developers can query the views instead of querying the underlying tables and manually rewriting the joins each time. If you define the proper constraints, join optimizations happen automatically for any queries they apply to.
The examples in the following sections reference the store_sales
and customer
tables with constraints:
CREATE
TABLE
mydataset
.
customer
(
customer_name
STRING
PRIMARY
KEY
NOT
ENFORCED
);
CREATE
TABLE
mydataset
.
store_sales
(
item
STRING
PRIMARY
KEY
NOT
ENFORCED
,
sales_customer
STRING
REFERENCES
mydataset
.
customer
(
customer_name
)
NOT
ENFORCED
,
category
STRING
);
Eliminate inner joins
Consider the following query that contains an INNER JOIN
:
SELECT
ss
.
*
FROM
mydataset
.
store_sales
AS
ss
INNER
JOIN
mydataset
.
customer
AS
c
ON
ss
.
sales_customer
=
c
.
customer_name
;
The customer_name
column is a primary key on the customer
table, so
each row from the store_sales
table has either a single match, or no match
if sales_customer
is NULL
. Since the query only selects columns from the store_sales
table, the query optimizer can eliminate the join and rewrite the
query as the following:
SELECT
*
FROM
mydataset
.
store_sales
WHERE
sales_customer
IS
NOT
NULL
;
Eliminate outer joins
To remove a LEFT OUTER JOIN
, the join keys on the right side must be unique
and only columns from the left side are selected. Consider the following query:
SELECT
ss
.
*
FROM
mydataset
.
store_sales
ss
LEFT
OUTER
JOIN
mydataset
.
customer
c
ON
ss
.
category
=
c
.
customer_name
;
In this example, there is no relationship between category
and customer_name
. The selected columns only come from
the store_sales
table and the join key customer_name
is a primary key on the customer
table, so each value is
unique. This means that there is exactly one (possibly NULL
) match in the customer
table for each row in the store_sales
table and the LEFT OUTER JOIN
can be eliminated:
SELECT
ss
.
*
FROM
mydataset
.
store_sales
;
Reorder joins
When BigQuery can't eliminate a join, it can use table constraints to get information about join cardinalities and optimize the order in which to perform joins.
Limitations
Primary keys and foreign keys are subject to the following limitations:
- Key constraints are unenforced in BigQuery. You are responsible for maintaining the constraints at all times. Queries over tables with violated constraints might return incorrect results.
- Primary keys can't exceed 16 columns.
- Foreign keys must have values that are present in the referenced table
column. These values can be
NULL
. - Primary keys and foreign keys must be of one of the following types:
BIGNUMERIC
,BOOLEAN
,DATE
,DATETIME
,INT64
,NUMERIC
,STRING
, orTIMESTAMP
. - Primary keys and foreign keys can only be set on top-level columns.
- Primary keys can't be named.
- Tables with primary key constraints can't be renamed.
- A table can have up to 64 foreign keys.
- A foreign key can't refer to a column in the same table.
- Fields that are part of primary key constraints or foreign key constraints can't be renamed, or have their type changed.
- If you copy
, clone
, restore
,
or snapshot
a table without the
-a
or--append_table
option, the source table constraints are copied and overwritten to the destination table. If you use the-a
or--append_table
option, only the source table records are added to the destination table without the table constraints.
What's next
- Learn more about how to Optimize query computation .