Create and use clustered tables
This document describes how to create and use clustered tables in BigQuery. For an overview of clustered table support in BigQuery, see Introduction to clustered tables .
Create clustered tables
You can create a clustered table by using the following methods:
-
Create a table from a query result :
- Run a DDL
CREATE TABLE AS SELECT
statement. - Run a query that creates a clustered destination table .
- Run a DDL
-
Use a DDL
CREATE TABLE
statement with aCLUSTER BY
clause containing aclustering_column_list
. -
Run the bq command-line tool
bq mk
command. -
Make calls to the
tables.insert
API method. -
Load data into BigQuery.
-
Use the client libraries.
Table naming
When you create a table in BigQuery, the table name must be unique per dataset. The table name can:
- Contain characters with a total of up to 1,024 UTF-8 bytes.
- Contain Unicode characters in category L (letter), M (mark), N (number), Pc (connector, including underscore), Pd (dash), Zs (space). For more information, see General Category .
The following are all examples of valid table names: table 01
, ग्राहक
, 00_お客様
, étudiant-01
.
Caveats:
- Table names are case-sensitive by default.
mytable
andMyTable
can coexist in the same dataset, unless they are part of a dataset with case-sensitivity turned off . - Some table names and table name prefixes are reserved. If you receive an error saying that your table name or prefix is reserved, then select a different name and try again.
-
If you include multiple dot operators (
.
) in a sequence, the duplicate operators are implicitly stripped.For example, this:
project_name....dataset_name..table_name
Becomes this:
project_name.dataset_name.table_name
Required permissions
To create a table, you need the following IAM permissions:
-
bigquery.tables.create
-
bigquery.tables.updateData
-
bigquery.jobs.create
Additionally, you might require the bigquery.tables.getData
permission to
access the data that you write to the table.
Each of the following predefined IAM roles includes the permissions that you need in order to create a table:
-
roles/bigquery.dataEditor
-
roles/bigquery.dataOwner
-
roles/bigquery.admin
(includes thebigquery.jobs.create
permission) -
roles/bigquery.user
(includes thebigquery.jobs.create
permission) -
roles/bigquery.jobUser
(includes thebigquery.jobs.create
permission)
Additionally, if you have the bigquery.datasets.create
permission, you can
create and update tables in the datasets that you create.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions .
Create an empty clustered table with a schema definition
You specify clustering columns when you create a table in BigQuery. After the table is created, you can modify the clustering columns; see Modifying clustering specification for details.
Clustering columns must be top-level, non-repeated columns, and they must be one of the following simple data types:
-
DATE
-
BOOLEAN
-
GEOGRAPHY
-
INTEGER
-
NUMERIC
-
BIGNUMERIC
-
STRING
-
TIMESTAMP
-
RANGE
You can specify up to four clustering columns. When you specify multiple columns, the order of the columns determines how the data is sorted. For example, if the table is clustered by columns a, b and c, the data is sorted in the same order: first by column a, then by column b, and then by column c. As a best practice, place the most frequently filtered or aggregated column first.
The order of your clustering columns also affects query performance and pricing. For more information about query best practices for clustered tables, see Querying clustered tables .
To create an empty clustered table with a schema definition:
Console
-
In the Google Cloud console, go to the BigQuery page.
- In the Explorer pane, expand your project, and then select a dataset.
- In the Dataset info section, click Create table .
- In the Create table panel, specify the following details:
- In the Source section, select Empty table in the Create table from list.
- In the Destination
section, specify the following details:
- For Dataset , select the dataset in which you want to create the table.
- In the Table field, enter the name of the table that you want to create.
- Verify that the Table type field is set to Native table .
- In the Schema
section, enter the schema
definition.
You can enter schema information manually by using one of
the following methods:
- Option 1: Click Edit as text
and paste the schema in the form of a
JSON array. When you use a JSON array, you generate the schema using the
same process as creating a JSON schema file
.
You can view the schema of an existing table in JSON format by entering the following
command:
bq show --format = prettyjson dataset.table
- Option 2: Click Add field and enter the table schema. Specify each field's Name , Type , and Mode .
- Option 1: Click Edit as text
and paste the schema in the form of a
JSON array. When you use a JSON array, you generate the schema using the
same process as creating a JSON schema file
.
You can view the schema of an existing table in JSON format by entering the following
command:
- For Clustering order , enter between one and four comma-separated column names.
- Optional: In the Advanced options section, if you want to use a customer-managed encryption key, then select the Use a customer-managed encryption key (CMEK) option. By default, BigQuery encrypts customer content stored at rest by using a Google-owned and Google-managed key.
- Click Create table .
SQL
Use the CREATE TABLE
DDL statement
command with the CLUSTER BY
option. The following example creates a
clustered table named myclusteredtable
in mydataset
:
-
In the Google Cloud console, go to the BigQuerypage.
-
In the query editor, enter the following statement:
CREATE TABLE mydataset . myclusteredtable ( customer_id STRING , transaction_amount NUMERIC ) CLUSTER BY customer_id OPTIONS ( description = ' a table clustered by customer_id ' );
-
Click Run.
For more information about how to run queries, see Run an interactive query .
bq
Use the bq mk
command
with the following flags:
-
--table
(or the-t
shortcut). -
--schema
. You can supply the table's schema definition inline or use a JSON schema file. -
--clustering_fields
. You can specify up to four clustering columns.
Optional parameters include --expiration
, --description
, --time_partitioning_type
, --time_partitioning_field
, --time_partitioning_expiration
, --destination_kms_key
, and --label
.
If you are creating a table in a project other than your default project,
add the project ID to the dataset in the following format: project_id:dataset
.
--destination_kms_key
is not demonstrated here. For information about
using --destination_kms_key
, see customer-managed encryption keys
.
Enter the following command to create an empty clustered table with a schema definition:
bq mk \ --table \ --expiration INTEGER1 \ --schema SCHEMA \ --clustering_fields CLUSTER_COLUMNS \ --description " DESCRIPTION " \ --label KEY:VALUE,KEY:VALUE \ PROJECT_ID : DATASET . TABLE
Replace the following:
-
INTEGER1
: the default lifetime, in seconds, for the table. The minimum value is 3,600 seconds (one hour). The expiration time evaluates to the current UTC time plus the integer value. If you set the table's expiration time when you create a table, the dataset's default table expiration setting is ignored. Setting this value deletes the table after the specified time. -
SCHEMA
: an inline schema definition in the formatCOLUMN:DATA_TYPE,COLUMN:DATA_TYPE
or the path to the JSON schema file on your local machine. -
CLUSTER_COLUMNS
: a comma-separated list of up to four clustering columns. The list cannot contain any spaces. -
DESCRIPTION
: a description of the table, in quotes. -
KEY:VALUE
: the key-value pair that represents a label . You can enter multiple labels using a comma-separated list. -
PROJECT_ID
: your project ID. -
DATASET
: a dataset in your project. -
TABLE
: the name of the table you're creating.
When you specify the schema on the command line, you cannot include a RECORD
( STRUCT
)
type, you cannot include a column description, and you
cannot specify the column's mode. All modes default to NULLABLE
. To
include descriptions, modes, and RECORD
types, supply a JSON schema
file
instead.
Examples:
Enter the following command to create a clustered table
named myclusteredtable
in mydataset
in your default project. The table's
expiration is set to 2,592,000 (1 30-day month), the description is set to This is my clustered table
, and the label is set to organization:development
. The command uses the -t
shortcut instead of --table
.
The schema is specified inline as: timestamp:timestamp,customer_id:string,transaction_amount:float
. The
specified clustering field customer_id
is used to cluster the table.
bq mk \
-t \
--expiration 2592000 \
--schema 'timestamp:timestamp,customer_id:string,transaction_amount:float' \
--clustering_fields customer_id \
--description "This is my clustered table" \
--label org:dev \
mydataset.myclusteredtable
Enter the following command to create a clustered table named myclusteredtable
in myotherproject
, not your default project. The
description is set to This is my clustered table
, and the label is set
to organization:development
. The command uses the -t
shortcut instead of --table
. This command does not specify a table expiration. If the dataset
has a default table expiration, it is applied. If the dataset has no default
table expiration, the table never expires.
The schema is specified in a local JSON file: /tmp/myschema.json
. The customer_id
field is used to cluster the table.
bq mk \
-t \
--expiration 2592000 \
--schema /tmp/myschema.json \
--clustering_fields=customer_id \
--description "This is my clustered table" \
--label org:dev \
myotherproject:mydataset.myclusteredtable
After the table is created, you can update the table's description and labels .
Terraform
Use the google_bigquery_table
resource.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .
The following example creates a table named mytable
that is clustered
on the ID
and Created
columns:
To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.
Prepare Cloud Shell
- Launch Cloud Shell .
-
Set the default Google Cloud project where you want to apply your Terraform configurations.
You only need to run this command once per project, and you can run it in any directory.
export GOOGLE_CLOUD_PROJECT= PROJECT_ID
Environment variables are overridden if you set explicit values in the Terraform configuration file.
Prepare the directory
Each Terraform configuration file must have its own directory (also called a root module ).
- In Cloud Shell
, create a directory and a new
file within that directory. The filename must have the
.tf
extension—for examplemain.tf
. In this tutorial, the file is referred to asmain.tf
.mkdir DIRECTORY && cd DIRECTORY && touch main.tf
-
If you are following a tutorial, you can copy the sample code in each section or step.
Copy the sample code into the newly created
main.tf
.Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.
- Review and modify the sample parameters to apply to your environment.
- Save your changes.
- Initialize Terraform. You only need to do this once per directory.
terraform init
Optionally, to use the latest Google provider version, include the
-upgrade
option:terraform init -upgrade
Apply the changes
- Review the configuration and verify that the resources that Terraform is going to create or
update match your expectations:
terraform plan
Make corrections to the configuration as necessary.
- Apply the Terraform configuration by running the following command and entering
yes
at the prompt:terraform apply
Wait until Terraform displays the "Apply complete!" message.
- Open your Google Cloud project to view the results. In the Google Cloud console, navigate to your resources in the UI to make sure that Terraform has created or updated them.
API
Call the tables.insert
method with a defined table resource
that specifies the clustering.fields
property and the schema
property.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Python API reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Go API reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .