To create a job that runs a DDL statement, you must have thebigquery.jobs.createpermission for the project where you are running the job.
Each DDL statement also requires specific permissions on the affected resources,
which are documented under each statement.
IAM roles
The predefined IAM rolesbigquery.user,bigquery.jobUser, andbigquery.admininclude the requiredbigquery.jobs.createpermission.
You can run DDL statements by using the Google Cloud console, by using the
bq command-line tool, by calling thejobs.queryREST API, or
programmatically using theBigQuery API client libraries.
Console
Go to the BigQuery page in the Google Cloud console.
Enter the DDL statement into theQuery editortext area. For example:
CREATETABLEmydataset.newtable(xINT64)
ClickRun.
bq
Enter thebq querycommand
and supply the DDL statement as the query parameter. Set theuse_legacy_sqlflag tofalse.
bqquery--use_legacy_sql=false \'CREATE TABLE mydataset.newtable ( x INT64 )'
API
Call thejobs.querymethod
and supply the DDL statement in the request body'squeryproperty.
DDL functionality extends the information returned by aJobs resource.statistics.query.statementTypeincludes the following additional values for DDL
support:
CREATE_TABLE
CREATE_TABLE_AS_SELECT
DROP_TABLE
CREATE_VIEW
DROP_VIEW
statistics.queryhas 2 additional fields:
ddlOperationPerformed: The DDL operation performed, possibly dependent on
the existence of the DDL target. Current values include:
CREATE: The query created the DDL target.
SKIP: No-op. Examples —CREATE TABLE IF NOT EXISTSwas
submitted, and the table exists. OrDROP TABLE IF EXISTSwas submitted, and the
table does not exist.
REPLACE: The query replaced the DDL target. Example —CREATE OR REPLACE TABLEwas submitted, and the table already exists.
DROP: The query deleted the DDL target.
ddlTargetTable: When you submit aCREATE TABLE/VIEWstatement or aDROP TABLE/VIEWstatement, the target table is returned as an object with 3 fields:
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.QueryJobConfiguration;// Sample to create a view using DDLpublicclassDDLCreateView{publicstaticvoidrunDDLCreateView(){// TODO(developer): Replace these variables before running the sample.StringprojectId="MY_PROJECT_ID";StringdatasetId="MY_DATASET_ID";StringtableId="MY_VIEW_ID";Stringddl="CREATE VIEW "+"`"+projectId+"."+datasetId+"."+tableId+"`"+" OPTIONS("+" expiration_timestamp=TIMESTAMP_ADD("+" CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),"+" friendly_name=\"new_view\","+" description=\"a view that expires in 2 days\","+" labels=[(\"org_unit\", \"development\")]"+" )"+" AS SELECT name, state, year, number"+" FROM `bigquery-public-data.usa_names.usa_1910_current`"+" WHERE state LIKE 'W%'`";ddlCreateView(ddl);}publicstaticvoidddlCreateView(Stringddl){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();QueryJobConfigurationconfig=QueryJobConfiguration.newBuilder(ddl).build();// create a view using query and it will wait to complete job.Jobjob=bigquery.create(JobInfo.of(config));job=job.waitFor();if(job.isDone()){System.out.println("View created successfully");}else{System.out.println("View was not created");}}catch(BigQueryException|InterruptedExceptione){System.out.println("View was not created. \n"+e.toString());}}}
// Import the Google Cloud client library and create a clientconst{BigQuery}=require('@google-cloud/bigquery');constbigquery=newBigQuery();asyncfunctionddlCreateView(){// Creates a view via a DDL query/*** TODO(developer): Uncomment the following lines before running the sample.*/// const projectId = "my_project"// const datasetId = "my_dataset"// const tableId = "my_new_view"constquery=`CREATE VIEW \`${projectId}.${datasetId}.${tableId}\`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),friendly_name="new_view",description="a view that expires in 2 days",labels=[("org_unit", "development")])AS SELECT name, state, year, numberFROM \`bigquery-public-data.usa_names.usa_1910_current\`WHERE state LIKE 'W%'`;// For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/queryconstoptions={query:query,};// Run the query as a jobconst[job]=awaitbigquery.createQueryJob(options);job.on('complete',metadata=>{console.log(`Created new view${tableId}via job${metadata.id}`);});}
# from google.cloud import bigquery# project = 'my-project'# dataset_id = 'my_dataset'# table_id = 'new_view'# client = bigquery.Client(project=project)sql="""CREATE VIEW `{}.{}.{}`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),friendly_name="new_view",description="a view that expires in 2 days",labels=[("org_unit", "development")])AS SELECT name, state, year, numberFROM `bigquery-public-data.usa_names.usa_1910_current`WHERE state LIKE 'W%'""".format(project,dataset_id,table_id)job=client.query(sql)# API request.job.result()# Waits for the query to finish.print('Created new view "{}.{}.{}".'.format(job.destination.project,job.destination.dataset_id,job.destination.table_id,))
On-demand query size calculation
If you use on-demand billing, BigQuery charges for data definition
language (DDL) queries based on the number of bytes processed by the query.
DDL statement
Bytes processed
CREATE TABLE
None.
CREATE TABLE ... AS SELECT ...
The sum of bytes processed for all the columns referenced from the tables scanned by the query.
IF NOT EXISTS: If any dataset exists with the same name, theCREATEstatement has no effect.
DEFAULT COLLATE collate_specification: When a new table is created in the
dataset, the table inherits a
defaultcollation specificationunless a collation specification is explicitly specified for a table or acolumn.
If you remove or change this collation specification later with theALTER SCHEMAstatement, this will not change existing
collation specifications in this dataset. If you want to update an existing
collation specification in a dataset, you must alter the column that contains
the specification.
project_name: The name of the project where you are creating the dataset.
Defaults to the project that runs this DDL statement.
The option list specifies options for the dataset. Specify the options in the
following format:NAME=VALUE, ...
The following options are supported:
NAME
VALUE
Details
default_kms_key_name
STRING
Specifies the default Cloud KMS key for encrypting table data
in this dataset. You can override this value when you create a table.
default_partition_expiration_days
FLOAT64
Specifies the default expiration time, in days, for table partitions in
this dataset. You can override this value when you create a table.
default_rounding_mode
STRING
Example:default_rounding_mode = "ROUND_HALF_EVEN"
This specifies thedefaultRoundingModethat is used for new tables created in this dataset. It does not impact
existing tables.
The following values are supported:
"ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are
rounded away from zero. For example, 2.25 is rounded to 2.3, and
-2.25 is rounded to -2.3.
"ROUND_HALF_EVEN": Halfway cases are rounded towards
the nearest even digit. For example, 2.25 is rounded to 2.2 and
-2.25 is rounded to -2.2.
default_table_expiration_days
FLOAT64
Specifies the default expiration time, in days, for tables in this
dataset. You can override this value when you create a table.
description
STRING
The description of the dataset.
failover_reservation
STRING
Associates the dataset to a reservation in the case of a failover scenario.
friendly_name
STRING
A descriptive name for the dataset.
is_case_insensitive
BOOL
TRUEif the dataset and its table names are
case-insensitive, otherwiseFALSE. By default, this
isFALSE, which means the dataset and its table names are
case-sensitive.
Datasets:mydatasetandMyDatasetcan
coexist in the same project, unless one of them has case-sensitivity
turned off.
Tables:mytableandMyTablecan coexist in
the same dataset if case-sensitivity for the dataset is turned on.
is_primary
BOOLEAN
Declares if the dataset is the primary replica.
labels
<ARRAY<STRUCT<STRING, STRING>>>
An array of labels for the dataset, expressed as key-value pairs.
location
STRING
The location in which to create the dataset. If you don't specify this
option, the dataset is created in the location where the query runs. If
you specify this option and also explicitly set the location for the query
job, the two values must match; otherwise the query fails.
max_time_travel_hours
SMALLINT
Specifies the duration in hours of thetime travel windowfor the dataset. Themax_time_travel_hoursvalue must
be an integer expressed in multiples of 24 (48, 72, 96, 120, 144, 168)
between 48 (2 days) and 168 (7 days). 168 hours is the default
if this option isn't specified.
Alters thestorage billing modelfor the dataset. Set thestorage_billing_modelvalue toPHYSICALto use physical bytes when calculating storage
charges, or toLOGICALto use logical bytes.LOGICALis the default.
Thestorage_billing_modeloption is only available for
datasets that have been updated after December 1, 2022. For datasets that
were last updated before that date, the storage billing model isLOGICAL.
When you change a dataset's billing model, it takes 24 hours for the
change to take effect.
Once you change a dataset's storage billing model, you must wait 14 days
before you can change the storage billing model again.
If you remove or change this collation specification later with theALTER TABLEstatement, this will not change existing
collation specifications in this table. If you want to update an existing
collation specification in a table, you must alter the column that contains
the specification.
If the table is part of a dataset, the default collation specification for
this table overrides the default collation specification for the dataset.
clustering_column_list: A comma-separated list
of column references that determine how to cluster the table. You cannot have
collation on columns in this list.
connection_name: Specifies aconnection resourcethat has
credentials for accessing the external data. Specify the connection name
in the formPROJECT_ID.LOCATION.CONNECTION_ID. If the
project ID or location contains a dash, enclose the connection name in
backticks (`).
query_statement: The query from which the table should be created. For the
query syntax, seeSQL syntax reference.
If a collation specification is used on this table, collation passes
through this query statement.
primary_key: An expression that defines a primary key table constraint.
foreign_key: An expression that defines a foreign key table constraint.
Details
CREATE TABLEstatements must comply with the following rules:
Only oneCREATEstatement is allowed.
Either the column list, theAS query_statementclause, or both must be
present.
When both the column list and theAS query_statementclause are present,
BigQuery ignores the names in theAS query_statementclause and matches the columns with the column list by position.
When theAS query_statementclause is present and the column list is
absent, BigQuery determines the column names and types
from theAS query_statementclause.
Column names must be specified either through the column list,
theAS query_statementclause or schema of the table in theLIKEclause.
Duplicate column names are not allowed.
When both theLIKEand theAS query_statementclause are present, the
column list in the query statement must match the columns of the table
referenced by theLIKEclause.
It is not possible to create aningestion-time partitioned tablefrom the result of a query. Instead, use aCREATE TABLEDDL statement to
create the table, and then use anINSERTDML statementto insert data into it.
It is not possible to use theOR REPLACEmodifier to replace a table with
a different kind of partitioning. Instead,DROPthe table, and then use aCREATE TABLE ... AS SELECT ...statement to recreate it.
This statement supports the following variants, which have the same limitations:
CREATE TABLE LIKE: Create a table with the
same schema as an existing table.
CREATE TABLE COPY: Create a table by copying
schema and data from an existing table.
column
(column_name column_schema[, ...])contains the table's
schema information in a comma-separated list.
column_nameis the name of the column.
A column name:
Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
Must start with a letter or underscore
Can be up to 300 characters
column_schema: Similar to adata type, but supports an
optionalNOT NULLconstraint for types other thanARRAY.column_schemaalso supports options on top-level columns andSTRUCTfields.
column_schemacan be used only in the column definition list ofCREATE TABLEstatements. It cannot be used as a type in expressions.
Ifsimple_typeis aSTRING, it supports an additional clause forcollation,
which defines how a resultingSTRINGcan be compared and sorted.
The syntax looks like this:
STRING COLLATE collate_specification
If you haveDEFAULT COLLATE collate_specificationassigned to the table,
the collation specification for a column overrides the specification for the
table.
default_expression: Thedefault valueassigned to the column.
field_list: Represents the fields in a struct.
field_name: The name of the struct field. Struct field names have the
same restrictions as column names.
NOT NULL: When theNOT NULLconstraint is present for a column or field,
the column or field is created withREQUIREDmode. Conversely, when theNOT NULLconstraint is absent, the column or field is created withNULLABLEmode.
Columns and fields ofARRAYtype do not support theNOT NULLmodifier. For
example, acolumn_schemaofARRAY<INT64> NOT NULLis invalid, sinceARRAYcolumns haveREPEATEDmode and can be empty but cannot beNULL.
An array element in a table can never beNULL, regardless of whether theNOT NULLconstraint is specified. For example,ARRAY<INT64>is equivalent
toARRAY<INT64 NOT NULL>.
TheNOT NULLattribute of a table'scolumn_schemadoes not propagate
through queries over the table. If tableTcontains a column declared asx INT64 NOT NULL, for example,CREATE TABLE dataset.newtable AS SELECT x FROM Tcreates a table nameddataset.newtablein whichxisNULLABLE.
partition_expression
PARTITION BYis an optional clause that controlstable partitioning.partition_expressionis an expression that determines how to partition the table. The partition
expression can contain the following values:
_PARTITIONDATE. Partition by ingestion time with daily partitions. This
syntax cannot be used with theAS query_statementclause.
DATE(_PARTITIONTIME). Equivalent to_PARTITIONDATE. This syntax cannot be
used with theAS query_statementclause.
<date_column>. Partition by aDATEcolumn with daily partitions.
DATE({ <timestamp_column> | <datetime_column> }). Partition by aTIMESTAMPorDATETIMEcolumn with daily partitions.
DATETIME_TRUNC(<datetime_column>, { DAY | HOUR | MONTH | YEAR }). Partition
by aDATETIMEcolumn with the specified partitioning type.
TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR }).
Partition by aTIMESTAMPcolumn with the specified partitioning type.
TIMESTAMP_TRUNC(_PARTITIONTIME, { DAY | HOUR | MONTH | YEAR }). Partition
by ingestion time with the specified partitioning type. This syntax cannot be
used with theAS query_statementclause.
DATE_TRUNC(<date_column>, { MONTH | YEAR }). Partition by aDATEcolumn
with the specified partitioning type.
RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<start>, <end>[, <interval>])).
Partition by an integer column with the specified range, where:
startis the start of range partitioning, inclusive.
endis the end of range partitioning, exclusive.
intervalis the width of each range within the partition. Defaults to 1.
clustering_column_list
CLUSTER BYis an optional clause that controlstable clustering.clustering_column_listis a comma-separated list that determines how to
cluster the table. The clustering column list can contain a list of up to four
clustering columns.
table_option_list
The option list lets you set table options such as alabeland an expiration time. You can include multiple
options using a comma-separated list.
Specify a table option list in the following format:
NAME=VALUE, ...
NAMEandVALUEmust be one of the following combinations:
This property is equivalent to theexpirationTimetable resource property.
partition_expiration_days
FLOAT64
Example:partition_expiration_days=7
Sets the partition expiration in days. For more information, seeSet the partition expiration. By default, partitions don't expire.
This property is equivalent to thetimePartitioning.expirationMstable resource property but uses days instead of milliseconds. One day
is equivalent to 86400000 milliseconds, or 24 hours.
This property can only be set if the table is partitioned.
require_partition_filter
BOOL
Example:require_partition_filter=true
Specifies whether queries on this table must include a a predicate
filter that filters on the partitioning column. For more information,
seeSet partition filter requirements. The default value isfalse.
This property is equivalent to thefriendlyNametable resource property.
description
STRING
Example:description="a table that expires in 2025"
This property is equivalent to thedescriptiontable resource property.
labels
ARRAY<STRUCT<STRING, STRING>>
Example:labels=[("org_unit", "development")]
This property is equivalent to thelabelstable resource property.
default_rounding_mode
STRING
Example:default_rounding_mode = "ROUND_HALF_EVEN"
This specifies the defaultrounding modethat's used for values written to any newNUMERICorBIGNUMERICtype columns orSTRUCTfields
in the table. It does not impact existing fields in the table.
The following values are supported:
"ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are
rounded away from zero. For example, 2.5 is rounded to 3.0, and
-2.5 is rounded to -3.
"ROUND_HALF_EVEN": Halfway cases are rounded towards
the nearest even digit. For example, 2.5 is rounded to 2.0 and
-2.5 is rounded to -2.0.
This property is equivalent to thedefaultRoundingModetable resource property.
Set this property toTRUEin order to capturechange historyon the
table, which you can then view by using theCHANGESfunction. Enabling this table option has an impact on costs; for
more information seePricing and costs.
The default isFALSE.
max_staleness
INTERVAL
Example:max_staleness=INTERVAL "4:0:0" HOUR TO SECOND
The maximum interval behind the current time where it's
acceptable to read stale data. For example, withchange data capture,
when this option is set, the table copy operation is denied if data is
more stale than themax_stalenessvalue.
This specifies therounding modethat's used for values written to aNUMERICorBIGNUMERICtype column orSTRUCTfield.
The following values are supported:
"ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are
rounded away from zero. For example, 2.25 is rounded to 2.3, and
-2.25 is rounded to -2.3.
"ROUND_HALF_EVEN": Halfway cases are rounded towards
the nearest even digit. For example, 2.25 is rounded to 2.2 and
-2.25 is rounded to -2.2.
This property is equivalent to theroundingModetable resource property.
VALUEis a constant expression containing only literals, query parameters,
and scalar functions.
The constant expressioncannotcontain:
A reference to a table
Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
User-defined functions, aggregate functions, or analytic functions
The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
Setting theVALUEreplaces the existing value of that option for the column, if
there was one. Setting theVALUEtoNULLclears the column's value for that
option.
In addition, theOR REPLACEclause requiresbigquery.tables.updateandbigquery.tables.updateDatapermissions.
If theOPTIONSclause includes any expiration options, then thebigquery.tables.deletepermission is also required.
Examples
Creating a new table
The following example creates a partitioned table namednewtableinmydataset:
CREATETABLEmydataset.newtable(xINT64OPTIONS(description="An optional INTEGER field"),ySTRUCT<aARRAY<STRING>OPTIONS(description="A repeated STRING field"),bBOOL>)PARTITIONBY_PARTITIONDATEOPTIONS(expiration_timestamp=TIMESTAMP"2025-01-01 00:00:00 UTC",partition_expiration_days=1,description="a table that expires in 2025, with each partition living for 24 hours",labels=[("org_unit","development")])
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.newtable, your table qualifier might be`myproject.mydataset.newtable`.
If the table name exists in the dataset, the following error is returned:
Already Exists:project_id:dataset.table
The table uses the followingpartition_expressionto partition the table:PARTITION BY _PARTITIONDATE. This expression partitions the table using
the date in the_PARTITIONDATEpseudocolumn.
The table schema contains two columns:
x:An integer, with description "An optional INTEGER field"
y:A STRUCT containing two columns:
a:An array of strings, with description "A repeated STRING field"
b:A boolean
The table option list specifies the:
Table expiration time:January 1, 2025 at 00:00:00 UTC
Partition expiration time:1 day
Description:A table that expires in 2025
Label:org_unit = development
Creating a new table from an existing table
The following example creates a table namedtop_wordsinmydatasetfrom a
query:
CREATETABLEmydataset.top_wordsOPTIONS(description="Top ten words per Shakespeare corpus")ASSELECTcorpus,ARRAY_AGG(STRUCT(word,word_count)ORDERBYword_countDESCLIMIT10)AStop_wordsFROMbigquery-public-data.samples.shakespeareGROUPBYcorpus;
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.top_words, your table qualifier might be`myproject.mydataset.top_words`.
If the table name exists in the dataset, the following error is returned:
Already Exists:project_id:dataset.table
The table schema contains 2 columns:
corpus:Name of a Shakespeare corpus
top_words:AnARRAYofSTRUCTs containing 2 fields:word(aSTRING) andword_count(anINT64with the word count)
The table option list specifies the:
Description:Top ten words per Shakespeare corpus
Creating a table only if the table doesn't exist
The following example creates a table namednewtableinmydatasetonly if no
table namednewtableexists inmydataset. If the table name exists in the
dataset, no error is returned, and no action is taken.
CREATETABLEIFNOTEXISTSmydataset.newtable(xINT64,ySTRUCT<aARRAY<STRING>,bBOOL>)OPTIONS(expiration_timestamp=TIMESTAMP"2025-01-01 00:00:00 UTC",description="a table that expires in 2025",labels=[("org_unit","development")])
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.newtable, your table qualifier might be`myproject.mydataset.newtable`.
The table schema contains 2 columns:
x:An integer
y:A STRUCT containing a (an array of strings) and b (a
boolean)
The table option list specifies the:
Expiration time:January 1, 2025 at 00:00:00 UTC
Description:A table that expires in 2025
Label:org_unit = development
Creating or replacing a table
The following example creates a table namednewtableinmydataset, and ifnewtableexists inmydataset, it is overwritten with an empty table.
CREATEORREPLACETABLEmydataset.newtable(xINT64,ySTRUCT<aARRAY<STRING>,bBOOL>)OPTIONS(expiration_timestamp=TIMESTAMP"2025-01-01 00:00:00 UTC",description="a table that expires in 2025",labels=[("org_unit","development")])
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.newtable, your table qualifier might be`myproject.mydataset.newtable`.
The table schema contains 2 columns:
x:An integer
y:A STRUCT containing a (an array of strings) and b (a
boolean)
The table option list specifies the:
Expiration time:January 1, 2025 at 00:00:00 UTC
Description:A table that expires in 2025
Label:org_unit = development
Creating a table withREQUIREDcolumns
The following example creates a table namednewtableinmydataset. TheNOT
NULLmodifier in the column definition list of aCREATE TABLEstatement
specifies that a column or field is created inREQUIREDmode.
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.newtable, your table qualifier might be`myproject.mydataset.newtable`.
If the table name exists in the dataset, the following error is returned:
Already Exists:project_id:dataset.table
The table schema contains 3 columns:
x:AREQUIREDinteger
y:AREQUIREDSTRUCT containing a (an array of strings), b (aREQUIREDboolean), and c (aNULLABLEfloat)
z:ANULLABLEstring
Creating a table with collation support
The following examples create a table namednewtableinmydatasetwith
columnsa,b,c, and a struct with fieldsxandy.
AllSTRINGcolumn schemas in this table are collated with'und:ci':
The following example creates a table namednewtableinmydataset. The
parameters in parentheses specify that the column contains a parameterized data
type. SeeParameterized Data Typesfor more information about parameterized types.
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. Instead ofmydataset.newtable, your table qualifier should be`myproject.mydataset.newtable`.
If the table name exists in the dataset, the following error is returned:
Already Exists:project_id:dataset.table
The table schema contains 3 columns:
x:A parameterized string with a maximum length of 10
y:A STRUCT containing a (an array of parameterized bytes with a maximum length of 5), b (a
parameterized NUMERIC with a maximum precision of 15, maximum scale of 2,
and rounding mode set to 'ROUND_HALF_EVEN'), and c (a float)
z:A parameterized BIGNUMERIC with a maximum precision of 35 and maximum scale of 0
Creating a partitioned table
The following example creates apartitioned tablenamednewtableinmydatasetusing aDATEcolumn:
CREATETABLEmydataset.newtable(transaction_idINT64,transaction_dateDATE)PARTITIONBYtransaction_dateOPTIONS(partition_expiration_days=3,description="a table partitioned by transaction_date")
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.newtable, your table qualifier might be`myproject.mydataset.newtable`.
The table schema contains 2 columns:
transaction_id:An integer
transaction_date:A date
The table option list specifies the:
Partition expiration:Three days
Description:A table partitioned by transaction_date
Creating a partitioned table from the result of a query
The following example creates apartitioned tablenameddays_with_raininmydatasetusing aDATEcolumn:
CREATETABLEmydataset.days_with_rainPARTITIONBYdateOPTIONS(partition_expiration_days=365,description="weather stations with precipitation, partitioned by day")ASSELECTDATE(CAST(yearASINT64),CAST(moASINT64),CAST(daASINT64))ASdate,(SELECTANY_VALUE(name)FROM`bigquery-public-data.noaa_gsod.stations`ASstationsWHEREstations.usaf=stn)ASstation_name,-- Stations can have multiple namesprcpFROM`bigquery-public-data.noaa_gsod.gsod2017`ASweatherWHEREprcp!=99.9-- Filter unknown valuesANDprcp>0-- Filter stations/days with no precipitation
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.days_with_rain, your table qualifier might be`myproject.mydataset.days_with_rain`.
The table schema contains 2 columns:
date:TheDATEof data collection
station_name:The name of the weather station as aSTRING
prcp:The amount of precipitation in inches as aFLOAT64
The table option list specifies the:
Partition expiration:One year
Description:Weather stations with precipitation, partitioned by day
Creating a clustered table
Example 1
The following example creates aclustered tablenamedmyclusteredtableinmydataset. The table is apartitioned table,
partitioned by aTIMESTAMPcolumn and clustered by aSTRINGcolumn namedcustomer_id.
CREATETABLEmydataset.myclusteredtable(timestampTIMESTAMP,customer_idSTRING,transaction_amountNUMERIC)PARTITIONBYDATE(timestamp)CLUSTERBYcustomer_idOPTIONS(partition_expiration_days=3,description="a table clustered by customer_id")
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.myclusteredtable, your table qualifier might be`myproject.mydataset.myclusteredtable`.
The table schema contains 3 columns:
timestamp:The time of data collection as aTIMESTAMP
CREATETABLEmydataset.myclusteredtable(customer_idSTRING,transaction_amountNUMERIC)PARTITIONBYDATE(_PARTITIONTIME)CLUSTERBYcustomer_idOPTIONS(partition_expiration_days=3,description="a table clustered by customer_id")
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.myclusteredtable, your table qualifier might be`myproject.mydataset.myclusteredtable`.
The following example creates aclustered tablenamedmyclusteredtableinmydataset. The table is not partitioned.
CREATETABLEmydataset.myclusteredtable(customer_idSTRING,transaction_amountNUMERIC)CLUSTERBYcustomer_idOPTIONS(description="a table clustered by customer_id")
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.myclusteredtable, your table qualifier might be`myproject.mydataset.myclusteredtable`.
Creating a clustered table from the result of a query
Example 1
The following example creates aclustered tablenamedmyclusteredtableinmydatasetusing the result of a query. The table
is apartitioned table, partitioned by aTIMESTAMPcolumn.
CREATETABLEmydataset.myclusteredtable(timestampTIMESTAMP,customer_idSTRING,transaction_amountNUMERIC)PARTITIONBYDATE(timestamp)CLUSTERBYcustomer_idOPTIONS(partition_expiration_days=3,description="a table clustered by customer_id")ASSELECT*FROMmydataset.myothertable
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.myclusteredtable, your table qualifier might be`myproject.mydataset.myclusteredtable`.
The table schema contains 3 columns:
timestamp:The time of data collection as aTIMESTAMP
The following example creates aclustered tablenamedmyclusteredtableinmydatasetusing the result of a query. The table
is not partitioned.
CREATETABLEmydataset.myclusteredtable(customer_idSTRING,transaction_amountNUMERIC)CLUSTERBYcustomer_idOPTIONS(description="a table clustered by customer_id")ASSELECT*FROMmydataset.myothertable
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.myclusteredtable, your table qualifier might be`myproject.mydataset.myclusteredtable`.
Suppose you have a BigLake table namedmyawsdataset.ordersthat
references data fromAmazon S3.
You want to transfer data from that table to a
BigQuery tablemyotherdataset.shipmentsin the US multi-region.
First, display information about themyawsdataset.orderstable:
bqshowmyawsdataset.orders;
The output is similar to the following:
Last modified Schema Type Total URIs Expiration
----------------- -------------------------- ---------- ------------ -----------
31 Oct 17:40:28 |- l_orderkey: integer EXTERNAL 1
|- l_partkey: integer
|- l_suppkey: integer
|- l_linenumber: integer
|- l_returnflag: string
|- l_linestatus: string
|- l_commitdate: date
Next, display information about themyotherdataset.shipmentstable:
bqshowmyotherdataset.shipments
The output is similar to the following. Some columns are omitted to simplify the
output.
Last modified Schema Total Rows Total Bytes Expiration Time Partitioning Clustered Fields Total Logical
----------------- --------------------------- ------------ ------------- ------------ ------------------- ------------------ ---------------
31 Oct 17:34:31 |- l_orderkey: integer 3086653 210767042 210767042
|- l_partkey: integer
|- l_suppkey: integer
|- l_commitdate: date
|- l_shipdate: date
|- l_receiptdate: date
|- l_shipinstruct: string
|- l_shipmode: string
Now, using theCREATE TABLE AS SELECTstatement you can selectively load data
to themyotherdataset.orderstable in the US multi-region:
You can then perform a join operation with the newly created table:
SELECTorders.l_orderkey,orders.l_orderkey,orders.l_suppkey,orders.l_commitdate,orders.l_returnflag,shipments.l_shipmode,shipments.l_shipinstructFROMmyotherdataset.shipmentsJOIN`myotherdataset.orders`asordersONorders.l_orderkey=shipments.l_orderkeyANDorders.l_partkey=shipments.l_partkeyANDorders.l_suppkey=shipments.l_suppkeyWHEREorders.l_returnflag='R';-- 'R' means refunded.
When new data is available, append the data of the 1993 year to the destination
table using theINSERT INTO SELECTstatement:
This statement is a variant of theCREATE TABLEstatement and has the samelimitations.
Other than the use of theLIKEclause in place of a column list,
the syntax is identical to theCREATE TABLEsyntax.
TheCREATE TABLE LIKEstatement copies only the metadata of the source table.
You can use theAS query_statementclause to include data into the new table.
The new table has no relationship to the source table after creation; thus
modifications to the source table will not propagate to the new table.
By default, the new table inherits partitioning, clustering, and options
metadata from the source table. You can customize metadata in the new table by
using the optional clauses in the SQL statement. For example, if you want to
specify a different set of options for the new table, then include theOPTIONSclause with a list of options and values. This behavior matches that ofALTER TABLE SET OPTIONS.
This statement is a variant of theCREATE TABLEstatement and has the samelimitations.
Other than the use of theCOPYclause in place of a column list,
the syntax is identical to theCREATE TABLEsyntax.
TheCREATE TABLE COPYstatement copies both the metadata and data from the
source table.
The new table inherits partitioning and clustering from the source table. By
default, the table options metadata from the source table are also inherited,
but you can override table options by using theOPTIONSclause. The behavior
is equivalent to runningALTER TABLE SET OPTIONSafter the table is copied.
The new table has no relationship to the source table after creation;
modifications to the source table are not propagated to the new table.
In addition, theOR REPLACEclause requiresbigquery.tables.updateandbigquery.tables.updateDatapermissions.
If theOPTIONSclause includes any expiration options, then thebigquery.tables.deletepermission is also required.
CREATE SNAPSHOT TABLEstatement
Creates atable snapshotbased on a
source table. The source table can be a table, atable clone, or
a table snapshot.
Syntax
CREATE SNAPSHOT TABLE [ IF NOT EXISTS ] table_snapshot_name
CLONE source_table_name
[FOR SYSTEM_TIME AS OF time_expression]
[OPTIONS(snapshot_option_list)]
Arguments
IF NOT EXISTS: If a table snapshot or othertable resourceexists with the same name, theCREATEstatement has no effect.
table_snapshot_name: The name of the table snapshot that you want to create.
The table snapshot name must be unique per dataset. SeeTable path syntax.
source_table_name: The name of the table that you want to snapshot or the
table snapshot that you want to copy. SeeTable path syntax.
If the source table is a standard table, then BigQuery creates
a table snapshot of the source table. If the source table is a table snapshot,
then BigQuery creates a copy of the table snapshot.
FOR SYSTEM_TIME AS OF:
Lets you select the version of the table that was current at the time
specified bytimestamp_expression. It can only be used when creating a
snapshot of a table; it can't be used when making a copy of a table snapshot.
snapshot_option_list: Additional table snapshot
creation options such as alabeland an expiration
time.
Details
CREATE SNAPSHOT TABLEstatements must comply with the following rules:
Only oneCREATEstatement is allowed.
The source table must be one of the following:
A table
A table clone
A table snapshot
TheFOR SYSTEM_TIME AS OFclause can only be used when creating a snapshot
of a table or table clone; it can't be used when making a copy of a table
snapshot.
snapshot_option_list
The option list lets you set table snapshot options such as alabeland an expiration time. You can include multiple
options using a comma-separated list.
Specify a table snapshot option list in the following format:
NAME=VALUE, ...
NAMEandVALUEmust be one of the following combinations:
Create a table snapshot: fail if it already exists
The following example creates a table snapshot of the tablemyproject.mydataset.mytable. The table snapshot is created in the datasetmydatasetand is namedmytablesnapshot:
CREATESNAPSHOTTABLE`myproject.mydataset.mytablesnapshot`CLONE`myproject.mydataset.mytable`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="my_table_snapshot",description="A table snapshot that expires in 2 days",labels=[("org_unit","development")])
If the table snapshot name already exists in the dataset, then the following
error is returned:
The table snapshot option list specifies the following:
Expiration time:48 hours after the time the table snapshot is created
Friendly name:my_table_snapshot
Description:A table snapshot that expires in 2 days
Label:org_unit = development
Create a table snapshot: ignore if it already exists
The following example creates a table snapshot of the tablemyproject.mydataset.mytable. The table snapshot is created in the datasetmydatasetand is namedmytablesnapshot:
CREATESNAPSHOTTABLEIFNOTEXISTS`myproject.mydataset.mytablesnapshot`CLONE`myproject.mydataset.mytable`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="my_table_snapshot",description="A table snapshot that expires in 2 days"labels=[("org_unit","development")])
The table snapshot option list specifies the following:
Expiration time:48 hours after the time the table snapshot is created
Friendly name:my_table_snapshot
Description:A table snapshot that expires in 2 days
Label:org_unit = development
If the table snapshot name already
exists in the dataset, then no action is taken, and no error is returned.
Creates atable clonebased on a source
table. The source table can be a table, a table clone,
or atable snapshot.
Syntax
CREATE TABLE [ IF NOT EXISTS ]
destination_table_name
CLONE source_table_name [FOR SYSTEM_TIME AS OF time_expression]
...
[OPTIONS(table_option_list)]
Details
Other than the use of theCLONEclause in place of a column list, the syntax
is identical to theCREATE TABLEsyntax.
Arguments
IF NOT EXISTS: If the specified destination table name already exists, theCREATEstatement has no effect.
destination_table_name: The name of the table that you want to create.
The table name must
be unique per dataset. The table name can contain the following:
Up to 1,024 characters
Letters (upper or lower case), numbers, and underscores
OPTIONS(table_option_list): Lets you specify
additional table creation options such as alabeland
an expiration time.
source_table_name: The name of the source table.
CREATE TABLE CLONEstatements must comply with the following rules:
Only oneCREATEstatement is allowed.
The table that is being cloned must be a table, a table clone, or
a table snapshot.
The source table (required only if the source table is a table snapshot).
If theOPTIONSclause includes any expiration options, then thebigquery.tables.deletepermission is also required.
Examples
Restore a table snapshot: fail if destination table already exists
The following example creates the tablemyproject.mydataset.mytablefrom the table snapshotmyproject.mydataset.mytablesnapshot:
CREATETABLE`myproject.mydataset.mytable`CLONE`myproject.mydataset.mytablesnapshot`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL365DAY),friendly_name="my_table",description="A table that expires in 1 year",labels=[("org_unit","development")])
If the table name exists in the dataset, then the following error is
returned:
Already Exists: myproject.mydataset.mytable.
The table option list specifies the following:
Expiration time:365 days after the time that the table is created
Friendly name:my_table
Description:A table that expires in 1 year
Label:org_unit = development
Create a clone of a table: ignore if the destination table already exists
The following example creates the table clonemyproject.mydataset.mytableclonebased on the tablemyproject.mydataset.mytable:
CREATETABLEIFNOTEXISTS`myproject.mydataset.mytableclone`CLONE`myproject.mydataset.mytable`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL365DAY),friendly_name="my_table",description="A table that expires in 1 year",labels=[("org_unit","development")])
The table option list specifies the following:
Expiration time:365 days after the time the table is created
Friendly name:my_table
Description:A table that expires in 1 year
Label:org_unit = development
If the table name exists in the dataset, then no action is taken, and no error
is returned.
For information about creating a copy of a table, seeCREATE TABLE COPY.
OR REPLACE: Replaces any view with the same name if it exists. Cannot
appear withIF NOT EXISTS.
IF NOT EXISTS: If a view or othertable resourceexists with the same name, theCREATEstatement has no effect. Cannot appear
withOR REPLACE.
view_name: The name of the view you're creating. SeeTable path syntax.
view_column_name_list: Lets you explicitly
specify the column names of the view, which may be aliases to the column names
in the underlying SQL query.
view_option_list: Additional view creation options
such as alabeland an expiration time.
query_expression: The GoogleSQL query expression used to define the
view.
Details
CREATE VIEWstatements must comply with the following rules:
Only oneCREATEstatement is allowed.
view_column_name_list
The view's column name list is optional. The names must be unique but do not have to be the same as the column names of the underlying SQL query. For example, if your view is created with the following statement:
The number of columns in the column name list must match the number of columns in the underlying SQL query. If the columns in the table of the underlying SQL query is added or dropped, the view becomes invalid and must be recreated. For example, if theagecolumn is dropped from themydataset.peopletable, then the view created in the previous example becomes invalid.
view_column_option_list
Theview_column_option_listlets you specify optional top-level column
options. Column options for a view have the same syntax and requirements as
for a table, but with a different list ofNAMEandVALUEfields:
NAME
VALUE
Details
description
STRING
Example:description="a unique id"
view_option_list
The option list allows you to set view options such as alabeland an expiration time. You can include multiple
options using a comma-separated list.
Specify a view option list in the following format:
NAME=VALUE, ...
NAMEandVALUEmust be one of the following combinations:
This property is equivalent to theexpirationTimetable resource property.
friendly_name
STRING
Example:friendly_name="my_view"
This property is equivalent to thefriendlyNametable resource property.
description
STRING
Example:description="a view that expires in 2025"
This property is equivalent to thedescriptiontable resource property.
labels
ARRAY<STRUCT<STRING, STRING>>
Example:labels=[("org_unit", "development")]
This property is equivalent to thelabelstable resource property.
privacy_policy
JSON-formatted STRING
The policies to enforce when anyone queries the view.
To learn more about the policies available for a view, see
theprivacy_policyview option.
VALUEis a constant expression containing only literals, query parameters,
and scalar functions.
The constant expressioncannotcontain:
A reference to a table
Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
User-defined functions, aggregate functions, or analytic functions
The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
IfVALUEevaluates toNULL, the corresponding optionNAMEin theCREATE VIEWstatement is ignored.
privacy_policy
The following policies are available in theprivacy_policyview optionto createanalysis rules. A policy represents
a condition that needs to be met before a query can be run.
Policy
Details
aggregation_threshold_policy
The aggregation threshold policy to enforce when a view is
queried.
Syntax:
'{
"aggregation_threshold_policy": {
"threshold": value,
"privacy_unit_columns": value
}
}'
Parameters:
aggregation_threshold_policy: An aggregation threshold
policy for the view. When this parameter is included,
a minimum number of distinct entities must be present in a
set of data in the view.
threshold: The minimum number of distinct
privacy units (privacy unit column values) that need to contribute
to each row in the query results. If a potential row doesn't satisfy
this threshold, that row is omitted from the query results.valueis a positive JSON integer.
privacy_unit_columns: The columns that represents the
privacy unit columns in a view. At this time, a view can have only one
privacy unit column.valueis a JSON string.
differential_privacy_policy: The
differential privacy policy for the view.
privacy_unit_column: The column that represents the
privacy unit column for differentially private queries on the view.valueis a JSON string.
max_epsilon_per_query: The maximum amount of epsilon that
can be specified for a differentially private query on the view.valueis a JSON number from 0.001 to 1e+15.
epsilon_budget: The amount of epsilon that can be used in
totality for all differentially private queries on the view.valueis JSON number from 0.001 to 1e+15.
delta_per_query: The maximum amount of delta that can
be specified for a differentially private query on the view.valueis a JSON number from 1e-15 to 1.
delta_budget: The amount of delta that can be used in
totality for all differentially private queries on the view. The
budget must be larger than the delta for any
differentially private query on the view.valueis a JSON number from 1e-15 to `1000`.
max_groups_contributed: The maximum number of groups to
which each protected entity can contribute in a
differentially private query.valueis a non-negative JSON integer.
A join restriction policy for the view. When this parameter is
included, only the specified joins can be run on the specified columns
in the view.
This policy can be used alone or with other policies, such as the
aggregation threshold or differential privacy policy.
Syntax:
'{
"join_restriction_policy": {
"join_condition": value,
"join_allowed_columns": value
}
}'
Parameters:
join_restriction_policy: The
join restriction policy for the view.
join_condition: The type of join condition to
enforce on the view.valuecan be one of the following JSON strings:
JOIN_ALL: All columns injoin_allowed_columnsmust be inner joined upon for this view to be queried.
JOIN_ANY: At least one column injoin_allowed_columnsmust be joined upon for this
view to be queried.
JOIN_BLOCKED: This view can't be joined along any
column. Don't setjoin_allowed_columnsin this case.
This can be used with all analysis rules except for thelist overlap analysis rule.
JOIN_NOT_REQUIRED: A join is not required to query
this view. If a join is used, only the columns injoin_allowed_columnscan be used. This can be
used with all analysis rules except for thelist overlap analysis rule.
join_allowed_columns: A list of columns that can be
part of a join operation.valueis a JSON array.
If the view is created in the same project used to run theCREATE VIEWstatement, the view bodyquery_expressioncan reference entities without
specifying the project; the default project is the project
which owns the view. Consider the sample query below.
After running the aboveCREATE VIEWquery in the projectmyProject, you can
run the querySELECT * FROM myProject.myDataset.myView. Regardless of the project you
choose to run thisSELECTquery, the referenced tableanotherDataset.myTableis always resolved against projectmyProject.
If the view is not created in the same project used to run theCREATE VIEWstatement, then all references in the view bodyquery_expressionmust be
qualified with project IDs. For instance, the preceding sampleCREATE VIEWquery
is invalid if it runs in a project different frommyProject.
In addition, theOR REPLACEclause requiresbigquery.tables.updatepermission.
If theOPTIONSclause includes an expiration time, then thebigquery.tables.deletepermission is also required.
Examples
Creating a new view
The following example creates a view namednewviewinmydataset:
CREATEVIEW`myproject.mydataset.newview`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="newview",description="a view that expires in 2 days",labels=[("org_unit","development")])ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`
If the view name exists in the dataset, the following error is returned:
Already Exists:project_id:dataset.table
The view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The view option list specifies the:
Expiration time:48 hours from the time the view is created
Friendly name:newview
Description:A view that expires in 2 days
Label:org_unit = development
Creating a view only if the view doesn't exist
The following example creates a view namednewviewinmydatasetonly if no
view namednewviewexists inmydataset. If the view name exists in the
dataset, no error is returned, and no action is taken.
CREATEVIEWIFNOTEXISTS`myproject.mydataset.newview`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="newview",description="a view that expires in 2 days",labels=[("org_unit","development")])ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`
The view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The view option list specifies the:
Expiration time:48 hours from the time the view is created
Friendly name:newview
Description:A view that expires in 2 days
Label:org_unit = development
Creating or replacing a view
The following example creates a view namednewviewinmydataset, and ifnewviewexists inmydataset, it is overwritten using the specified query
expression.
CREATEORREPLACEVIEW`myproject.mydataset.newview`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="newview",description="a view that expires in 2 days",labels=[("org_unit","development")])ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`
The view is defined using the following GoogleSQL query:
Expiration time:48 hours from the time the view is created
Friendly name:newview
Description:A view that expires in 2 days
Label:org_unit = development
Creating a view with column descriptions
The following example creates a view namednewviewinmydataset. This view
definition provides the column description for each column inmytable.
You can rename columns from the original query.
CREATEVIEW`myproject.mydataset.newview`(column_1_new_nameOPTIONS(DESCRIPTION='Description of the column 1 contents'),column_2_new_nameOPTIONS(DESCRIPTION='Description of the column 2 contents'),column_3_new_nameOPTIONS(DESCRIPTION='Description of the column 3 contents'))ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`
OR REPLACE: Replaces a materialized view with the same name if it exists.
Cannot appear withIF NOT EXISTS.
IF NOT EXISTS: If a materialized view or othertable resourceexists with the same name, theCREATEstatement has no effect. Cannot appear
withOR REPLACE.
materialized_view_name: The name of the materialized view you're creating.
SeeTable path syntax.
If theproject_nameis omitted from the materialized view name, or it is the
same as the project that runs this DDL query, then the latter is also used as
the default project for references to tables, functions, and other resources
inquery_expression. The default project of the references is fixed and does
not depend on the future queries that invoke the new materialized view.
Otherwise, all references inquery_expressionmust be qualified with
project names.
The materialized view name must be unique per dataset.
partition_expression: An expression that determines
how to partition the table. A materialized view can only be partitioned in the
same way as the table inquery expression(thebase table) is partitioned.
clustering_column_list: A comma-separated list
of column references that determine how to cluster the materialized view.
materialized_view_option_list: Allows you
to specify additional materialized view options such as a whether refresh is
enabled, the refresh interval, alabel, and an
expiration time.
query_expression: The GoogleSQL query expression used to define the
materialized view.
Details
CREATE MATERIALIZED VIEWstatements must comply with the following rules:
Only oneCREATEstatement is allowed.
Default project in materialized view body
If the materialized view is created in the same project used to run theCREATE MATERIALIZED VIEWstatement, the materialized view bodyquery_expressioncan reference entities without
specifying the project; the default project is the project
which owns the materialized view. Consider the sample query below.
After running the aboveCREATE MATERIALIZED VIEWquery in the projectmyProject, you can
run the querySELECT * FROM myProject.myDataset.myView. Regardless of the project you
choose to run thisSELECTquery, the referenced tableanotherDataset.myTableis always resolved against projectmyProject.
If the materialized view is not created in the same project used to run theCREATE VIEWstatement, then all references in the materialized view bodyquery_expressionmust be
qualified with project IDs. For instance, the preceding sampleCREATE MATERIALIZED VIEWquery
is invalid if it runs in a project different frommyProject.
materialized_view_option_list
The option list allows you to set materialized view options such as a whether
refresh is enabled. the refresh interval, alabeland
an expiration time. You can include multiple options using a comma-separated
list.
Specify a materialized view option list in the following format:
NAME=VALUE, ...
NAMEandVALUEmust be one of the following combinations:
This property is equivalent to theexpirationTimetable resource property.expiration_timestampis optional
and not used by default.
max_staleness
INTERVAL
Example:max_staleness=INTERVAL "4:0:0" HOUR TO SECOND
Themax_stalenesspropertyprovides consistently high
performance with controlled costs when processing large, frequently
changing datasets.max_stalenessis disabled by default.
allow_non_incremental_definition
BOOLEAN
Example:allow_non_incremental_definition=true
Theallow_non_incremental_definitionpropertysupports an
expanded range of SQL queries to create materialized views.allow_non_incremental_definition=trueis disabled by
default.
The dataset where you create the materialized
view.
In addition, theOR REPLACEclause requiresbigquery.tables.updatepermission.
If theOPTIONSclause includes any expiration options, then thebigquery.tables.deletepermission is also required.
Examples
Creating a new materialized view
The following example creates a materialized view namednew_mvinmydataset:
CREATEMATERIALIZEDVIEW`myproject.mydataset.new_mv`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="new_mv",description="a materialized view that expires in 2 days",labels=[("org_unit","development")],enable_refresh=true,refresh_interval_minutes=20)ASSELECTcolumn_1,SUM(column_2)ASsum_2,AVG(column_3)ASavg_3FROM`myproject.mydataset.mytable`GROUPBYcolumn_1
If the materialized view name exists in the dataset, the following error is
returned:
When you use a DDL statement to create a materialized view, you must specify the
project, dataset, and materialized view in the following format:`project_id.dataset.materialized_view`(including the backticks ifproject_idcontains special characters); for example,`myproject.mydataset.new_mv`.
The materialized view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The materialized view option list specifies the:
Expiration time:48 hours from the time the materialized view is created
Friendly name:new_mv
Description:A materialized view that expires in 2 days
Label:org_unit = development
Refresh enabled:true
Refresh interval:20 minutes
Creating a materialized view only if the materialized view doesn't exist
The following example creates a materialized view namednew_mvinmydatasetonly if no materialized view namednew_mvexists inmydataset. If the
materialized view name exists in the dataset, no error is returned, and no
action is taken.
CREATEMATERIALIZEDVIEWIFNOTEXISTS`myproject.mydataset.new_mv`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="new_mv",description="a view that expires in 2 days",labels=[("org_unit","development")],enable_refresh=false)ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`
The materialized view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The materialized view option list specifies the:
Expiration time:48 hours from the time the view is created
Friendly name:new_mv
Description:A view that expires in 2 days
Label:org_unit = development
Refresh enabled:false
Creating a materialized view with partitioning and clustering
The following example creates a materialized view namednew_mvinmydataset,
partitioned by thecol_datetimecolumn and clustered
by thecol_intcolumn:
Creates areplica of a materialized view.
The source materialized view must
be over an Amazon Simple Storage Service (Amazon S3) BigLake table. You can use the materialized
view replica to make Amazon S3 data available locally for joins.
replica_name: The name of the materialized view replica you're creating, intable path syntax. If the project name is omitted from the
materialized view replica name, the current project is used as the default.
The materialized view replica name must be unique for each dataset.
source_materialized_view_name: The name of the materialized view you are
replicating, in table path syntax. The source materialized view must be over
an Amazon S3 BigLake table, and must be authorized on the
dataset that contains that table.
materialized_view_replica_option_list
The option list lets you set materialized view replica options.
Specify a materialized view replica option list in the following format:
NAME=VALUE, ...
NAME
VALUE
Details
replication_interval_seconds
INT64
Specifies how often to replicate the data from the source materialized
view to the replica. Must be a value between60and3,600, inclusive. Defaults to300(5 minutes).
A federated dataset is a connection between BigQuery and an
external data source at the dataset level. For an example, seeCreate AWS Glue federated datasets.
Syntax
CREATE EXTERNAL SCHEMA [ IF NOT EXISTS ] dataset_name
[WITH CONNECTION connection_name]
[OPTIONS(external_schema_option_list)]
Arguments
IF NOT EXISTS: If any dataset exists with the same name, theCREATEstatement has no effect.
dataset_name: The name of the dataset to create.
connection_name: Specifies aconnection resourcethat has
credentials for accessing the external data. Specify the connection name
in the formPROJECT_ID.LOCATION.CONNECTION_ID. If the
project ID or location contains a dash, enclose the connection name in
backticks (`).
The dataset is created in the location that you specify in the query settings.
For more information, seeSpecify locations.
The location must support the kind of federated dataset that you are creating,
for example, you can only create AWS Glue federated datasets in AWS
locations.
The option list specifies options for the federated dataset. Specify the options
in the following format:NAME=VALUE, ...
The following options are supported:
NAME
VALUE
Details
description
STRING
The description of the dataset.
friendly_name
STRING
A descriptive name for the dataset.
labels
<ARRAY<STRUCT<STRING, STRING>>>
An array of labels for the dataset, expressed as key-value pairs.
location
STRING
The location in which to create the dataset. If you don't specify this
option, the dataset is created in the location where the query runs. If
you specify this option and also explicitly set the location for the query
job, the two values must match; otherwise the query fails. The location
must support the kind of federated dataset that you are creating,
for example, you can only create AWS Glue federated datasets in
AWS locations.
external_source
STRING
The source of the external dataset, such as theAmazon Resource Name (ARN),
with a prefix identifying the source, such asaws-glue://.
External tables let BigQuery query data that is stored outside of
BigQuery storage. For more information about external tables, seeIntroduction to external data sources.
column_schema: Specifies the schema of the column. It uses the same
syntax as thecolumn_schemadefinition in
theCREATE TABLEstatement. If you don't include
this clause, BigQuery detects the schema automatically.
connection_name: Specifies aconnection resourcethat has
credentials for accessing the external data. Specify the connection name
in the formPROJECT_ID.LOCATION.CONNECTION_ID. If the
project ID or location contains a dash, enclose the connection name in
backticks (`).
partition_column_name: The name of a partition column. Include this
field if your external data uses a hive-partitioned layout. For more
information, see:Supported data layouts.
TheCREATE EXTERNAL TABLEstatement does not support creating temporary
external tables.
To create an externally partitioned table, use theWITH PARTITION COLUMNSclause to specify the partition schema details. BigQuery
validates the column definitions against the external data location. The schema
declaration must strictly follow the ordering of the fields in the external
path. For more information about external partitioning, seeQuerying externally partitioned data.
external_table_option_list
The option list specifies options for creating the external table. Theformatandurisoptions are required. Specify the option list in the following
format:NAME=VALUE, ...
Options
allow_jagged_rows
BOOL
Iftrue, allow rows that are missing trailing optional
columns.
Applies to CSV data.
allow_quoted_newlines
BOOL
Iftrue, allow quoted data sections that contain newline
characters in the file.
Applies to CSV data.
bigtable_options
STRING
Only required when creating a Bigtable external
table.
Specifies the schema of the Bigtable external table
in JSON format.
For a list of Bigtable table definition options, seeBigtableOptionsin the REST API reference.
compression
STRING
The compression type of the data source. Supported values include:GZIP. If not specified, the data source is uncompressed.
Iftrue, use schema inference specifically for
Parquet LIST logical type.
Applies to Parquet data.
enable_logical_types
BOOL
Iftrue, convert Avro logical types into their
corresponding SQL types. For more information, seeLogical types.
Applies to Avro data.
encoding
STRING
The character encoding of the data. Supported values include:UTF8(orUTF-8),ISO_8859_1(orISO-8859-1).
Applies to CSV data.
enum_as_string
BOOL
Iftrue, infer Parquet ENUM logical type as STRING
instead of BYTES by default.
Applies to Parquet data.
expiration_timestamp
TIMESTAMP
The time when this table expires. If not specified, the table does
not expire.
Example:"2025-01-01 00:00:00 UTC".
field_delimiter
STRING
The separator for fields in a CSV file.
Applies to CSV data.
format
STRING
The format of the external data.
Supported values forCREATE EXTERNAL TABLEinclude:AVRO,CLOUD_BIGTABLE,CSV,DATASTORE_BACKUP,DELTA_LAKE(preview),GOOGLE_SHEETS,NEWLINE_DELIMITED_JSON(orJSON),ORC,PARQUET.
The valueJSONis equivalent toNEWLINE_DELIMITED_JSON.
hive_partition_uri_prefix
STRING
A common prefix for all source URIs before the partition key encoding
begins. Applies only to hive-partitioned external tables.
Applies to Avro, CSV, JSON, Parquet, and ORC data.
Example:"gs://bucket/path".
file_set_spec_type
STRING
Specifies how to interpret source URIs for load jobs and external tables.
Supported values include:
FILE_SYSTEM_MATCH. Expands source URIs by listing files from the object store. This is the default behavior if FileSetSpecType is not set.
NEW_LINE_DELIMITED_MANIFEST. Indicates that the provided URIs are newline-delimited manifest files, with one URI per line. Wildcard URIs are not supported in the manifest files, and all referenced data files must be in the same bucket as the manifest file.
For example, if you have a source URI of"gs://bucket/path/file"and thefile_set_spec_typeisFILE_SYSTEM_MATCH, then the file is used directly as a data file. If thefile_set_spec_typeisNEW_LINE_DELIMITED_MANIFEST, then each line in the file is interpreted as a URI that points to a data file.
ignore_unknown_values
BOOL
Iftrue, ignore extra values that are not represented
in the table schema, without returning an error.
Applies to CSV and JSON data.
json_extension
STRING
For JSON data, indicates a particular JSON interchange format. If
not specified, BigQuery reads the data as generic JSON
records.
Specifies whether cached metadata is used by operations against the
table, and how fresh the cached metadata must be in order for
the operation to use it.
To disable metadata caching, specify 0. This is the default.
To enable metadata caching, specify aninterval literalvalue between 30 minutes and 7 days. For example, specifyINTERVAL 4 HOURfor a 4 hour staleness interval.
With this value, operations against the table use cached metadata if
it has been refreshed within the past 4 hours. If the cached metadata
is older than that, the operation falls back to retrieving metadata from
Cloud Storage instead.
null_marker
STRING
The string that representsNULLvalues in a CSV file.
Set the value of this option toSIMPLEwhen
creating an object table.
preserve_ascii_control_characters
BOOL
Iftrue, then the embedded ASCII control characters
which are the first 32 characters in the ASCII table, ranging from
'\x00' to '\x1F', are preserved.
Applies to CSV data.
projection_fields
STRING
A list of entity properties to load.
Applies to Datastore data.
quote
STRING
The string used to quote data sections in a CSV file. If your data
contains quoted newline characters, also set theallow_quoted_newlinesproperty totrue.
Applies to CSV data.
reference_file_schema_uri
STRING
User provided reference file with the table schema.
Iftrue, all queries over this table require a partition
filter that can be used to eliminate partitions when reading data.
Applies only to hive-partitioned external tables.
Applies to Avro, CSV, JSON, Parquet, and ORC data.
sheet_range
STRING
Range of a Google Sheets spreadsheet to query from.
Applies to Google Sheets data.
Example:"sheet1!A1:B20",
skip_leading_rows
INT64
The number of rows at the top of a file to skip when reading the
data.
Applies to CSV and Google Sheets data.
uris
For external tables, including object tables, that aren't
Bigtable tables:
ARRAY<STRING>
An array of fully qualified URIs for the external data locations.
Each URI can contain one
asterisk (*)wildcard character,
which must come after the bucket name. When you specifyurisvalues that target multiple files, all of those
files must share a compatible schema.
In addition, theOR REPLACEclause requiresbigquery.tables.updatepermission.
If theOPTIONSclause includes an expiration time, then thebigquery.tables.deletepermission is also required.
Examples
The following example creates a BigLake table and explicitly
specifies the schema. It also specifies refreshing metadata cache automatically
at a system-defined interval.
The following example creates an external table from a CSV file and explicitly
specifies the schema. It also specifies the field delimiter ('|') and sets the
maximum number of bad records allowed.
The following example creates an externally partitioned table. It uses schema
auto-detection to detect both the file schema and the hive partitioning
layout. If the external path isgs://bucket/path/field_1=first/field_2=1/data.parquet, the partition columns
are detected asfield_1(STRING) andfield_2(INT64).
The following example creates an externally partitioned table by explicitly
specifying the partition columns. This example assumes that the external file
path has the patterngs://bucket/path/field_1=first/field_2=1/data.parquet.
CREATEEXTERNALTABLEdataset.CustomHivePartitionedTableWITHPARTITIONCOLUMNS(field_1STRING,-- column order must match the external pathfield_2INT64)OPTIONS(uris=['gs://bucket/path/*'],format='PARQUET',hive_partition_uri_prefix='gs://bucket/path',require_hive_partition_filter=false);
CREATE FUNCTIONstatement
Creates a newuser-defined function(UDF). BigQuery supports UDFs written in either SQL or
JavaScript.
Routine names must contain only letters, numbers, and underscores, and be at most 256 characters long.
Arguments
OR REPLACE: Replaces any function with the same name if it exists. Cannot
appear withIF NOT EXISTS.
IF NOT EXISTS: If any dataset exists with the same name, theCREATEstatement has no effect. Cannot appear withOR REPLACE.
TEMPorTEMPORARY: Creates a temporary function. If the clause is not
present, the statement creates a persistent UDF. You can reuse persistent
UDFs across multiple queries, whereas you can only use temporary UDFs in a
single query, script, session, or procedure.
project_name: For persistent functions, the name of the project where
you are creating the function. Defaults to the project that runs the DDL
query. Do not include the project name for temporary functions.
dataset_name: For persistent functions, the name of the dataset where
you are creating the function. Defaults to thedefaultDatasetin the
request. Do not include the dataset name for temporary functions.
function_name: The name of the function.
named_parameter: A comma-separatedparam_nameandparam_typepair. The value ofparam_typeis a
BigQuerydata type. For a SQL UDF, the
value ofparam_typecan also beANY TYPE.
determinism_specifier: Applies only to JavaScript UDFs.
Provides a hint to BigQuery as to whether the query result can
be cached. Can be one of the following values:
DETERMINISTIC: The function always returns the same result when passed
the same arguments. The query result is potentially cacheable. For
example, if the functionadd_one(i)always returnsi + 1, the
function is deterministic.
NOT DETERMINISTIC: The function does not always return the same result
when passed the same arguments, and therefore is not cacheable. For
example, if the functionjadd_random(i)returnsi + rand(), the
function is not deterministic and BigQuery does not use
cached results.
If all of the invoked functions areDETERMINISTIC,
BigQuery tries to cache the result, unless the results can't
be cached for other reasons. For more information, seeUsing cached query results.
data_type: The data type that the function returns.
If the function is defined in SQL, then theRETURNSclause is
optional. If theRETURNSclause is omitted, then BigQuery
infers the result type of the function from the SQL function body when a
query calls the function.
If the function is defined in JavaScript, then theRETURNSclause is
required. For more information about allowed values fordata_type,
seeSupported
JavaScript UDF data types.
sql_expression: The SQL expression that defines the function.
javascript_code: The definition of a JavaScript function. The value is astring literal.
If the code includes quotes and backslashes, it must be either escaped or
represented as a raw string. For example, the codereturn "\n";can be
represented as one of the following:
Quoted string"return \"\\n\";". Both quotes and backslashes need
to be escaped.
Triple quoted string:"""return "\\n";""". Backslashes need
to be escaped while quotes do not.
Raw string:r"""return "\n";""". No escaping is needed.
connection_name: Specifies aconnection resourcethat has
credentials for accessing the remote endpoint. Specify the connection name
in the formproject_name.location.connection_id: If the
project name or location contains a dash, enclose the connection name in
backticks (`).
function_option_list
The option list specifies options for creating a UDF. The following options are
supported:
NAME
VALUE
Details
description
STRING
A description of the UDF. This option isn't supported when creating a temporary function.
library
ARRAY<STRING>
An array of JavaScript libraries to include in the function
definition. Applies only to JavaScript UDFs. For more information, seeIncluding JavaScript libraries.
A list of key-value pairs that will be sent with every HTTP request when the function is invoked. Applies only to remote functions.
Example:[("key1","value1"),("key2", "value2")]
max_batching_rows
INT64
The maximum number of rows in each HTTP request. If not specified, BigQuery decides how many rows are included in a HTTP request. Applies only to remote functions.
The following example creates a persistent remote function namedremoteMultiplyInputsin a dataset namedmydataset, assumingmydatasetis
inUSlocation and there is a connectionmyconnectionin the same location
and same project.
OR REPLACE: Replaces any function with the same name if it
exists.OR REPLACEcan't appear withIF NOT EXISTS.
IF NOT EXISTS: If any dataset exists with the same name,
theCREATEstatement has no effect.IF NOT EXISTScan't appear withOR REPLACE.
TEMPorTEMPORARY: The function
is temporary; that is, it exists for the lifetime of a single query,
script, session, or procedure. A temporary function can't have the same
name as a built-in function. If the names match, an error is produced.
IfTEMPorTEMPORARYis not included, a
persistent function is created. You can reuse persistent functions across
multiple queries.
function_path: The path where the function must be created
and the name of the function.
project_name: For persistent functions, the name of the
project where you are creating the function. Defaults to the project
that runs the DDL query. Don't include the project name for
temporary functions.
dataset_name: For persistent functions, the name of the
dataset where you are creating the function. Defaults todefaultDatasetin the request. Don't include the
dataset name for temporary functions.
function_name: The name of the function. Function names
must contain only letters, numbers, and underscores, and be at most
256 characters long.
function_parameter: A parameter for the function.
parameter_name: The name of the function parameter.
parameter_data_type: The GoogleSQLdata typefor the function parameter.
NOT AGGREGATE: The function parameter is not an
aggregate. A non-aggregate
function parameter can appear anywhere in the function
definition.
return_data_type: The GoogleSQL data type
that the function should return. GoogleSQL
infers the result data type of the function from the function body when theRETURNclause is omitted.
function_body: The SQL expression that defines the
function body.
function_option_list: A list of options for creating the
function. For more information, seefunction_option_list.
function_option_list
The option list specifies options for creating a SQL UDAF. The following
options are supported:
In addition, theOR REPLACEclause requires thebigquery.routines.updatepermission.
Examples
Create and call a SQL UDAF
The following example shows a persistent SQL UDAF that includes a
non-aggregate function parameter. Inside the function definition, the
aggregateSUMmethod takes the aggregate function parameter dividend,
while the non-aggregate division operator (/) takes the
non-aggregate function parameter divisor.
CREATEAGGREGATEFUNCTIONmyProject.myDataset.ScaledSum(dividendFLOAT64,divisorFLOAT64NOTAGGREGATE)RETURNSFLOAT64AS(SUM(dividend)/divisor);-- Call the SQL UDAF.SELECTScaledSum(col1,2)ASscaled_sumFROM(SELECT1AScol1UNIONALLSELECT3AScol1UNIONALLSELECT5AScol1);/*------------*| scaled_sum |+------------+| 4.5 |*------------*/
OR REPLACE: Replaces any function with the same name if it
exists.OR REPLACEcan't appear withIF NOT EXISTS.
IF NOT EXISTS: If any dataset exists with the same name,
theCREATEstatement has no effect.IF NOT EXISTScan't appear withOR REPLACE.
TEMPorTEMPORARY: The function
is temporary; that is, it exists for the lifetime of a single query,
script, session, or procedure. A temporary function can't have the same
name as a built-in function. If the names match, an error is produced.
IfTEMPorTEMPORARYis not included, a
persistent function is created. You can reuse persistent functions across
multiple queries.
function_path: The path where the function must be created
and the name of the function.
project_name: For persistent functions, the name of the
project where you are creating the function. Defaults to the project
that runs the DDL query. Don't include the project name for
temporary functions.
dataset_name: For persistent functions, the name of the
dataset where you are creating the function. Defaults todefaultDatasetin the request. Don't include the
dataset name for temporary functions.
function_name: The name of the function. Function names
must contain only letters, numbers, and underscores, and be at most
256 characters long.
function_parameter: A parameter for the function.
parameter_name: The name of the function parameter.
parameter_data_type: The GoogleSQLdata typefor the function parameter.
NOT AGGREGATE: The function parameter is not an
aggregate. Only
one non-aggregate function parameter is allowed per
JavaScript UDAF, and it must be the last parameter in the
list.
return_data_type: The GoogleSQL data type
that the function should return.
function_body: The JavaScript expression that defines the
function body.
For more information, seefunction_body.
function_option_list: A list of options for creating the
function. For more information, seefunction_option_list.
function_body
The body of the JavaScript function must be a quoted string literal
that represents the JavaScript code. To learn more about the different types of
quoted string literals you can use, seeFormats for quoted literals.
The JavaScript function body must include four JavaScript functions
that initialize, aggregate, merge, and finalize the results for the
JavaScript UDAF. To learn more about theinitialState,aggregate,merge,
andfinalizeJavaScript functions, seeRequired aggregate functions in a JavaScript UDAF.
Only serialized data can be passed into the JavaScript aggregate functions.
If you need to serialize data such as functions or symbols to pass them into
the aggregate functions, use the JavaScript serialization functions.
For more information, seeSerialization functions for a JavaScript UDAF.
function_option_list
The option list specifies options for creating a JavaScript UDAF. The following
options are supported:
NAME
VALUE
Details
description
STRING
A description of the UDAF.
library
ARRAY<STRING>
An array of JavaScript libraries to include in the
JavaScript UDAF function body.
An array of arrays is not supported. To get around this
limitation, use theArray<Object<Array>>(JavaScript) andARRAY<STRUCT<ARRAY>>(GoogleSQL)
data types.
BIGNUMERIC
NumberorString
Same asNUMERIC.
BOOL
Boolean
BYTES
Uint8Array
DATE
Date
FLOAT64
Number
INT64
BigInt
JSON
Various types
The GoogleSQLJSONdata type can be converted
into a JavaScriptObject,Array, or other
GoogleSQL-supported JavaScript data type.
NUMERIC
NumberorString
If aNUMERICvalue can be represented exactly as anIEEE 754 floating-pointvalue (range[-253, 253]),
and has no fractional part, it is encoded as aNumberdata type, otherwise it is encoded as aStringdata type.
STRING
String
STRUCT
Object
EachSTRUCTfield is a named property in theObjectdata type. An unnamedSTRUCTfield is
not supported.
TIMESTAMP
Date
Datecontains a microsecond field with the
microsecond fraction ofTIMESTAMP.
Required aggregation functions in a JavaScript UDAF
The JavaScript function body must include the following exportable
JavaScript functions:
initialStatefunction: Sets up the initial aggregation state of the UDAF
and then returns the initial aggregation state.
partialState: The second aggregation state to merge.
nonAggregateParam: Replace with aNOT AGGREGATEfunction parameter name.
Details:
Depending on the size and organization of the underlying data being queried,
themergefunction might or might not be called. For example, if a
particular set of data is small, or the data is partitioned in a way that
results in small sets of data, themergefunction won't be called.
nonAggregateParam: Replace with aNOT AGGREGATEfunction parameter name.
The final aggregation state is returned by themergefunction
(oraggregatefunction ifmergeis never invoked). If the input is empty afterNULLfiltering,
the final aggregation state isinitialState.
Example:
exportfunctionfinalize(finalState,initialSum)
Serialization functions for a JavaScript UDAF
If you want to work with non-serializable aggregation states, the
JavaScript UDAF must provide theserializeanddeserializefunctions:
serializefunction: Converts an aggregation state into a
BigQuery-serializable object. An object in JavaScript
is BigQuery-serializable if all fields
are a JavaScript primitive data type (for example,String,Number,null,undefined), another
BigQuery-serializable object, or a JavaScriptArray, where all elements are either primitives or
BigQuery-serializable objects.
deserializefunction: Converts a serialized state into an aggregation
state. An aggregated state can be passed into theserialize,aggregate,merge, andfinalizefunctions.
In addition, theOR REPLACEclause requires thebigquery.routines.updatepermission.
Examples
Calculate the positive sum of all rows
A JavaScript UDAF is similar to a JavaScript UDF, but defines an
aggregate function instead of a scalar function. In the following example,
a temporary JavaScript UDAF calculates the sum of all rows that have a
positive value. The JavaScript UDAF body is quoted within a raw string:
CREATETEMPAGGREGATEFUNCTIONSumPositive(xFLOAT64)RETURNSFLOAT64LANGUAGEjsASr'''export function initialState() {return {sum: 0}}export function aggregate(state, x) {if (x > 0) {state.sum += x;}}export function merge(state, partialState) {state.sum += partialState.sum;}export function finalize(state) {return state.sum;}''';-- Call the JavaScript UDAF.WITHnumbersAS(SELECT*FROMUNNEST([1.0,-1.0,3.0,-3.0,5.0,-5.0])ASx)SELECTSumPositive(x)ASsumFROMnumbers;/*-----*| sum |+-----+| 9.0 |*-----*/
Get the weighted average of all rows
A JavaScript UDAF can have aggregate and non-aggregate parameters.
In the following example, the JavaScript UDAF calculates the weighted average
forxafter starting with an initial sum (initialSum).xandweightare
aggregate parameters, andinitialSumis a non-aggregate parameter:
CREATEORREPLACEAGGREGATEFUNCTIONmy_project.my_dataset.WeightedAverage(xINT64,weightFLOAT64,initialSumFLOAT64NOTAGGREGATE)RETURNSINT64LANGUAGEjsAS'''export function initialState(initialSum) {return {count: 0, sum: initialSum}}export function aggregate(state, x, weight) {state.count += 1;state.sum += Number(x) * weight;}export function merge(state, partialState) {state.sum += partialState.sum;state.count += partialState.count;}export function finalize(state) {return state.sum / state.count;}''';SELECTmy_project.my_dataset.WeightedAverage(item,weight,2)ASweighted_averageFROM(SELECT1ASitem,2.45ASweightUNIONALLSELECT3ASitem,0.11ASweightUNIONALLSELECT5ASitem,7.02ASweight);/*------------------*| weighted_average |+------------------+| 13 |*------------------*/
CREATE TABLE FUNCTIONstatement
Creates a newtable function,
also called atable-valued function(TVF).
OR REPLACE: Replaces any table function with the same name if it exists.
Cannot appear withIF NOT EXISTS.
IF NOT EXISTS: If any table function exists with the same name, theCREATEstatement has no effect. Cannot appear withOR REPLACE.
project_name: The name of the project where you are creating the function.
Defaults to the project that runs this DDL statement.
dataset_name: The name of the dataset where you are creating the function.
function_name: The name of the function to create.
function_parameter: A parameter for the function, specified as a parameter
name and a data type. The value ofdata_typeis a scalar
BigQuerydata typeorANY TYPE.
RETURNS TABLE: The schema of the table that the function returns, specified
as a comma-separated list of column name and data type pairs. IfRETURNS
TABLEis absent, BigQuery infers the output schema from the
query statement in the function body. IfRETURNS TABLEis included, the
names in the returned table type must match column names from the SQL query.
sql_query: Specifies the SQL query to run. The SQL query must include names
for all columns.
table_function_options_list
Thetable_function_options_listlets you specify table function options. Table function
options have the same syntax and requirements as table options but with a
different list ofNAMEs andVALUEs:
NAME
VALUE
Details
description
STRING
The description of the table function.
Details
BigQuery coerces argument types when possible. For example, if
the parameter type isFLOAT64and you pass anINT64value, then
BigQuery coerces it to aFLOAT64.
If a parameter type isANY TYPE, the function accepts an input of any type for
this argument. The type that you pass to the function must be compatible with
the function definition. If you pass an argument with an incompatible type, the
query returns an error. If more than one parameter has typeANY TYPE,
BigQuery does not enforce any type relationship between them.
CREATE [OR REPLACE] PROCEDURE [IF NOT EXISTS]
[[project_name.]dataset_name.]procedure_name(procedure_argument[, ...] )
[EXTERNAL SECURITYexternal_security]
WITH CONNECTIONconnection_project_id.connection_region.connection_id[OPTIONS(procedure_option_list)]
LANGUAGElanguage[ASpyspark_code]procedure_argument: [procedure_argument_mode]argument_nameargument_type procedure_argument_mode: IN | OUT | INOUTexternal_security: INVOKER
Arguments
OR REPLACE: Replaces any procedure with the same name if it exists. Cannot
appear withIF NOT EXISTS.
IF NOT EXISTS: If any procedure exists with the same name, theCREATEstatement has no effect. Cannot appear withOR REPLACE.
project_name: The name of the project where you are creating the
procedure. Defaults to the project that runs this DDL query. If the project
name contains special characters such as colons, it should be quoted in backticks`(example:`google.com:my_project`).
dataset_name: The name of the dataset where you are creating the procedure.
Defaults to thedefaultDatasetin the request.
procedure_name: The name of the procedure to create.
external_security: The procedure to be executed with the privileges of
the user that calls it.
connection_project_id: the project that
contains theconnectionto run
Spark procedures—for example,myproject.
connection_region: the region that
contains the connection to run Spark
procedures—for example,us.
connection_id: the connection ID—for example,myconnection.
When youview the connection detailsin the Google Cloud console, the connection ID is the value in the last
section of the fully qualified connection ID that is shown inConnection ID—for exampleprojects/myproject/locations/connection_location/connections/myconnection.
language: The language in which the stored procedure for Apache Spark is
written. BigQuery supports stored procedures for Apache
Spark that are written in Python, Java, or Scala.
pyspark_code: The PySpark code for the stored procedure for Apache Spark if
you want to pass the body of the procedure inline. Cannot appear withmain_file_uriinprocedure_option_list.
procedure_argument_mode: Specifies whether an argument is an input, an
output, or both.
procedure_option_list
Theprocedure_option_listlets you specify procedure options. Procedure
options have the same syntax and requirements as table options but with a
different list ofNAMEs andVALUEs:
NAME
VALUE
Details
strict_mode
BOOL
It is useful for catching many common types of errors. The errors are not
exhaustive, and successful creation of a procedure withstrict_modedoesn't guarantee that the procedure will
successfully execute at runtime.
Ifstrict_modeisTRUE, the procedure body
undergoes additional checks for errors such as non-existent tables or
columns. TheCREATE PROCEDUREstatement fails if the body
fails any of these checks.
Ifstrict_modeisFALSE, the procedure body is
checked only for syntax. Procedures which invoke themselves recursively
should be created withstrict_mode=FALSEto avoid errors
caused by the procedure not yet existing while it is being validated.
Default value isTRUE.
Example:strict_mode=FALSE
description
STRING
A description of the procedure. Example:description="A procedure that runs a query."
engine
STRING
The engine type for processing stored procedures for Apache Spark. Must be specified for
stored procedures for Spark.
Valid value:engine="SPARK"
runtime_version
STRING
The runtime version of stored procedures for Spark.
If not specified, the system default runtime version is used.
Stored procedures for Spark support the same list of runtime versions
as Dataproc Serverless. However, we recommend to specify a runtime
version. For more information, seeDataproc Serverless
Spark runtime releases.
Example:runtime_version="1.1"
container_image
STRING
Custom container image for the runtime environment of the stored procedure for Spark.
If not specified, the system default container image that includes the
default Spark, Java, and Python packages associated with a runtime
version is used.
You can provide a custom container Docker image that includes your own
built Java or Python dependencies. As Spark is mounted into your custom
container at runtime, you must omit Spark in your custom container
image.
A key-value pair to include properties for stored procedures for Spark.
Stored procedures for Spark support most of theSpark propertiesand a list ofcustom
Dataproc Serverless properties. If you specify unsupported Spark properties such as YARN-related
Spark properties, BigQuery fails to create the
stored procedure. You can add Spark properties using the
following format:[("key1","value1"),("key2", "value2")]
The Cloud Storage URI of the main Python, Scala, or Java JAR file of the Spark
application. Applies only to stored procedures for Spark.
Alternatively, if you want to add the body of
the stored procedure that's written in Python in theCREATE PROCEDUREstatement,
add the code afterLANGUAGE PYTHON ASas shown in the
example inUse inline code.
Applies only to stored procedures for Spark written in Java and Scala.
Specify a fully-qualified
class name in a JAR set with thejar_urisoption. You can set only one value
formain_file_uriandmain_class.
Example:main_class=”com.example.wordcount”
py_file_uris
ARRAY<STRING>
Python files to be placed on thePYTHONPATHfor a PySpark
application. Applies only to stored procedures for Apache Spark written in Python.
Optional. Cloud Storage URIs of Python files to pass to the
PySpark framework. Supported file formats include the following:.py,.egg, and.zip.
Files to be placed in the working directory of each executor. Applies
only to stored procedures for Apache Spark.
Optional. Cloud Storage URIs of files to be placed in the working
directory of each executor. Example: file_uris=["gs://my-bucket/my-file1",
"gs://my-bucket/my-file2"]
archive_uris
ARRAY<STRING>
Archive files to be extracted into the working directory of each
executor. Applies only to stored procedures for Apache Spark.
Optional. Cloud Storage URIs of archives to be extracted into
the working directory of each executor. Supported file formats include
the following:.jar,.tar,.tar.gz,.tgz, and.zip.
INindicates that the argument is only an input to the procedure. You can
specify either a variable or a value expression forINarguments.
OUTindicates that the argument is an output of the procedure. AnOUTargument is initialized toNULLwhen the procedure starts. You
must specify a variable forOUTarguments.
INOUTindicates that the argument is both an input to and an output from
the procedure. You must specify a variable forINOUTarguments. AnINOUTargument can be referenced in the body of a procedure as a variable and assigned
new values.
If neitherIN,OUT, norINOUTis specified, the argument is treated as anINargument.
Variable scope
If a variable is declared outside a procedure, passed as an INOUT or OUT argument to a procedure, and the procedure assigns a new value to that variable, that new value is visible outside of the procedure.
Variables declared in a procedure are not visible outside of the procedure,
and vice versa.
AnOUTorINOUTargument can be assigned a value usingSET, in which case
the modified value is visible outside of the procedure. If the procedure exits
successfully, then the value of theOUTorINOUTargument is the final value
assigned to thatINOUTvariable.
Temporary tablesexist for the duration of the
script, so if a procedure creates a temporary table, the caller of the procedure
will be able to reference the temporary table as well.
Default project in procedure body
Procedure bodies can reference entities without specifying the project; the
default project is the project which owns the procedure, not necessarily the
project used to run theCREATE PROCEDUREstatement. Consider the sample query
below.
After creating the above procedure, you can run the queryCALL myProject.myDataset.QueryTable(). Regardless of the project you
choose to run thisCALLquery, the referenced tableanotherDataset.myTableis always resolved against projectmyProject.
The following example creates a SQL procedure that both takesxas an input
argument and returnsxas output; because no argument mode is present for the
argumentdelta, it is an input argument. The procedure consists of a block
containing a single statement, which assigns the sum of the two input arguments
tox.
The following example calls theAddDeltaprocedure from the example above,
passing it the variableaccumulatorboth times; because the changes toxwithinAddDeltaare visible outside ofAddDelta, these procedure calls
incrementaccumulatorby a total of 8.
The following example creates the procedureSelectFromTablesAndAppend, which
takestarget_dateas an input argument and returnsrows_addedas an output.
The procedure creates a temporary tableDataForTargetDatefrom a query; then,
it calculates the number of rows inDataForTargetDateand assigns the result
torows_added. Next, it inserts a new row intoTargetTable, passing the
value oftarget_dateas one of the column names. Finally, it drops the tableDataForTargetDateand returnsrows_added.
CREATEPROCEDUREmydataset.SelectFromTablesAndAppend(target_dateDATE,OUTrows_addedINT64)BEGINCREATETEMPTABLEDataForTargetDateASSELECTt1.id,t1.x,t2.yFROMdataset.partitioned_table1ASt1JOINdataset.partitioned_table2ASt2ONt1.id=t2.idWHEREt1.date=target_dateANDt2.date=target_date;SETrows_added=(SELECTCOUNT(*)FROMDataForTargetDate);SELECTid,x,y,target_date-- note that target_date is a parameterFROMDataForTargetDate;DROPTABLEDataForTargetDate;END;
The following example declares a variablerows_added, then passes it as an
argument to theSelectFromTablesAndAppendprocedure from the previous example,
along with the value ofCURRENT_DATE; then it returns a message stating how
many rows were added.
IF NOT EXISTS: If any row-level access policy exists with the same name, theCREATEstatement has no effect. Cannot appear withOR REPLACE.
row_access_policy_name: The name of the row-level access policy that you are
creating. The row-level access policy name must be unique for each table. The
row-level access policy name can contain the following:
Up to 256 characters.
Letters (upper or lowercase), numbers, and underscores. Must start with a
letter.
table_name: The name of the table that you want to create a row-level access
policy for. The table must already exist.
GRANT TO grantee_list: An optional clause that specifies the initial
members that the row-level access policy should be created with.
grantee_listis a list ofiam_memberusers or groups. Strings must be
validIAM principals, or
members, following the format of anIAM Policy Binding member,
and must be quoted. The following types are supported:
grantee_listtypes
user:{emailid}
An email address that represents a specific Google
account.
Example:user:alice@example.com
serviceAccount:{emailid}
An email address that represents a service
account.
The Google Workspace domain (primary) that represents all
the users of that domain.
Example:domain:example.com
allAuthenticatedUsers
A special identifier that represents all service
accounts and all users on the internet who have authenticated with a
Google Account. This identifier includes accounts that aren't connected to a
Google Workspace or Cloud Identity domain, such as personal Gmail accounts.
Users who aren't authenticated, such as anonymous visitors, aren't included.
allUsers
A special identifier that represents anyone who is on the
internet, including authenticated and unauthenticated users. Because
BigQuery requires authentication before a user can access the
service,allUsersincludes only authenticated users.
You can combine a series ofiam_membervalues, if they are comma-separated
and quoted separately. For example:"user:alice@example.com","group:admins@example.com","user:sales@example.com"
filter_expression: Defines the subset of table rows to show only to the
members of thegrantee_list. Thefilter_expressionis similar to theWHEREclause in aSELECTquery.
The following are valid filter expressions:
GoogleSQL scalar functions.
SESSION_USER(), to restrict access only to rows that belong to the user
running the query. If none of the row-level access policies are applicable
to the querying user, then the user has no access to the data in the table.
TRUE. Grants the principals in thegrantee_listfield access to all rows
of the table.
The filter expression cannot contain the following:
commitment_id: The ID of the commitment. The value must be unique to the
project and location. It must start and end with a lowercase letter or a
number and contain only lowercase letters, numbers and dashes.
The option list specifies options for the dataset. Specify the options in the following format:NAME=VALUE, ...
The following options are supported:
NAME
TYPE
Details
ignore_idle_slots
BOOLEAN
If the value istrue, then the reservation uses only the
slots that are provisioned to it. The default value isfalse.
For more information, seeIdle slots.
slot_capacity
INTEGER
The number of slots to allocate to the reservation. If this reservation was created with anedition, this is equivalent to the amount ofbaseline slots.
target_job_concurrency
INTEGER
A soft upper bound on the number of jobs that can run concurrently in this reservation.
assignment_id: The ID of the assignment. The value must be unique to the
project and location. It must start and end with a lowercase letter or a
number and contain only lowercase letters, numbers and dashes.
To remove a project from any reservations and use on-demand billing instead, setreservation_idtonone.
assignment_option_list
The option list specifies options for the dataset. Specify the options in the following format:NAME=VALUE, ...
The following options are supported:
NAME
TYPE
Details
assignee
String
The ID of the project, folder, or organization to assign to the
reservation.
job_type
String
The type of job to assign to this reservation. Supported values includeQUERY,PIPELINE,ML_EXTERNAL,CONTINUOUS, andBACKGROUND.
For more information, seeAssignments.
IF NOT EXISTS: If there is already a search index by that name on the table,
do nothing. If the table has a search index by a different name, then return
an error.
index_name: The name of the search index you're creating. Since the search
index is always created in the same project and dataset as the base table,
there is no need to specify these in the name.
ALL COLUMNS: If data types are not specified, creates a search index on
every column in the table which contains aSTRINGfield. If data types are
specified, create a search index on every column in the table which matches
any of the data types specified.
column_name: The name of a top-level column in the table which is one of
the following supported data types or contains a field with one of the
supported data types:
You can create only one search index per base table. You cannot create a search
index on a view or materialized view. To modify which columns are
indexed,DROPthe current index and create a new one.
BigQuery returns an error if anycolumn_nameis not aSTRINGor does not contain aSTRINGfield, or if you callCREATE SEARCH INDEXonALL COLUMNSof a table which contains noSTRINGfields.
Creating a search index fails on a table which has column ACLs or row filters;
however, these may all be added to the table after creation of the index.
index_option_list
The option list specifies options for the search index. Specify the options in
the following format:NAME=VALUE, ...
The following options are supported:
NAME
VALUE
Details
analyzer
STRING
Example:analyzer='LOG_ANALYZER'
Thetext analyzerto use to generate tokens for the search
index. The supported values are'LOG_ANALYZER','NO_OP_ANALYZER', and'PATTERN_ANALYZER'.
analyzer_options
JSON-formatted STRING
The text analyzer configurations to set when creating a search index. Supported whenanalyzeris equal to'LOG_ANALYZER'or'PATTERN_ANALYZER'. For examples of JSON-formatted strings with different text analyzers, seeWork with text analyzers.
An array of data types to set when creating a search index. Supported
data types areSTRING,INT64andTIMESTAMP. Ifdata_typesis not set,STRINGfields are indexed by default.
OR REPLACE: Replaces any vector index with the same name if it exists.
Can't appear withIF NOT EXISTS.
IF NOT EXISTS: If there is already a vector index by that name on the table,
do nothing. If the table has a vector index by a different name, then return
an error.
index_name: The name of the vector index you're creating. Since the index
is always created in the same project and dataset as the base table, there is
no need to specify these in the name.
column_name: The name of a column with a type ofARRAY<FLOAT64>. The
column can't have any child fields. All elements in the array must be
non-NULL, and all values in the column must have the same array dimensions.
stored_column_name: The name of
a top-level column in the table to store in
the vector index. The column type can't beRANGE.
Stored columns are not used if the table has a row-level access policy or the
column has a policy tag. To learn more, seeStore columns and pre-filter.
You can create only one vector index per table. You can't create a vector index
on a table that already has asearch indexwith the same index name.
To modify which column is indexed,DROPthe current
index and create a new one.
index_option_list
The option list specifies options for the vector index. Specify the options in
the following format:NAME=VALUE, ...
The following options are supported:
NAME
VALUE
Details
index_type
STRING
Required. The algorithm to use to build the vector index. The supported
values areIVFandTREE_AH.
IVF: SpecifyingIVFbuilds the vector index as an inverted file index (IVF). An IVF uses a
k-means algorithm to cluster the vector data, and then partitions the vector
data based on those clusters. When you use theVECTOR_SEARCHfunctionto search the vector data, it can use these partitions
to reduce the amount of data it needs to read in order to determine a
result.
TREE_AH(Preview):
Uses Google'sScaNN algorithm.TREE_AHis a tree-quantization based index, leveraging k-means
clustering for partitioning and asymmetric hashing (product quantization)
for fast approximate distance computation. For more information, seeTreeAH index.
distance_type
STRING
Specifies the default distance type to use when performing a vector
search using this index. The supported values areEUCLIDEAN,COSINE,
andDOT_PRODUCT.EUCLIDEANis the default.
The index creation itself always usesEUCLIDEANdistance
for training but the distance used in theVECTOR_SEARCHfunction can be different.
If you specify a value for thedistance_typeargument of theVECTOR_SEARCHfunction,
that value is used instead of the vector index'sdistance_typevalue.
ivf_options
JSON-formatted STRING
The options to use with theIVFalgorithm. Defaults to'{}'to denote that all underlying options use their
corresponding default values.
The only supported option isnum_lists. Specify anINT64less than or equal
to 5,000 that determines how many lists the IVF algorithm creates.
For example,ivf_options = '{"num_lists":1000}'.
The IVF algorithm divides the whole data space into a number of lists
equal to thenum_listsvalue, with data points that are closer
to each other being more likely to be put on the same list.
If thenum_listsvalue is small, you have fewer lists with
more data points, while a larger value creates more lists with fewer data
points.
You can usenum_listsin combination with thefractions_list_to_searchargument of theVECTOR_SEARCHfunctionto create an efficient vector search. When you have data that
is distributed in many small groups in the embedding space, specify a
highnum_listsvalue to create an index with more lists and
specify a lowerfractions_list_to_searchvalue to scan fewer
of those lists in vector search. Use a lowernum_listsvalue
and a higherfractions_list_to_searchvalue when your data is
distributed in fewer, larger groups. Using a highnum_listsvalue might make the vector index take longer to build.
If you don't specify a value fornum_lists,
BigQuery calculates an appropriate value.
The statement fails ifivf_optionsis specified andindex_typeis notIVF.
The options to use with theTREE_AHalgorithm. Defaults to'{}'to denote that all underlying options use their
corresponding default values.
Two options are supported:leaf_node_embedding_nodeandnormalization_type.
leaf_node_embedding_count: the approximate number of vectors
in each leaf node. The minimum value is500. This is the
inverse ofnum_listsinivf_options. Defaults to1000.
normalization_type: the type of normalization performed on
each base table and query vector prior to any processing.
The supported values areNONEandL2.L2is also referred to as theEuclidean norm. Defaults toNONE.
For exampletree_ah_options = '{"leaf_node_embedding_count": 1000,
"normalization_type": "L2"}'
The statement fails iftree_ah_optionsis specified andindex_typeis notTREE_AH.
ALTER SCHEMA [IF EXISTS]
[project_name.]dataset_nameSET DEFAULT COLLATEcollate_specification
Arguments
IF EXISTS: If no dataset exists with that name, the statement has no effect.
DEFAULT COLLATE collate_specification: When a new table is created in the
dataset, the table inherits a
defaultcollation specificationunless a collation specification is explicitly specified for acolumn.
The updated collation specification only applies to tables created afterwards.
project_name: The name of the project that contains the dataset. Defaults
to the project that runs this DDL statement.
+----------------------+
| mydataset.mytable_a |
| number INT64 |
| word STRING |
+----------------------+
At a later time, you decide to add a collation specification to your
dataset. For example:
ALTERSCHEMAmydatasetSETDEFAULTCOLLATE'und:ci'
If you create a new table for your dataset, it inheritsCOLLATE 'und:ci'for
allSTRINGcolumns. For example, collation is added tocharacterswhen you create themytable_btable in themydatasetdataset:
However, although you have updated the collation specification for the dataset,
your existing table,mytable_a, continues to use the previous
collation specification. For example:
+---------------------+
| mydataset.mytable_a |
| number INT64 |
| word STRING |
+---------------------+
ALTER SCHEMA SET OPTIONSstatement
Sets options on a dataset.
The statement runs in the location of the dataset if the dataset exists, unless
you specify the location in the query settings. For more information, seeSpecifying your location.
The option list specifies options for the dataset. Specify the options in the
following format:NAME=VALUE, ...
The following options are supported:
NAME
VALUE
Details
default_kms_key_name
STRING
Specifies the default Cloud KMS key for encrypting table data
in this dataset. You can override this value when you create a table.
default_partition_expiration_days
FLOAT64
Specifies the default expiration time, in days, for table partitions in
this dataset. You can override this value when you create a table.
default_rounding_mode
STRING
Example:default_rounding_mode = "ROUND_HALF_EVEN"
This specifies thedefaultRoundingModethat is used for new tables created in this dataset. It does not impact
existing tables.
The following values are supported:
"ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are
rounded away from zero. For example, 2.25 is rounded to 2.3, and
-2.25 is rounded to -2.3.
"ROUND_HALF_EVEN": Halfway cases are rounded towards
the nearest even digit. For example, 2.25 is rounded to 2.2 and
-2.25 is rounded to -2.2.
default_table_expiration_days
FLOAT64
Specifies the default expiration time, in days, for tables in this
dataset. You can override this value when you create a table.
description
STRING
The description of the dataset.
failover_reservation
STRING
Associates the dataset to a reservation in the case of a failover scenario.
friendly_name
STRING
A descriptive name for the dataset.
is_case_insensitive
BOOL
TRUEif the dataset and its table names are
case-insensitive, otherwiseFALSE. By default, this
isFALSE, which means the dataset and its table names are
case-sensitive.
Datasets:mydatasetandMyDatasetcan
coexist in the same project, unless one of them has case-sensitivity
turned off.
Tables:mytableandMyTablecan coexist in
the same dataset if case-sensitivity for the dataset is turned on.
is_primary
BOOLEAN
Declares if the dataset is the primary replica.
labels
<ARRAY<STRUCT<STRING, STRING>>>
An array of labels for the dataset, expressed as key-value pairs.
max_time_travel_hours
SMALLINT
Specifies the duration in hours of thetime travel windowfor the dataset. Themax_time_travel_hoursvalue must
be an integer expressed in multiples of 24 (48, 72, 96, 120, 144, 168)
between 48 (2 days) and 168 (7 days). 168 hours is the default
if this option isn't specified.
Alters thestorage billing modelfor the dataset. Set thestorage_billing_modelvalue toPHYSICALto use physical bytes when calculating storage
charges, or toLOGICALto use logical bytes.LOGICALis the default.
Thestorage_billing_modeloption is only available for
datasets that have been updated after December 1, 2022. For datasets that
were last updated before that date, the storage billing model isLOGICAL.
When you change a dataset's billing model, it takes 24 hours for the
change to take effect.
Once you change a dataset's storage billing model, you must wait 14 days
before you can change the storage billing model again.
The option list specifies options for the dataset. Specify the options in the
following format:NAME=VALUE, ...
The following options are supported:
NAME
VALUE
Details
location
STRING
The location in which to create the replica.
replica_kms_key
STRING
The
Cloud Key Management Service key set in the destination region.replica_kms_keyis used as a substitute encryption key in the destination region for any
keys used in the source region. Any table in the source region that's
encrypted with a Cloud KMS key is encrypted with thereplica_kms_key. This value must be a Cloud KMS key
created in the replica dataset's region, not the source dataset's
region. For more information about setting up a Cloud KMS key, seeGrant encryption and decryption permission.
This property is equivalent to theexpirationTimetable resource property.
partition_expiration_days
FLOAT64
Example:partition_expiration_days=7
Sets the partition expiration in days. For more information, seeSet the partition expiration. By default, partitions don't expire.
This property is equivalent to thetimePartitioning.expirationMstable resource property but uses days instead of milliseconds. One day
is equivalent to 86400000 milliseconds, or 24 hours.
This property can only be set if the table is partitioned.
require_partition_filter
BOOL
Example:require_partition_filter=true
Specifies whether queries on this table must include a a predicate
filter that filters on the partitioning column. For more information,
seeSet partition filter requirements. The default value isfalse.
This property is equivalent to thefriendlyNametable resource property.
description
STRING
Example:description="a table that expires in 2025"
This property is equivalent to thedescriptiontable resource property.
labels
ARRAY<STRUCT<STRING, STRING>>
Example:labels=[("org_unit", "development")]
This property is equivalent to thelabelstable resource property.
default_rounding_mode
STRING
Example:default_rounding_mode = "ROUND_HALF_EVEN"
This specifies the defaultrounding modethat's used for values written to any newNUMERICorBIGNUMERICtype columns orSTRUCTfields
in the table. It does not impact existing fields in the table.
The following values are supported:
"ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are
rounded away from zero. For example, 2.5 is rounded to 3.0, and
-2.5 is rounded to -3.
"ROUND_HALF_EVEN": Halfway cases are rounded towards
the nearest even digit. For example, 2.5 is rounded to 2.0 and
-2.5 is rounded to -2.0.
This property is equivalent to thedefaultRoundingModetable resource property.
Set this property toTRUEin order to capturechange historyon the
table, which you can then view by using theCHANGESfunction. Enabling this table option has an impact on costs; for
more information seePricing and costs.
The default isFALSE.
max_staleness
INTERVAL
Example:max_staleness=INTERVAL "4:0:0" HOUR TO SECOND
The maximum interval behind the current time where it's
acceptable to read stale data. For example, withchange data capture,
when this option is set, the table copy operation is denied if data is
more stale than themax_stalenessvalue.
VALUEis a constant expression containing only literals, query parameters,
and scalar functions.
The constant expressioncannotcontain:
A reference to a table
Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
User-defined functions, aggregate functions, or analytic functions
The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
Setting the value replaces the existing value of that option for the table, if
there was one. Setting the value toNULLclears the table's value for that
option.
Setting the expiration timestamp and description on a table
The following example sets the expiration timestamp on a table to seven days
from the execution time of theALTER TABLEstatement, and sets the description
as well:
ALTERTABLEmydataset.mytableSETOPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL7DAY),description="Table that expires seven days from now")
Setting the require partition filter attribute on a partitioned table
Queries that reference this table must use a filter on the partitioning column,
or else BigQuery returns an error. Setting this option totruecan help prevent mistakes in querying more data than intended.
Clearing the expiration timestamp on a table
The following example clears the expiration timestamp on a table so that it will
not expire:
IF NOT EXISTS: If the column name already exists, the statement has no effect.
column: The column to add. This includes the name of the column and schema
to add. The column name and schema use the same syntax used in theCREATE TABLEstatement.
Details
You cannot use this statement to create:
Partitioned columns.
Clustered columns.
Nested columns inside existingRECORDfields.
You cannot add aREQUIREDcolumn to an existing table schema. However, you
can create a nestedREQUIREDcolumn as part of a newRECORDfield.
Without theIF NOT EXISTSclause, if the table already contains a column with
that name, the statement returns an error. If theIF NOT EXISTSclause is
included and the column name already exists, no error is returned, and no
action is taken.
The value of the new column for existing rows is set to one of the following:
NULLif the new column was added withNULLABLEmode. This is the default
mode.
An emptyARRAYif the new column was added withREPEATEDmode.
If any of the columns namedA,C, orDalready exist, the statement fails.
If columnBalready exists, the statement succeeds because of theIF NOT
EXISTSclause.
Adding aRECORDcolumn
The following example adds a column namedAof typeSTRUCTthat contains the
following nested columns:
The following example adds thefkandfk2foreign key constraints to thefk_tabletable in a single statement. This example depends on an existing
table,pk_table.
Add a primary key to thepk_tabletable:
ALTERTABLEpk_tableADDPRIMARYKEY(x,y)NOTENFORCED;
Create a table namedfk_tablefor multiple foreign key constraints.
new_table_name: The new name of the table. The new name cannot be an
existing table name.
Details
If you want to rename a table that has data streaming into it, you must stop
the streaming, commit any pending streams, and wait
for BigQuery to indicate that streaming
is not in use.
While a table can usually be renamed 5 hours after the last streaming
operation, it might take longer.
Existing table ACLs and row access policies are preserved, but table ACL and
row access policy updates made during the table rename are not preserved.
You can't concurrently rename a table and run a DML statement on that table.
If the table to be modified has active row-level access policies, the statement
returns an error.
Without theIF EXISTSclause, if the table does not contain a column with that
name, then the statement returns an error. If theIF EXISTSclause is included
and the column name does not exist, then no error is returned, and no action is
taken.
This statement only renames the column from the table. Any objects that refer to
the column, such as views or materialized views, must be updated or recreated
separately.
You cannot use this statement to rename the following:
Subfields, such as nested columns in aSTRUCT
Partitioning columns
Clustering columns
Fields that are part of primary key constraints or foreign key constraints
Columns in a table that has row access policies
After one or more columns in a table are renamed, you cannot do the following:
Query the table with legacy SQL.
Query the table as a wildcard table.
Renaming the
columns with their original names removes these restrictions.
MultipleRENAME COLUMNstatements in oneALTER TABLEstatement are
supported. The sequence of renames are interpreted and validated in order.
Eachcolumn_namemust refer to a column name that exists after all preceding
renames have been applied.RENAME COLUMNcannot be used with otherALTER
TABLEactions in one statement.
table_name: The name of the table to alter. SeeTable path syntax. The table must already exist and have a
schema.
IF EXISTS: If the specified column does not exist, the statement has no
effect.
column_name: The name of the column to drop.
Details
Dropping a column is a metadata-only operation and does not
immediately free up the storage that is associated with the dropped column. The
storage is freed up the next time the table is written to, typically when you
perform a DML operation on it or when a background optimzation job happens.
SinceDROP COLUMNis not a data cleanup operation, there is no guaranteed
time window within which the data will be deleted.
There are two options for immediately reclaiming storage:
Overwrite a table with aSELECT * EXCEPTquery.
Export the data to Cloud Storage, delete the unwanted columns, and then
load the data into a new table with the correct schema.
You can restore a dropped column in a table usingtime travel.
You cannot use this statement to drop the following:
Partitioned columns
Clustered columns
Fields that are part of primary key constraints or foreign key constraints
Nested columns inside existingRECORDfields
Columns in a table that has row access policies
After one or more columns in a table are dropped you cannot do the following:
Without theIF EXISTSclause, if the table does not contain a column with that
name, then the statement returns an error. If theIF EXISTSclause is included and
the column name does not exist, then no error is returned, and no action is taken.
This statement only removes the column from the table. Any objects that refer to
the column, such as views or materialized views, must be updated or recreated
separately.
If the column namedAdoes not exist, then the statement fails. If columnBdoes not exist, then the statement still succeeds because of theIF EXISTSclause.
After one or more columns in a table are dropped, you cannot do the following:
Query the table with legacy SQL.
Accelerate queries on the table with BigQuery BI Engine.
ALTER TABLE
table_name
SET DEFAULT COLLATE collate_specification
Arguments
table_name: The name of the table to alter. SeeTable path syntax. The table must already exist and have a
schema.
SET DEFAULT COLLATE collate_specification: When a new column is created in the
schema, and if the column does not have an explicitcollation specification,
thecolumninherits this
collation specification forSTRINGtypes. The updated
collation specification only applies to columns added afterwards.
If you want to add a collation specification on a new column in
an existing table, you can do this when youadd the column. If you add a
collation specification directly on a column, the collation specification
for the column has precedence over a table's default collation specification.
You cannot update an existing collation specification on a column.
When you createmytable, allSTRINGcolumns inheritCOLLATE 'und:ci'.
The resulting table has this structure:
+--------------------------------+
| mydataset.mytable |
| number INT64 |
| word STRING COLLATE 'und:ci' |
+--------------------------------+
At a later time, you decide to change the collation specification for your
table.
ALTERTABLEmydataset.mytableSETDEFAULTCOLLATE''
Although you have updated the collation specification, your existing column,word, continues to use the previous collation specification.
+--------------------------------+
| mydataset.mytable |
| number INT64 |
| word STRING COLLATE 'und:ci' |
+--------------------------------+
However, if you create a new column for your table, the new column includes the
new collation specification. In the following example a column callednameis added. Because the new collation specification is empty, the default
collation specification is used.
ALTERTABLEmydataset.mytableADDCOLUMNnameSTRING
+--------------------------------+
| mydataset.mytable |
| number INT64 |
| word STRING COLLATE 'und:ci' |
| name STRING COLLATE |
+--------------------------------+
ALTER COLUMN SET OPTIONSstatement
Sets options, such as the column description, on a column in a table or view
in BigQuery.
This specifies therounding modethat's used for values written to aNUMERICorBIGNUMERICtype column orSTRUCTfield.
The following values are supported:
"ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are
rounded away from zero. For example, 2.25 is rounded to 2.3, and
-2.25 is rounded to -2.3.
"ROUND_HALF_EVEN": Halfway cases are rounded towards
the nearest even digit. For example, 2.25 is rounded to 2.2 and
-2.25 is rounded to -2.2.
This property is equivalent to theroundingModetable resource property.
VALUEis a constant expression containing only literals, query parameters,
and scalar functions.
The constant expressioncannotcontain:
A reference to a table
Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
User-defined functions, aggregate functions, or analytic functions
The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
Setting theVALUEreplaces the existing value of that option for the column, if
there was one. Setting theVALUEtoNULLclears the column's value for that
option.
view_column_set_options_list
Theview_column_option_listlets you specify optional top-level column
options. Column options for a view have the same syntax and requirements as
for a table, but with a different list ofNAMEandVALUEfields:
Changes the data type of a column in a table in BigQuery
to a less restrictive data type. For example, aNUMERICdata type can be changed
to aBIGNUMERICtype but not the reverse.
(ALTER COLUMN) IF EXISTS: If the specified column does not exist, the
statement has no effect.
column_name: The name of the top level column you're altering. Modifying
subfields is not supported.
column_schema: The schema that you're converting the column to. This schema
uses the same syntax used in theCREATE TABLEstatement.
Details
The following data type conversions are supported:
:
INT64toNUMERIC,BIGNUMERIC,FLOAT64
NUMERICtoBIGNUMERIC,FLOAT64
You can also convert data types from more restrictive to less restrictiveparameterized data types.
For example, you can increase the maximum length of a string type or increase the
precision or scale of a numeric type.
The following are examples of valid parameterized data type conversions:
Without theIF EXISTSclause, if the table does not contain a column with that
name, the statement returns an error. If theIF EXISTSclause is included and
the column name does not exist, no error is returned, and no action is taken.
Setting the default value for a column only affects future inserts to the table.
It does not change any existing table data.
The type of the default value must match the type of the column.
ASTRUCTtype can only have a default value set for the entireSTRUCTfield. You
cannot set the default value for a subset of the fields. You cannot set the
default value of an array toNULLor set an element within
an array toNULL.
If the default value is a function, it is evaluated at the time that the value
is written to the table, not the time the table is created.
You can't set default values on columns that are primary keys.
(ALTER COLUMN) IF EXISTS: If the specified column does not exist, the
statement has no effect.
column_name: The name of the top-level column to remove the default value
from. If you drop the default value from a column that does not have
a default set, an error is returned.
This property is equivalent to theexpirationTimetable resource property.
friendly_name
STRING
Example:friendly_name="my_view"
This property is equivalent to thefriendlyNametable resource property.
description
STRING
Example:description="a view that expires in 2025"
This property is equivalent to thedescriptiontable resource property.
labels
ARRAY<STRUCT<STRING, STRING>>
Example:labels=[("org_unit", "development")]
This property is equivalent to thelabelstable resource property.
privacy_policy
JSON-formatted STRING
The policies to enforce when anyone queries the view.
To learn more about the policies available for a view, see
theprivacy_policyview option.
VALUEis a constant expression containing only literals, query parameters,
and scalar functions.
The constant expressioncannotcontain:
A reference to a table
Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
User-defined functions, aggregate functions, or analytic functions
The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
Setting the value replaces the existing value of that option for the view, if
there was one. Setting the value toNULLclears the view's value for that
option.
Setting the expiration timestamp and description on a view
The following example sets the expiration timestamp on a view to seven days
from the execution time of theALTER VIEWstatement, and sets the description
as well:
ALTERVIEWmydataset.myviewSETOPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL7DAY),description="View that expires seven days from now")
The option list allows you to set materialized view options such as a whether
refresh is enabled. the refresh interval, alabeland
an expiration time. You can include multiple options using a comma-separated
list.
Specify a materialized view option list in the following format:
NAME=VALUE, ...
NAMEandVALUEmust be one of the following combinations:
This property is equivalent to theexpirationTimetable resource property.expiration_timestampis optional
and not used by default.
max_staleness
INTERVAL
Example:max_staleness=INTERVAL "4:0:0" HOUR TO SECOND
Themax_stalenesspropertyprovides consistently high
performance with controlled costs when processing large, frequently
changing datasets.max_stalenessis disabled by default.
allow_non_incremental_definition
BOOLEAN
Example:allow_non_incremental_definition=true
Theallow_non_incremental_definitionpropertysupports an
expanded range of SQL queries to create materialized views.allow_non_incremental_definition=trueis disabled by
default.
This property is equivalent to thefriendlyNametable resource property.
description
STRING
Example:description="a materialized view that expires in 2025"
This property is equivalent to thedescriptiontable resource property.
labels
ARRAY<STRUCT<STRING, STRING>>
Example:labels=[("org_unit", "development")]
This property is equivalent to thelabelstable resource property.
Setting the value replaces the existing value of that option for the
materialized view, if there was one. Setting the value toNULLclears the
materialized view's value for that option.
The option list specifies options for the organization. Specify the options in the
following format:NAME=VALUE, ...
The following options are supported:
NAME
VALUE
Details
default_kms_key_name
STRING
The default Cloud Key Management Service key for encrypting table data, including temporary or anonymous tables. For more information, seeCustomer-managed Cloud KMS keys.
The default time zone to use in time zone-dependent SQL functions,
when a time zone is not specified as an argument. For more information,
seetime zones.
Example:`region-us.default_time_zone` = "America/Los_Angeles".
Sets the default time zone toAmerica/Los_Angelesin theusregion.
default_query_job_timeout_ms
INT64
The default time after which a query job times out. The timeout
period must be between 10 minutes and 6 hours.
Example:`region-us.default_query_job_timeout_ms` = 1800000.
Sets the default query job timeout time to 30 minutes for all jobs in theusregion.
default_interactive_query_queue_timeout_ms
INT64
The default amount of time that an interactive query is queued.
If unset, the default is 6 hours. The minimum value is 1 millisecond.
The maximum value is 48 hours. To
disable interactive query queueing, set the value to -1.
Example:`region-us.default_interactive_query_queue_timeout_ms` = 1800000.
Sets the default queue timeout for interactive queries in theusregion to 30 minutes.
default_batch_query_queue_timeout_ms
INT64
The default amount of time that a batch query is queued.
If unset, the default is 24 hours. The minimum value is 1 millisecond.
The maximum value is 48 hours. To
disable batch query queueing, set the value to -1.
Example:`region-us.default_batch_query_queue_timeout_ms` = 1800000.
Sets the default queue timeout for batch queries in theusregion to 30 minutes.
default_query_optimizer_options
STRING
The history-based query optimizations. This option
can be one of the following:
'adaptive=on': Use history-based query optimizations.
'adaptive=off': Don't use
history-based query optimizations.
NULL(default): Use the default history-based query
optimizations setting, which is equivalent to'adaptive=off'.
Setting the value replaces the existing value of that option for the
organization, if there is one. Setting the value toNULLclears the
organization's value for that option.
Required permissions
TheALTER ORGANIZATION SET OPTIONSstatement requires the followingIAM permissions:
Permission
Resource
bigquery.config.update
The organization to alter.
Examples
The following example sets the default time zone to America/Chicago and the default query job timeout to one hour for an organization in the US region:
The following example sets the default time zone, the default query job timeout,
the default interactive and batch queue timeouts, and the default
Cloud KMS key toNULL, clearing the organization level default
settings:
The option list specifies options for the project. Specify the options in the
following format:NAME=VALUE, ...
The following options are supported:
NAME
VALUE
Details
default_kms_key_name
STRING
The default Cloud Key Management Service key for encrypting table data, including temporary or anonymous tables. For more information, seeCustomer-managed Cloud KMS keys.
The default time zone to use in time zone-dependent SQL functions, when a time zone is not specified as an argument. For more information, seetime zones.
Example:`region-us.default_time_zone` = "America/Los_Angeles". Sets the default time zone toAmerica/Los_Angelesin theusregion.
default_query_job_timeout_ms
INT64
The default time after which a query job times out. The timeout period must be between 10 minutes and 6 hours.
Example:`region-us.default_query_job_timeout_ms` = 1800000. Sets the default query job timeout time to 30 minutes for jobs run in theusregion.
default_interactive_query_queue_timeout_ms
INT64
The default amount of time that an interactive query is queued.
If unset, the default is 6 hours. The minimum value is 1 millisecond.
The maximum value is 48 hours. To
disable interactive query queueing, set the value to -1.
Example:`region-us.default_interactive_query_queue_timeout_ms` = 1800000.
Sets the default queue timeout for interactive queries in theusregion to 30 minutes.
default_batch_query_queue_timeout_ms
INT64
The default amount of time that a batch query is queued.
If unset, the default is 24 hours. The minimum value is 1 millisecond.
The maximum value is 48 hours. To
disable batch query queueing, set the value to -1.
Example:`region-us.default_batch_query_queue_timeout_ms` = 1800000. Sets
the default queue timeout for batch queries in theusregion to
30 minutes.
default_query_optimizer_options
STRING
The history-based query optimizations. This option
can be one of the following:
'adaptive=on': Use history-based query optimizations.
'adaptive=off': Don't use
history-based query optimizations.
NULL(default): Use the default history-based query
optimizations setting, which is equivalent to'adaptive=off'.
Setting the value replaces the existing value of that option for the project, if there was one. Setting the value toNULLclears the
project's value for that option.
The following example sets the default time zone, the default query job timeout, the default Cloud KMS key toNULL, and the default interactive and batch queue
timeouts, clearing the project level default settings:
The option list specifies a set of options for BigQuery BI Engine capacity.
Specify a column option list in the following format:
NAME=VALUE, ...
The following options are supported:
NAME
VALUE
Details
size_gb
INT64
Specifies the size of the reservation in gigabytes.
preferred_tables
<ARRAY<STRING>>
List of tables that acceleration should be applied to. Format:project.dataset.tableordataset.table. If project is omitted, query project
is used.
SettingVALUEreplaces the existing value of that option for the BI Engine
capacity, if there is one. SettingVALUEtoNULLclears the value
for that option.
commitment_id: The ID of the commitment. The value must be unique to the
project and location. It must start and end with a lowercase letter or a
number and contain only lowercase letters, numbers and dashes.
The option list specifies options for the dataset. Specify the options in the following format:NAME=VALUE, ...
The following options are supported:
NAME
TYPE
Details
plan
String
The commitment plan to
purchase. Supported values include:ANNUAL,THREE_YEAR, andTRIAL. For more
information, seeCommitment
plans.
renewal_plan
String
The plan this capacity commitment is converted to aftercommitment_end_timepasses. Once the plan is changed, the committed period is extended according to the commitment plan. Applicable for ANNUAL, THREE_YEAR, and TRIAL commitments.
reservation_id: The ID of the reservation. The value must be unique to the
project and location. It must start and end with a lowercase letter or a
number and contain only lowercase letters, numbers and dashes.
The option list specifies options for the dataset. Specify the options in the following format:NAME=VALUE, ...
The following options are supported:
NAME
TYPE
Details
ignore_idle_slots
BOOLEAN
If the value istrue, then the reservation uses only the
slots that are provisioned to it. The default value isfalse.
For more information, seeIdle slots.
slot_capacity
INTEGER
The number of slots to allocate to the reservation. If this reservation was created with anedition, this is equivalent to the amount ofbaseline slots.
target_job_concurrency
INTEGER
A soft upper bound on the number of jobs that can run concurrently in this reservation.
autoscale_max_slots
INTEGER
The maximum number of slots that can be added to the reservation by autoscaling.
secondary_location
STRING
The secondary location to use in the case of disaster recovery.
is_primary
BOOLEAN
If the value istrue, the reservation is set to be the primary reservation.
The administration project that
maintains ownership of the
commitments.
Examples
Autoscaling example
The following example changes an autoscaling reservation to 300 baseline slots
and 400 autoscaling slots for a max reservation size of 700. These slots are
located in theregion-usregion and managed by a projectadmin_project:
EXTERNAL: Specifies if that dataset is a federated dataset. TheDROP EXTERNALstatement only removes the external definition from
BigQuery. The data stored in the external location is not
affected.
IF EXISTS: If no dataset exists with that name, the statement has no effect.
project_name: The name of the project that contains the dataset. Defaults
to the project that runs this DDL statement.
dataset_name: The name of the dataset to delete.
CASCADE: Deletes the dataset and all resources within the dataset, such as
tables, views, and functions. You must have permission to delete the
resources, or else the statement returns an error. For a list of
BigQuery permissions, seePredefined roles and permissions.
RESTRICT: Deletes the dataset only if it's empty. Otherwise, returns an
error. If you don't specify eitherCASCADEorRESTRICT, then the default
behavior isRESTRICT.
Details
The statement runs in the location of the dataset if it exists, unless you
specify the location in the query settings. For more information, seeSpecifying your location.
The following example undeletes the dataset namedmydataset. If the dataset
already exists or has passed the time travel window, then the statement returns
an error.
The following example deletes a table namedmytablein themydataset:
DROPTABLEmydataset.mytable
If the table name does not exist in the dataset, the following error is
returned:
Error: Not found: Table myproject:mydataset.mytable
Deleting a table only if the table exists
The following example deletes a table namedmytableinmydatasetonly if
the table exists. If the table name does not exist in the dataset, no error is
returned, and no action is taken.
The following example drops the external table namedexternal_tablefrom the
datasetmydataset. It returns an error if the external table does not exist.
DROPEXTERNALTABLEmydataset.external_table
The following example drops the external table namedexternal_tablefrom the
datasetmydataset. If the external table does not exist, no error is returned.
DROPEXTERNALTABLEIFEXISTSmydataset.external_table
DROP VIEWstatement
Deletes a view.
Syntax
DROPVIEW[IFEXISTS]view_name
Arguments
IF EXISTS: If no view exists with that name, the statement has no effect.
The following example deletes a view namedmyviewinmydataset:
DROPVIEWmydataset.myview
If the view name does not exist in the dataset, the following error is returned:
Error: Not found: Table myproject:mydataset.myview
Deleting a view only if the view exists
The following example deletes a view namedmyviewinmydatasetonly if
the view exists. If the view name does not exist in the dataset, no error is
returned, and no action is taken.
DROPVIEWIFEXISTSmydataset.myview
DROP MATERIALIZED VIEWstatement
Deletes a materialized view.
Syntax
DROPMATERIALIZEDVIEW[IFEXISTS]mv_name
Arguments
IF EXISTS: If no materialized view exists with that name, the statement has
no effect.
mv_name: The name of the materialized view to delete. SeeTable path syntax.
The following example deletes a materialized view namedmy_mvinmydataset:
DROPMATERIALIZEDVIEWmydataset.my_mv
If the materialized view name does not exist in the dataset, the following error
is returned:
Error: Not found: Table myproject:mydataset.my_mv
If you are deleting a materialized view in another project, you must specify the
project, dataset, and materialized view in the following format:`project_id.dataset.materialized_view`(including the backticks ifproject_idcontains special characters); for example,`myproject.mydataset.my_mv`.
Deleting a materialized view only if it exists
The following example deletes a materialized view namedmy_mvinmydatasetonly if the materialized view exists. If the materialized view name does not
exist in the dataset, no error is returned, and no action is taken.
DROPMATERIALIZEDVIEWIFEXISTSmydataset.my_mv
If you are deleting a materialized view in another project, you must specify the
project, dataset, and materialized view in the following format:`project_id.dataset.materialized_view`,(including the backticks ifproject_idcontains special characters); for example,`myproject.mydataset.my_mv`.
DROP FUNCTIONstatement
Deletes a persistent user-defined function (UDF) or
user-defined aggregate function (UDAF).
IF EXISTS: If no function exists with that name, the statement has no
effect.
project_name: The name of the project containing the function to delete.
Defaults to the project that runs this DDL query. If the project name
contains special characters such as colons, it should be quoted in backticks`(example:`google.com:my_project`).
dataset_name: The name of the dataset containing the function to delete.
Defaults to thedefaultDatasetin the request.
function_name: The name of the function you're deleting.
IF EXISTS: If no procedure exists with that name, the statement has no
effect.
project_name: The name of the project containing the procedure to delete.
Defaults to the project that runs this DDL query. If the project name
contains special characters such as colons, it should be quoted in backticks`(example:`google.com:my_project`).
dataset_name: The name of the dataset containing the procedure to delete.
Defaults to thedefaultDatasetin the request.
procedure_name: The name of the procedure you're deleting.
The following example deletes a vector indexmy_indexfrommy_table:
DROPVECTORINDEXmy_indexONdataset.my_table;
Table path syntax
Use the following syntax when specifying the path of atable resource,
including standard tables, views, materialized views, external tables, and
table snapshots.
project_name: The name of the project that contains the table resource.
Defaults to the project that runs the DDL query. If the project name contains
special characters such as colons, quote the name in backticks`(example:`google.com:my_project`).
dataset_name: The name of the dataset that contains the table resource.
Defaults to thedefaultDatasetin the request.
table_name: The name of the table resource.
When you create a table in BigQuery, the table name must
be unique per dataset. The table name can:
Contain characters with a total of up to 1,024 UTF-8 bytes.
Contain Unicode characters in category L (letter), M (mark), N (number),
Pc (connector, including underscore), Pd (dash), Zs (space). For more
information, seeGeneral Category.
The following are all examples of valid table names:table 01,ग्राहक,00_お客様,étudiant-01.
Some table names and table name prefixes are reserved. If
you receive an error saying that your table name or prefix is
reserved, then select a different name and try again.
If you include multiple dot operators (.) in a sequence, the duplicate
operators are implicitly stripped.
For example, this:project_name....dataset_name..table_name
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-11-06 UTC."],[],[]]