Manage datasets
This document describes how to copy datasets, recreate datasets in another location, secure datasets, delete datasets, restore tables from deleted datasets, and undelete datasets in BigQuery.
As a BigQuery administrator, you can organize and control access to tables and views that analysts use. For more information about datasets, see Introduction to datasets .
You cannot change the name of an existing dataset or relocate a dataset after it's created. As a workaround for changing the dataset name, you can copy a dataset and change the destination dataset's name. To relocate a dataset, you can follow one of the following methods:
Required roles
This section describes the roles and permissions that you need to manage datasets. If your source or destination dataset is in the same project as the one you are using to copy, then you don't need extra permissions or roles on that dataset.
To get the permissions that you need to manage datasets, ask your administrator to grant you the following IAM roles:
- Copy a dataset ( Beta
):
- BigQuery Admin
(
roles/bigquery.admin
) on the destination project - BigQuery Data Viewer
(
roles/bigquery.dataViewer
) on the source dataset - BigQuery Data Editor
(
roles/bigquery.dataEditor
) on the destination dataset
- BigQuery Admin
(
- Delete a dataset: BigQuery Data Owner
(
roles/bigquery.dataOwner
) on the project - Restore a deleted dataset: BigQuery User
(
roles/bigquery.user
) on the project
For more information about granting roles, see Manage access .
These predefined roles contain the permissions required to manage datasets. To see the exact permissions that are required, expand the Required permissionssection:
Required permissions
The following permissions are required to manage datasets:
- Copy a dataset:
-
bigquery.transfers.update
on the destination project -
bigquery.jobs.create
on the destination project -
bigquery.datasets.get
on the source and destination dataset -
bigquery.tables.list
on the source and destination dataset -
bigquery.datasets.update
on the destination dataset -
bigquery.tables.create
on the destination dataset
-
- Delete a dataset:
-
bigquery.datasets.delete
on the project -
bigquery.tables.delete
on the project
-
- Restore a deleted dataset:
-
bigquery.datasets.create
on the project -
bigquery.datasets.get
on the dataset
-
You might also be able to get these permissions with custom roles or other predefined roles .
For more information about IAM roles and permissions in BigQuery, see Introduction to IAM .
Copy datasets
You can copy a dataset, including partitioned data within a region or across regions, without extracting, moving, or reloading data into BigQuery. BigQuery uses the BigQuery Data Transfer Service in the backend to copy datasets. For location considerations when you transfer data, see Data location and transfers .
For each dataset copy configuration, you can have one transfer run active at a time. Additional transfer runs are queued. If you are using the Google Cloud console, you can schedule recurring copies, and configure an email or Pub/Sub notifications with the BigQuery Data Transfer Service.
Limitations
The following limitations apply when you copy datasets:
-
You can't copy the following resources from a source dataset:
- Views.
- Routines, including UDFs.
- External tables.
- Change data capture (CDC) tables if the copy job is across regions. Copying CDC tables within the same region is supported.
-
Tables encrypted with customer-managed encrypted keys (CMEK) if the copy job is across regions. Copying tables with default encryption across regions is supported.
You can copy all encrypted tables within the same region, including tables encrypted with CMEK.
-
You can't use the following resources as destination datasets for copy jobs:
- Write-optimized storage.
-
Datasets encrypted with CMEK if the copy job is across regions.
However, tables encrypted with CMEK are allowed as destination tables when copying within the same region.
-
The minimum frequency between copy jobs is 12 hours.
-
Appending data to a partitioned table in the destination dataset isn't supported.
-
If a table exists in the source dataset and the destination dataset, and the source table has not changed since the last successful copy, it's skipped. The source table is skipped even if the Overwrite destination tablescheckbox is selected.
-
When truncating tables in the destination dataset, the dataset copy job doesn't detect any changes made to resources in the destination dataset before it begins the copy job. The dataset copy job overwrites all of the data in the destination dataset, including both the tables and schema.
-
The destination table might not reflect changes made to the source tables after a copy job starts.
-
Copying a dataset is not supported in BigQuery Omni regions .
-
To copy a dataset to a project in another VPC Service Controls service perimeter , you need to set the following egress rules:
-
In the destination project's VPC Service Controls service perimeter configuration, the IAM principal must have the following methods:
-
bigquery.datasets.get
-
bigquery.tables.list
-
bigquery.tables.get
, -
bigquery.tables.getData
-
-
In the source project's VPC Service Controls service perimeter configuration, the IAM principal being used must have the method set to
All Methods
.
-
Copy a dataset
Select one of the following options:
Console
-
Enable the BigQuery Data Transfer Service for your destination dataset.
-
Ensure that you have the required roles .
If you intend to set up transfer run notifications for Pub/Sub ( Option 2later in these steps), then you must have the
pubsub.topics.setIamPolicy
permission.If you only set up email notifications, then Pub/Sub permissions are not required. For more information, see the BigQuery Data Transfer Service run notifications .
-
Create a BigQuery dataset in the same region or a different region from your source dataset.
Option 1: Use the BigQuery copy function
To create a one-time transfer, use the BigQuery copy function:
-
Go to the BigQuerypage.
-
In the Explorerpanel, expand your project and select a dataset.
-
In the Dataset infosection, click Copy, and then do the following:
-
In the Datasetfield, either create a new dataset or select an existing dataset ID from the list.
Dataset names within a project must be unique. The project and dataset can be in different regions, but not all regions are supported for cross-region dataset copying.
In the Locationfield, the location of the source dataset is displayed.
-
Optional: To overwrite both the data and schema of the destination tables with the source tables, select the Overwrite destination tablescheckbox. Both the source and destination tables must have the same partitioning schema.
-
To copy the dataset, click Copy.
-
Option 2: Use the BigQuery Data Transfer Service
To schedule recurring copies and configure email or Pub/Sub notifications, use the BigQuery Data Transfer Service in the Google Cloud console of the destination project:
-
Go to the Data transferspage.
-
Click Create a transfer.
-
In the Sourcelist, select Dataset Copy.
-
In the Display namefield, enter a name for your transfer run.
-
In the Schedule optionssection, do the following:
-
For Repeat frequency, choose an option for how often to run the transfer:
If you select Custom, enter a custom frequency—for example,
every day 00:00
. For more information, see Formatting the schedule . -
For Start date and run time, enter the date and time to start the transfer. If you choose Start now, this option is disabled.
-
-
In the Destination settingssection, select a destination dataset to store your transfer data. You can also click CREATE NEW DATASETto create a new dataset before you select it for this transfer.
-
In the Data source detailssection, enter the following information:
- For Source dataset, enter the dataset ID that you want to copy.
- For Source project, enter the project ID of your source dataset.
-
To overwrite both the data and schema of the destination tables with the source tables, select the Overwrite destination tablescheckbox. Both the source and destination tables must have the same partitioning schema.
-
In the Service Accountmenu, select a service account from the service accounts associated with your Google Cloud project. You can associate a service account with your transfer instead of using your user credentials. For more information about using service accounts with data transfers, see Use service accounts .
- If you signed in with a federated identity , then a service account is required to create a transfer. If you signed in with a Google Account , then a service account for the transfer is optional.
- The service account must have the required roles .
-
Optional: In the Notification optionssection, do the following:
- To enable email notifications, click the toggle. When you enable this option, the owner of the transfer configuration receives an email notification when a transfer run fails.
- To enable Pub/Sub notifications, click the toggle, and then either select a topic name from the list or click Create a topic. This option configures Pub/Sub run notifications for your transfer.
-
Click Save.
bq
-
Enable the BigQuery Data Transfer Service for your destination dataset.
-
Ensure that you have the required roles .
-
To create a BigQuery dataset , use the
bq mk
command with the dataset creation flag--dataset
and thelocation
flag:bq mk \ --dataset \ --location= LOCATION \ PROJECT : DATASET
Replace the following:
-
LOCATION
: the location where you want to copy the dataset -
PROJECT
: the project ID of your target dataset -
DATASET
: the name of the target dataset
-
-
To copy a dataset, use the
bq mk
command with the transfer creation flag--transfer_config
and the--data_source
flag. You must set the--data_source
flag tocross_region_copy
. For a complete list of valid values for the--data_source
flag, see the transfer-config flags in the bq command-line tool reference.bq mk \ --transfer_config \ --project_id= PROJECT \ --data_source=cross_region_copy \ --target_dataset= DATASET \ --display_name= NAME \ --service_account_name= SERCICE_ACCOUNT \ --params=' PARAMETERS '
Replace the following:
-
NAME
: the display name for the copy job or the transfer configuration -
SERVICE_ACCOUNT
: the service account name used to authenticate your transfer. The service account should be owned by the sameproject_id
used to create the transfer and it should have all of the required permissions . -
PARAMETERS
: the parameters for the transfer configuration in the JSON formatParameters for a dataset copy configuration include the following:
-
source_dataset_id
: the ID of the source dataset that you want to copy -
source_project_id
: the ID of the project that your source dataset is in -
overwrite_destination_table
: an optional flag that lets you truncate the tables of a previous copy and refresh all the data
Both the source and destination tables must have the same partitioning schema.
-
The following examples show the formatting of the parameters, based on your system's environment:
-
Linux:use single quotes to enclose the JSON string–for example:
'{"source_dataset_id":"mydataset","source_project_id":"mysourceproject","overwrite_destination_table":"true"}'
-
Windows command line:use double quotes to enclose the JSON string, and escape double quotes in the string with a backslash–for example:
"{\"source_dataset_id\":\"mydataset\",\"source_project_id\":\"mysourceproject\",\"overwrite_destination_table\":\"true\"}"
-
PowerShell:use single quotes to enclose the JSON string, and escape double quotes in the string with a backslash–for example:
'{\"source_dataset_id\":\"mydataset\",\"source_project_id\":\"mysourceproject\",\"overwrite_destination_table\":\"true\"}'
For example, the following command creates a dataset copy configuration that's named
My Transfer
with a target dataset that's namedmydataset
and a project with the ID ofmyproject
.bq mk \ --transfer_config \ --project_id=myproject \ --data_source=cross_region_copy \ --target_dataset=mydataset \ --display_name='My Transfer' \ --params='{ "source_dataset_id":"123_demo_eu", "source_project_id":"mysourceproject", "overwrite_destination_table":"true" }'
-
API
-
Enable the BigQuery Data Transfer Service for your destination dataset.
-
Ensure that you have the required roles .
-
To create a BigQuery dataset , call the
datasets.insert
method with a defined dataset resource . -
To copy a dataset, use the
projects.locations.transferConfigs.create
method and supply an instance of theTransferConfig
resource.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Java API reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Python API reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .
Install the Python client for the BigQuery Data Transfer API withpip install google-cloud-bigquery-datatransfer
. Then create a transfer configuration to copy the dataset.