This page explains how to use Spanner Data Boost when you run federated queries from BigQuery to a Spanner database. With Data Boost, federated queries run with minimal impact to existing workloads on the provisioned Spanner instance. The Data Boost queries from BigQuery to a Spanner database can join BigQuery data with Spanner data.
You can run federated queries from BigQuery to Spanner using Data Boost with one of the following methods:
- Run a federated Data Boost query
, which uses the
EXTERNAL_QUERYfederated query function. - Use Data Boost with external datasets .
Spanner federation lets BigQuery query data residing in Spanner in real time, without copying or moving data. To learn more about Spanner federated queries, see Spanner federated queries . To learn about Data Boost, see Data Boost overview .
Before you begin
Before you can run federated queries with Data Boost, you need to complete the following tasks:
Create a Spanner instance and database
If you don't have a Spanner instance and database, follow the steps in Create and query a database using the Google Cloud console to create them.
Enable the BigQuery connection API
The BigQuery connection API lets you manage BigQuery connections to external data sources such as a Spanner database.
-
Enable the BigQuery connection API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles .
For more information, see BigQuery connection API in the BigQuery documentation.
Grant IAM permissions for Data Boost to principals
A principal must be granted the following permissions to run federated queries with Data Boost:
-
spanner.instances.get- lets you get the configuration of an instance. -
spanner.databases.useDataBoost- lets you use the Spanner Data Boost compute resources to process partitioned queries .
For more information about Spanner permissions, see Identity and Access Management (IAM) permissions .
To grant these required permissions, we recommend that you use the Cloud
Spanner Database Reader With DataBoost
( roles/spanner.databaseReaderWithDataBoost
) IAM role. You can
add that role to any principal that needs to run federated queries
with Data Boost. To learn more about predefined roles in
Spanner, see Predefined roles
. To learn
how to create a custom IAM role, see Create a custom
role
.
Run a federated Data Boost query
To run a Data Boost query from BigQuery to an external source, you need a BigQuery connection to the external source and the ID of the connection. When you run a federated Spanner query with Data Boost, the external source is a Spanner database. After you create your connection ID, it's used by BigQuery to run a Data Boost query of a Spanner database.
Use one of the following options to create a BigQuery connection ID, and then use the connection ID to run a Data Boost query from BigQuery:
-
Start in Spanner - Create the BigQuery external connection ID in the Spanner console. After your connection ID is created in the Spanner console, you're redirected to the BigQuery console to run a federated Data Boost query to a Spanner database.
-
Start in BigQuery - Create the Data Boost external connection ID in the BigQuery console or using the
bqcommand-line tool. After you create the connection ID, you stay in the BigQuery console to run a federated Data Boost query to a Spanner database.
Start in Spanner to run a Data Boost query
To run a federated Data Boost query starting in the Spanner Studio, do the following:
-
Go to the Spanner Instancespage in the Google Cloud console.
The console shows a list of your Spanner instances.
-
Select a Spanner instance, and then select a database.
-
On the Database overviewpage, in the navigation menu, click Spanner Studio.
-
Click View in BiqQuery.
-
In the View in BigQuerydialog, enter a connection ID.
The connection ID is used to create a new BigQuery external connection to your Spanner database. You reference your external connection using the following pattern:
PROJECT-ID . LOCATION . CONNECTION-IDAn error occurs if the ID already exists.
-
Fill in the rest of the dialog and do the following:
- Select Read data in parallel.
- Select Use Spanner Data Boost.
-
Click View in BigQuery.
BigQuery Studio opens with the following query:
SELECT * FROM EXTERNAL_QUERY ( " PROJECT-ID . LOCATION . CONNECTION-ID " , "SELECT * FROM INFORMATION_SCHEMA.TABLES;" );You can replace this with your federated query. For example, you might make a query that's similar to the following example. This example makes a federated query from a table named
ordersin a Spanner database and joins the results with a BigQuery table namedmydataset.customers.SELECT c . customer_id , c . name , rq . first_order_date FROM mydataset . customers AS c LEFT OUTER JOIN EXTERNAL_QUERY ( 'my-project.us.example-db' , '''SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id''' ) AS rq ON rq . customer_id = c . customer_id GROUP BY c . customer_id , c . name , rq . first_order_date ;
Start in BigQuery to run a Data Boost query
To create an external data connection from BigQuery to a Spanner database and use that connection to run a federated Data Boost query from BigQuery, select one of the following options:
Console
-
Go to Create Spanner connections in the BigQuery documentation and follow the instructions on the Consoletab.
-
In the External data sourcepane, do the following:
- Select Read data in parallel.
- Select Use Spanner Data Boost.
bq
-
Go to Create Spanner connections in the BigQuery documentation and follow instructions on the bq* tab.
-
Set the following connection properties to
true:-
useParallelism -
useDataBoost
-
The following example uses the bq
mk
command to create a
new connection named my_connection
with the two required properties for
Data Boost:
bq
mk
--connection
--connection_type =
'CLOUD_SPANNER'
--location =
'us'
\
--properties =
'{"database":"projects/my-project/instances/my-instance/databases/my-database", "useParallelism":true, "useDataBoost": true}'
my_connection
Use Data Boost with external datasets
To run a Data Boost query from BigQuery to Spanner as an external source, you can create an external dataset (also known as a federated dataset) in BigQuery that's linked to an existing GoogleSQL or PostgreSQL database in Spanner.
Use a CLOUD_RESOURCE
connection
By default, Spanner external datasets use end-user credentials (EUC), which requires the users have direct access to their Spanner databases. Users can query these datasets if they have access granted in Spanner.
Optionally, Spanner external datasets can use a CLOUD_RESOURCE
connection to interact with your Spanner database,
so that you can provide a user access to Spanner data through BigQuery, without giving them direct access to the Spanner database.
Because the service account from CLOUD_RESOURCE
connection handles retrieving data from the Spanner, you only have to grant users access to the Spanner external dataset.
This access delegation decouples access to the Spanner tables from external datasets and the direct access to the underlying Spanner tables.
A Cloud resource connection associated with a service account is used to connect to the Spanner.
Users can query these Spanner tables from external datasets even if they don't have access granted in Spanner.
Before you create Spanner external datasets with a CLOUD_RESOURCE
connection, do the following:
Create a connection
You can create or use an existing CLOUD_RESOURCE
connection
to connect to Spanner. Make sure to create the connection in the
same location
that you plan to create your
Spanner external dataset.
Console
-
Go to the BigQuerypage.
-
In the left pane, click Explorer:

If you don't see the left pane, click Expand left paneto open the pane.
-
In the Explorerpane, expand your project name, and then click Connections.
-
On the Connectionspage, click Create connection.
-
For Connection type, choose Vertex AI remote models, remote functions, BigLake and Spanner (Cloud Resource).
-
In the Connection IDfield, enter a name for your connection.
-
For Location type, select a location for your connection. The connection should be colocated with your other resources such as datasets.
-
Click Create connection.
-
Click Go to connection.
-
In the Connection infopane, copy the service account ID for use in a later step.
bq
-
In a command-line environment, create a connection:
bq mk --connection --location = REGION --project_id = PROJECT_ID \ --connection_type = CLOUD_RESOURCE CONNECTION_ID
The
--project_idparameter overrides the default project.Replace the following:
-
REGION: your connection region -
PROJECT_ID: your Google Cloud project ID -
CONNECTION_ID: an ID for your connection
When you create a connection resource, BigQuery creates a unique system service account and associates it with the connection.
Troubleshooting: If you get the following connection error, update the Google Cloud SDK :
Flags parsing error: flag --connection_type=CLOUD_RESOURCE: value should be one of...
-
-
Retrieve and copy the service account ID for use in a later step:
bq show --connection PROJECT_ID . REGION . CONNECTION_ID
The output is similar to the following:
name properties 1234. REGION . CONNECTION_ID {"serviceAccountId": "connection-1234-9u56h9@gcp-sa-bigquery-condel.iam.gserviceaccount.com"}
Python
To learn how to install and use the client library for Spanner, see Spanner client libraries .
To authenticate to Spanner, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Node.js
To learn how to install and use the client library for Spanner, see Spanner client libraries .
To authenticate to Spanner, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Terraform
Use the google_bigquery_connection
resource.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .
The following example creates a Cloud resource connection named my_cloud_resource_connection
in the US
region:
To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.
Prepare Cloud Shell
- Launch Cloud Shell .
-
Set the default Google Cloud project where you want to apply your Terraform configurations.
You only need to run this command once per project, and you can run it in any directory.
export GOOGLE_CLOUD_PROJECT= PROJECT_IDEnvironment variables are overridden if you set explicit values in the Terraform configuration file.
Prepare the directory
Each Terraform configuration file must have its own directory (also called a root module ).
- In Cloud Shell
, create a directory and a new
file within that directory. The filename must have the
.tfextension—for examplemain.tf. In this tutorial, the file is referred to asmain.tf.mkdir DIRECTORY && cd DIRECTORY && touch main.tf
-
If you are following a tutorial, you can copy the sample code in each section or step.
Copy the sample code into the newly created
main.tf.Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.
- Review and modify the sample parameters to apply to your environment.
- Save your changes.
- Initialize Terraform. You only need to do this once per directory.
terraform init
Optionally, to use the latest Google provider version, include the
-upgradeoption:terraform init -upgrade
Apply the changes
- Review the configuration and verify that the resources that Terraform is going to create or
update match your expectations:
terraform plan
Make corrections to the configuration as necessary.
- Apply the Terraform configuration by running the following command and entering
yesat the prompt:terraform apply
Wait until Terraform displays the "Apply complete!" message.
- Open your Google Cloud project to view the results. In the Google Cloud console, navigate to your resources in the UI to make sure that Terraform has created or updated them.
After you create the connection, open it, and in the Connection infopane, copy the service account ID. You need this ID when you configure permissions for the connection. When you create a connection resource, BigQuery creates a unique system service account and associates it with the connection.
Set up access
You must give the service account that is associated with the new connection read access to your
Spanner instance or database. It is recommended to use the
Cloud Spanner Database Reader with Data Boost ( roles/spanner.databaseReaderWithDataBoost
) predefined IAM role.
Follow these steps to grant access to database-level roles for the service account that you copied earlier from the connection:
-
Go to the Spanner Instancespage.
-
Click the name of the instance that contains your database to go to the Instance detailspage.
-
In the Overviewtab, select the checkbox for your database.The Info panelappears.
-
Click Add principal.
-
In the Add principalspanel, in New principals, enter the service account ID that you copied earlier.
-
In the Select a rolefield, select Cloud Spanner Database Reader with DataBoost role.
-
Click Save.
Create an external dataset
To create an external dataset, do the following:
Console
-
Open the BigQuery page in the Google Cloud console.
-
In the left pane, click Explorer:

If you don't see the left pane, click Expand left paneto open the pane.
-
In the Explorerpane, select the project where you want to create the dataset.
-
Click View actions, and then click Create dataset.
-
On the Create datasetpage, do the following:
- For Dataset ID, enter a unique dataset name.
- For Location type, choose a location for the dataset, such as
us-central1or multiregionus. After you create a dataset, the location can't be changed. -
For External Dataset, do the following:
- Check the box next to Link to an external dataset.
- For External dataset type, select
Spanner. - For External source, enter the full identifier of your Spanner database in the following format:
projects/ PROJECT_ID /instances/ INSTANCE /databases/ DATABASE. For example:projects/my_project/instances/my_instance/databases/my_database. - Optionally, for Database roleenter the name of a Spanner database role. For more information read about Database roles used for creating Spanner Connections
- Optionally, check the box next to Use a Cloud Resource connectionto create the external dataset with a connection.
-
Leave the other default settings as they are.
-
Click Create dataset.
SQL
Use the CREATE EXTERNAL SCHEMA
data definition language (DDL) statement
.
-
In the Google Cloud console, go to the BigQuerypage.
-
In the query editor, enter the following statement:
CREATE EXTERNAL SCHEMA DATASET_NAME OPTIONS ( external_source = ' SPANNER_EXTERNAL_SOURCE ' , location = ' LOCATION ' ); /* Alternatively, create with a connection: */ CREATE EXTERNAL SCHEMA DATASET_NAME WITH CONNECTION
PROJECT_ID . LOCATION . CONNECTION_NAMEOPTIONS ( external_source = ' SPANNER_EXTERNAL_SOURCE ' , location = ' LOCATION ' );Replace the following:
-
DATASET_NAME: the name of your new dataset in BigQuery. -
SPANNER_EXTERNAL_SOURCE: the full, qualified Spanner database name, with a prefix identifying the source, in the following format:google-cloudspanner://[ DATABASE_ROLE @]/projects/ PROJECT_ID /instances/ INSTANCE /databases/ DATABASE. For example:google-cloudspanner://admin@/projects/my_project/instances/my_instance/databases/my_databaseorgoogle-cloudspanner:/projects/my_project/instances/my_instance/databases/my_database. -
LOCATION: the location of your new dataset in BigQuery, for example,us-central1. After you create a dataset, you can't change its location. - (Optional)
CONNECTION_NAME: the name of your Cloud resource connection.
-
-
Click Run.
For more information about how to run queries, see Run an interactive query .
bq
In a command-line environment, create an external dataset by using the bq mk
command
:
bq --location = LOCATION mk --dataset \ --external_source SPANNER_EXTERNAL_SOURCE \ DATASET_NAME
Alternatively, create with a connection:
bq --location = LOCATION mk --dataset \ --external_source SPANNER_EXTERNAL_SOURCE \ --connection_id PROJECT_ID . LOCATION . CONNECTION_NAME \ DATASET_NAME
Replace the following:
-
LOCATION: the location of your new dataset in BigQuery—for example,us-central1. After you create a dataset, you can't change its location. You can set a default location value by using the.bigqueryrcfile . -
SPANNER_EXTERNAL_SOURCE: the full, qualified Spanner database name, with a prefix identifying the source, in the following format:google-cloudspanner://[ DATABASE_ROLE @]/projects/ PROJECT_ID /instances/ INSTANCE /databases/ DATABASE. For example:google-cloudspanner://admin@/projects/my_project/instances/my_instance/databases/my_databaseorgoogle-cloudspanner:/projects/my_project/instances/my_instance/databases/my_database. -
DATASET_NAME: the name of your new dataset in BigQuery. To create a dataset in a project other than your default project, add the project ID to the dataset name in the following format:PROJECT_ID:DATASET_NAME. - (Optional)
CONNECTION_NAME: the name of your Cloud resource connection.
Terraform
Use the google_bigquery_dataset
resource
.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .
The following example creates a Spanner external dataset:
To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.
Prepare Cloud Shell
- Launch Cloud Shell .
-
Set the default Google Cloud project where you want to apply your Terraform configurations.
You only need to run this command once per project, and you can run it in any directory.
export GOOGLE_CLOUD_PROJECT= PROJECT_IDEnvironment variables are overridden if you set explicit values in the Terraform configuration file.
Prepare the directory
Each Terraform configuration file must have its own directory (also called a root module ).
- In Cloud Shell
, create a directory and a new
file within that directory. The filename must have the
.tfextension—for examplemain.tf. In this tutorial, the file is referred to asmain.tf.mkdir DIRECTORY && cd DIRECTORY && touch main.tf
-
If you are following a tutorial, you can copy the sample code in each section or step.
Copy the sample code into the newly created
main.tf.Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.
- Review and modify the sample parameters to apply to your environment.
- Save your changes.
- Initialize Terraform. You only need to do this once per directory.
terraform init
Optionally, to use the latest Google provider version, include the
-upgradeoption:terraform init -upgrade
Apply the changes
- Review the configuration and verify that the resources that Terraform is going to create or
update match your expectations:
terraform plan
Make corrections to the configuration as necessary.
- Apply the Terraform configuration by running the following command and entering
yesat the prompt:terraform apply
Wait until Terraform displays the "Apply complete!" message.
- Open your Google Cloud project to view the results. In the Google Cloud console, navigate to your resources in the UI to make sure that Terraform has created or updated them.
API
Call the datasets.insert
method
with a defined dataset resource
and externalDatasetReference
field
for your Spanner database.
Note that names of the tables in the external datasets are case insensitive.
When you create the external datasets with a CLOUD_RESOURCE
connection,
you need to have the bigquery.connections.delegate
permission (available from the BigQuery Connection Admin role) on the connection
that is used by the external datasets.
Create a non-incremental materialized view based on tables from an external dataset
Before you proceed, you must create the underlying Spanner external dataset using a CLOUD_RESOURCE
connection
.
You can create non-incremental materialized views that reference Spanner external dataset tables
by using the allow_non_incremental_definition
option.
The following example uses a base Spanner external
dataset table:
/* You must create the spanner_external_dataset with a CLOUD_RESOURCE connection. */ CREATE MATERIALIZED VIEW sample_dataset . sample_spanner_mv OPTIONS ( enable_refresh = true , refresh_interval_minutes = 60 , max_staleness = INTERVAL "24" HOUR , allow_non_incremental_definition = true ) AS SELECT COUNT ( * ) cnt FROM spanner_external_dataset . spanner_table ;
Only BigQuery non-incremental materialized views can have Spanner external dataset tables as base tables
.
If a non-incremental materialized view's last refresh occurred
outside the max_staleness
interval, then the query reads the base
Spanner external dataset tables. Learn more about
BigQuery non-incremental materialized views
.
What's next
- Learn about Data Boost in Data Boost overview
- Use Data Boost in your applications
- Monitor Data Boost usage
- Monitor and manage Data Boost quota usage
- Learn more about Spanner external dataset .
- Learn more about creating materialized views over Spanner external datasets .

