The BigQuery source plugin lets you connect and load data from BigQuery tables. Data from a BigQuery table is exported into a temporary location in Cloud Storage, and then gets read into the pipeline from there.
Before you begin
Cloud Data Fusion typically has two service accounts:
- Design-time service account: Cloud Data Fusion API Service Agent
- Execution-time service account: Compute Engine Service Account
Before using the BigQuery batch source plugin, grant the following roles or permissions to each service account.
Cloud Data Fusion API Service Agent
This service account already has all the required permissions and you don't need to add additional permissions. For reference, it has the following permissions:
-
bigquery.datasets.get -
bigquery.tables.create -
bigquery.tables.get -
bigquery.tables.updateData -
bigquery.tables.update -
bigquery.tables.export
If you're using a namespace service account in addition to the default design-time service account, add the permissions from the preceding list to it.
Compute Engine Service Account
In your Google Cloud project, grant the following IAM roles or permissions to the Compute Engine Service Account:
- BigQuery Job User
(
roles/bigquery.jobUser). This predefined role contains the requiredbigquery.jobs.createpermission. -
BigQuery Data Editor (
roles/bigquery.dataEditor). This predefined role contains the following required permissions:-
bigquery.datasets.get -
bigquery.tables.create -
bigquery.tables.get -
bigquery.tables.updateData -
bigquery.tables.update -
bigquery.tables.export
-
These roles and permissions can also be assigned on the BigQuery dataset or table, depending on your use case.
-
Storage Legacy Bucket Writer (
roles/storage.legacyBucketWriter). This predefined role contains the following required permissions:-
storage.buckets.get -
storage.objects.get -
storage.objects.list
-
This role and these permissions can also be assigned on the Cloud Storage bucket, depending on your use case.
Configure the plugin
- Go to the Cloud Data Fusion web interface and click Studio.
- Check that Data Pipeline - Batchis selected (not Realtime).
- In the Sourcemenu, click BigQuery. The BigQuery node appears in your pipeline.
- To configure the source, go to the BigQuery node and click Properties.
-
Enter the following properties. For a complete list, see Properties .
- Enter a Labelfor the BigQuery node—for
example,
BigQuery tables. -
Enter the connection details. You can set up a new, one-time connection, or an existing, reusable connection.
New connection
To add a one-time connection to BigQuery, follow these steps:
- In the Project IDfield, leave the value as auto-detect.
- If the BigQuery dataset is in a different project, in the Dataset Project IDfield, enter the ID.
-
In the Service account typefield, choose one of the following and enter the content in the next field:
- File path
- JSON
Reusable connection
To reuse an existing connection, follow these steps:
- Turn on Use connection.
- Click Browse connections.
-
Click the connection name—for example, BigQuery Default.
-
Optional: If a connection doesn't exist and you want to create a new reusable connection, click Add connectionand refer to the steps in the New connection tab on this page.
-
In the Reference namefield, enter a name to use for lineage.
-
Optional: If your dataset is already available in your instance, click Browseand select the data to read.
-
In the Datasetfield, enter the name of the dataset that contains the table.
-
In the Tablefield, enter the name of the table.
-
To test connectivity, click Get schema.
-
Optional: In the Partition start datefield, enter the inclusive start date string—for example,
2021-01-11. -
Optional: In the Partition end datefield, enter the inclusive end date string—for example,
2024-01-11. -
Optional: In the Filterfield, enter a BigQuery
WHEREclause . -
Optional: In the Temporary bucket namefield, enter a name for the Cloud Storage bucket.
-
Optional: In the Encryption Key Namefield, enter the Cloud Key Management Service (Cloud KMS) encryption key name. For more information, see Get the resource name for the key .
-
Optional: Turn on Enable querying views. If you enable them, do the following:
- In the Temporary table creation projectfield, enter the project name where the temporary table is created.
- In the Temporary table creation datasetfield, enter the dataset name where the temporary table is created.
-
Optional: Click Validateand address any errors found.
-
Click Close. Properties are saved and you can continue to build your data pipeline in the Cloud Data Fusion web interface.
- Enter a Labelfor the BigQuery node—for
example,
Properties
Default is
auto-detect
.- File path: the path where the service account is located.
- JSON: JSON content of the service account.
Default is
auto-detect
.yyyy-mm-dd
. If
no value is given, all partitions up to the partition end date will be
read.yyyy-mm-dd
. If
no value is given, all partitions up from the partition start date are
read.Roster
table where the SchoolID
column has the value SchoolID > 52
:SELECT * FROM Roster WHERE SchoolID > 52;
. This is the same as
the WHERE
clause
in
BigQuery.Default is No.
Data type mappings
The following table is a list of BigQuery data types with corresponding CDAP types.
| BigQuery data type | CDAP schema data type |
|---|---|
BOOL
|
boolean
|
BYTES
|
bytes
|
DATE
|
date
|
DATETIME
|
datetime, string
|
FLOAT64
|
double
|
GEO
|
unsupported
|
INT64
|
long
|
NUMERIC
|
decimal
|
BIGNUMERIC
|
decimal
|
REPEATED
|
array
|
STRING
|
string
, datetime
(ISO 8601 format) |
STRUCT
|
record
|
TIME
|
time
(microseconds) |
TIMESTAMP
|
timestamp
(microseconds) |
JSON
|
unsupported
|
What's next
- Learn more about plugins in Cloud Data Fusion .

