Stay organized with collectionsSave and categorize content based on your preferences.
This page provides guidance about configuring the BigQuery batch source
plugin in Cloud Data Fusion.
The BigQuery source plugin lets you connect and load data from
BigQuery tables. Data from a BigQuery table is
exported into a temporary location in Cloud Storage, and then gets read into
the pipeline from there.
Before you begin
Cloud Data Fusion typically has two service accounts:
Before using the BigQuery batch source plugin, grant the
following roles or permissions to each service account.
Cloud Data Fusion API Service Agent
This service account already has all the required permissions and you don't need
to add additional permissions. For reference, it has the following permissions:
bigquery.datasets.get
bigquery.tables.create
bigquery.tables.get
bigquery.tables.updateData
bigquery.tables.update
bigquery.tables.export
If you're using anamespace service
accountin addition to the default design-time service account, add the permissions from
the preceding list to it.
Compute Engine Service Account
In your Google Cloud project, grant the following IAM roles or
permissions to the Compute Engine Service Account:
BigQuery Job User(roles/bigquery.jobUser). This
predefined role contains the requiredbigquery.jobs.createpermission.
BigQuery Data
Editor(roles/bigquery.dataEditor). This
predefined role contains the following required permissions:
bigquery.datasets.get
bigquery.tables.create
bigquery.tables.get
bigquery.tables.updateData
bigquery.tables.update
bigquery.tables.export
These roles and permissions can also be assigned on the BigQuery
dataset or table, depending on your use case.
Storage Legacy Bucket Writer(roles/storage.legacyBucketWriter).
This predefined role contains the following required permissions:
storage.buckets.get
storage.objects.get
storage.objects.list
This role and these permissions can also be assigned on the
Cloud Storage bucket, depending on your use case.
In theSourcemenu, clickBigQuery. The BigQuery node
appears in your pipeline.
To configure the source, go to the BigQuery node and clickProperties.
Enter the following properties. For a complete list, seeProperties.
Enter aLabelfor the BigQuery node—for
example,BigQuery tables.
Enter the connection details. You can set up a new, one-time connection,
or an existing, reusable connection.
New connection
To add a one-time connection to BigQuery, follow these
steps:
In theProject IDfield, leave the value as auto-detect.
If the BigQuery dataset is in a different project,
in theDataset Project IDfield, enter the ID.
In theService account typefield, choose one of the following
and enter the content in the next field:
File path
JSON
Reusable connection
To reuse an existing connection, follow these steps:
Turn onUse connection.
ClickBrowse connections.
Click the connection name—for example,BigQuery Default.
Optional: If a connection doesn't exist and you want to create a
new reusable connection, clickAdd connectionand refer to the
steps in theNew connectiontab on this page.
In theReference namefield, enter a name to use for lineage.
Optional: If your dataset is already available in your instance,
clickBrowseand select the data to read.
In theDatasetfield, enter the name of the dataset that
contains the table.
In theTablefield, enter the name of the table.
To test connectivity, clickGet schema.
Optional: In thePartition start datefield, enter the
inclusive start date string—for example,2021-01-11.
Optional: In thePartition end datefield, enter the
inclusive end date string—for example,2024-01-11.
Optional: In theFilterfield, enter a BigQueryWHEREclause.
Optional: In theTemporary bucket namefield, enter a name
for the Cloud Storage bucket.
Optional: In theEncryption Key Namefield, enter the Cloud Key Management Service
(Cloud KMS) encryption key name. For more information, seeGet
the resource name for the
key.
Optional: Turn onEnable querying views. If you enable them, do the
following:
In theTemporary table creation projectfield, enter the project
name where the temporary table is created.
In theTemporary table creation datasetfield, enter the dataset
name where the temporary table is created.
Optional: ClickValidateand address any errors found.
ClickcloseClose. Properties
are saved and you can continue to build your data pipeline in the
Cloud Data Fusion web interface.
Properties
Property
Macro enabled
Required property
Description
Label
No
Yes
The name of the node in your data pipeline.
Use connection
No
No
Browse for a reusable connection to the source. For more information
about adding, importing, and editing the connections that appear when
you browse connections, seeManage connections.
Connection
Yes
Yes
IfUse connectionis turned on, the name of the
reusable connection you select appears in this field.
Project ID
Yes
No
Used only whenUse connectionis turned off. A globally
unique identifier for the project where the BigQuery job
runs. Default isauto-detect.
Dataset project ID
Yes
No
Used only whenUse connectionis turned off. If the
dataset is in a different project than the one where the
BigQuery job runs, this value is the globally unique
identifier for the project with the BigQuery dataset. If no
value is given, the field defaults to theProject IDvalue. The BigQuery Data Viewer role must be granted to the specified
service account to read BigQuery data on the project.
Service account type
Yes
No
Select one of the following options:
File path: the path where the service account is
located.
JSON: JSON content of the service account.
Service account file path
Yes
No
Used only when the Service account type value isFile
path. The path on the local file system of the service account key
used for authorization. If jobs run on Dataproc clusters,
set the value to auto-detect. If jobs run on other types of clusters, the
file must be present on every node in the cluster. Default isauto-detect.
Service account JSON
Yes
No
Used only when the Service account type value isJSON.
The JSON file content of the service account.
Reference name
No
Yes
Name that uniquely identifies this source for other services, such as
lineage and annotating metadata.
Dataset
Yes
Yes
The dataset the table belongs to. A dataset is contained within a
specific project. Datasets are top-level containers that organize
and control access to tables and views.
Table
Yes
Yes
The table to read from. A table contains individual records organized in
rows. Each record is composed of columns (also called fields). Every table
is defined by a schema that describes the column names, data types, and
other information.
Partition start date
Yes
No
Inclusive partition start date, specified asyyyy-mm-dd. If
no value is given, all partitions up to the partition end date will be
read.
Partition end date
Yes
No
Exclusive partition end date, specified asyyyy-mm-dd. If
no value is given, all partitions up from the partition start date are
read.
Filter
Yes
No
A SQL query that filters for the given conditions. For example, this
query returns all rows from theRostertable where theSchoolIDcolumn has the valueSchoolID > 52: SELECT * FROM Roster WHERE SchoolID > 52;. This is the same as
theWHEREclausein
BigQuery.
Temporary bucket name
Yes
No
The Cloud Storage bucket for temporary data storage. It is
automatically created if it does not exist. Temporary data is deleted after
it has been read. If a name isn't provided, a unique bucket is created and
then deleted after the run finishes.
Encryption key name
Yes
No
Key that encrypts data written to any bucket created by the plugin. If
the bucket exists, this value is ignored. For more information, seeCMEK.
Enable querying views
Yes
No
Whether to allow BigQuery Logical and Materialized
views. Since BigQuery views aren't enabled by
default, querying them may have performance overhead. Default is No.
Temporary table creation project
Yes
No
Used only whenEnable querying viewsis turned on. The
project name where the temporary table should be created. Defaults to the
same project the table is in.
Temporary table creation dataset
Yes
No
The dataset in the specified project where the temporary table should be
created. Defaults to the same dataset the table is in.
Output schema
Yes
Yes
Schema of the table to read. You can fetch it by clickingGet schema.
Data type mappings
The following table is a list of BigQuery data types with
corresponding CDAP types.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eThe BigQuery source plugin in Cloud Data Fusion enables connecting to and loading data from BigQuery tables, by first exporting the data to a temporary Cloud Storage location before reading it into the pipeline.\u003c/p\u003e\n"],["\u003cp\u003eConfiguring the BigQuery batch source plugin involves granting specific IAM roles and permissions to both the Cloud Data Fusion API Service Agent and the Compute Engine Service Account, such as \u003ccode\u003ebigquery.jobUser\u003c/code\u003e, \u003ccode\u003ebigquery.dataEditor\u003c/code\u003e, and \u003ccode\u003estorage.legacyBucketWriter\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eWhen configuring the plugin properties, users can choose between creating a new one-time connection or reusing an existing connection to BigQuery, and they must also specify the dataset and table to read from, along with optional settings like partition dates and filter clauses.\u003c/p\u003e\n"],["\u003cp\u003eThe plugin supports various BigQuery data types, and the provided table outlines their corresponding CDAP schema data types, such as mapping \u003ccode\u003eBOOL\u003c/code\u003e to \u003ccode\u003eboolean\u003c/code\u003e and \u003ccode\u003eINT64\u003c/code\u003e to \u003ccode\u003elong\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eUsers have the option to utilize a temporary Cloud Storage bucket for data, or have one automatically created and deleted during the process, which would require \u003ccode\u003estorage.buckets.create\u003c/code\u003e and \u003ccode\u003estorage.buckets.delete\u003c/code\u003e permissions.\u003c/p\u003e\n"]]],[],null,["# BigQuery batch source\n\nThis page provides guidance about configuring the BigQuery batch source plugin in Cloud Data Fusion.\n\n\u003cbr /\u003e\n\nThe BigQuery source plugin lets you connect and load data from\nBigQuery tables. Data from a BigQuery table is\nexported into a temporary location in Cloud Storage, and then gets read into\nthe pipeline from there.\n\nBefore you begin\n----------------\n\nCloud Data Fusion typically has two service accounts:\n\n- Design-time service account: [Cloud Data Fusion API Service Agent](/data-fusion/docs/concepts/service-accounts)\n- Execution-time service account: [Compute Engine Service Account](/data-fusion/docs/concepts/service-accounts)\n\nBefore using the BigQuery batch source plugin, grant the\nfollowing roles or permissions to each service account.\n\n#### Cloud Data Fusion API Service Agent\n\nThis service account already has all the required permissions and you don't need\nto add additional permissions. For reference, it has the following permissions:\n\n- `bigquery.datasets.get`\n- `bigquery.tables.create`\n- `bigquery.tables.get`\n- `bigquery.tables.updateData`\n- `bigquery.tables.update`\n- `bigquery.tables.export`\n\nIf you're using a [namespace service\naccount](/data-fusion/docs/how-to/control-access-in-namespace)\nin addition to the default design-time service account, add the permissions from\nthe preceding list to it.\n\n#### Compute Engine Service Account\n\nIn your Google Cloud project, grant the following IAM roles or\npermissions to the Compute Engine Service Account:\n\n- [BigQuery Job User](/iam/docs/understanding-roles#bigquery.jobUser) (`roles/bigquery.jobUser`). This predefined role contains the required `bigquery.jobs.create` permission.\n- [BigQuery Data\n Editor](/iam/docs/understanding-roles#bigquery.dataEditor) (`roles/bigquery.dataEditor`). This\n predefined role contains the following required permissions:\n\n - `bigquery.datasets.get`\n - `bigquery.tables.create`\n - `bigquery.tables.get`\n - `bigquery.tables.updateData`\n - `bigquery.tables.update`\n - `bigquery.tables.export`\n\nThese roles and permissions can also be assigned on the BigQuery\ndataset or table, depending on your use case.\n\n- [Storage Legacy Bucket Writer](/iam/docs/understanding-roles#storage.legacyBucketWriter) (`roles/storage.legacyBucketWriter`).\n This predefined role contains the following required permissions:\n\n - `storage.buckets.get`\n - `storage.objects.get`\n - `storage.objects.list`\n\nThis role and these permissions can also be assigned on the\nCloud Storage bucket, depending on your use case.\n| **Note:** If you don't provide a temporary bucket name when you configure the plugin properties, you need `storage.buckets.create` and `storage.buckets.delete` permissions, which can be assigned using the [Storage\n| Admin](/iam/docs/understanding-roles#storage.admin) role (`roles/storage.admin`) on the project.\n\nConfigure the plugin\n--------------------\n\n1. [Go to the Cloud Data Fusion web interface](/data-fusion/docs/create-data-pipeline#navigate-web-interface) and click **Studio**.\n2. Check that **Data Pipeline - Batch** is selected (not **Realtime**).\n3. In the **Source** menu, click **BigQuery**. The BigQuery node appears in your pipeline.\n4. To configure the source, go to the BigQuery node and click **Properties**.\n5. Enter the following properties. For a complete list, see\n [Properties](#properties).\n\n 1. Enter a **Label** for the BigQuery node---for example, `BigQuery tables`.\n 2. Enter the connection details. You can set up a new, one-time connection,\n or an existing, reusable connection.\n\n ### New connection\n\n\n To add a one-time connection to BigQuery, follow these\n steps:\n 1. In the **Project ID** field, leave the value as auto-detect.\n 2. If the BigQuery dataset is in a different project, in the **Dataset Project ID** field, enter the ID.\n 3. In the **Service account type** field, choose one of the following\n and enter the content in the next field:\n\n - **File path**\n - **JSON**\n\n ### Reusable connection\n\n\n To reuse an existing connection, follow these steps:\n 1. Turn on **Use connection**.\n 2. Click **Browse connections**.\n 3. Click the connection name---for example,\n **BigQuery Default**.\n\n | **Note:** For more information about adding, importing, and editing the connections that appear when you browse connections, see [Manage connections](/data-fusion/docs/how-to/managing-connections).\n 4. Optional: If a connection doesn't exist and you want to create a\n new reusable connection, click **Add connection** and refer to the\n steps in the [**New connection**](#configure) tab on this page.\n\n 3. In the **Reference name** field, enter a name to use for lineage.\n\n 4. Optional: If your dataset is already available in your instance,\n click **Browse** and select the data to read.\n\n 5. In the **Dataset** field, enter the name of the dataset that\n contains the table.\n\n 6. In the **Table** field, enter the name of the table.\n\n 7. To test connectivity, click **Get schema**.\n\n 8. Optional: In the **Partition start date** field, enter the\n inclusive start date string---for example, `2021-01-11`.\n\n 9. Optional: In the **Partition end date** field, enter the\n inclusive end date string---for example, `2024-01-11`.\n\n 10. Optional: In the **Filter** field, enter a BigQuery\n [`WHERE` clause](/bigquery/docs/reference/standard-sql/query-syntax#where_clause).\n\n 11. Optional: In the **Temporary bucket name** field, enter a name\n for the Cloud Storage bucket.\n\n 12. Optional: In the **Encryption Key Name** field, enter the Cloud Key Management Service\n (Cloud KMS) encryption key name. For more information, see [Get\n the resource name for the\n key](/data-fusion/docs/how-to/customer-managed-encryption-keys#get-the-resource-name).\n\n 13. Optional: Turn on **Enable querying views**. If you enable them, do the\n following:\n\n - In the **Temporary table creation project** field, enter the project name where the temporary table is created.\n - In the **Temporary table creation dataset** field, enter the dataset name where the temporary table is created.\n 14. Optional: Click **Validate** and address any errors found.\n\n 15. Click close **Close**. Properties\n are saved and you can continue to build your data pipeline in the\n Cloud Data Fusion web interface.\n\n### Properties\n\n### Data type mappings\n\nThe following table is a list of BigQuery data types with\ncorresponding CDAP types.\n\nRelease notes\n-------------\n\n- [February 8, 2024](https://cdap.atlassian.net/wiki/spaces/DOCS/pages/1280901131/CDAP+Hub+Release+Log#Feb-8%2C-2024)\n- [January 16, 2024](/data-fusion/docs/release-notes#January_16_2024)\n- [September 6, 2023](https://cdap.atlassian.net/wiki/spaces/DOCS/pages/1280901131/CDAP+Hub+Release+Log#September-6%2C-2023)\n- [June 14, 2023](/data-fusion/docs/release-notes#June_14_2023)\n- [December 6, 2022](/data-fusion/docs/release-notes#December_06_2022)\n\nWhat's next\n-----------\n\n- Learn more about [plugins in Cloud Data Fusion](/data-fusion/docs/concepts/plugins)."]]