Schedule pipelines

This document describes how to schedule BigQuery pipelines , including how to schedule pipelines and inspect scheduled pipeline runs.

Pipelines are powered by Dataform . Each pipeline schedule is run using your Google Account user credentials or a custom service account that you select when you configure the schedule.

Changes you make to a pipeline are automatically saved, but are available only to you and to users granted the Dataform Admin role on the project. To update the schedule with a new version of the pipeline, you need to deploy the pipeline . Deploying updates the schedule to use your current version of the pipeline. Schedules always run the latest deployed version.

Schedules of pipelines that contain notebooks use a default runtime specification . During a scheduled run of a pipeline containing notebooks, BigQuery writes notebook output to the Cloud Storage bucket selected during schedule creation.

Before you begin

Before you begin, create a pipeline .

Enable pipeline scheduling

To schedule pipelines, you must grant the following role to the custom service account that you plan to use for pipeline schedules:

Service Account User ( roles/iam.serviceAccountUser ): Follow Grant a single role on a service account to add your service account as a principal to itself. In other words, add the service account as a principal to the same service account. Then, grant the Service Account User role to this principal.

If your pipeline contains SQL queries, you must grant the following roles to the service account that you plan to use for pipeline schedules:

BigQuery Job User ( roles/bigquery.jobUser ): Follow Grant a single role on a project to grant the BigQuery Job User role to your service account on projects from which your pipelines read data.
BigQuery Data Viewer ( roles/bigquery.dataViewer ): Follow Grant a single role on a project to grant the BigQuery Data Viewer role to your service account on projects from which your pipelines read data.
BigQuery Data Editor ( roles/bigquery.dataEditor ): Follow Grant a single role on a project to grant the BigQuery Data Editor role to your service account on projects to which your pipelines write data.

If your pipeline contains notebooks, you must grant the following roles to the service account that you plan to use for pipeline schedules:

Notebook Executor User ( roles/aiplatform.notebookExecutorUser ): Follow Grant a single role on a project to grant the Notebook Executor User role to your service account on the selected project.
Storage Admin ( roles/storage.admin ): Follow Add a principal to a bucket-level policy to add your service account as a principal to the Cloud Storage bucket that you plan to use for storing output of notebooks executed in scheduled pipeline runs, and grant the Storage Admin role to this principal.

Additionally, you must grant the following roles to the default Dataform service agent:

Service Account Token Creator ( roles/iam.serviceAccountTokenCreator ): Follow Grant token creation access to a service account to add the default Dataform service agent as a principal to your service account, and grant the Service Account Token Creator role to this principal.
Service Account User ( roles/iam.serviceAccountUser ): Follow Grant or revoke multiple IAM roles using Google Cloud console to grant the Service Account User role to the default Dataform service agent on the custom service account.

To learn more about service accounts in Dataform, see About service accounts in Dataform .

VPC Service Controls requirements

If you use VPC Service Controls to protect your pipelines, you should be aware that scheduled runs are powered by Dataform. When you configure VPC Service Controls for scheduled runs, ensure that the following requirements are met:

You must set the dataform.restrictGitRemotes Organization Policy Service .
Dataform and BigQuery must be restricted by the same VPC Service Controls service perimeter.
To allow users to authenticate with the user credentials for their Google Account when scheduling or manually triggering runs, you must add their user identities to your ingress rules. For more information, see Updating ingress and egress policies for a service perimeter and Ingress rules reference .

For detailed configuration steps and security considerations, see Configure VPC Service Controls for Dataform .

Required roles

To get the permissions that you need to manage pipelines, ask your administrator to grant you the following IAM roles:

Delete pipelines: Dataform Admin ( roles/dataform.Admin ) on the pipeline
Create, edit, run, and delete pipeline schedules:
- Dataform Admin ( roles/dataform.Admin ) on the pipeline
- Service Account User ( roles/iam.serviceAccountUser ) on the custom service account
View and run pipelines: Dataform Viewer ( roles/dataform.Viewer ) on the project
View pipeline schedules: Dataform Editor ( roles/dataform.Editor ) on the project

For more information about granting roles, see Manage access to projects, folders, and organizations .

You might also be able to get the required permissions through custom roles or other predefined roles .

To enhance security for scheduling, see Implement enhanced scheduling permissions .

For more information about Dataform IAM, see Control access with IAM .

To use Colab notebook runtime templates when scheduling pipelines, you need the Notebook Runtime User role ( roles/aiplatform.notebookRuntimeUser ).

Create a pipeline schedule

To create a pipeline schedule, follow these steps:

Explorerpane

In the Google Cloud console, go to the BigQuerypage.

Go to BigQuery
In the left pane, click Explorer:

If you don't see the left pane, click Expand left paneto open the pane.
In the Explorerpane, expand your project, click Pipelines, and then select a pipeline.
Click Schedule.
In the Schedule pipelinepane, in the Schedule namefield, enter a name for the schedule.
In the Authenticationsection, authorize the pipeline with your Google Account user credentials or a service account.
- To use your Google Account user credentials ( Preview ), select Execute with my user credentials.
- To use a service account, select Execute with selected service account, and then select a service account.
If your pipeline contains a notebook, in the Notebook optionssection, in the Runtime templatefield, select a Colaboratory notebook runtime template or the default runtime specifications. For details on creating a Colab notebook runtime template, see Create a runtime template .

Note: A notebook runtime template must be in the same region as the pipeline.

Note: If you don't have the required role for using Colab notebook runtime templates, you can still run and schedule pipelines with the default runtime specifications.
If your pipeline contains a notebook, in the Notebook optionssection, in the Cloud Storage bucketfield, click Browseand select or create a Cloud Storage bucket for storing the output of notebooks in your pipeline.

Your selected service account must be granted the Storage Admin IAM role on the selected bucket. For more information, see Enable pipeline scheduling .
Under Configuration Type, select Schedule (time-based recurrence).
Under Schedule frequency, do the following:
1. In the Repeatsmenu, select the frequency of scheduled pipeline runs.
2. In the At timefield, enter the time for scheduled pipeline runs.
3. In the Timezonemenu, select the timezone for the schedule.
Set the BigQuery query job priority with the Execute as interactive job with high priority (default)option. By default, BigQuery runs queries as interactive query jobs , which are intended to start running as quickly as possible. Clearing this option runs the queries as batch query jobs , which have lower priority.
Click Create schedule. If you selected Execute with my user credentialsfor your authentication method, you must authorize your Google Account ( Preview ).

When you create the schedule, the current version of the pipeline is automatically deployed. To update the schedule with a new version of the pipeline, deploy the pipeline .

The latest deployed version of the pipeline runs at the selected time and frequency.

Schedulingpage

In the Google Cloud console, go to the Schedulingpage.

Go to Scheduling
Click Create, and then select Pipeline schedulefrom the menu.
In the Schedule pipelinepane, select a pipeline to schedule.
In the Schedule namefield, enter a name for the schedule.
In the Authenticationsection, authorize the pipeline with your Google Account user credentials or a service account.
- To use your Google Account user credentials ( Preview ), select Execute with my user credentials.
- To use a service account, select Execute with selected service account, and then select a service account.
If your pipeline contains a notebook, in the Notebook optionssection, in the Runtime templatefield, select a Colab notebook runtime template or the default runtime specifications. For details on creating a Colab notebook runtime template, see Create a runtime template .

Note: A notebook runtime template must be in the same region as the pipeline.

Note: If you don't have the required role for using Colab notebook runtime templates, you can still run and schedule pipelines with the default runtime specifications.
If your pipeline contains a notebook, in the Cloud Storage bucketfield, click Browseand select or create a Cloud Storage bucket for storing the output of notebooks in your pipeline.

Your selected service account must be granted the Storage Admin IAM role on the selected bucket. For more information, see Enable pipeline scheduling .
Under Configuration Type, select Schedule (time-based recurrence).
Under Schedule frequency, do the following:
1. In the Repeatsmenu, select the frequency of scheduled pipeline runs.
2. In the At timefield, enter the time for scheduled pipeline runs.
3. In the Timezonemenu, select the timezone for the schedule.
Set the BigQuery query job priority with the Execute as interactive job with high priority (default)option. By default, BigQuery runs queries as interactive query jobs , which are intended to start running as quickly as possible. Clearing this option runs the queries as batch query jobs , which have lower priority.
Click Create schedule. If you selected Execute with my user credentialsfor your authentication method, you must authorize your Google Account ( Preview ).

Authorize your Google Account

To authenticate the resource with your Google Account user credentials, you must manually grant permission for BigQuery pipelines to get the access token for your Google Account and access the source data on your behalf. You can grant manual approval with the OAuth dialog interface.

You only need to give permission to BigQuery pipelines once.

To revoke the permission that you granted, follow these steps:

Go to your Google Account page .
Click BigQuery Pipelines.
Click Remove access.

Changing the pipeline schedule owner by updating credentials also requires manual approval if the new Google Account owner has never created a schedule before.

If your pipeline contains a notebook, you must also manually grant permission for Colab Enterprise to get the access token for your Google Account and access the source data on your behalf. You only need to give permission once. You can revoke this permission on the Google Account page .

Trigger-based scheduling

You can configure BigQuery pipelines to automatically trigger executions based on updates to specified BigQuery tables. You can create trigger-based schedules to automate pipeline executions in response to changes to your BigQuery data, rather than on a fixed schedule.

When the pipeline detects changes to the specified table or tables, it triggers a new execution of the associated workflow. You can define conditions based on updates to a single table, to all of a set of tables, or to any of a set of tables.

You can also adjust the optional settings of your trigger-based schedules to control the minimum interval between pipeline triggers. For example, adjust the Min Execution Durationvalue to ensure that trigger-based schedules aren't activated more frequently than intended. You can also adjust the Max Wait Durationvalue to ensure that the trigger-based schedule is forced to activate once within that duration, even if no table updates were detected.

Limitations

Trigger-based schedules are subject to the following limitations:

Trigger-based schedules aren't instantaneous. When you configure a trigger-based schedule, the pipeline checks the status of the BigQuery table approximately once every 3 minutes. This time period is called the polling interval and can result in a delay between a table modification and the trigger activation.
Each monitored table results in API calls to BigQuery during every polling interval. Monitoring a very large number of tables can contribute to BigQuery API quota consumption .

Create a trigger

To create a trigger, follow these steps:

In the Google Cloud console, go to the BigQuerypage.

Go to BigQuery
In the left pane, click Explorer:

If you don't see the left pane, click Expand left paneto open the pane.
In the Explorerpane, expand your project, click Pipelines, and then select a pipeline.
Click Trigger.
In the Triggerfield, enter a name for the trigger.
In the Authenticationsection, authorize the pipeline with your Google Account user credentials or a service account.
- To use your Google Account user credentials ( Preview ), select Execute with my user credentials.
- To use a service account, select Execute with selected service account, and then select a service account.
If your pipeline contains a notebook, in the Notebook optionssection, in the Runtime templatefield, select a Colaboratory notebook runtime template or the default runtime specifications. For details on creating a Colab notebook runtime template, see Create a runtime template .

Note: A notebook runtime template must be in the same region as the pipeline.

Note: If you don't have the required role for using Colab notebook runtime templates, you can still run and schedule pipelines with the default runtime specifications.
If your pipeline contains a notebook, in the Notebook optionssection, in the Cloud Storage bucketfield, click Browseand select or create a Cloud Storage bucket for storing the output of notebooks in your pipeline.

Your selected service account must be granted the Storage Admin IAM role on the selected bucket. For more information, see Enable pipeline scheduling .
Under Configuration Type, select Trigger (event-based execution).
In the Search tablesfield, add a table or tables to be monitored for the trigger.
Under Trigger Condition, select one of the following options:
- Wait for ALL tables to update: trigger the workflow only when all listed tables have been updated since the last check.
- Trigger if ANY table updates: trigger this workflow if any of the listed tables are updated since the last check.
(Optional) For Max Wait Duration, enter a duration to force the activation of a trigger if no table updates are detected within this duration. Supports values between 1 second to 7 days. If not specified, then the workflow will only run if the monitored table is updated, and the minimum execution duration is satisfied.
(Optional) For Min Execution Duration, select a duration to prevent triggers from activating more frequently than this minimum duration. Supports values between 3 minutes to 24 hours. If not specified, the default value is 3 minutes.
Click Create schedule. If you selected Execute with my user credentialsfor your authentication method, you must authorize your Google Account ( Preview ).

Troubleshooting trigger-based schedules

This section describes common issues with trigger-based schedules and how to resolve them.

Issue: The trigger isn't activating

Resolution:Try one of the following steps:

Verify that the user credentials or the service account has all the required permissions .
Verify that the specified BigQuery table is being modified.
Check that the trigger isn't being affected by the polling interval .
Check if the minimum execution duration, or the Min Execution Durationvalue, is preventing more frequent runs. You can decrease this value to increase the frequency of the trigger activation.
Check if the trigger condition option ( ALLor ANY) is affecting the trigger activation.
Examine the audit logs to check for errors when Dataform attempts to call the BigQuery API to check the status of the monitored table.

Issue: The trigger is activating too often

Resolution:Adjust the minimum execution duration, or the Min Execution Durationvalue. You can increase this value to decrease the frequency of the trigger activation.

Deploy a pipeline

Deploying a pipeline updates its schedule with the current version of the pipeline. Schedules run the latest deployed version of the pipeline.

To deploy a pipeline, follow these steps:

In the Google Cloud console, go to the BigQuerypage.

Go to BigQuery
In the left pane, click Explorer:
In the Explorerpane, expand your project, click Pipelines, and then select a pipeline.
Click Deploy.

The corresponding schedule is updated with the current version of the pipeline. The latest deployed version of the pipeline runs at the scheduled time.

Disable a schedule

To pause the scheduled runs of a selected pipeline without deleting the schedule, you can disable the schedule.

To disable a schedule for a selected pipeline, follow these steps:

Explorerpane

In the Google Cloud console, go to the BigQuerypage.

Go to BigQuery
In the left pane, click Explorer:
In the Explorerpane, expand your project, click Pipelines, and then select a pipeline.
Click View schedule.
In the Schedule detailstable, in the Schedule staterow, click the Schedule is enabledtoggle.

Schedulingpage

In the Google Cloud console, go to the Schedulingpage.

Go to Scheduling
Click the name of the selected pipeline.
On the Schedule detailspage, click Disable.

Enable a schedule

To resume scheduled runs of a disabled pipeline schedule, follow these steps:

Explorerpane

In the Google Cloud console, go to the BigQuerypage.

Go to BigQuery
In the left pane, click Explorer:
In the Explorerpane, expand your project, click Pipelines, and then select a pipeline.
Click View schedule.
In the Schedule detailstable, in the Schedule staterow, click the Schedule is disabledtoggle.

Schedulingpage

In the Google Cloud console, go to the Schedulingpage.

Go to Scheduling
Click the name of the selected pipeline.
On the Schedule detailspage, click Enable.

Manually run a deployed pipeline

When you manually run a pipeline deployed in a selected schedule, BigQuery executes the deployed pipeline once, independently from the schedule.

To manually run a deployed pipeline, follow these steps:

In the Google Cloud console, go to the Schedulingpage.

Go to Scheduling
Click the name of the selected pipeline schedule.
On the Schedule detailspage, click Run.

View all pipeline schedules

To view all pipeline schedules in your Google Cloud project, follow these steps:

In the Google Cloud console, go to the Schedulingpage.

Go to Scheduling
Optional: To display additional columns with pipeline schedule details, click Column display options, and then select columns and click OK.

View pipeline schedule details

To view details for a selected pipeline schedule, follow these steps:

Explorerpane

In the Google Cloud console, go to the BigQuerypage.

Go to BigQuery
In the left pane, click Explorer:
In the Explorerpane, expand your project, click Pipelines, and then select a pipeline.
Click View schedule.

Schedulingpage

In the Google Cloud console, go to the Schedulingpage.

Go to Scheduling
Click the name of the selected pipeline schedule.

View past scheduled runs

To view past runs of a selected pipeline schedule, follow these steps:

Explorerpane

In the Google Cloud console, go to the BigQuerypage.

Go to BigQuery
In the left pane, click Explorer:
In the Explorerpane, expand your project, click Pipelines, and then select a pipeline.
Click Executions.
Optional: To refresh the list of past runs, click Refresh.

Schedulingpage

In the Google Cloud console, go to the Schedulingpage.

Go to Scheduling
Click the name of the selected pipeline.
On the Schedule detailspage, in the Past executionssection, inspect past runs.
Optional: To refresh the list of past runs, click Refresh.

Edit a pipeline schedule

To edit a pipeline schedule, follow these steps:

Explorerpane

In the Google Cloud console, go to the BigQuerypage.

Go to BigQuery
In the left pane, click Explorer:
In the Explorerpane, expand your project, click Pipelines, and then select a pipeline.
Click View schedule, and then click Edit.
In the Schedule pipelinedialog, edit the schedule, and then click Update schedule.

Schedulingpage

In the Google Cloud console, go to the Schedulingpage.

Go to Scheduling
Click the name of the selected pipeline.
On the Schedule detailspage, click Edit.
Click View schedule, and then click Edit.
In the Schedule pipelinedialog, edit the schedule, and then click Update schedule.

Delete a pipeline schedule

To permanently delete a pipeline schedule, follow these steps:

In the Google Cloud console, go to the Schedulingpage.

Go to Scheduling
Do either of the following:
- Click the name of the selected pipeline schedule, and then on the Schedule detailspage, click Delete.
- In the row that contains the selected pipeline schedule, click View actionsin the Actionscolumn, and then click Delete.
In the dialog that appears, click Delete.

What's next

Learn more about pipelines in BigQuery .
Learn how to create pipelines .