Stream from Pub/Sub to BigQueryStay organized with collectionsSave and categorize content based on your preferences.
This tutorial uses thePub/Sub Subscription to BigQuerytemplate
to create and run aDataflow templatejob using the Google Cloud console or Google Cloud CLI. The tutorial
walks you through a streaming pipeline example that reads JSON-encoded
messages fromPub/Suband writes them to aBigQuerytable.
Streaming analytics and data integration
pipelines use Pub/Sub to ingest and distribute data.
Pub/Sub enables you to create systems of event producers and
consumers, calledpublishersandsubscribers. Publishers send events to
the Pub/Sub service asynchronously, and Pub/Sub delivers the events to all
services that need to react to them.
Dataflow is a fully-managed service for transforming and
enriching data in stream (real-time) and batch modes. It provides a simplified pipeline
development environment that uses the Apache Beam SDK to transform incoming data and
then output the transformed data.
The benefit of this workflow is that you can use UDFs to transform the message
data before it is written to BigQuery.
Before running a Dataflow pipeline for this scenario, consider
whether aPub/Sub
BigQuery subscriptionwith aUDFmeets your requirements.
Objectives
Create a Pub/Sub topic.
Create a BigQuery dataset with a table and schema.
Use a Google-provided streaming template to stream data from your
Pub/Sub subscription to BigQuery by using Dataflow.
Costs
In this document, you use the following billable components of Google Cloud Platform:
Dataflow
Pub/Sub
Cloud Storage
BigQuery
To generate a cost estimate based on your projected usage,
use thepricing calculator.
New Google Cloud users might be eligible for afree trial.
When you finish the tasks that are described in this document, you can avoid
continued billing by deleting the resources that you created. For more information, seeClean up.
Before you begin
This section shows you how to select a project, enable APIs, and grant the
appropriate roles to your user account and to theworker service account.
Console
Sign in to your Google Cloud Platform account. If you're new to
Google Cloud,create an accountto evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Roles required to select or create a project
Select a project: Selecting a project doesn't require a specific
IAM role—you can select any project that you've been
granted a role on.
Create a project: To create a project, you need the Project Creator role
(roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant
roles.
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains theserviceusage.services.enablepermission.Learn how to grant
roles.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Roles required to select or create a project
Select a project: Selecting a project doesn't require a specific
IAM role—you can select any project that you've been
granted a role on.
Create a project: To create a project, you need the Project Creator role
(roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant
roles.
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains theserviceusage.services.enablepermission.Learn how to grant
roles.
Sign in to your Google Cloud Platform account. If you're new to
Google Cloud,create an accountto evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Roles required to select or create a project
Select a project: Selecting a project doesn't require a specific
IAM role—you can select any project that you've been
granted a role on.
Create a project: To create a project, you need the Project Creator role
(roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant
roles.
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains theserviceusage.services.enablepermission.Learn how to grant
roles.
Toinitializethe gcloud CLI, run the following command:
gcloudinit
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Roles required to select or create a project
Select a project: Selecting a project doesn't require a specific
IAM role—you can select any project that you've been
granted a role on.
Create a project: To create a project, you need the Project Creator role
(roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant
roles.
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains theserviceusage.services.enablepermission.Learn how to grant
roles.
Begin by creating a Cloud Storage bucket using the Google Cloud console or
Google Cloud CLI. The Dataflow pipeline uses this bucket as a
temporary storage location.
Console
In the Google Cloud console, go to the Cloud StorageBucketspage.
On theCreate a bucketpage, forName your bucket, enter a name
that meets thebucket naming requirements.
Cloud Storage bucket names must be globally unique.
Don't select the other options.
ReplaceBUCKET_NAMEwith a name for your Cloud Storage bucket
that meets thebucket naming requirements.
Cloud Storage bucket names must be globally unique.
Create a Pub/Sub topic and subscription
Create a Pub/Sub topic and then create a subscription to that topic.
Console
To create a topic, complete the following steps.
In the Google Cloud console, go to the Pub/SubTopicspage.
Run a streaming pipeline using the Google-provided
Pub/Sub Subscription to BigQuery template.
The pipeline gets incoming data from the Pub/Sub topic and outputs the data to
your BigQuery dataset.
Console
In the Google Cloud console, go to the DataflowJobspage.
PROJECT_ID: the name of your Google Cloud Platform
project
DATASET_NAME: the name of your
BigQuery dataset
TABLE_NAME: the name of your
BigQuery table
gcloud
Check the results in BigQuery by running the following query:
bqquery--use_legacy_sql=false'SELECT * FROM `PROJECT_ID.DATASET_NAME.TABLE_NAME`'
Replace the following variables:
PROJECT_ID: the name of your Google Cloud Platform
project
DATASET_NAME: the name of your
BigQuery dataset
TABLE_NAME: the name of your
BigQuery table
Use a UDF to transform the data
This tutorial assumes that the Pub/Sub messages are formatted as
JSON, and that the BigQuery table schema matches the JSON data.
Optionally, you can provide a JavaScript user-defined function (UDF) that
transforms the data before it is written to BigQuery.
The UDF can perform additional processing, such as filtering, removing personal
identifiable information (PII), or enriching the data with additional fields.
While the job is running, the pipeline might fail to write individual messages
to BigQuery. Possible errors include:
Serialization errors, including badly-formatted JSON.
Type conversion errors, caused by a mismatch in the table schema and the JSON
data.
Extra fields in the JSON data that are not present in the table schema.
The pipeline writes these errors to adead-letter tablein
BigQuery. By default, the pipeline automatically creates a
dead-letter table namedTABLE_NAME_error_records,
whereTABLE_NAMEis the name of the output table.
To use a different name, set theoutputDeadletterTabletemplate parameter.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this
tutorial, either delete the project that contains the resources, or keep the project and
delete the individual resources.
Delete the project
The easiest way to eliminate billing is to delete the Google Cloud project that you created
for the tutorial.
Console
In the Google Cloud console, go to theManage resourcespage.
Find the row containing the principal whose access you want to revoke.
In that row, clickeditEdit principal.
Click theDeletedeletebutton for
each role you want to revoke, and then clickSave.
gcloud
If you keep your project, revoke the roles that you granted to the
Compute Engine default service account. Run the following command one
time for each of the following IAM roles:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-11-13 UTC."],[],[]]