Cloud Data Fusion supports Pub/Sub sources in streaming data pipelines.
Before you begin
Roles and permissions
To get the permissions that
you need to read from a Pub/Sub streaming source,
ask your administrator to grant you the Pub/Sub Editor
( roles/pubsub.editor
)
IAM role on the service account used to access the Pub/Sub subscription.
For more information about granting roles, see Manage access to projects, folders, and organizations
.
This predefined role contains the permissions required to read from a Pub/Sub streaming source. To see the exact permissions that are required, expand the Required permissionssection:
Required permissions
The following permissions are required to read from a Pub/Sub streaming source:
-
pubsub.snapshots.create -
pubsub.snapshots.delete -
pubsub.snapshots.seek -
pubsub.subscriptions.consume -
pubsub.topics.attachSubscription
You might also be able to get these permissions with custom roles or other predefined roles .
You grant the role on the service account you specified in the plugin properties for accessing Pub/Sub. If none is specified, grant the role on the Dataproc service account.
For more information about granting roles, see Manage access .
Add a Pub/Sub source to your streaming data pipeline
-
Go to your instance:
-
In the Google Cloud console, go to the Cloud Data Fusion page.
-
To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.
-
-
In the Cloud Data Fusion web interface, click Studio.
-
Select Data Pipeline - Realtime.
-
In the Sourcemenu, select Pub/Sub. A Pub/Sub streaming source node appears in the pipeline.
-
On the Pub/Sub node, click Propertiesto configure the source. For more information, see Pub/Sub Streaming Source .
Support for a single Pub/Sub source with no Windower plugins
Cloud Data Fusion version 6.9.1 supports real time pipelines with a single Pub/Sub streaming source and no Windower plugins.
- The Pub/Sub streaming source has built-in support and data is processed at least once. Enabling Spark checkpointing isn't required.
- The Pub/Sub streaming source creates a Pub/Sub snapshot at the beginning of each batch and removes it at the end of each batch.
- Creating Pub/Sub snapshots has a cost associated with it. For more information, see Pub/Sub pricing .
- You can monitor snapshot creation in Cloud Audit Logs .
Upgrade a pipeline with a Pub/Sub streaming source
Cloud Data Fusion supports direct application upgrades for streaming pipelines with a Pub/Sub streaming source created in 6.9.1 or later.
Cloud Data Fusion doesn't support upgrades for data pipelines with a Pub/Sub streaming source in version 6.9.0 or earlier. Instead, upgrade those pipelines to 6.9.1:
- Stop publishing the data to the topic when the instance upgrade is planned.
- Wait for the pipeline to finish processing the published data.
- After the data is processed completely, stop the pipeline.
- Upgrade the instance .
- Duplicate the existing pipeline and update to the newest plugins.
- Deploy the pipeline.
-
Run the new pipeline to read data.
The new version automatically uses snapshot instead of Spark checkpointing.
-
Delete the old pipeline.
What's next
- Refer to the CDAP Pub/Sub Streaming Source .

