Create a stream using the automated flow

This page describes how to create a stream using the automated setup flow.

The automated stream setup simplifies the process of moving data from Cloud SQL for PostgreSQL managed databases to BigQuery by reducing the number of steps that you need perform. The flow lets you create a stream directly from the overview page of your Cloud SQL for PostgreSQL instance. Datastream automates securing the VPC connection between the stream and the source database, creating database configurations and stream connection resources.

Before you begin

  1. Enable the Datastream, Network Connectivity, and Compute Engine APIs.
  2. Ensure that you have the required Identity and Access Management (IAM) permissions to create and manage Datastream resources. See the section that follows for more details.
  3. Create and configure a source Cloud SQL for PostgreSQL database for replication. For more information, see Configure a Cloud SQL for PostgreSQL database for CDC .
  4. We recommend that you enable logical replication for your instance prior to creating the stream. If you don't enable logical replication, Datastream enables it for you, which restarts your source instance.

    For information about enabling logical replication for a Cloud SQL for PostgreSQL instance, see Configure an Cloud SQL for PostgreSQL database for CDC .

  5. Make sure that your source database is configured to use private services access .

Required permissions

To use the automated stream creation flow, you need the following Identity and Access Management (IAM) roles or permissions:

  • serviceusage.services.enable , compute.networkAdmin for enabling the required APIs and performing network configuration tasks.
  • cloudsql.admin for instance configuration tasks.
  • datastream.admin for administration tasks that Datastream performs on your behalf.

Your database administrator user also needs GRANT permissions for the schema that you want to replicate. Connect to your source database and run the following:

  
 GRANT 
  
 cloudsqlsuperuser 
  
 TO 
  
 " USER_NAME 
" 
 ; 
  
 ALTER 
  
 ROLE 
  
 " USER_NAME 
" 
  
 CREATEROLE 
 ; 
  
 GRANT 
  
 SELECT 
  
 on 
  
 ALL 
  
 TABLES 
  
 IN 
  
 SCHEMA 
  
 " SCHEMA_NAME 
" 
  
 to 
  
  
 " USER_NAME 
" 
  
 WITH 
  
 GRANT 
  
 OPTION 
 ; 
  
 ALTER 
  
 DEFAULT 
  
 PRIVILEGES 
  
 IN 
  
 SCHEMA 
  
 " SCHEMA_NAME 
" 
  
 GRANT 
  
 SELECT 
  
 ON 
  
 TABLES 
  
 TO 
  
 " USER_NAME 
" 
  
 WITH 
  
 GRANT 
  
 OPTION 
 ; 
  

Create and start the stream

To create and start a stream, perform the following steps:

Console

Get started

  1. Go to your source Cloud SQL for PostgreSQL database instance overview page.

    Go to Cloud SQL Instances

  2. In the Stream data into BigQuerysection, click Create stream.

  3. On the Get startedpage, provide the name for your stream in the Stream namefield. A unique identifier is auto-populated.

  4. In the Provide the instance user account detailssection, select the authentication method:

    • IAM database authentication: this option is available if your user has the Identity and Access Management (IAM) principal identity assigned. For more information, see IAM principals . If you select this option, you must manually grant the cloudsqlsuperuser role and the CREATEROLE permission to your user:

      1. In Google Cloud console, go to the Cloud SQL Instances page.

      Go to Cloud SQL Instances

      1. Select your Cloud SQL instance.
      2. In the navigation menu, click Cloud SQL Studioand sign in.
      3. In the Explorerpane, run the following query for your user:
     GRANT 
      
     cloudsqlsuperuser 
      
     TO 
      
     " USER_NAME 
    " 
     ; 
     ALTER 
      
     ROLE 
      
     " USER_NAME 
    " 
      
     CREATEROLE 
     ; 
    
    • Built-in database authentication: provide the username and the password of a user with the cloudsqlsuperuser role. If you select this option, make sure that the user has GRANT permissions on the tables that they want to replicate.
  5. Review additional stream details, such as region, encryption and labels. Expand the Additional stream detailssection to apply changes, if required.

  6. Click Continue.

Configure source

  1. On the Configure stream sourcepage, select the database from which you want to replicate data.
  2. The Objects to includelist selects all available objects by default. The objects on the list are the objects for which you have the necessary permissions to stream. To change the selected objects, modify the selections in the list.
  3. Review advanced stream configurations, such as backfill mode and the maximum number of concurrent backfill connections. Expand the Advanced stream configurationssection to apply changes, if required.
  4. Click Continue.

Configure destination

  1. On the Configure destinationpage, adjust the BigQuery destination settings, if required. For more information, see Configure information about the destination for the stream .

Create and start the stream

  1. Click Create and start laterto create your stream and start it later in Datastream, or Startto create and start your stream right away.

  2. You're notified about the tasks that are automatically performed for you:

    • Datastream creates the required Virtual Private Cloud resources, such as internal IP address range, subnetwork and network attachment.
    • Datastream configures tables for CDC, sets up replication slots, sets up a publication for all tables in the database, and creates a dedicated Datastream user.
    • Datastream creates a private connectivity configuration and source and destination connection profiles.
  3. Confirm that you want to create or create and start your stream.

Monitor the stream

You can monitor basic stream details from your source instance overview page in the Google Cloud console. The page displays information such as stream status, stream name, destination BigQuery dataset, and destination project identifier.

From the overview page, you can also perform actions such as starting, stopping, or pausing the stream. For more granular monitoring information, click the stream name to navigate to Datastream.

Delete the stream

When you delete a stream that you created using the automated flow, some resources, such as the PostgreSQL replication slot are deleted automatically. However, there are resources that you need to manually delete:

  • The publication. The publication is created using the database administrator user, and can be deleted only by its owner, so by default, the user who created it.
  • The Datastream reader user.
  • The Datastream source and destination connection profiles
  • The private connectivity resources.
  • All the network resources created during the automated flow, such as the subnetwork and the network attachment.

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: