Create an AlloyDB stream using the automated flow

This page describes how to create an AlloyDB for PostgreSQL stream using the automated setup flow.

The automated stream setup simplifies the process of moving data from AlloyDB for PostgreSQL clusters to BigQuery by reducing the number of steps that you need to perform. The flow lets you create a stream directly from the overview page of your AlloyDB for PostgreSQL cluster.

An AlloyDB for PostgreSQL deployment consists of a primary cluster, which contains a primary instance. It can also include secondary clusters and secondary instances. Datastream always creates the stream from the primary instance in the primary cluster.

Datastream automates securing the VPC connection between the stream and the source cluster, creating database configurations and stream connection resources.

Before you begin

  1. Enable the Datastream, Network Connectivity, and Compute Engine APIs.
  2. Ensure that you have the required Identity and Access Management (IAM) permissions to create and manage Datastream resources. See the section that follows for more details.
  3. Create and configure a source AlloyDB for PostgreSQL cluster for replication. For more information, see Configure a AlloyDB for PostgreSQL database for CDC .
  4. We recommend that you enable logical replication for your cluster prior to creating the stream. If you don't enable logical replication, Datastream enables it for you, which restarts your source instance.

    For information about enabling logical replication for an AlloyDB for PostgreSQL cluster, see Configure an AlloyDB for PostgreSQL database for CDC .

  5. Make sure that your source cluster is configured to use private services access .

Required permissions

To use the automated stream creation flow, you need the following Identity and Access Management (IAM) roles or permissions:

  • serviceusage.services.enable , compute.networkAdmin for enabling the required APIs and performing network configuration tasks.
  • alloydb.admin for cluster and instance configuration tasks.
  • datastream.admin for administration tasks that Datastream performs on your behalf.

If you choose to customize the stream settings, your database administrator user also needs GRANT permissions for the schema that you want to replicate and some additional permissions for administrative and structural tasks within the source database. Connect to your source database and run the following:

  
 GRANT 
  
 alloydbsuperuser 
  
 TO 
  
 " USER_NAME 
" 
 ; 
  
 ALTER 
  
 ROLE 
  
 " USER_NAME 
" 
  
 CREATEROLE 
 ; 
  
 GRANT 
  
 SELECT 
  
 on 
  
 ALL 
  
 TABLES 
  
 IN 
  
 SCHEMA 
  
 " SCHEMA_NAME 
" 
  
 to 
  
  
 " USER_NAME 
" 
  
 WITH 
  
 GRANT 
  
 OPTION 
 ; 
  
 ALTER 
  
 DEFAULT 
  
 PRIVILEGES 
  
 IN 
  
 SCHEMA 
  
 " SCHEMA_NAME 
" 
  
 GRANT 
  
 SELECT 
  
 ON 
  
 TABLES 
  
 TO 
  
 " USER_NAME 
" 
  
 WITH 
  
 GRANT 
  
 OPTION 
 ; 
  
 GRANT 
  
 CREATE 
  
 ON 
  
 DATABASE 
  
 " DATABASE_NAME 
" 
  
 TO 
  
 " USER_NAME 
" 
 ; 
  

Future tables

Future tables are automatically added to the stream only if they're created by the user whose account details you provide to authenticate during the automated flow. If a future table is created by a different user, then that user must explicitly grant the Datastream reader user SELECT permission on the table.

Create and start the stream

To create and start a stream, perform the following steps:

Console

  1. Go to your source AlloyDB for PostgreSQL clusters overview page.

    Go to AlloyDB for PostgreSQL

  2. Click the primary cluster from which you want to stream data.

  3. Create and start your stream automatically, in one of the following ways:

    1. In the Cluster detailspage, under Data replication, click Start stream.
    2. In the Integrationssection, click Start streamunder Replicate data to BigQuery.
    3. On the Launchpadtab, under Seamlessly integrate with other Google Cloud products, click Replicate data to BigQuery.
  4. The Start stream to replicate datapane opens.

  5. Under Stream settings, review the default settings for the stream that you create.

  6. You can change the source database by expanding the corresponding drop-down and selecting a different value.

  7. If you're ready to create and start your stream, click Start stream.

If you need to change the stream settings, click Customize.

Customize stream settings

When you click Customize, the Customize stream settingspage appears. Make sure that you have the required permissions to customize the stream settings.

Get started

  1. On the Get startedpage, provide the name for your stream in the Stream namefield. The Stream IDfield gets populated automatically.
  2. In the Provide the instance user account detailssection, select the authentication method:

    • IAM database authentication: this option is available if your user has the Identity and Access Management (IAM) principal identity assigned. For more information, see IAM principals . If you select this option, you must manually grant the alloydbsuperuser role and the CREATEROLE permission to your user:

      1. In Google Cloud console, go to the AlloyDB for PostgreSQL Clusters page.

      Go to AlloyDB for PostgreSQL

      1. Select your AlloyDB for PostgreSQL cluster.
      2. In the navigation menu, click AlloyDB for PostgreSQL Studioand sign in.
      3. In the Explorerpane, run the following query for your user:
     GRANT 
      
     alloydbsuperuser 
      
     TO 
      
     " USER_NAME 
    " 
     ; 
     ALTER 
      
     ROLE 
      
     " USER_NAME 
    " 
      
     CREATEROLE 
     ; 
    
    • Built-in database authentication: provide the username and the password of a user with the alloydbsuperuser role. If your user doesn't have the alloydbsuperuser role, follow the steps described in the IAM database authentication section to grant it manually. If you select built-in database authentication, also make sure that the user has GRANT permissions on the tables that they want to replicate.
  3. Review additional stream details, such as region, encryption and labels. Expand the Additional stream detailssection to apply changes, if required.

  4. Click Continue.

Configure source

  1. On the Configure sourcepage, select the database from which you want to replicate data.

  2. The Selected objectslist selects all available objects by default. The objects on the list are the objects for which you have the necessary permissions to stream. To change the selected objects, modify the selections in the list.

  3. Review advanced stream configurations, such as backfill mode and the maximum number of concurrent backfill connections. Expand the Advanced stream configurationssection to apply changes, if required.

  4. Click Continue.

Configure destination

  1. On the Configure destinationpage, adjust the BigQuery destination settings, if required. For more information, see Configure information about the destination for the stream .

Create and start the stream

  1. Click Create and startto create and start your stream, or Create without startingto create your stream and start it later in Datastream.

  2. You're notified about the tasks that are automatically performed for you:

    • Datastream creates the required Virtual Private Cloud resources, such as an internal IP address range, subnetwork and network attachment.
    • Datastream configures tables for CDC, sets up replication slots, sets up a publication for all tables in the database, and creates a dedicated Datastream user.
    • Datastream creates a private connectivity configuration and source and destination connection profiles.
  3. Confirm that you want to create and start your stream.

Monitor the stream

You can monitor basic stream details from your source instance overview page in the Google Cloud console. The page displays information such as stream status, stream name, destination BigQuery dataset, and destination project identifier.

From the overview page, you can also perform actions such as starting or pausing the stream. For more granular monitoring information, click the stream name to navigate to Datastream.

Delete the stream

When you delete a stream that you created using the automated flow, some resources, such as the PostgreSQL replication slot are deleted automatically. However, there are resources that you need to manually delete:

  • The publication. The publication is created using the database administrator user, and can be deleted only by its owner, so by default, the user who created it.
  • The Datastream reader user.
  • The Datastream source and destination connection profiles
  • The private connectivity resources.
  • All the network resources created during the automated flow, such as the subnetwork and the network attachment.

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: