Deploy a highly-available PostgreSQL database on GKE


PostgreSQL is an open source object-relational database known for reliability and data integrity. It is ACID -compliant, and supports foreign keys, joins, views, triggers, and stored procedures.

This document is intended for database administrators, cloud architects, and operations professionals interested in deploying a highly-available PostgreSQL topology on Google Kubernetes Engine (GKE).

Objectives

In this tutorial, you will learn how to:
  • Use Terraform to create a regional GKE cluster.
  • Deploy a highly-available PostgreSQL database.
  • Set up monitoring for the PostgreSQL application.
  • Perform PostgreSQL database and GKE cluster upgrades.
  • Simulate cluster disruption and PostgreSQL replica failover.
  • Perform backup and restore of the PostgreSQL database.

Architecture

This section describes the architecture of the solution you'll build in this tutorial.

You'll provision two GKE clusters in different regions: a primary cluster and a backup cluster . For this tutorial, the primary cluster is in the us-central1 region and the backup cluster is in the us-west1 region. This architecture lets you provision a highly-available PostgreSQL database and test for disaster recovery, as described later in this tutorial.

For the source cluster, you'll use a Helm chart ( bitnami/postgresql-ha ) to set up a high-availability PostgreSQL cluster.

Diagram shows an example architecture of a highly-available PostgreSQL cluster.
Figure 1 : Example architecture of a highly-available PostgreSQL cluster.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator .

New Google Cloud users might be eligible for a free trial .

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up .

Before you begin

Set up your project

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project .

  4. Enable the Google Kubernetes Engine, Backup for GKE, Artifact Registry, Compute Engine, and IAM APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project .

  7. Enable the Google Kubernetes Engine, Backup for GKE, Artifact Registry, Compute Engine, and IAM APIs.

    Enable the APIs

Set up roles

  1. Make sure that you have the following role or roles on the project: roles/storage.objectViewer, roles/logging.logWriter, roles/artifactregistry.Admin, roles/container.clusterAdmin, roles/container.serviceAgent, roles/serviceusage.serviceUsageAdmin, roles/iam.serviceAccountAdmin

    Check for the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

    4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

    Grant the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. Click Grant access .
    4. In the New principals field, enter your user identifier. This is typically the email address for a Google Account.

    5. In the Select a role list, select a role.
    6. To grant additional roles, click Add another role and add each additional role.
    7. Click Save .

Set up your environment

In this tutorial, you use Cloud Shell to manage resources hosted on Google Cloud. Cloud Shell comes preinstalled with the software you'll need for this tutorial, including Docker , kubectl , the gcloud CLI , Helm , and Terraform .

To use Cloud Shell to set up your environment:

  1. Launch a Cloud Shell session from the Google Cloud console, by clickingCloud Shell activation icon Activate Cloud Shellin the Google Cloud console . This launches a session in the bottom pane of Google Cloud console.

  2. Set environment variables.

      export 
      
     PROJECT_ID 
     = 
     PROJECT_ID 
     export 
      
     SOURCE_CLUSTER 
     = 
    cluster-db1 export 
      
     REGION 
     = 
    us-central1 
    

    Replace the following values:

  3. Set the default environment variables.

     gcloud  
    config  
     set 
      
    project  
     PROJECT_ID 
     
    
  4. Clone the code repository.

     git  
    clone  
    https://github.com/GoogleCloudPlatform/kubernetes-engine-samples 
    
  5. Change to the working directory.

      cd 
      
    kubernetes-engine-samples/databases/gke-stateful-postgres 
    

Create your cluster infrastructure

In this section, you'll run a Terraform script to create a custom Virtual Private Cloud (VPC), a Artifact Registry repository to store PostgreSQL images, and two regional GKE clusters . One cluster will be deployed in us-central1 and the second cluster for backup will be deployed in us-west1 .

To create the cluster, follow these steps:

Autopilot

In Cloud Shell, run the following commands:

 terraform  
-chdir = 
terraform/gke-autopilot  
init
terraform  
-chdir = 
terraform/gke-autopilot  
apply  
-var  
 project_id 
 = 
 $PROJECT_ID 
 

When prompted, type yes .

Understand the Terraform configuration

The Terraform configuration files create the following resources to deploy your infrastructure:

  • Create a Artifact Registry repository to store the Docker images.
      resource 
      
     "google_artifact_registry_repository" 
      
     "main" 
      
     { 
      
     location 
      
     = 
      
     "us" 
      
     repository_id 
      
     = 
      
     "main" 
      
     format 
      
     = 
      
     "DOCKER" 
      
     project 
      
     = 
      
     var.project_id 
     } 
     
    
  • Create the VPC network and subnet for the VM's network interface.
      module 
      
     "gcp-network" 
      
     { 
      
     source 
      
     = 
      
     "terraform-google-modules/network/google" 
      
     version 
      
     = 
      
     "< 8.0.0" 
      
     project_id 
      
     = 
      
     var.project_id 
      
     network_name 
      
     = 
      
     "vpc-gke-postgresql" 
      
     subnets 
      
     = 
      
     [ 
      
     { 
      
     subnet_name 
      
     = 
      
     "snet-gke-postgresql-us-central1" 
      
     subnet_ip 
      
     = 
      
     "10.0.0.0/17" 
      
     subnet_region 
      
     = 
      
     "us-central1" 
      
     subnet_private_access 
      
     = 
      
     true 
      
     }, 
      
     { 
      
     subnet_name 
      
     = 
      
     "snet-gke-postgresql-us-west1" 
      
     subnet_ip 
      
     = 
      
     "10.0.128.0/17" 
      
     subnet_region 
      
     = 
      
     "us-west1" 
      
     subnet_private_access 
      
     = 
      
     true 
      
     }, 
      
     ] 
      
     secondary_ranges 
      
     = 
      
     { 
      
     ( 
     "snet-gke-postgresql-us-central1" 
     ) 
      
     = 
      
     [ 
      
     { 
      
     range_name 
      
     = 
      
     "ip-range-pods-db1" 
      
     ip_cidr_range 
      
     = 
      
     "192.168.0.0/18" 
      
     }, 
      
     { 
      
     range_name 
      
     = 
      
     "ip-range-svc-db1" 
      
     ip_cidr_range 
      
     = 
      
     "192.168.64.0/18" 
      
     }, 
      
     ], 
      
     ( 
     "snet-gke-postgresql-us-west1" 
     ) 
      
     = 
      
     [ 
      
     { 
      
     range_name 
      
     = 
      
     "ip-range-pods-db2" 
      
     ip_cidr_range 
      
     = 
      
     "192.168.128.0/18" 
      
     }, 
      
     { 
      
     range_name 
      
     = 
      
     "ip-range-svc-db2" 
      
     ip_cidr_range 
      
     = 
      
     "192.168.192.0/18" 
      
     }, 
      
     ] 
      
     } 
     } 
     output 
      
     "network_name" 
      
     { 
      
     value 
      
     = 
      
     module.gcp-network.network_name 
     } 
     output 
      
     "primary_subnet_name" 
      
     { 
      
     value 
      
     = 
      
     module.gcp-network.subnets_names[0 
     ] 
     } 
     output 
      
     "secondary_subnet_name" 
      
     { 
      
     value 
      
     = 
      
     module.gcp-network.subnets_names[1 
     ] 
     } 
     
    
  • Create a primary GKE cluster.

    Terraform creates a private cluster in the us-central1 region, and enables Backup for GKE for disaster recovery and Managed Service for Prometheus for cluster monitoring.

    Managed Service for Prometheus is only supported on Autopilot clusters running GKE version 1.25 or later.

      module 
      
     "gke-db1-autopilot" 
      
     { 
      
     source 
      
     = 
      
     "../modules/beta-autopilot-private-cluster" 
      
     project_id 
      
     = 
      
     var.project_id 
      
     name 
      
     = 
      
     "cluster-db1" 
      
     kubernetes_version 
      
     = 
      
     "1.25" # Will be ignored if use "REGULAR" 
      
     release_channel 
      
     region 
      
     = 
      
     "us-central1" 
      
     regional 
      
     = 
      
     true 
      
     zones 
      
     = 
      
     [ 
     "us-central1-a", "us-central1-b", "us-central1-c" 
     ] 
      
     network 
      
     = 
      
     module.network.network_name 
      
     subnetwork 
      
     = 
      
     module.network.primary_subnet_name 
      
     ip_range_pods 
      
     = 
      
     "ip-range-pods-db1" 
      
     ip_range_services 
      
     = 
      
     "ip-range-svc-db1" 
      
     horizontal_pod_autoscaling 
      
     = 
      
     true 
      
     release_channel 
      
     = 
      
     "RAPID" 
     # Default version is 1.22 in REGULAR. GMP on Autopilot requires V1.25 via var.kubernetes_version 
      
     enable_vertical_pod_autoscaling 
      
     = 
      
     true 
      
     enable_private_endpoint 
      
     = 
      
     false 
      
     enable_private_nodes 
      
     = 
      
     true 
      
     master_ipv4_cidr_block 
      
     = 
      
     "172.16.0.0/28" 
      
     create_service_account 
      
     = 
      
     false 
     } 
     
    
  • Create a backup cluster in the us-west1 region for disaster recovery.

      module 
      
     "gke-db2-autopilot" 
      
     { 
      
     source 
      
     = 
      
     "../modules/beta-autopilot-private-cluster" 
      
     project_id 
      
     = 
      
     var.project_id 
      
     name 
      
     = 
      
     "cluster-db2" 
      
     kubernetes_version 
      
     = 
      
     "1.25" # Will be ignored if use "REGULAR" 
      
     release_channel 
      
     region 
      
     = 
      
     "us-west1" 
      
     regional 
      
     = 
      
     true 
      
     zones 
      
     = 
      
     [ 
     "us-west1-a", "us-west1-b", "us-west1-c" 
     ] 
      
     network 
      
     = 
      
     module.network.network_name 
      
     subnetwork 
      
     = 
      
     module.network.secondary_subnet_name 
      
     ip_range_pods 
      
     = 
      
     "ip-range-pods-db2" 
      
     ip_range_services 
      
     = 
      
     "ip-range-svc-db2" 
      
     horizontal_pod_autoscaling 
      
     = 
      
     true 
      
     release_channel 
      
     = 
      
     "RAPID" 
     # Default version is 1.22 in REGULAR. GMP on Autopilot requires V1.25 via var.kubernetes_version 
      
     enable_vertical_pod_autoscaling 
      
     = 
      
     true 
      
     enable_private_endpoint 
      
     = 
      
     false 
      
     enable_private_nodes 
      
     = 
      
     true 
      
     master_ipv4_cidr_block 
      
     = 
      
     "172.16.0.16/28" 
      
     create_service_account 
      
     = 
      
     false 
     } 
     
    

Standard

In Cloud Shell, run the following commands:

 terraform  
-chdir = 
terraform/gke-standard  
init
terraform  
-chdir = 
terraform/gke-standard  
apply  
-var  
 project_id 
 = 
 $PROJECT_ID 
 

When prompted, type yes .

Understand the Terraform configuration

The Terraform configuration files create the following resources to deploy your infrastructure:

  • Create a Artifact Registry repository to store the Docker images.
      resource 
      
     "google_artifact_registry_repository" 
      
     "main" 
      
     { 
      
     location 
      
     = 
      
     "us" 
      
     repository_id 
      
     = 
      
     "main" 
      
     format 
      
     = 
      
     "DOCKER" 
      
     project 
      
     = 
      
     var.project_id 
     } 
     resource 
      
     "google_artifact_registry_repository_iam_binding" 
      
     "binding" 
      
     { 
      
     provider 
      
     = 
      
     google-beta 
      
     project 
      
     = 
      
     google_artifact_registry_repository.main.project 
      
     location 
      
     = 
      
     google_artifact_registry_repository.main.location 
      
     repository 
      
     = 
      
     google_artifact_registry_repository.main.name 
      
     role 
      
     = 
      
     "roles/artifactregistry.reader" 
      
     members 
      
     = 
      
     [ 
      
     "serviceAccount:${module.gke-db1.service_account}" 
     , 
      
     ] 
     } 
     
    
  • Create the VPC network and subnet for the VM's network interface.
      module 
      
     "gcp-network" 
      
     { 
      
     source 
      
     = 
      
     "terraform-google-modules/network/google" 
      
     version 
      
     = 
      
     "< 8.0.0" 
      
     project_id 
      
     = 
      
     var.project_id 
      
     network_name 
      
     = 
      
     "vpc-gke-postgresql" 
      
     subnets 
      
     = 
      
     [ 
      
     { 
      
     subnet_name 
      
     = 
      
     "snet-gke-postgresql-us-central1" 
      
     subnet_ip 
      
     = 
      
     "10.0.0.0/17" 
      
     subnet_region 
      
     = 
      
     "us-central1" 
      
     subnet_private_access 
      
     = 
      
     true 
      
     }, 
      
     { 
      
     subnet_name 
      
     = 
      
     "snet-gke-postgresql-us-west1" 
      
     subnet_ip 
      
     = 
      
     "10.0.128.0/17" 
      
     subnet_region 
      
     = 
      
     "us-west1" 
      
     subnet_private_access 
      
     = 
      
     true 
      
     }, 
      
     ] 
      
     secondary_ranges 
      
     = 
      
     { 
      
     ( 
     "snet-gke-postgresql-us-central1" 
     ) 
      
     = 
      
     [ 
      
     { 
      
     range_name 
      
     = 
      
     "ip-range-pods-db1" 
      
     ip_cidr_range 
      
     = 
      
     "192.168.0.0/18" 
      
     }, 
      
     { 
      
     range_name 
      
     = 
      
     "ip-range-svc-db1" 
      
     ip_cidr_range 
      
     = 
      
     "192.168.64.0/18" 
      
     }, 
      
     ], 
      
     ( 
     "snet-gke-postgresql-us-west1" 
     ) 
      
     = 
      
     [ 
      
     { 
      
     range_name 
      
     = 
      
     "ip-range-pods-db2" 
      
     ip_cidr_range 
      
     = 
      
     "192.168.128.0/18" 
      
     }, 
      
     { 
      
     range_name 
      
     = 
      
     "ip-range-svc-db2" 
      
     ip_cidr_range 
      
     = 
      
     "192.168.192.0/18" 
      
     }, 
      
     ] 
      
     } 
     } 
     output 
      
     "network_name" 
      
     { 
      
     value 
      
     = 
      
     module.gcp-network.network_name 
     } 
     output 
      
     "primary_subnet_name" 
      
     { 
      
     value 
      
     = 
      
     module.gcp-network.subnets_names[0 
     ] 
     } 
     output 
      
     "secondary_subnet_name" 
      
     { 
      
     value 
      
     = 
      
     module.gcp-network.subnets_names[1 
     ] 
     } 
     
    
  • Create a primary GKE cluster.

    Terraform creates a private cluster in the us-central1 region, and enables Backup for GKE for disaster recovery and Managed Service for Prometheus for cluster monitoring.

      module 
      
     "gke-db1" 
      
     { 
      
     source 
      
     = 
      
     "../modules/beta-private-cluster" 
      
     project_id 
      
     = 
      
     var.project_id 
      
     name 
      
     = 
      
     "cluster-db1" 
      
     regional 
      
     = 
      
     true 
      
     region 
      
     = 
      
     "us-central1" 
      
     network 
      
     = 
      
     module.network.network_name 
      
     subnetwork 
      
     = 
      
     module.network.primary_subnet_name 
      
     ip_range_pods 
      
     = 
      
     "ip-range-pods-db1" 
      
     ip_range_services 
      
     = 
      
     "ip-range-svc-db1" 
      
     create_service_account 
      
     = 
      
     true 
      
     enable_private_endpoint 
      
     = 
      
     false 
      
     enable_private_nodes 
      
     = 
      
     true 
      
     master_ipv4_cidr_block 
      
     = 
      
     "172.16.0.0/28" 
      
     network_policy 
      
     = 
      
     true 
      
     cluster_autoscaling 
      
     = 
      
     { 
      
     "autoscaling_profile" 
     : 
      
     "OPTIMIZE_UTILIZATION" 
     , 
      
     "enabled" 
      
     : 
      
     true 
     , 
      
     "gpu_resources" 
      
     : 
      
     [], 
      
     "min_cpu_cores" 
      
     : 
      
     36 
     , 
      
     "min_memory_gb" 
      
     : 
      
     144 
     , 
      
     "max_cpu_cores" 
      
     : 
      
     48 
     , 
      
     "max_memory_gb" 
      
     : 
      
     192 
     , 
      
     } 
      
     monitoring_enable_managed_prometheus 
      
     = 
      
     true 
      
     gke_backup_agent_config 
      
     = 
      
     true 
      
     node_pools 
      
     = 
      
     [ 
      
     { 
      
     name 
      
     = 
      
     "pool-sys" 
      
     autoscaling 
      
     = 
      
     true 
      
     min_count 
      
     = 
      
     1 
      
     max_count 
      
     = 
      
     3 
      
     max_surge 
      
     = 
      
     1 
      
     max_unavailable 
      
     = 
      
     0 
      
     machine_type 
      
     = 
      
     "e2-standard-4" 
      
     node_locations 
      
     = 
      
     "us-central1-a,us-central1-b,us-central1-c" 
      
     auto_repair 
      
     = 
      
     true 
      
     }, 
      
     { 
      
     name 
      
     = 
      
     "pool-db" 
      
     autoscaling 
      
     = 
      
     true 
      
     max_surge 
      
     = 
      
     1 
      
     max_unavailable 
      
     = 
      
     0 
      
     machine_type 
      
     = 
      
     "e2-standard-8" 
      
     node_locations 
      
     = 
      
     "us-central1-a,us-central1-b,us-central1-c" 
      
     auto_repair 
      
     = 
      
     true 
      
     }, 
      
     ] 
      
     node_pools_labels 
      
     = 
      
     { 
      
     all 
      
     = 
      
     {} 
      
     pool-db 
      
     = 
      
     { 
      
     "app.stateful/component" 
      
     = 
      
     "postgresql" 
      
     } 
      
     pool-sys 
      
     = 
      
     { 
      
     "app.stateful/component" 
      
     = 
      
     "postgresql-pgpool" 
      
     } 
      
     } 
      
     node_pools_taints 
      
     = 
      
     { 
      
     all 
      
     = 
      
     [] 
      
     pool-db 
      
     = 
      
     [ 
      
     { 
      
     key 
      
     = 
      
     "app.stateful/component" 
      
     value 
      
     = 
      
     "postgresql" 
      
     effect 
      
     = 
      
     "NO_SCHEDULE" 
      
     }, 
      
     ], 
      
     pool-sys 
      
     = 
      
     [ 
      
     { 
      
     key 
      
     = 
      
     "app.stateful/component" 
      
     value 
      
     = 
      
     "postgresql-pgpool" 
      
     effect 
      
     = 
      
     "NO_SCHEDULE" 
      
     }, 
      
     ], 
      
     } 
      
     gce_pd_csi_driver 
      
     = 
      
     true 
     } 
     
    
  • Create a backup cluster in the us-west1 region for disaster recovery.

      module 
      
     "gke-db2" 
      
     { 
      
     source 
      
     = 
      
     "../modules/beta-private-cluster" 
      
     project_id 
      
     = 
      
     var.project_id 
      
     name 
      
     = 
      
     "cluster-db2" 
      
     regional 
      
     = 
      
     true 
      
     region 
      
     = 
      
     "us-west1" 
      
     network 
      
     = 
      
     module.network.network_name 
      
     subnetwork 
      
     = 
      
     module.network.secondary_subnet_name 
      
     ip_range_pods 
      
     = 
      
     "ip-range-pods-db2" 
      
     ip_range_services 
      
     = 
      
     "ip-range-svc-db2" 
      
     create_service_account 
      
     = 
      
     false 
      
     service_account 
      
     = 
      
     module.gke-db1.service_account 
      
     enable_private_endpoint 
      
     = 
      
     false 
      
     enable_private_nodes 
      
     = 
      
     true 
      
     master_ipv4_cidr_block 
      
     = 
      
     "172.16.0.16/28" 
      
     network_policy 
      
     = 
      
     true 
      
     cluster_autoscaling 
      
     = 
      
     { 
      
     "autoscaling_profile" 
     : 
      
     "OPTIMIZE_UTILIZATION" 
     , 
      
     "enabled" 
      
     : 
      
     true 
     , 
      
     "gpu_resources" 
      
     : 
      
     [], 
      
     "min_cpu_cores" 
      
     : 
      
     10 
     , 
      
     "min_memory_gb" 
      
     : 
      
     144 
     , 
      
     "max_cpu_cores" 
      
     : 
      
     48 
     , 
      
     "max_memory_gb" 
      
     : 
      
     192 
     , 
      
     } 
      
     monitoring_enable_managed_prometheus 
      
     = 
      
     true 
      
     gke_backup_agent_config 
      
     = 
      
     true 
      
     node_pools 
      
     = 
      
     [ 
      
     { 
      
     name 
      
     = 
      
     "pool-sys" 
      
     autoscaling 
      
     = 
      
     true 
      
     min_count 
      
     = 
      
     1 
      
     max_count 
      
     = 
      
     3 
      
     max_surge 
      
     = 
      
     1 
      
     max_unavailable 
      
     = 
      
     0 
      
     machine_type 
      
     = 
      
     "e2-standard-4" 
      
     node_locations 
      
     = 
      
     "us-west1-a,us-west1-b,us-west1-c" 
      
     auto_repair 
      
     = 
      
     true 
      
     }, 
      
     { 
      
     name 
      
     = 
      
     "pool-db" 
      
     autoscaling 
      
     = 
      
     true 
      
     max_surge 
      
     = 
      
     1 
      
     max_unavailable 
      
     = 
      
     0 
      
     machine_type 
      
     = 
      
     "e2-standard-8" 
      
     node_locations 
      
     = 
      
     "us-west1-a,us-west1-b,us-west1-c" 
      
     auto_repair 
      
     = 
      
     true 
      
     }, 
      
     ] 
      
     node_pools_labels 
      
     = 
      
     { 
      
     all 
      
     = 
      
     {} 
      
     pool-db 
      
     = 
      
     { 
      
     "app.stateful/component" 
      
     = 
      
     "postgresql" 
      
     } 
      
     pool-sys 
      
     = 
      
     { 
      
     "app.stateful/component" 
      
     = 
      
     "postgresql-pgpool" 
      
     } 
      
     } 
      
     node_pools_taints 
      
     = 
      
     { 
      
     all 
      
     = 
      
     [] 
      
     pool-db 
      
     = 
      
     [ 
      
     { 
      
     key 
      
     = 
      
     "app.stateful/component" 
      
     value 
      
     = 
      
     "postgresql" 
      
     effect 
      
     = 
      
     "NO_SCHEDULE" 
      
     }, 
      
     ], 
      
     pool-sys 
      
     = 
      
     [ 
      
     { 
      
     key 
      
     = 
      
     "app.stateful/component" 
      
     value 
      
     = 
      
     "postgresql-pgpool" 
      
     effect 
      
     = 
      
     "NO_SCHEDULE" 
      
     }, 
      
     ], 
      
     } 
      
     gce_pd_csi_driver 
      
     = 
      
     true 
     } 
     
    

Deploy PostgreSQL on your cluster

In this section, you'll deploy a PostgreSQL database instance to run on GKE by using a Helm chart.

Install PostgreSQL

To install PostgreSQL on your cluster, follow these steps.

  1. Configure Docker access.

     gcloud  
    auth  
    configure-docker  
    us-docker.pkg.dev 
    
  2. Populate Artifact Registry with the required PostgreSQL Docker images.

     ./scripts/gcr.sh  
    bitnami/postgresql-repmgr  
     15 
    .1.0-debian-11-r0
    ./scripts/gcr.sh  
    bitnami/postgres-exporter  
     0 
    .11.1-debian-11-r27
    ./scripts/gcr.sh  
    bitnami/pgpool  
     4 
    .3.3-debian-11-r28 
    

    The script pushes the following Bitnami images to the Artifact Registry for Helm to install:

  3. Verify that the correct images are stored in the repo.

     gcloud  
    artifacts  
    docker  
    images  
    list  
    us-docker.pkg.dev/ $PROJECT_ID 
    /main  
     \ 
      
    --format = 
     "flattened(package)" 
     
    

    The output is similar to the following:

     ---
    image: us-docker.pkg.dev/[PROJECT_ID]/main/bitnami/pgpool
    ---
    image: us-docker.pkg.dev/[PROJECT_ID]/main/bitnami/postgres-exporter
    ---
    image: us-docker.pkg.dev/h[PROJECT_ID]/main/bitnami/postgresql-repmgr 
    
  4. Configure kubectl command line access to the primary cluster.

     gcloud  
    container  
    clusters  
    get-credentials  
     $SOURCE_CLUSTER 
      
     \ 
    --location = 
     $REGION 
      
    --project = 
     $PROJECT_ID 
     
    
  5. Create a namespace.

      export 
      
     NAMESPACE 
     = 
    postgresql
    kubectl  
    create  
    namespace  
     $NAMESPACE 
     
    
  6. If you are deploying to an Autopilot cluster, configure node provisioning across three zones. You can skip this step if you are deploying to a Standard cluster.

    By default, Autopilot provisions resources in only two zones. The deployment defined in prepareforha.yaml ensures that Autopilot provisions nodes across three zones in your cluster, by setting these values:

    • replicas:3
    • podAntiAffinity with requiredDuringSchedulingIgnoredDuringExecution and topologyKey: "topology.kubernetes.io/zone"
     kubectl  
    -n  
     $NAMESPACE 
      
    apply  
    -f  
    scripts/prepareforha.yaml 
    
      apiVersion 
     : 
      
     apps/v1 
     kind 
     : 
      
     Deployment 
     metadata 
     : 
      
     name 
     : 
      
     prepare-three-zone-ha 
      
     labels 
     : 
      
     app 
     : 
      
     prepare-three-zone-ha 
      
     app.kubernetes.io/name 
     : 
      
     postgresql-ha 
     spec 
     : 
      
     replicas 
     : 
      
     3 
      
     selector 
     : 
      
     matchLabels 
     : 
      
     app 
     : 
      
     prepare-three-zone-ha 
      
     app.kubernetes.io/name 
     : 
      
     postgresql-ha 
      
     template 
     : 
      
     metadata 
     : 
      
     labels 
     : 
      
     app 
     : 
      
     prepare-three-zone-ha 
      
     app.kubernetes.io/name 
     : 
      
     postgresql-ha 
      
     spec 
     : 
      
     affinity 
     : 
      
     podAntiAffinity 
     : 
      
     requiredDuringSchedulingIgnoredDuringExecution 
     : 
      
     - 
      
     labelSelector 
     : 
      
     matchExpressions 
     : 
      
     - 
      
     key 
     : 
      
     app 
      
     operator 
     : 
      
     In 
      
     values 
     : 
      
     - 
      
     prepare-three-zone-ha 
      
     topologyKey 
     : 
      
     "topology.kubernetes.io/zone" 
      
     nodeAffinity 
     : 
      
     preferredDuringSchedulingIgnoredDuringExecution 
     : 
      
     - 
      
     preference 
     : 
      
     matchExpressions 
     : 
      
     - 
      
     key 
     : 
      
     cloud.google.com/compute-class 
      
     operator 
     : 
      
     In 
      
     values 
     : 
      
     - 
      
     "Scale-Out" 
      
     weight 
     : 
      
     1 
      
     nodeSelector 
     : 
      
     app.stateful/component 
     : 
      
     postgresql 
      
     tolerations 
     : 
      
     - 
      
     effect 
     : 
      
     NoSchedule 
      
     key 
     : 
      
     app.stateful/component 
      
     operator 
     : 
      
     Equal 
      
     value 
     : 
      
     postgresql 
      
     containers 
     : 
      
     - 
      
     name 
     : 
      
     prepare-three-zone-ha 
      
     image 
     : 
      
     busybox:latest 
      
     command 
     : 
      
     - 
      
     "/bin/sh" 
      
     - 
      
     "-c" 
      
     - 
      
     "while 
      
     true; 
      
     do 
      
     sleep 
      
     3600; 
      
     done" 
      
     resources 
     : 
      
     limits 
     : 
      
     cpu 
     : 
      
     "500m" 
      
     ephemeral-storage 
     : 
      
     "10Mi" 
      
     memory 
     : 
      
     "0.5Gi" 
      
     requests 
     : 
      
     cpu 
     : 
      
     "500m" 
      
     ephemeral-storage 
     : 
      
     "10Mi" 
      
     memory 
     : 
      
     "0.5Gi" 
     
    
  7. Update the Helm dependency.

      cd 
      
    helm/postgresql-bootstrap
    helm  
    dependency  
    update 
    
  8. Inspect and verify the charts that Helm will install.

     helm  
    -n  
    postgresql  
    template  
    postgresql  
    .  
     \ 
      
    --set  
    global.imageRegistry = 
     "us-docker.pkg.dev/ 
     $PROJECT_ID 
     /main" 
     
    
  9. Install the Helm chart.

     helm  
    -n  
    postgresql  
    upgrade  
    --install  
    postgresql  
    .  
     \ 
      
    --set  
    global.imageRegistry = 
     "us-docker.pkg.dev/ 
     $PROJECT_ID 
     /main" 
     
    

    The output is similar to the following:

     NAMESPACE: postgresql
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None 
    
  10. Verify that the PostgreSQL replicas are running.

     kubectl  
    get  
    all  
    -n  
     $NAMESPACE 
     
    

    The output is similar to the following:

     NAME                                                          READY   STATUS    RESTARTS   AGE
    pod/postgresql-postgresql-bootstrap-pgpool-75664444cb-dkl24   1/1     Running   0          8m39s
    pod/postgresql-postgresql-ha-pgpool-6d86bf9b58-ff2bg          1/1     Running   0          8m39s
    pod/postgresql-postgresql-ha-postgresql-0                     2/2     Running   0          8m39s
    pod/postgresql-postgresql-ha-postgresql-1                     2/2     Running   0          8m39s
    pod/postgresql-postgresql-ha-postgresql-2                     2/2     Running   0          8m38s
    
    NAME                                                   TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)    AGE
    service/postgresql-postgresql-ha-pgpool                ClusterIP   192.168.99.236    <none>        5432/TCP   8m39s
    service/postgresql-postgresql-ha-postgresql            ClusterIP   192.168.90.20     <none>        5432/TCP   8m39s
    service/postgresql-postgresql-ha-postgresql-headless   ClusterIP   None              <none>        5432/TCP   8m39s
    service/postgresql-postgresql-ha-postgresql-metrics    ClusterIP   192.168.127.198   <none>        9187/TCP   8m39s
    
    NAME                                                     READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/postgresql-postgresql-bootstrap-pgpool   1/1     1            1           8m39s
    deployment.apps/postgresql-postgresql-ha-pgpool          1/1     1            1           8m39s
    
    NAME                                                                DESIRED   CURRENT   READY   AGE
    replicaset.apps/postgresql-postgresql-bootstrap-pgpool-75664444cb   1         1         1       8m39s
    replicaset.apps/postgresql-postgresql-ha-pgpool-6d86bf9b58          1         1         1       8m39s
    
    NAME                                                   READY   AGE
    statefulset.apps/postgresql-postgresql-ha-postgresql   3/3     8m39s 
    

Create a test dataset

In this section, you'll create a database and a table with sample values. The database serves as a test dataset for the failover process you'll test later in this tutorial.

  1. Connect to your PostgreSQL instance.

      cd 
      
    ../../
    ./scripts/launch-client.sh 
    

    The output is similar to the following:

     Launching Pod pg-client in the namespace postgresql ...
    pod/pg-client created
    waiting for the Pod to be ready
    Copying script files to the target Pod pg-client ...
    Pod: pg-client is healthy 
    
  2. Start a shell session.

     kubectl  
     exec 
      
    -it  
    pg-client  
    -n  
    postgresql  
    --  
    /bin/bash 
    
  3. Create a database and a table, and then insert some test rows.

     psql  
    -h  
     $HOST_PGPOOL 
      
    -U  
    postgres  
    -a  
    -q  
    -f  
    /tmp/scripts/generate-db.sql 
    
  4. Verify the number of rows for each table.

     psql  
    -h  
     $HOST_PGPOOL 
      
    -U  
    postgres  
    -a  
    -q  
    -f  
    /tmp/scripts/count-rows.sql 
    

    The output is similar to the following:

     select COUNT(*) from tb01;
     count
    --------
     300000
    (1 row)
    
    select COUNT(*) from tb02;
     count
    --------
     300000
    (1 row) 
    
  5. Generate test data.

      export 
      
     DB 
     = 
    postgres
    pgbench  
    -i  
    -h  
     $HOST_PGPOOL 
      
    -U  
    postgres  
     $DB 
      
    -s  
     50 
     
    

    The output is similar to the following:

     dropping old tables...
    creating tables...
    generating data (client-side)...
    5000000 of 5000000 tuples (100%) done (elapsed 29.85 s, remaining 0.00 s)
    vacuuming...
    creating primary keys...
    done in 36.86 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 31.10 s, vacuum 1.88 s, primary keys 3.86 s). 
    
  6. Exit the postgres client Pod.

      exit 
     
    

Monitor PostgreSQL

In this section, you'll view metrics and set up alerts for your PostgreSQL instance. You'll use Google Cloud Managed Service for Prometheus to perform monitoring and alerting.

View metrics

Your PostgreSQL deployment includes a postgresql-exporter sidecar container. This container exposes a /metrics endpoint. Google Cloud Managed Service for Prometheus is configured to monitor the PostgreSQL Pods on this endpoint. You can view these metrics through Google Cloud console dashboards .

The Google Cloud console provides a few ways to create and save dashboard configuration:

  • Creation and Export: You can create dashboards directly in Google Cloud console, then export and store them in a code repository. To do this, in the dashboard toolbar, open the JSON editor and download the dashboard JSON file.
  • Storage and Import: You can import a dashboard from a JSON file by clicking +Create Dashboard and uploading the dashboard's JSON content using the JSON editor menu).

To visualize data from your PostgreSQL application and GKE cluster, follow these steps:

  1. Create the following dashboards.

      cd 
      
    monitoring
    gcloud  
    monitoring  
    dashboards  
    create  
     \ 
      
    --config-from-file = 
    dashboard/postgresql-overview.json  
     \ 
      
    --project = 
     $PROJECT_ID 
    gcloud  
    monitoring  
    dashboards  
    create  
     \ 
      
    --config-from-file  
    dashboard/gke-postgresql.json  
     \ 
      
    --project  
     $PROJECT_ID 
     
    
  2. In the Google Cloud console, navigate to the Cloud Monitoring Dashboard. Go to the Cloud Monitoring Dashboard

  3. Select Customfrom the dashboard list. The following dashboards appear:

    • PostgreSQL Overview: Displays metrics from the PostgreSQL application, including database uptime, database size, and transaction latency.
    • GKE PostgreSQL Cluster: Displays metrics from the GKE cluster that PostgreSQL is running on, including CPU usage, memory usage, and volume utilization.
  4. Click on each link to examine the dashboards generated.

Set up alerts

Alerting gives you timely awareness of problems in your applications so you can resolve the problems quickly. You can create an alerting policy to specify the circumstances under which you want to be alerted and how you want to be notified. You can also create notification channels that let you select where alerts are sent.

In this section, you'll use Terraform to configure the following example alerts:

  • db_max_transaction : Monitors the max lag of transactions in seconds; an alert will be triggered if the value is greater than 10.
  • db_node_up : Monitors the status of database Pods; 0 means a Pod is down and triggers an alert.

To set up alerts, follow these steps:

  1. Configure alerts with Terraform.

      EMAIL 
     = 
     YOUR_EMAIL 
     cd 
      
    alerting/terraform
    terraform  
    init
    terraform  
    plan  
    -var  
     project_id 
     = 
     $PROJECT_ID 
      
    -var  
     email_address 
     = 
     $EMAIL 
    terraform  
    apply  
    -var  
     project_id 
     = 
     $PROJECT_ID 
      
    -var  
     email_address 
     = 
     $EMAIL 
     
    

    Replace the following values:

    • YOUR_EMAIL : your email address.

    The output is similar to the following :

     Apply complete! Resources: 3 added, 0 changed, 0 destroyed. 
    
  2. Connect to the client Pod.

      cd 
      
    ../../../
    kubectl  
     exec 
      
    -it  
    --namespace  
    postgresql  
    pg-client  
    --  
    /bin/bash 
    
  3. Generate a load test to test the db_max_transaction alert.

     pgbench  
    -i  
    -h  
     $HOST_PGPOOL 
      
    -U  
    postgres  
    -s  
     200 
      
    postgres 
    

    The output is similar to the following:

     dropping old tables...
    creating tables...
    generating data (client-side)...
    20000000 of 20000000 tuples (100%) done (elapsed 163.22 s, remaining 0.00 s)
    vacuuming...
    creating primary keys...
    done in 191.30 s (drop tables 0.14 s, create tables 0.01 s, client-side generate 165.62 s, vacuum 4.52 s, primary keys 21.00 s). 
    

    The alert triggers and sends an email to YOUR_EMAIL with a subject line that starts with "[ALERT] Max Lag of transaction".

  4. In the Google Cloud console, navigate to the Alert Policy page.

    Go to Alert Policy

  5. Select db_max_transaction from the listed policies. From the chart, you should see a spike from the load test which exceeds the threshold hold of 10 for the Prometheus metric pg_stat_activity_max_tx_duration/gauge .

  6. Exit the postgres client Pod.

      exit 
     
    

Manage PostgreSQL and GKE upgrades

Version updates for both PostgreSQL and Kubernetes are released on a regular schedule. Follow operational best practices to update your software environment regularly. By default, GKE manages cluster and node pool upgrades for you.

Upgrade PostgreSQL

This section shows how you can perform a version upgrade for PostgreSQL. For this tutorial, you'll use a rolling update strategy for upgrading your Pods, so that at no point all of the Pods are down.

To perform a version upgrade, follow these steps:

  1. Push an updated version of the postgresql-repmgr image to Artifact Registry. Define the new version (for example, postgresql-repmgr 15.1.0-debian-11-r1 ).

      NEW_IMAGE 
     = 
    us-docker.pkg.dev/ $PROJECT_ID 
    /main/bitnami/postgresql-repmgr:15.1.0-debian-11-r1
    ./scripts/gcr.sh  
    bitnami/postgresql-repmgr  
     15 
    .1.0-debian-11-r1 
    
  2. Trigger a rolling update using kubectl .

     kubectl  
     set 
      
    image  
    statefulset  
    -n  
    postgresql  
    postgresql-postgresql-ha-postgresql  
     postgresql 
     = 
     $NEW_IMAGE 
    kubectl  
    rollout  
    restart  
    statefulsets  
    -n  
    postgresql  
    postgresql-postgresql-ha-postgresql
    kubectl  
    rollout  
    status  
    statefulset  
    -n  
    postgresql  
    postgresql-postgresql-ha-postgresql 
    

    You will see the StatefulSet complete a rolling update , starting with the highest ordinal replica to the lowest.

    The output is similar to the following:

     Waiting for 1 pods to be ready...
    waiting for statefulset rolling update to complete 1 pods at revision postgresql-postgresql-ha-postgresql-5c566ccf49...
    Waiting for 1 pods to be ready...
    Waiting for 1 pods to be ready...
    waiting for statefulset rolling update to complete 2 pods at revision postgresql-postgresql-ha-postgresql-5c566ccf49...
    Waiting for 1 pods to be ready...
    Waiting for 1 pods to be ready...
    statefulset rolling update complete 3 pods at revision postgresql-postgresql-ha-postgresql-5c566ccf49... 
    

Plan for GKE upgrades on Standard clusters

This section is applicable if you are running Standard clusters. You can take proactive steps and set configurations to mitigate risk and facilitate a smoother cluster upgrade when you are running stateful services, including:

  • Follow GKE best practices for upgrading clusters . Choose an appropriate upgrade strategy to ensure the upgrades happen during the period of the maintenance window:

    • Choose surge upgrades if cost optimization is important and if your workloads can tolerate a graceful shutdown in less than 60 minutes.
    • Choose blue-green upgrades if your workloads are less tolerant of disruptions, and a temporary cost increase due to higher resource usage is acceptable.

    To learn more, see Upgrade a cluster running a stateful workload .

  • Use the Recommender service to check for deprecation insights and recommendations to avoid service interruptions.

  • Use maintenance windows to ensure upgrades happen when you intend them. Before the maintenance window, ensure your database backups are successful.

  • Before allowing traffic to the upgraded nodes, use readiness and liveness probes to ensure they are ready for traffic.

  • Create Probes that assess whether replication is in sync before accepting traffic. This can be done through custom scripts, depending on the complexity and scale of your database.

Verify database availability during Standard cluster upgrades

This section is applicable if you are running Standard clusters. To verify PostgreSQL availability during upgrades, the general process is to generate traffic against the PostgreSQL database during the upgrade process. Then, use pgbench to check that the database can handle a baseline level of traffic during an upgrade, compared to when the database is fully available.

  1. Connect to your PostgreSQL instance.

     ./scripts/launch-client.sh 
    

    The output is similar to the following:

     Launching Pod pg-client in the namespace postgresql ...
    pod/pg-client created
    waiting for the Pod to be ready
    Copying script files to the target Pod pg-client ...
    Pod: pg-client is healthy 
    
  2. In Cloud Shell, shell into the client Pod.

     kubectl  
     exec 
      
    -it  
    -n  
    postgresql  
    pg-client  
    --  
    /bin/bash 
    
  3. Initialize pgbench .

     pgbench  
    -i  
    -h  
     $HOST_PGPOOL 
      
    -U  
    postgres  
    postgres 
    
  4. Use the following command to get baseline results for confirming that your PostgreSQL application stays highly-available during the time window for an upgrade. To get a baseline result, test with multi-connections via multi jobs (threads) for 30 seconds.

     pgbench  
    -h  
     $HOST_PGPOOL 
      
    -U  
    postgres  
    postgres  
    -c10  
    -j4  
    -T  
     30 
      
    -R  
     200 
     
    

    The output looks similar to the following:

     pgbench (14.5)
    starting vacuum...end.
    transaction type: <builtin: TPC-B (sort of)>
    scaling factor: 1
    query mode: simple
    number of clients: 10
    number of threads: 4
    duration: 30 s
    number of transactions actually processed: 5980
    latency average = 7.613 ms
    latency stddev = 2.898 ms
    rate limit schedule lag: avg 0.256 (max 36.613) ms
    initial connection time = 397.804 ms
    tps = 201.955497 (without initial connection time) 
    
  5. To ensure availability during upgrades, you can generate some load against your database, and ensure that the PostgreSQL application provides a consistent response rate during the upgrade. To perform this test, generate some traffic against the database, using the pgbench command. The following command will run pgbench for one hour, targeting 200 TPS (transactions per second), and listing the request rate every 2 seconds.

     pgbench  
    -h  
     $HOST_PGPOOL 
      
    -U  
    postgres  
    postgres  
    --client = 
     10 
      
    --jobs = 
     4 
      
    --rate = 
     200 
      
    --time = 
     3600 
      
    --progress = 
     2 
      
    --select-only 
    

    Where:

    • --client : Number of clients simulated, that is, number of concurrent database sessions.
    • --jobs : Number of worker threads within pgbench. Using more than one thread can be helpful on multi-CPU machines. Clients are distributed as evenly as possible among available threads. The default is 1.
    • --rate : The rate is given in transactions per second
    • --progress : Show progress report every sec seconds.

    The output is similar to the following:

     pgbench  
     ( 
     14 
    .5 ) 
    starting  
    vacuum...end.
    progress:  
     5 
    .0  
    s,  
     354 
    .8  
    tps,  
    lat  
     25 
    .222  
    ms  
    stddev  
     15 
    .038
    progress:  
     10 
    .0  
    s,  
     393 
    .8  
    tps,  
    lat  
     25 
    .396  
    ms  
    stddev  
     16 
    .459
    progress:  
     15 
    .0  
    s,  
     412 
    .8  
    tps,  
    lat  
     24 
    .216  
    ms  
    stddev  
     14 
    .548
    progress:  
     20 
    .0  
    s,  
     405 
    .0  
    tps,  
    lat  
     24 
    .656  
    ms  
    stddev  
     14 
    .066 
    
  6. In the Google Cloud console, navigate back to the PostgreSQL Overviewdashboard in Cloud Monitoring. Notice the spike on the Connection per DBand Connection per Podgraphs.

  7. Exit the client Pod.

      exit 
     
    
  8. Delete the client Pod.

     kubectl  
    delete  
    pod  
    -n  
    postgresql  
    pg-client 
    

Simulate a PostgreSQL service disruption

In this section, you'll simulate a service disruption in one of the PostgreSQL replicas by stopping the replication manager service. This will prevent the Pod from serving traffic to its peer replicas and its liveness probes to fail.

  1. Open a new Cloud Shell session and configure kubectl command line access to the primary cluster.

     gcloud  
    container  
    clusters  
    get-credentials  
     $SOURCE_CLUSTER 
      
     \ 
    --location = 
     $REGION 
      
    --project = 
     $PROJECT_ID 
     
    
  2. View the PostgreSQL events emitted in Kubernetes.

     kubectl  
    get  
    events  
    -n  
    postgresql  
    --field-selector = 
    involvedObject.name = 
    postgresql-postgresql-ha-postgresql-0  
    --watch 
    
  3. In the earlier Cloud Shell session, simulate a service failure by stopping PostgreSQL repmgr .

    1. Attach your session to the database container.

       kubectl  
       exec 
        
      -it  
      -n  
       $NAMESPACE 
        
      postgresql-postgresql-ha-postgresql-0  
      -c  
      postgresql  
      --  
      /bin/bash 
      
    2. Stop the service using repmgr , and remove the checkpoint and the dry-run argument.

        export 
        
       ENTRY 
       = 
       '/opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh' 
       export 
        
       RCONF 
       = 
       '/opt/bitnami/repmgr/conf/repmgr.conf' 
       $ENTRY 
        
      repmgr  
      -f  
       $RCONF 
        
      node  
      service  
      --action = 
      stop  
      --checkpoint 
      

The liveness probe configured for the PostgreSQL container will start to fail within five seconds. This repeats every ten seconds, until the failure threshold of six failures is reached. Once the failureThreshold value is reached, the container is restarted. You can configure these parameters to decrease the liveness probe tolerance to tune the SLO requirements of your deployment.

From the event stream, you will see the Pod's liveness and readiness probes fail, and a message that the container needs to be restarted. The output is similar to the following:

 0s          Normal    Killing                pod/postgresql-postgresql-ha-postgresql-0   Container postgresql failed liveness probe, will be restarted
0s          Warning   Unhealthy              pod/postgresql-postgresql-ha-postgresql-0   Readiness probe failed: psql: error: connection to server at "127.0.0.1", port 5432 failed: Connection refused...
0s          Normal    Pulled                 pod/postgresql-postgresql-ha-postgresql-0   Container image "us-docker.pkg.dev/psch-gke-dev/main/bitnami/postgresql-repmgr:14.5.0-debian-11-r10" already present on machine
0s          Normal    Created                pod/postgresql-postgresql-ha-postgresql-0   Created container postgresql
0s          Normal    Started                pod/postgresql-postgresql-ha-postgresql-0   Started container postgresql 

Prepare for disaster recovery

To ensure that your production workloads remain available in the event of a service-interrupting event, you should prepare a disaster recovery (DR) plan. To learn more about DR planning, see the Disaster recovery planning guide .

Disaster recovery for Kubernetes can be implemented in two phases:

  • Backup involves creating a point-in-time snapshot of your state or data before a service-interrupting event occurs.
  • Recovery involves restoring your state or data from a backup copy after the occurrence of a disaster.

To backup and restore your workloads on GKE clusters, you can use Backup for GKE . You can enable this service on new and existing clusters . This deploys a Backup for GKE agent that runs in your clusters; the agent is responsible for capturing configuration and volume backup data and orchestrating recovery.

Backups and restores can be scoped to an entire cluster, a namespace, or an application (defined by selectors such as matchLabels ).

Example PostgreSQL backup and restore scenario

The example in this section shows how you can perform a backup and restore operation at the application scope, using the ProtectedApplication Custom Resource.

The following diagram shows the component resources in the ProtectedApplication, namely a StatefulSet representing the postgresql-ha application and a deployment of pgpool , which use the same label ( app.kubernetes.io/name: postgresql-ha ).

Diagram shows an example backup-and-recovery solution for a highly-available PostgreSQL cluster.
Figure 2 : Example backup-and-recovery solution for a highly-available PostgreSQL cluster.

To prepare to backup and restore your PostgreSQL workload, follow these steps:

  1. Set up the environment variables. In this example you'll use a ProtectedApplication to restore the PostgreSQL workload and its volumes from the source GKE cluster ( us-central1 ), then restore to another GKE cluster in a different region ( us-west1 ).

      export 
      
     SOURCE_CLUSTER 
     = 
    cluster-db1 export 
      
     TARGET_CLUSTER 
     = 
    cluster-db2 export 
      
     REGION 
     = 
    us-central1 export 
      
     DR_REGION 
     = 
    us-west1 export 
      
     NAME_PREFIX 
     = 
    g-db-protected-app export 
      
     BACKUP_PLAN_NAME 
     = 
     $NAME_PREFIX 
    -bkp-plan-01 export 
      
     BACKUP_NAME 
     = 
    bkp- $BACKUP_PLAN_NAME 
     export 
      
     RESTORE_PLAN_NAME 
     = 
     $NAME_PREFIX 
    -rest-plan-01 export 
      
     RESTORE_NAME 
     = 
    rest- $RESTORE_PLAN_NAME 
     
    
  2. Verify that Backup for GKE is enabled on your clusters. It should already be enabled as part of the Terraform setup you performed earlier.

     gcloud  
    container  
    clusters  
    describe  
     $SOURCE_CLUSTER 
      
     \ 
      
    --project = 
     $PROJECT_ID 
      
     \ 
      
    --location = 
     $REGION 
      
     \ 
      
    --format = 
     'value(addonsConfig.gkeBackupAgentConfig)' 
     
    

    If Backup for GKE is enabled, the output of the command shows enabled=True .

Set up a backup plan and perform a restore

Backup for GKE allows you to create a backup plan as a cron job. A backup plan contains a backup configuration including the source cluster, the selection of which workloads to back up, and the region in which backup artifacts produced under this plan are stored.

To perform a backup and restore, follow these steps:

  1. Verify the status of ProtectedApplication on cluster-db1 .

     kubectl  
    get  
    ProtectedApplication  
    -A 
    

    The output looks similar to the following:

     NAMESPACE    NAME            READY TO BACKUP
    postgresql   postgresql-ha   true 
    
  2. Create a backup plan for the ProtectedApplication.

      export 
      
     NAMESPACE 
     = 
    postgresql export 
      
     PROTECTED_APP 
     = 
     $( 
    kubectl  
    get  
    ProtectedApplication  
    -n  
     $NAMESPACE 
      
     | 
      
    grep  
    -v  
     'NAME' 
      
     | 
      
    awk  
     '{ print $1 }' 
     ) 
     
    
     gcloud  
    beta  
    container  
    backup-restore  
    backup-plans  
    create  
     $BACKUP_PLAN_NAME 
      
     \ 
    --project = 
     $PROJECT_ID 
      
     \ 
    --location = 
     $DR_REGION 
      
     \ 
    --cluster = 
    projects/ $PROJECT_ID 
    /locations/ $REGION 
    /clusters/ $SOURCE_CLUSTER 
      
     \ 
    --selected-applications = 
     $NAMESPACE 
    / $PROTECTED_APP 
      
     \ 
    --include-secrets  
     \ 
    --include-volume-data  
     \ 
    --cron-schedule = 
     "0 3 * * *" 
      
     \ 
    --backup-retain-days = 
     7 
      
     \ 
    --backup-delete-lock-days = 
     0 
     
    
  3. Manually create a backup.

     gcloud  
    beta  
    container  
    backup-restore  
    backups  
    create  
     $BACKUP_NAME 
      
     \ 
    --project = 
     $PROJECT_ID 
      
     \ 
    --location = 
     $DR_REGION 
      
     \ 
    --backup-plan = 
     $BACKUP_PLAN_NAME 
      
     \ 
    --wait-for-completion 
    
  4. Set up a restore plan.

     gcloud  
    beta  
    container  
    backup-restore  
    restore-plans  
    create  
     $RESTORE_PLAN_NAME 
      
     \ 
      
    --project = 
     $PROJECT_ID 
      
     \ 
      
    --location = 
     $DR_REGION 
      
     \ 
      
    --backup-plan = 
    projects/ $PROJECT_ID 
    /locations/ $DR_REGION 
    /backupPlans/ $BACKUP_PLAN_NAME 
      
     \ 
      
    --cluster = 
    projects/ $PROJECT_ID 
    /locations/ $DR_REGION 
    /clusters/ $TARGET_CLUSTER 
      
     \ 
      
    --cluster-resource-conflict-policy = 
    use-existing-version  
     \ 
      
    --namespaced-resource-restore-mode = 
    delete-and-restore  
     \ 
      
    --volume-data-restore-policy = 
    restore-volume-data-from-backup  
     \ 
      
    --selected-applications = 
     $NAMESPACE 
    / $PROTECTED_APP 
      
     \ 
      
    --cluster-resource-scope-selected-group-kinds = 
     "storage.k8s.io/StorageClass" 
    , "scheduling.k8s.io/PriorityClass" 
     
    
  5. Restore from the backup.

     gcloud  
    beta  
    container  
    backup-restore  
    restores  
    create  
     $RESTORE_NAME 
      
     \ 
      
    --project = 
     $PROJECT_ID 
      
     \ 
      
    --location = 
     $DR_REGION 
      
     \ 
      
    --restore-plan = 
     $RESTORE_PLAN_NAME 
      
     \ 
      
    --backup = 
    projects/ $PROJECT_ID 
    /locations/ $DR_REGION 
    /backupPlans/ $BACKUP_PLAN_NAME 
    /backups/ $BACKUP_NAME 
      
     \ 
      
    --wait-for-completion 
    

Verify that your cluster is restored

To verify that the restored cluster has all the expected Pods, PersistentVolume, and StorageClass resources, follow these steps:

  1. Configure kubectl command line access to the backup cluster cluster-db2 .

     gcloud  
    container  
    clusters  
    get-credentials  
     $TARGET_CLUSTER 
      
    --location  
     $DR_REGION 
      
    --project  
     $PROJECT_ID 
     
    
  2. Verify that the StatefulSet is ready with 3/3 Pods.

     kubectl  
    get  
    all  
    -n  
     $NAMESPACE 
     
    

    The output is similar to the following:

     NAME                                                   READY   STATUS    RESTARTS        AGE
    pod/postgresql-postgresql-ha-pgpool-778798b5bd-k2q4b   1/1     Running   0               4m49s
    pod/postgresql-postgresql-ha-postgresql-0              2/2     Running   2 (4m13s ago)   4m49s
    pod/postgresql-postgresql-ha-postgresql-1              2/2     Running   0               4m49s
    pod/postgresql-postgresql-ha-postgresql-2              2/2     Running   0               4m49s
    
    NAME                                                   TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)    AGE
    service/postgresql-postgresql-ha-pgpool                ClusterIP   192.168.241.46    <none>        5432/TCP   4m49s
    service/postgresql-postgresql-ha-postgresql            ClusterIP   192.168.220.20    <none>        5432/TCP   4m49s
    service/postgresql-postgresql-ha-postgresql-headless   ClusterIP   None              <none>        5432/TCP   4m49s
    service/postgresql-postgresql-ha-postgresql-metrics    ClusterIP   192.168.226.235   <none>        9187/TCP   4m49s
    
    NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/postgresql-postgresql-ha-pgpool   1/1     1            1           4m49s
    
    NAME                                                         DESIRED   CURRENT   READY   AGE
    replicaset.apps/postgresql-postgresql-ha-pgpool-778798b5bd   1         1         1       4m49s
    
    NAME                                                   READY   AGE
    statefulset.apps/postgresql-postgresql-ha-postgresql   3/3     4m49s 
    
  3. Verify all Pods in the postgres namespace are running.

     kubectl  
    get  
    pods  
    -n  
     $NAMESPACE 
     
    

    The output is similar to the following:

     postgresql-postgresql-ha-pgpool-569d7b8dfc-2f9zx   1/1     Running   0          7m56s
    postgresql-postgresql-ha-postgresql-0              2/2     Running   0          7m56s
    postgresql-postgresql-ha-postgresql-1              2/2     Running   0          7m56s
    postgresql-postgresql-ha-postgresql-2              2/2     Running   0          7m56s 
    
  4. Verify the PersistentVolumes and StorageClass. During the restore process, Backup for GKE creates a Proxy Class in the target workload to replace the StorageClass provisioned in the source workload ( gce-pd-gkebackup-dn in the example output).

     kubectl  
    get  
    pvc  
    -n  
     $NAMESPACE 
     
    

    The output is similar to the following:

     NAME                                         STATUS   VOLUME                 CAPACITY   ACCESS MODES   STORAGECLASS          AGE
    data-postgresql-postgresql-ha-postgresql-0   Bound    pvc-be91c361e9303f96   8Gi        RWO            gce-pd-gkebackup-dn   10m
    data-postgresql-postgresql-ha-postgresql-1   Bound    pvc-6523044f8ce927d3   8Gi        RWO            gce-pd-gkebackup-dn   10m
    data-postgresql-postgresql-ha-postgresql-2   Bound    pvc-c9e71a99ccb99a4c   8Gi        RWO            gce-pd-gkebackup-dn   10m 
    

Validate that the expected data is restored

To validate that the expected data is restored, follow these steps:

  1. Connect to your PostgreSQL instance.

     ./scripts/launch-client.sh
    kubectl  
     exec 
      
    -it  
    pg-client  
    -n  
    postgresql  
    --  
    /bin/bash 
    
  2. Verify the number of rows for each table.

     psql  
    -h  
     $HOST_PGPOOL 
      
    -U  
    postgres  
    -a  
    -q  
    -f  
    /tmp/scripts/count-rows.sql select 
      
    COUNT ( 
    * ) 
      
    from  
    tb01 ; 
     
    

    You should see a similar result to the data you wrote earlier in the Create a test dataset . The output is similar to the following:

      300000 
     ( 
     1 
      
    row ) 
     
    
  3. Exit the client Pod.

      exit 
     
    

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

The easiest way to avoid billing is to delete the project you created for the tutorial.

  • In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  • In the project list, select the project that you want to delete, and then click Delete .
  • In the dialog, type the project ID, and then click Shut down to delete the project.
  • What's next

    Create a Mobile Website
    View Site in Mobile | Classic
    Share by: