Manage high availability

Select a documentation version: Manage high availability for your AlloyDB Omni clusters to add resilience to outages and failures through automated failovers and recovery mechanisms.

Limitations

  • Standby nodes can't be used as readable replicas.

  • Even if the dataplane is healthy, if the AlloyDB Omni cluster manager is down for more than 90 seconds by default, an automatic failover occurs. This duration can be configured in Configure high availability specifications using the HEALTHCHECK_PERIOD and AUTOFAILOVER_TRIGGER_THRESHOLD variables.

Configure high availability specification

To configure high availability, fill out the following information in your DBCluster specification:

  DBCluster 
 : 
  
 metadata 
 : 
  
 ... 
  
 spec 
 : 
  
 ... 
  
  availability 
 : 
  
 numberOfStandbys 
 : 
  
  NUMBER_OF_STANDBYS 
 
  
 enableAutoFailover 
 : 
  
 true 
  
 enableAutoHeal 
 : 
  
 true 
  
 replayReplicationSlotsOnStandbys 
 : 
  
 false 
  
 healthcheckPeriodSeconds 
 : 
  
  HEALTHCHECK_PERIOD 
 
  
 autoFailoverTriggerThreshold 
 : 
  
  AUTOFAILOVER_TRIGGER_THRESHOLD 
 
  
 autoHealTriggerThreshold 
 : 
  
  AUTOHEAL_TRIGGER_THRESHOLD 
 
 

Replace the following variables:

  • NUMBER_OF_STANDBYS : number of standby nodes to set up. Setting this value to 0 disables high availability. The maximum value is 5 . If you're not sure how many standby nodes you need, start with 2 for high resiliency.

  • (Optional) HEALTHCHECK_PERIOD : number of seconds to wait between each health check. The default value is 30 . The minimum value is 1 . The maximum value is 86400 (one day).

  • (Optional) AUTOFAILOVER_TRIGGER_THRESHOLD : number of times the health check can fail before a failover occurs. The default value is 3 . The minimum value is 0 , but if the value is set to 0 , AlloyDB Omni uses the default value.

    An automatic failover occurs if the healthcheck fails AUTOFAILOVER_TRIGGER_THRESHOLD times or for HEALTHCHECK_PERIOD * AUTOFAILOVER_TRIGGER_THRESHOLD seconds.

  • (Optional) AUTOHEAL_TRIGGER_THRESHOLD : number of times the health check can fail before auto-heal begins. The default value is 3 . The minimum value is 0 , but if the value is set to 0 , AlloyDB Omni uses the default value.

    An automatic recovery occurs if the healthcheck fails AUTOHEAL_TRIGGER_THRESHOLD times or for HEALTHCHECK_PERIOD * AUTOHEAL_TRIGGER_THRESHOLD seconds.

Apply your DBCluster specification

To apply your configured DBCluster specification, run one of the following command:

alloydbctl

 alloydbctl  
apply  
-d  
 " DEPLOYMENT_SPEC 
" 
  
-r  
 " DBCLUSTER_SPECIFICATION 
" 
 

Replace the following variables:

Ansible

 ansible-playbook  
 DBCLUSTER_PLAYBOOK 
  
-i  
 " DEPLOYMENT_SPEC 
" 
  
 \ 
  
-e  
 resource_spec 
 = 
 " DBCLUSTER_SPECIFICATION 
" 
 

Replace the following variables:

  • RESTORE_PLAYBOOK : path to the playbook that you created for your DBCluster CRD.

  • DEPLOYMENT_SPEC : path to the deployment specification you created in Install AlloyDB Omni components .

  • DBCLUSTER_SPECIFICATION : path to the DBCluster specification you created in Create a cluster .

Switchover to a standby instance

You can perform switchovers when you need to test your high availability setup or any other planned maintenance activities that require switching the primary and standby replica. Once the switchover occurs, the direction of replication and roles of the primary and standby are reversed.

Switchovers perform the following actions:

  1. AlloyDB Omni orchestrator takes the primary offline.

  2. AlloyDB Omni orchestrator promotes the standby to be the new primary.

  3. AlloyDB Omni orchestrator converts the primary into a standby.

  4. AlloyDB Omni starts the newly-converted standby.

Perform a switchover

To perform a switchover, complete the following steps:

  1. Verify that your primary and standby instances are healthy.

  2. Verify that the high availability status.phase is Ready .

    alloydbctl

     alloydbctl  
    get  
    -d  
     " DEPLOYMENT_SPEC 
    " 
      
    -t  
    DBCluster  
    -n  
     DBCLUSTER_SPECIFICATION 
      
    -o  
    yaml 
    

    Replace the following variables:

    Ansible

     ansible-playbook  
    status.yaml  
    -i  
     DEPLOYMENT_SPEC 
      
    -e  
     resource_type 
     = 
    DBCluster  
     \ 
      
    -e  
     resource_name 
     = 
     DBCLUSTER_SPECIFICATION 
     
    

    Replace the following variables:

  3. Create a Switchover specification using the following format:

      Switchover 
     : 
      
     metadata 
     : 
      
     name 
     : 
      
      SWITCHOVER_NAME 
     
      
     spec 
     : 
      
     dbClusterRef 
     : 
      
      DBCLUSTER_NAME 
     
      
     newPrimary 
     : 
      
      NEW_PRIMARY_NAME 
     
     
    

    Replace the following variables:

    • SWITCHOVER_NAME : name for this Switchover specification. For example, my-switchover-1 . This name must be unique every time a switchover is performed.

    • DBCLUSTER_NAME : name of your database cluster that you defined in Create a cluster .

    • (Optional) NEW_PRIMARY_NAME : is the standby instance that becomes the new primary. To map the instance name to a host, see the instanceList field in the status of the referenced DBCluster .

  4. If you're using Ansible, create a playbook for your Switchover specification.

      - 
      
     name 
     : 
      
      SWITCHOVER_PLAYBOOK_NAME 
     
      
     hosts 
     : 
      
     localhost 
      
     vars 
     : 
      
     ansible_become 
     : 
      
     true 
      
     ansible_user 
     : 
      
      ANSIBLE_USER 
     
      
     ansible_ssh_private_key_file 
     : 
      
      ANSIBLE_SSH_PRIVATE_KEY_FILE 
     
      
     roles 
     : 
      
     - 
      
     role 
     : 
      
     google.alloydbomni_orchestrator.switchover 
     
    

    Replace the following variables:

    • SWITCHOVER_PLAYBOOK_NAME : name of your Ansible playbook. For example, My Switchover .

    • ANSIBLE_USER : OS user that Ansible uses to log into your AlloyDB Omni nodes.

    • ANSIBLE_SSH_PRIVATE_KEY_FILE : private key Ansible uses to connect to your AlloyDB Omni nodes using SSH.

  5. Apply your Switchover specification.

    alloydbctl

     alloydbctl  
    apply  
    -d  
     " DEPLOYMENT_SPEC 
    " 
      
    -r  
     " SWITCHOVER_SPECIFICATION 
    " 
     
    

    Replace the following variables:

    • DEPLOYMENT_SPEC : path to the deployment specification you created in Install AlloyDB Omni components .

    • SWITCHOVER_SPECIFICATION : path to the Switchover specification you created in step three.

    Ansible

     ansible-playbook  
     SWITCHOVER_PLAYBOOK 
      
    -i  
     " DEPLOYMENT_SPEC 
    " 
      
     \ 
      
    -e  
     resource_spec 
     = 
     " SWITCHOVER_SPECIFICATION 
    " 
     
    

    Replace the following variables:

    • SWITCHOVER_PLAYBOOK : path to the playbook that you created for your Switchover CRD in step four.

    • DEPLOYMENT_SPEC : path to the deployment specification you created in Install AlloyDB Omni components .

    • SWITCHOVER_SPECIFICATION : path to the Switchover specification you created in step three.

Load balancer for high availability

The load balancer (HAProxy) achieves high availability by pairing its nodes with Keepalived and a virtual IP. Keepalived utilizes the Virtual Router Redundancy Protocol (VRRP) to control a floating, virtual IP. Database client applications connect to this virtual IP instead of the database node's IP address.

In configurations where a dedicated load balancer isn't used, Keepalived is installed directly on the database nodes. In this scenario, high availability is achieved by dynamically assigning the virtual IP to the current primary node, ensuring seamless failover if the primary becomes unavailable.

To establish a stable election, Keepalived assigns VRRP priorities to the database cluster nodes. The first load balancer node assumes the primary role with a higher Keepalived priority— for example, 110 . Subsequent nodes act as secondaries with a lower priority— for example, 100 .

To ensure that the virtual IP points to a healthy node, Keepalived runs continues health checks every two seconds. This verifies the state of the systemd HAProxy process. If the HAProxy service on the primary fails, Keepalived migrates the virtual IP to a healthy secondary node.

If the database node's membership changes, HAProxy and Keepalived automatically points to the new active database nodes. The underlying routing configuration updates without dropping live client connections.

Configure the load balancer

To configure the virtual IP for the load balancer nodes, add the following dbLoadBalancerOptions field to the primarySpec field in your DBCluster specification:

  DBCluster 
 : 
  
 spec 
 : 
  
 primarySpec 
 : 
  
 ... 
  
  dbLoadBalancerOptions 
 : 
  
 onprem 
 : 
  
 loadBalancerIP 
 : 
  
 " VIRTUAL_IP 
" 
  
 loadBalancerType 
 : 
  
 "internal" 
  
 loadBalancerInterface 
 : 
  
 " VIRTUAL_IP_INTERFACE 
" 
 

Replace the following variables:

  • VIRTUAL_IP : static IP address used for the floating, virtual IP. Database client applications use the IP address defined here. TO ensure that Keepalived can broadcast gratuitous ARPs successfully, this IP address must be available; can't loopback; and in the case of on-premises, belongs to the same subnet as your primary node interfaces.

  • VIRTUAL_IP_INTERFACE : network interface where VIRTUAL_IP is configured. The default value is eth0 .

Design a Mobile Site
View Site in Mobile | Classic
Share by: