Force-removing broken nodes in Google Distributed Cloud

When a node is broken and needs to be removed from a cluster for repair or replacement, you can force its removal from the cluster.

Force-removing nodes in Google Distributed Cloud 1.6.0

In the 1.6.0 release of Anthos on bare metal, removing a node from its parent node pool results in the following actions:

  • Draining the node
  • Removing the node from Kubernetes cluster
  • Resetting the machine to the state prior to installing Anthos on bare metal

However, if the node is inaccessible, node draining does not complete. The controller repeatedly attempts to drain the node, and never makes progress.

As a temporary workaround, you can apply the following changes to bypass the draining and resetting steps.

CAUTION: This operation requires your careful update of specific implementation fields in Kubernetes custom resources. Please do not proceed unless you are certain that the node is unrecoverable.

Perform these steps after you have physically removed the broken node from the node pool and applied the change in the admin cluster.

  1. Look up the cluster API machine custom resource in the admin cluster, where ADMIN_KUBECONFIG is the path to the kubeconfig file and CLUSTER_NAMESPACE is the affected cluster namespace:

    kubectl --kubeconfig ADMIN_KUBECONFIG 
    \
    -n CLUSTER_NAMESPACE 
    get ma 10.200.0.8

    The command returns results similar to the following:

     NAME 
      
     PROVIDERID 
      
     PHASE 
     10.200.0.8 
      
     baremetal 
     : 
     //10.200.0.8   Deleting 
    

    In this example, 10.200.0.8 is the IP address of the node that is stuck at the deleting phase.

  2. Edit the machine custom resource and add the machine.cluster.x-k8s.io/exclude-node-draining annotation. Note that annotation value itself does not matter, because as long as the key is present, draining will be skipped:

    kubectl --kubeconfig ADMIN_KUBECONFIG 
    -n CLUSTER_NAMESPACE 
    \
    annotate ma 10.200.0.8 machine.cluster.x-k8s.io/exclude-node-draining=true
  3. Look up the bare metal machine custom resource in the admin cluster:

    kubectl --kubeconfig ADMIN_KUBECONFIG 
    -n CLUSTER_NAMESPACE 
    \
    get baremetalmachine 10.200.0.8

    The command returns results similar to the following:

     NAME 
      
     CLUSTER 
      
     READY 
      
     INSTANCEID 
      
     MACHINE 
     10.200.0.8 
      
     cluster1 
      
     true 
      
     baremetal 
     : 
     //10.200.0.8   10.200.0.8 
    
  4. Remove the finalizer to skip the resetting step and unblock the node removal:

    kubectl --kubeconfig ADMIN_KUBECONFIG 
    -n CLUSTER_NAMESPACE 
    \
    patch baremetalmachine 10.200.0.8 --type json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'

    After a few seconds, the node is removed from the cluster.

Force-removing nodes in Google Distributed Cloud 1.6.1

In Google Distributed Cloud 1.6.1, you can add an annotation to mark a node for force removal.

After removing the node from the parent nodepool, run the following command to annotate the corresponding failing machine with the baremetal.cluster.gke.io/force-remove annotation. The value of the annotation itself does not matter:

kubectl --kubeconfig ADMIN_KUBECONFIG 
-n CLUSTER_NAMESPACE 
\
  annotate machine 10.200.0.8 baremetal.cluster.gke.io/force-remove=true

Google Distributed Cloud removes the node successfully.

Force-removing Control Plane nodes

Force-removing a control plane node is similar to performing a kubeadm reset on control plane nodes, and requires additional steps.

To force-remove a control plane node from the node pools, you need to take the following actions against the cluster that contains the failing control plane node:

  • remove the failing etcd member running on the failing node from the etcd cluster
  • update the ClusterStatus in the kube to remove the corresponding apiEndpoint .

Removing a failing etcd member

To remove the failing control plan node, first run etcdctl on the remaining healthy etcd pods. For more general information on this operation, see this Kubernetes documentation.

In the following procedure, CLUSTER_KUBECONFIG is the path to the kubeconfig file of the cluster.

  1. Look up the etcd pod with the following command:

    kubectl --kubeconfig CLUSTER_KUBECONFIG 
    get \
     pod -n kube-system -l component=etcd -o wide

    The command returns the following list of nodes. For this example, assume node 10.200.0.8is inaccessible and unrecoverable:

     NAME 
      
     READY 
      
     STATUS 
      
     RESTARTS 
      
     AGE 
      
     IP 
      
     NODE 
     etcd 
     - 
     357 
     b68f4ecf0 
      
     1 
     / 
     1 
      
     Running 
      
     0 
      
     9 
     m2s 
      
     10.200.0.6 
      
     357 
     b68f4ecf0 
     etcd 
     - 
     7 
     d7c21db88b3 
      
     1 
     / 
     1 
      
     Running 
      
     0 
      
     33 
     m 
      
     10.200.0.7 
      
     7 
     d7c21db88b3 
     etcd 
     - 
     b049141e0802 
      
     1 
     / 
     1 
      
     Running 
      
     0 
      
     8 
     m22s 
      
     10.200.0.8 
      
     b049141e0802 
    
  2. Exec into one of the remaining healthy etcd pods:

    kubectl --kubeconfig CLUSTER_KUBECONFIG 
    exec -it -n \
    kube-system etcd-357b68f4ecf0 -- /bin/sh
  3. Look up the current members to find the ID of the failing member. The command will return a list:

    etcdctl --endpoints=https://10.200.0.6:2379,https://10.200.0.7:2379 --key=/etc/kubernetes/pki/etcd/peer.key \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt  member list

    This command returns, for example:

     23 
     da9c3f2594532a 
     , 
      
     started 
     , 
      
     7 
     d7c21db88b3 
     , 
      
     https 
     : 
     //10.200.0.6:2380, https://10.200.0.6:2379, false 
     772 
     c1a54956b7f51 
     , 
      
     started 
     , 
      
     357 
     b68f4ecf0 
     , 
      
     https 
     : 
     //10.200.0.7:2380, https://10.200.0.7:2379, false 
     f64f66ad8d3e7960 
     , 
      
     started 
     , 
      
     b049141e0802 
     , 
      
     https 
     : 
     //10.200.0.8:2380, https://10.200.0.8:2379, false 
    
  4. Remove the failing member:

    etcdctl --endpoints=https://10.200.0.6:2379,https://10.200.0.7:2379 --key=/etc/kubernetes/pki/etcd/peer.key \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt \
     member remove f64f66ad8d3e7960

Updating ClusterStatus and removing the failing apiEndpoint

In the following procedure, CLUSTER_KUBECONFIG is the path to the kubeconfig file of the cluster.

  1. Look up the ClusterStatus section inside the kubeadm-config config map:

    kubectl --kubeconfig CLUSTER_KUBECONFIG 
    describe configmap -n \
    kube-system kubeadm-config

    The command returns results similar to those shown below:

     ... 
     ClusterStatus 
     : 
     ---- 
     apiEndpoints 
     : 
     7 
     d7c21db88b3 
     : 
      
     advertiseAddress 
     : 
      
     10.200.0.6 
      
     bindPort 
     : 
      
     6444 
     357 
     b68f4ecf0 
     : 
      
     advertiseAddress 
     : 
      
     10.200.0.7 
      
     bindPort 
     : 
      
     6444 
     b049141e0802 
     : 
      
     advertiseAddress 
     : 
      
     10.200.0.8 
      
     bindPort 
     : 
      
     6444 
     apiVersion 
     : 
      
     kubeadm 
     . 
     k8s 
     . 
     io 
     / 
     v1beta2 
     kind 
     : 
      
     ClusterStatus 
     ... 
    
  2. Edit the config map to remove the section that contains the failing IP (this example shows the results of removing 10.200.0.8 using the kubectl edit command):

    kubectl --kubeconfig CLUSTER_KUBECONFIG 
    edit configmap \
    -n kube-system kubeadm-config

    After editing, the config map looks similar to the following:

     ... 
     ClusterStatus 
     : 
      
     | 
      
     apiEndpoints 
     : 
      
     7 
     d7c21db88b3 
     : 
      
     advertiseAddress 
     : 
      
     10.200.0.6 
      
     bindPort 
     : 
      
     6444 
      
     357 
     b68f4ecf0 
     : 
      
     advertiseAddress 
     : 
      
     10.200.0.7 
      
     bindPort 
     : 
      
     6444 
      
     apiVersion 
     : 
      
     kubeadm 
     . 
     k8s 
     . 
     io 
     / 
     v1beta2 
      
     kind 
     : 
      
     ClusterStatus 
     ... 
    
  3. When you save the edited config map, the failing node is removed from the cluster.

Design a Mobile Site
View Site in Mobile | Classic
Share by: