Version 1.11. This version is no longer supported. For information about how to upgrade to version 1.12, seeUpgrading Anthos on bare metalin the 1.12 documentation. For more information about supported and unsupported versions, see theVersion historypage in the latest documentation.
When you need to repair or maintain nodes, you should first put the nodes into
maintenance mode. Putting nodes into maintenance mode safely drains their
pods/workloads and excludes the nodes from pod scheduling. In maintenance mode,
you can work on your nodes without a risk of disrupting pod traffic.
How it works
Google Distributed Cloud provides a way to place nodes into maintenance mode. This
approach lets other cluster components correctly know that the node is in
maintenance mode. When you place a node in maintenance mode, no additional pods
can be scheduled on the node, and existing pods are stopped.
Instead of using maintenance mode, you can manually use Kubernetes commands such
askubectl cordonandkubectl drainon a specific node. If you run
Google Distributed Cloud version 1.12.0 (anthosBareMetalVersion: 1.12.0) or
lower, see the known issue onNodes uncordoned if you don't use the maintenance mode procedure.
When you use the maintenance mode process, Google Distributed Cloud does the
following:
Node taintsare added to specified nodes to indicate that no pods can be scheduled or
executed on the nodes.
A 20-minute timeout is enforced to ensure nodes don't get stuck waiting for
pods to stop. Pods might not stop if they are configured totolerate all taintsor they havefinalizers.
Google Distributed Cloud attempts to stop all pods, but if the timeout is
exceeded, the node is put into maintenance mode. This timeout prevents running
pods from blocking upgrades.
If you have aVM-based workloadrunning
on the node, Google Distributed Cloud applies aNodeSelectorto the virtual
machine instance (VMI) Pod, then stops the Pod. TheNodeSelectorensures
that the VMI Pod is restarted on the same node when the node is removed from
maintenance mode.
Put a node into maintenance mode
Choose the nodes you want to put into maintenance mode by specifying IP ranges
for the selected nodes undermaintenanceBlocksin your cluster configuration
file. The nodes you choose must be in a ready state, and functioning in the
cluster.
To put nodes into maintenance mode:
Edit the cluster configuration file to select the nodes you want to put into
maintenance mode.
You can edit the configuration file with an editor of your choice, or you
can edit the cluster custom resource directly by running the following
command:
kubectl-nCLUSTER_NAMESPACEeditclusterCLUSTER_NAME
Replace the following:
CLUSTER_NAMESPACE: the namespace of the cluster.
CLUSTER_NAME: the name of the cluster.
Add themaintenanceBlockssection to the cluster configuration file to
specify either a single IP address, or an address range, for nodes you want
to put into maintenance mode.
The following sample shows how to select multiple nodes by specifying a
range of IP addresses:
ThisUNDERMAINTENANCEcolumn in this sample shows that one node is in
maintenance mode.
Google Distributed Cloud also adds the following taints to nodes when they are
put into maintenance mode:
baremetal.cluster.gke.io/maintenance:NoExecute
baremetal.cluster.gke.io/maintenance:NoSchedule
Remove a node from maintenance mode
To remove nodes from maintenance mode:
Edit the cluster configuration file to clear the nodes you want to remove
from maintenance mode.
You can edit the configuration file with an editor of your choice, or you
can edit the cluster custom resource directly by running the following
command:
kubectl-nCLUSTER_NAMESPACEeditclusterCLUSTER_NAME
Replace the following:
CLUSTER_NAMESPACE: the namespace of the cluster.
CLUSTER_NAME: the name of the cluster.
Either edit the IP addresses to remove specific nodes from maintenance mode
or remove themaintenanceBlockssection remove all does from maintenance
mode.
Save and apply the updated cluster configuration.
Usekubectlcommands to check the status of your nodes.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003ePlacing nodes in maintenance mode is crucial for safe node repair or maintenance, as it prevents new pods from being scheduled and stops existing ones, reducing the risk of disrupting pod traffic.\u003c/p\u003e\n"],["\u003cp\u003eGoogle Distributed Cloud provides a structured method for entering maintenance mode, which automatically adds node taints and enforces a 20-minute timeout for stopping pods, preventing potential upgrade blockages.\u003c/p\u003e\n"],["\u003cp\u003eThe process involves editing the cluster configuration file to specify IP addresses or ranges under \u003ccode\u003emaintenanceBlocks\u003c/code\u003e to designate which nodes should enter maintenance mode, and then saving the configuration to have Google Distributed Cloud perform the needed actions.\u003c/p\u003e\n"],["\u003cp\u003eNodes in maintenance mode have a \u003ccode\u003eSchedulingDisabled\u003c/code\u003e status and are marked with specific taints (\u003ccode\u003ebaremetal.cluster.gke.io/maintenance:NoExecute\u003c/code\u003e and \u003ccode\u003ebaremetal.cluster.gke.io/maintenance:NoSchedule\u003c/code\u003e), and nodes can be removed from this mode by editing the configuration to remove them from \u003ccode\u003emaintenanceBlocks\u003c/code\u003e.\u003c/p\u003e\n"]]],[],null,["# Put nodes into maintenance mode\n\n\u003cbr /\u003e\n\nWhen you need to repair or maintain nodes, you should first put the nodes into\nmaintenance mode. Putting nodes into maintenance mode safely drains their\npods/workloads and excludes the nodes from pod scheduling. In maintenance mode,\nyou can work on your nodes without a risk of disrupting pod traffic.\n\nHow it works\n------------\n\nGoogle Distributed Cloud provides a way to place nodes into maintenance mode. This\napproach lets other cluster components correctly know that the node is in\nmaintenance mode. When you place a node in maintenance mode, no additional pods\ncan be scheduled on the node, and existing pods are stopped.\n\nInstead of using maintenance mode, you can manually use Kubernetes commands such\nas `kubectl cordon` and `kubectl drain` on a specific node. If you run\nGoogle Distributed Cloud version 1.12.0 (`anthosBareMetalVersion: 1.12.0`) or\nlower, see the known issue on\n[Nodes uncordoned if you don't use the maintenance mode procedure](/anthos/clusters/docs/bare-metal/1.11/troubleshooting/known-issues#uncordoned-nodes).\n\nWhen you use the maintenance mode process, Google Distributed Cloud does the\nfollowing:\n\n- [Node taints](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/)\n are added to specified nodes to indicate that no pods can be scheduled or\n executed on the nodes.\n\n- A 20-minute timeout is enforced to ensure nodes don't get stuck waiting for\n pods to stop. Pods might not stop if they are configured to\n [tolerate all taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)\n or they have\n [finalizers](https://kubernetes.io/docs/concepts/overview/working-with-objects/finalizers/).\n Google Distributed Cloud attempts to stop all pods, but if the timeout is\n exceeded, the node is put into maintenance mode. This timeout prevents running\n pods from blocking upgrades.\n\n- If you have a [VM-based workload](/anthos/clusters/docs/bare-metal/1.11/how-to/vm-workloads) running\n on the node, Google Distributed Cloud applies a `NodeSelector` to the virtual\n machine instance (VMI) Pod, then stops the Pod. The `NodeSelector` ensures\n that the VMI Pod is restarted on the same node when the node is removed from\n maintenance mode.\n\nPut a node into maintenance mode\n--------------------------------\n\nChoose the nodes you want to put into maintenance mode by specifying IP ranges\nfor the selected nodes under `maintenanceBlocks` in your cluster configuration\nfile. The nodes you choose must be in a ready state, and functioning in the\ncluster.\n\nTo put nodes into maintenance mode:\n\n1. Edit the cluster configuration file to select the nodes you want to put into\n maintenance mode.\n\n You can edit the configuration file with an editor of your choice, or you\n can edit the cluster custom resource directly by running the following\n command: \n\n kubectl -n \u003cvar translate=\"no\"\u003eCLUSTER_NAMESPACE\u003c/var\u003e edit cluster \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eCLUSTER_NAMESPACE\u003c/var\u003e: the namespace of the cluster.\n - \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the cluster.\n2. Add the `maintenanceBlocks` section to the cluster configuration file to\n specify either a single IP address, or an address range, for nodes you want\n to put into maintenance mode.\n\n The following sample shows how to select multiple nodes by specifying a\n range of IP addresses: \n\n metadata:\n name: my-cluster\n namespace: cluster-my-cluster\n spec:\n maintenanceBlocks:\n cidrBlocks:\n - 172.16.128.1-172.16.128.64\n\n3. Save and apply the updated cluster configuration.\n\n Google Distributed Cloud starts putting the nodes into maintenance mode.\n4. Run the following command to get the status of the nodes in your cluster:\n\n kubectl get nodes -n \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e\n\n The response is something like the following: \n\n NAME STATUS ROLES AGE VERSION\n user-anthos-baremetal-01 Ready master 2d22h v1.17.8-gke.16\n user-anthos-baremetal-04 Ready <none> 2d22h v1.17.8-gke.16\n user-anthos-baremetal-05 Ready,SchedulingDisabled <none> 2d22h v1.17.8-gke.16\n user-anthos-baremetal-06 Ready <none> 2d22h v1.17.8-gke.16\n\n A status of `SchedulingDisabled` indicates that a node is in maintenance\n mode.\n5. Run the following command to get the number of nodes in maintenance mode:\n\n kubectl get nodepools\n\n The response should look something like the following output: \n\n NAME READY RECONCILING STALLED UNDERMAINTENANCE UNKNOWN\n np1 3 0 0 1 0\n\n This `UNDERMAINTENANCE` column in this sample shows that one node is in\n maintenance mode.\n\n Google Distributed Cloud also adds the following taints to nodes when they are\n put into maintenance mode:\n - `baremetal.cluster.gke.io/maintenance:NoExecute`\n - `baremetal.cluster.gke.io/maintenance:NoSchedule`\n\nRemove a node from maintenance mode\n-----------------------------------\n\nTo remove nodes from maintenance mode:\n\n1. Edit the cluster configuration file to clear the nodes you want to remove\n from maintenance mode.\n\n You can edit the configuration file with an editor of your choice, or you\n can edit the cluster custom resource directly by running the following\n command: \n\n kubectl -n \u003cvar translate=\"no\"\u003eCLUSTER_NAMESPACE\u003c/var\u003e edit cluster \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eCLUSTER_NAMESPACE\u003c/var\u003e: the namespace of the cluster.\n - \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the cluster.\n2. Either edit the IP addresses to remove specific nodes from maintenance mode\n or remove the `maintenanceBlocks` section remove all does from maintenance\n mode.\n\n3. Save and apply the updated cluster configuration.\n\n4. Use `kubectl` commands to check the status of your nodes."]]