In a Google Distributed Cloud implementation, the control-plane VM for an admin cluster has two attached disks:
The boot disk has the operating system for the VM.
The data disk has credentials and the etcd database, which stores the state of
the admin cluster. That is, the data disk stores all of the Kubernetes objects for
the admin cluster.
This page shows how to recover when the control-plane VM is lost or the boot
disk is compromised. For example:
The boot disk becomes read-only due to spam journal logs.
The Docker overlay filesystem gets corrupted.
This page does not cover recovery of the data disk. For instructions on how to
recover the data disk, seeRestoring an admin cluster.
Repairing the control-plane VM
The steps that you do to repair the admin cluster's control-plane VM differ
slightly depending on whether you have a high-availability (HA) admin cluster
or a non-HA admin cluster.
ADMIN_CLUSTER_CONFIGwith the path of your admin cluster
configuration file.
ADMIN_CLUSTER_KUBECONFIGwith the path of your admin cluster's
kubeconfig file.
HA
A HA admin cluster has 3 control plane VMs. You must have at least two VMs
to bring up the cluster control plane. If three VMs have failed, repair
the failed VMs one at a time. After the second VM is repaired and running,
the cluster control plane should come back up.
ADMIN_CLUSTER_CONFIGwith the path of your admin cluster
configuration file.
ADMIN_CLUSTER_KUBECONFIGwith the path of your admin cluster's
kubeconfig file.
The output of the command is similar to the following:
Please select the control plane VM template to be used for re-creating the admin cluster's control plane VM.
[1] VM template: /atl-qual-vc07/vm/gke-admin-57f8g-fx9f4c729448z2v8-2-tmpl
GKE on-prem version: 1.16.0-gke.550
Creation time: 2023-07-25 01:52:51.815518 +0000 UTC
CPU: 4 CPU(s)
Memory: 16384 MB
Data disk: [vsanDatastore] 37a73d64-b823-47cd-2e0c-00620b9189a0/gke-admin-57f8g/default/gke-admin-57f8g-2-data.vmdk
[2] VM template: /atl-qual-vc07/vm/gke-admin-57f8g-fx9f4c729448z2v8-0-tmpl
GKE on-prem version: 1.16.0-gke.550
Creation time: 2023-07-25 01:52:54.228252 +0000 UTC
CPU: 4 CPU(s)
Memory: 16384 MB
Data disk: [vsanDatastore] 37a73d64-b823-47cd-2e0c-00620b9189a0/gke-admin-57f8g/default/gke-admin-57f8g-0-data.vmdk
[3] VM template: /atl-qual-vc07/vm/gke-admin-57f8g-fx9f4c729448z2v8-1-tmpl
GKE on-prem version: 1.16.0-gke.550
Creation time: 2023-07-25 01:52:54.210705 +0000 UTC
CPU: 4 CPU(s)
Memory: 16384 MB
Data disk: [vsanDatastore] 37a73d64-b823-47cd-2e0c-00620b9189a0/gke-admin-57f8g/default/gke-admin-57f8g-1-data.vmdk
Please enter your numeric choice:
Enter the number for the VM that you want to repair. If you don't see
the VM in the output, contact Google Cloud Support.
If you have three VMs that need to be repaired,gkectl repair
admin-masteroutputs an error message similar to the
following after repairing the first VM:
If you are repairing admin control plane VM for HA admin cluster,
it's possible that the API server is still down after repairing one
of the VMs. Try continue fixing other control plane VMs listed to
recover the quorum of control plane.
In this case, re-run the command to repair the second VM.
Notes
The admin cluster's control-plane VM is cloned into a VM template, which has
all the information needed to re-create the VM. Thegkectl repair admin-mastercommand uses the VM template to create a new VM. Then it attaches a new
boot disk and the existing data disk.
If your cluster nodes get their addresses from a DHCP server, the new VM might
have a different IP address from the original VM.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eThis document details how to recover a Google Distributed Cloud admin cluster's control-plane VM when the boot disk is lost or compromised, such as becoming read-only or the Docker overlay filesystem getting corrupted.\u003c/p\u003e\n"],["\u003cp\u003eThe admin cluster's control-plane VM has a boot disk with the OS and a data disk with credentials and the etcd database containing the state of the admin cluster.\u003c/p\u003e\n"],["\u003cp\u003eTo repair a non-HA admin cluster, use the command \u003ccode\u003egkectl repair admin-master --config ADMIN_CLUSTER_CONFIG --kubeconfig ADMIN_CLUSTER_KUBECONFIG\u003c/code\u003e, replacing the placeholders with your admin cluster configuration file and kubeconfig file paths.\u003c/p\u003e\n"],["\u003cp\u003eFor HA admin clusters, which have three control-plane VMs, repair failed VMs one at a time, ensuring at least two are running to restore the cluster control plane.\u003c/p\u003e\n"],["\u003cp\u003eThe repair process uses a VM template cloned from the control-plane VM to recreate a new VM, attach a new boot disk, and reconnect to the existing data disk.\u003c/p\u003e\n"]]],[],null,["# Repairing the admin cluster's control-plane VM\n\n\u003cbr /\u003e\n\nIn a Google Distributed Cloud implementation, the control-plane VM for an admin cluster has two attached disks:\n\n- The boot disk has the operating system for the VM.\n\n- The data disk has credentials and the etcd database, which stores the state of\n the admin cluster. That is, the data disk stores all of the Kubernetes objects for\n the admin cluster.\n\nThis page shows how to recover when the control-plane VM is lost or the boot\ndisk is compromised. For example:\n\n- The boot disk becomes read-only due to spam journal logs.\n- The Docker overlay filesystem gets corrupted.\n\nThis page does not cover recovery of the data disk. For instructions on how to\nrecover the data disk, see\n[Restoring an admin cluster](/anthos/clusters/docs/on-prem/1.16/how-to/back-up-and-restore-an-admin-cluster-with-gkectl).\n\nRepairing the control-plane VM\n------------------------------\n\n| **Warning:** Don't run `gkectl repair admin-master` after a failed admin upgrade attempt. Instead, [resume the admin upgrade](/anthos/clusters/docs/on-prem/1.16/how-to/upgrading#about_resume_admin).\n\nThe steps that you do to repair the admin cluster's control-plane VM differ\nslightly depending on whether you have a high-availability (HA) admin cluster\nor a non-HA admin cluster. \n\n### Non-HA\n\nRun the following command:\n\n```\ngkectl repair admin-master --config ADMIN_CLUSTER_CONFIG --kubeconfig ADMIN_CLUSTER_KUBECONFIG\n```\n\nReplace:\n\n- \u003cvar translate=\"no\"\u003eADMIN_CLUSTER_CONFIG\u003c/var\u003e with the path of your admin cluster\n configuration file.\n\n- \u003cvar translate=\"no\"\u003eADMIN_CLUSTER_KUBECONFIG\u003c/var\u003e with the path of your admin cluster's\n kubeconfig file.\n\n### HA\n\nA HA admin cluster has 3 control plane VMs. You must have at least two VMs\nto bring up the cluster control plane. If three VMs have failed, repair\nthe failed VMs one at a time. After the second VM is repaired and running,\nthe cluster control plane should come back up.\n\n1. Run the following command:\n\n ```\n gkectl repair admin-master --config ADMIN_CLUSTER_CONFIG --kubeconfig ADMIN_CLUSTER_KUBECONFIG\n ```\n\n Replace:\n - \u003cvar translate=\"no\"\u003eADMIN_CLUSTER_CONFIG\u003c/var\u003e with the path of your admin cluster\n configuration file.\n\n - \u003cvar translate=\"no\"\u003eADMIN_CLUSTER_KUBECONFIG\u003c/var\u003e with the path of your admin cluster's\n kubeconfig file.\n\n The output of the command is similar to the following: \n\n ```\n Please select the control plane VM template to be used for re-creating the admin cluster's control plane VM.\n [1] VM template: /atl-qual-vc07/vm/gke-admin-57f8g-fx9f4c729448z2v8-2-tmpl\n GKE on-prem version: 1.16.0-gke.550\n Creation time: 2023-07-25 01:52:51.815518 +0000 UTC\n CPU: 4 CPU(s)\n Memory: 16384 MB\n Data disk: [vsanDatastore] 37a73d64-b823-47cd-2e0c-00620b9189a0/gke-admin-57f8g/default/gke-admin-57f8g-2-data.vmdk\n\n [2] VM template: /atl-qual-vc07/vm/gke-admin-57f8g-fx9f4c729448z2v8-0-tmpl\n GKE on-prem version: 1.16.0-gke.550\n Creation time: 2023-07-25 01:52:54.228252 +0000 UTC\n CPU: 4 CPU(s)\n Memory: 16384 MB\n Data disk: [vsanDatastore] 37a73d64-b823-47cd-2e0c-00620b9189a0/gke-admin-57f8g/default/gke-admin-57f8g-0-data.vmdk\n\n [3] VM template: /atl-qual-vc07/vm/gke-admin-57f8g-fx9f4c729448z2v8-1-tmpl\n GKE on-prem version: 1.16.0-gke.550\n Creation time: 2023-07-25 01:52:54.210705 +0000 UTC\n CPU: 4 CPU(s)\n Memory: 16384 MB\n Data disk: [vsanDatastore] 37a73d64-b823-47cd-2e0c-00620b9189a0/gke-admin-57f8g/default/gke-admin-57f8g-1-data.vmdk\n\n Please enter your numeric choice:\n ```\n2. Enter the number for the VM that you want to repair. If you don't see\n the VM in the output, contact Google Cloud Support.\n\n If you have three VMs that need to be repaired, `gkectl repair\n admin-master` outputs an error message similar to the\n following after repairing the first VM: \n\n If you are repairing admin control plane VM for HA admin cluster,\n it's possible that the API server is still down after repairing one\n of the VMs. Try continue fixing other control plane VMs listed to\n recover the quorum of control plane.\n\n In this case, re-run the command to repair the second VM.\n\nNotes\n-----\n\nThe admin cluster's control-plane VM is cloned into a VM template, which has\nall the information needed to re-create the VM. The `gkectl repair admin-master`\ncommand uses the VM template to create a new VM. Then it attaches a new\nboot disk and the existing data disk.\n\nIf your cluster nodes get their addresses from a DHCP server, the new VM might\nhave a different IP address from the original VM."]]