This page explains how to upgrade Google Distributed Cloud. This page provides the steps for upgrading your admin workstation, user cluster, and admin cluster. For user clusters, this page provides the steps to upgrade the control plane and node pools at the same time, or separately.
Before you proceed, we recommend that you review the following documentation:
-
Upgrade overview Among other things, this document describes the supported version skew and version rules for upgrades, which have changed for 1.28 and later.
-
Upgrade best practices This document provides checklists and best practices for upgrading clusters.
IAM requirements for upgrading user clusters
Skip this section if you plan to use gkectl
for the user cluster upgrade.
If you want to use the Google Cloud console, the Google Cloud CLI, or Terraform to
upgrade a user cluster, and you aren't a project owner, you must be granted the
Identity and Access Management role roles/gkeonprem.admin
on the Google Cloud project that the
cluster was created in. For details on the permissions included in this role,
see GKE On-Prem API roles
in the IAM documentation.
To use the console to upgrade the cluster, at a minimum, you also need the following:
-
roles/container.viewer
. This role lets users view the GKE Clusters page and other container resources in the console. For details about the permissions included in this role, or to grant a role with read/write permissions, see Kubernetes Engine roles in the IAM documentation. -
roles/gkehub.viewer
. This role lets users view clusters in the console. For details about the permissions included in this role, or to grant a role with read/write permissions,see GKE Hub roles in the IAM documentation.
Upgrade your admin workstation
You need to upgrade your admin workstation if you plan to use gkectl
to
upgrade a user cluster.
If you plan to use the console, the gcloud CLI, or
Terraform to upgrade a user cluster, you can skip upgrading the admin
workstation for now. But you will need to upgrade the admin workstation when you
are ready to upgrade the admin cluster because only gkectl
supports admin
cluster upgrades.
Locate required files
Before you created your admin workstation, you filled in an admin workstation configuration file
that was generated by gkeadm create config
. The default name for this file is admin-ws-config.yaml
.
In addition, your workstation has an information file. The default name of this file is the same as the name of your admin workstation.
Locate your admin workstation configuration file and your information file. You
need them to do the upgrade steps. If these files are in your current
directory and they have their default names, then you won't need to specify
them when you run the upgrade commands. If these files are in
another directory, or if you have changed the filenames, then you specify them
by using the --config
and --info-file
flags.
If your output information file is missing, you can re-create it. See Re-create an information file if missing .
To upgrade the admin workstation:
-
Download
gkeadm
:gkeadm upgrade gkeadm --target-version TARGET_VERSION
Replace TARGET_VERSION with the target version of your upgrade. You need to specify a complete version number in the form of
X.Y.Z-gke.N.
. For a list of the Google Distributed Cloud versions, see Version history . -
Upgrade your admin workstation:
gkeadm upgrade admin-workstation --config AW_CONFIG_FILE \ --info-file INFO_FILE
Replace the following:
-
AW_CONFIG_FILE
: the path of your admin workstation configuration file. You can omit this flag if the file is in your current directory and has the nameadmin-ws-config.yaml
. -
INFO_FILE
: the path of your information file. You can omit this flag if the file is in your current directory. The default name of this file is the same as the name of your admin workstation.
The preceding command performs the following tasks:
-
Backs up all files in the home directory of your current admin workstation. These include:
- Your admin cluster configuration file. The default name is
admin-cluster.yaml
. - Your user cluster configuration file. The default name is
user-cluster.yaml
. - The kubeconfig files for your admin cluster and your user clusters.
- The root certificate for your vCenter server. Note that this file must have owner read and owner write permission.
- The JSON key file for your component access service account. Note that this file must have owner read and owner write permission.
- The JSON key files for your connect-register and logging-monitoring service accounts .
- Your admin cluster configuration file. The default name is
-
Creates a new admin workstation, and copies all the backed-up files to the new admin workstation.
-
Deletes the old admin workstation.
-
Check available versions for cluster upgrades
Run the following command to see which versions are available for upgrade:
gkectl version --kubeconfig ADMIN_CLUSTER_KUBECONFIG
The output shows the current version and the versions available for upgrade.
If you plan to use the console, the gcloud CLI or
Terraform for the upgrade, it takes about 7 to 10 days after a release for the
version to be available in the GKE On-Prem API in all Google Cloud regions.
The console lists only the availables versions for the user
cluster upgrade. The steps for upgrading a user cluster using the
gcloud CLI or Terraform include a step to run gcloud container vmware clusters query-version-config
to get available versions for the upgrade.
Upgrade a user cluster
You can use gkectl
, the console, the gcloud CLI, or
Terraform to upgrade a user cluster. For information on deciding which tool to
use, see Choose a tool to upgrade user clusters
.
gkectl
Prepare to upgrade a user cluster
Do the following steps on your admin workstation:
-
Run
gkectl prepare
to import OS images to vSphere:gkectl prepare \ --bundle-path /var/lib/gke/bundles/gke-onprem-vsphere- TARGET_VERSION .tgz \ --kubeconfig ADMIN_CLUSTER_KUBECONFIG
-
If your cluster has a Windows node pool, run
gkectl prepare windows
, and update theosImage
field for the node pool. For detailed instructions, see Upgrade user cluster with Windows node pools . -
In the user cluster configuration file, set
gkeOnPremVersion
to the target version of your upgrade. -
Ubuntu and COS node pools only: Specify which node pools you want to upgrade. Upgrading node pools separately from the control plane is supported for Ubuntu and COS node pools, but not for Windows node pools.
In your user cluster configuration file, indicate which node pools you want to upgrade, as follows:
-
For each node pool that you want to upgrade, remove the
nodePools.nodePool[i].gkeOnPremVersion
field, or set it to the empty string. -
For each node pool that you don't want to upgrade, set
nodePools.nodePool[i].gkeOnPremVersion
to the current version.
For example, suppose your user cluster is at version 1.15.5-gke.41 and has two node pools:
pool-1
andpool-2
. Also suppose that you want to upgrade the control plane andpool-1
to 1.16.3-gke.45, but you wantpool-2
to remain at version 1.15.5-gke.41. The following portion of a user cluster configuration file shows how to specify this example:gkeOnPremVersion: 1.16.3-gke.45 nodePools: - name: pool-1 gkeOnPremVersion: "" cpus: 4 memoryMB: 8192 replicas: 3 osImageType: ubuntu_containerd - name: pool-2 gkeOnPremVersion: 1.15.5-gke.41 cpus: 4 memoryMB: 8192 replicas: 5 osImageType: ubuntu_containerd
-
Run gkectl upgrade cluster
There are two variations of the gkectl upgrade cluster
command:
-
Asynchronous: (recommended)
With the asynchronous variation, the command starts the upgrade and then completes. You don't need to watch the output of the command for the entire duration of the upgrade. Instead, you can periodically check on the upgrade progress by runninggkectl list clusters
andgkectl describe clusters
. To use the asynchronous variation, include the--async
flag in the command. -
Synchronous:
With the synchronous variation, thegkectl upgrade cluster
command outputs status messages to the admin workstation as the upgrade progresses.
Asynchronous upgrade
-
If you are using prepared credentials and a private registry for the user cluster, make sure the private registry credential is prepared before upgrading the user cluster. For information on how to prepare the private registry credential, see Configure prepared credentials for user clusters .
-
On your admin workstation, start an asynchronous upgrade:
gkectl upgrade cluster \ --kubeconfig ADMIN_CLUSTER_KUBECONFIG \ --config USER_CLUSTER_CONFIG \ --async
The preceding command completes, and you can continue to use your admin workstation while the upgrade is in progress.
-
To see the status of the upgrade:
gkectl list clusters --kubeconfig ADMIN_CLUSTER_KUBECONFIG
The output shows a value for the cluster
STATE
. If the cluster is still upgrading, the value ofSTATE
isUPGRADING
. For example:NAMESPACE NAME READY STATE AGE VERSION my-uc-gkeonprem-mgmt my-uc False UPGRADING 9h 1.16.0-gke.1
The possible values for
STATE
arePROVISIONING
,UPGRADING
,DELETING
,UPDATING
,RUNNING
,RECONCILING
,ERROR
, andUNKNOWN
. -
To get more details about the upgrade progress and cluster events:
gkectl describe clusters --kubeconfig ADMIN_CLUSTER_KUBECONFIG \ --cluster USER_CLUSTER_NAME -v 5
The output shows the OnPremUserCluster custom resource for the specified user cluster, which includes cluster status, conditions, and events.
We record events for the start and end of each critical upgrade phase, including:
- ControlPlaneUpgrade
- MasterNodeUpgrade
- AddonsUpgrade
- NodePoolsUpgrade
Example output:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodePoolsUpgradeStarted 22m onprem-user-cluster-controller Creating or updating node pools: pool-2: Creating or updating node pool Normal AddonsUpgradeStarted 22m onprem-user-cluster-controller Creating or updating addon workloads Normal ControlPlaneUpgradeStarted 25m onprem-user-cluster-controller Creating or updating cluster control plane workloads: deploying user-kube-apiserver-base, ...: 14/15 pods are ready Normal ControlPlaneUpgradeFinished 23m onprem-user-cluster-controller Control plane is running
-
When the upgrade is complete,
gkectl list clusters
shows aSTATUS
ofRUNNING
:NAMESPACE NAME READY STATE AGE VERSION my-uc-gkeonprem-mgmt my-uc True RUNNING 9h 1.16.0-gke.1
Also, when the upgrade is complete,
gkectl describe clusters
shows aLastGKEOnPremVersion
field underStatus
. For example:Status: Cluster State: RUNNING LastGKEOnOremVersion: 1.16.0-gke.1
Troubleshoot asynchronous upgrade
For an asynchronous upgrade, the timeout duration is based on the
number of nodes in the cluster. If the upgrade takes longer than the timeout
duration, the cluster state is changed from UPGRADING
to ERROR
, with an
event saying that the upgrade operation timed out. Note that the ERROR
state
here means the upgrade is taking longer than expected, but has not been
terminated. The controller continues the reconciliation and keeps retrying the
operation.
Usually a timeout is the result of a deadlock caused by a
PodDisruptionBudget (PDB). In that case, Pods cannot be evicted from old
nodes, and the old nodes cannot be drained. If the Pod eviction takes longer
than 10 minutes, we write an event to the OnPremUserCluster object. You can
capture the event by running gkectl describe clusters
. Then you can adjust
the PDB to allow the node to drain. After that, the upgrade can proceed and
eventually complete.
Example event:
Warning PodEvictionTooLong 96s (x2 over 4m7s) onprem-user-cluster-controller Waiting too long(>10m0.00000003s) for (kube-system/coredns-856d6dbfdf-dl6nz) eviction.
In addition, when an upgrade is blocked or fails, you can run gkectl diagnose
to
check for common cluster issues. Based on the result, you can decide whether to
perform a manual fix or contact the Anthos support team for further assistance.
Synchronous upgrade
The gkectl upgrade
command runs preflight checks. If the preflight checks
fail, the command is blocked. You must fix the failures, or use the --skip-preflight-check-blocking
flag. You should only skip the preflight
checks if you are confident there are no critical failures.
Proceed with these steps on your admin workstation:
-
If you are using prepared credentials and a private registry for the user cluster, make sure the private registry credential is prepared before upgrading the user cluster. For information on how to prepare the private registry credential, see Configure prepared credentials for user clusters .
-
Upgrade the cluster:
gkectl upgrade cluster \ --kubeconfig ADMIN_CLUSTER_KUBECONFIG \ --config USER_CLUSTER_CONFIG_FILE
-
If you are upgrading to version 1.14.0 or higher, a new kubeconfig file is generated for the user cluster that overwrites any existing file. To view cluster details in the file, run the following command:
kubectl config view --kubeconfig USER_CLUSTER_KUBECONFIG
Upgrade additional node pools
If you upgraded only the user cluster's control plane, or you upgraded the control plane and some but not all node pools, do the following steps to upgrade node pools:
-
Edit your user cluster configuration file. For each node pool that you want to upgrade, remove the
nodePools.nodePool[i].gkeOnPremVersion
field, or set it to the empty string, as shown in the following example:gkeOnPremVersion: 1.16.3-gke.45 nodePools: - name: pool-1 gkeOnPremVersion: "" cpus: 4 memoryMB: 8192 replicas: 3 osImageType: ubuntu_containerd - name: pool-2 gkeOnPremVersion: "" cpus: 4 memoryMB: 8192 replicas: 5 osImageType: ubuntu_containerd
-
Run
gkectl update cluster
to apply the change:gkectl update cluster --kubeconfig ADMIN_CLUSTER_KUBECONFIG \ --config USER_CLUSTER_CONFIG
Replace the following:
-
ADMIN_CLUSTER_KUBECONFIG
: the path of your admin cluster kubeconfig file -
USER_CLUSTER_CONFIG
: the path of your user cluster configuration file
-
If you encounter an issue after upgrading a node pool, you can roll back to the previous version. For more information, see Rolling back a node pool after an upgrade .
Resume an upgrade
If a user cluster upgrade is interrupted, you can resume the user cluster
upgrade by running the same upgrade command with the --skip-validation-all
flag:
gkectl upgrade cluster \ --kubeconfig ADMIN_CLUSTER_KUBECONFIG \ --config USER_CLUSTER_CONFIG_FILE \ --skip-validation-all
Console
Upgrading a user cluster requires some changes to the admin cluster. The console automatically does the following:
-
Enrolls the admin cluster in the GKE On-Prem API if it isn't already enrolled.
-
Downloads and deploys a bundle of components to the admin cluster. The version of the components matches the version you specify for the upgrade. These components let the admin cluster manage user clusters at that version.
To upgrade a user cluster:
-
In the console, go to the Google Kubernetes Engine clusters overviewpage.
-
Select the Google Cloud project, and then select the cluster that you want to upgrade.
-
In the Detailspanel, click More details.
-
In the Cluster basicssection, click Upgrade.
-
In the Choose target versionlist, select the version that you want to upgrade to. The curated list contains only the latest patch releases.
-
Click Upgrade.
Before the cluster is upgraded, preflight checks run to validate cluster status and node health. If the preflight checks pass, the user cluster is upgraded. It takes about 30 minutes for the upgrade to complete.
To view the status of the upgrade, click Show Detailson the Cluster Detailstab.
gcloud CLI
Upgrading a user cluster requires some changes to the admin cluster. The
the gcloud container vmware clusters upgrade
command automatically does the
following:
-
Enrolls the admin cluster in the GKE On-Prem API if it isn't already enrolled.
-
Downloads and deploys a bundle of components to the admin cluster. The version of the components matches the version you specify for the upgrade. These components let the admin cluster manage user clusters at that version.
To upgrade a user cluster:
-
Update the Google Cloud CLI components:
gcloud components update
-
Ubuntu and COS node pools only: If you want to upgrade only the user cluster's control plane and leave all the node pools at the current version, change the upgrade policy on the cluster:
gcloud container vmware clusters update USER_CLUSTER_NAME \ --project= PROJECT_ID \ --location= REGION \ --upgrade-policy control-plane-only=True
Replace the following:
-
USER_CLUSTER_NAME
: The name of the user cluster to upgrade. -
PROJECT_ID
: The ID of the fleet host project in which the user cluster is a member. This is the project that you specified when the cluster was created. If you created the cluster usinggkectl
, this is the project ID in thegkeConnect.projectID
field in the cluster configuration file. -
REGION
: The Google Cloud region in which the GKE On-Prem API runs and stores its metadata. If you created the cluster using an GKE On-Prem API client, this is the region that you selected when creating the cluster. If you created the cluster usinggkectl
, this is the region that you specified when you enrolled the cluster in the GKE On-Prem API.
-
-
Get a list of available versions to upgrade to:
gcloud container vmware clusters query-version-config \ --cluster= USER_CLUSTER_NAME \ --project= PROJECT_ID \ --location= REGION
The output of the command is similar to the following:
versions: - version: 1.16.3-gke.45 - version: 1.16.2-gke.28 - version: 1.16.1-gke.45 - version: 1.16.0-gke.669 - version: 1.15.6-gke.25 - version: 1.15.5-gke.41 An Anthos version must be made available on the admin cluster ahead of the user cluster creation or upgrade. Versions annotated with isInstalled=true are installed on the admin cluster for the purpose of user cluster creation or upgrade whereas other version are released and will be available for upgrade once dependencies are resolved. To install the version in the admin cluster, run: $ gcloud container vmware admin-clusters update my-admin-cluster --required-platform-version=VERSION
You can ignore the message after the list of versions. It doesn't matter if the version that you are upgrading to is installed on the admin cluster. The
upgrade
command downloads and deploys a bundle of the components that matches the version you specify in theupgrade
command. -
Upgrade the cluster. If you ran the
update
command to change the upgrade policy tocontrol-plane-only=True
, only the cluster's control plane is upgraded. Otherwise, the cluster's control plane and all node pools are upgraded.gcloud container vmware clusters upgrade USER_CLUSTER_NAME \ --project= PROJECT_ID \ --location= REGION \ --version= VERSION
Replace VERSION with the Google Distributed Cloud version that you want to upgrade to. Specify a version from the output of the previous command. We recommend that you upgrade to the most recent patch version.
The output from the command is similar to the following:
Waiting for operation [projects/example-project-12345/locations/us-west1/operations/operation-1679543737105-5f7893fd5bae9-942b3f97-75e59179] to complete.
In the example output, the string
operation-1679543737105-5f7893fd5bae9-942b3f97-75e59179
is the OPERATION_ID of the long-running operation. You can find out the status of the operation by running the following command in another terminal window:gcloud container vmware operations describe OPERATION_ID \ --project= PROJECT_ID \ --location= REGION
Upgrade node pools
If you chose to upgrade only the user cluster's control plane, do the following steps to upgrade the node pools after the user cluster's control plane has been upgraded:
-
Get a list of node pools on the user cluster:
gcloud container vmware node-pools list --cluster= USER_CLUSTER_NAME \ --project= PROJECT_ID \ --location= REGION
-
For each node pool that you want to upgrade, run the following command:
gcloud container vmware node-pools update NODE_POOL_NAME \ --cluster= USER_CLUSTER_NAME \ --project= PROJECT_ID \ --location= REGION \ --version=VERSION
Terraform
-
Update the Google Cloud CLI components:
gcloud components update
-
If you haven't already, enroll the admin cluster in the GKE On-Prem API . After the cluster is enrolled in the GKE On-Prem API, you don't need to do this step again.
-
Get a list of available versions to upgrade to:
gcloud container vmware clusters query-version-config \ --cluster= USER_CLUSTER_NAME \ --project= PROJECT_ID \ --location= REGION
Replace the following:
-
USER_CLUSTER_NAME
: The name of the user cluster. -
PROJECT_ID
: The ID of the fleet project in which that user cluster is a member. This is the project that you specified when the cluster was created. If you created the cluster usinggkectl
, this is the project ID in thegkeConnect.projectID
field in the cluster configuration file. -
REGION
: The Google Cloud region in which the GKE On-Prem API runs and stores its metadata. In themain.tf
file that you used to create the user cluster, the region is in thelocation
field of the cluster resource.
The output of the command is similar to the following:
versions: - version: 1.16.3-gke.45 - version: 1.16.2-gke.28 - version: 1.16.1-gke.45 - version: 1.16.0-gke.669 - version: 1.15.6-gke.25 - version: 1.15.5-gke.41 An Anthos version must be made available on the admin cluster ahead of the user cluster creation or upgrade. Versions annotated with isInstalled=true are installed on the admin cluster for the purpose of user cluster creation or upgrade whereas other version are released and will be available for upgrade once dependencies are resolved. To install the version in the admin cluster, run: $ gcloud container vmware admin-clusters update my-admin-cluster --required-platform-version=VERSION
-
-
Download the new version of the components and deploy them in the admin cluster:
gcloud vmware admin-clusters update ADMIN_CLUSTER_NAME \ --project= PROJECT_ID \ --location= REGION \ --required-platform-version= VERSION
This command downloads the version of the components that you specify in
--required-platform-version
to the admin cluster, and then deploys the the components. These components let the admin cluster manage user clusters at that version. -
In the
main.tf
file that you used to create the user cluster, changeon_prem_version
in the cluster resource to the new version. -
Ubuntu and COS node pools only: If you want to upgrade only the user cluster's control plane and leave all the node pools at the current version, add the following to the cluster resource:
upgrade_policy { control_plane_only = true }
-
Initialize and create the Terraform plan:
terraform init
Terraform installs any needed libraries, such as the Google Cloud provider.
-
Review the configuration and make changes if needed:
terraform plan
-
Apply the Terraform plan to create the user cluster:
terraform apply
Upgrade node pools
If you chose to upgrade only the user cluster's control plane, do the following steps to upgrade additional node pools after the user cluster's control plane has been upgraded:
-
In
main.tf
in the resource for each node pool that you want to upgrade, add the following:on_prem_version = " VERSION "
For example:
resource "google_gkeonprem_vmware_node_pool" "nodepool-basic" { name = "my-nodepool" location = "us-west1" vmware_cluster = google_gkeonprem_vmware_cluster.default-basic.name config { replicas = 3 image_type = "ubuntu_containerd" enable_load_balancer = true } on_prem_version = "1.16.0-gke.0" }
-
Initialize and create the Terraform plan:
terraform init
-
Review the configuration and make changes if needed:
terraform plan
-
Apply the Terraform plan to create the user cluster:
terraform apply
Upgrade the admin cluster
Before you begin:
-
Determine if your certificates are up to date , and renew them if necessary.
-
If you are upgrading to version 1.13 or higher, you must first register the admin cluster by filling out the
gkeConnect
section in the admin cluster configuration file. Run the update cluster command with the configuration file changes.
Do the steps in this section on your new admin workstation. Make sure your gkectl
and clusters are the appropriate version for an upgrade, and that you have downloaded the appropriate bundle.
-
Make sure the
bundlepath
field in the admin cluster configuration file matches the path of the bundle to which you want to upgrade.If you make any other changes to the fields in the admin cluster configuration file, these changes are ignored during the upgrade. To make those changes take effect, you must first upgrade the cluster, and then run an update cluster command with the configuration file changes to make other changes to the cluster.
-
Run the following command:
gkectl upgrade admin \ --kubeconfig ADMIN_CLUSTER_KUBECONFIG \ --config ADMIN_CLUSTER_CONFIG_FILE \ FLAGS
Replace the following:
-
ADMIN_CLUSTER_KUBECONFIG
: the admin cluster's kubeconfig file. -
ADMIN_CLUSTER_CONFIG_FILE
: the Google Distributed Cloud admin cluster configuration file on your new admin workstation. -
FLAGS
: an optional set of flags . For example, you could include the--skip-validation-infra
flag to skip checking of your vSphere infrastructure.
-
If you are upgrading to version 1.14.0 or higher, a new kubeconfig file is generated for the admin cluster that overwrites any existing file. To view cluster details in the file, run the following command:
kubectl config view --kubeconfig ADMIN_CLUSTER_KUBECONFIG