Enable permissive mode on a backup plan

Autopilot Standard

This page explains how to enable permissive mode on a backup plan.

During backup execution, if Backup for GKE detects conditions that are likely to cause restore to fail, the backup itself fails. The reason for the failure is provided in the backup's state_reason field. In the Google Cloud console, this field is termed as Status reason.

About permissive mode

When backup failures aren't acceptable and it's not possible to address the underlying issues, you can enable permissive mode . Permissive mode ensures that backups complete successfully, even if GKE resources that could potentially cause restore failures are detected during the backup process. Details about the issues are provided in the backup's Status reasonfield.

We recommend using this option only if you understand the issues and can implement workarounds during the restoration process. For a list of potential error messages in the backup's Status reasonfield with recommended actions, see Troubleshoot backup failures .

Enable permissive mode

Use the following instructions to enable permissive mode:

gcloud

To enable permissive mode, run the gcloud beta container backup-restore backup-plans update command:

 gcloud  
beta  
container  
backup-restore  
backup-plans  
update  
 BACKUP_PLAN 
  
 \ 
  
--project = 
 PROJECT_ID 
  
 \ 
  
--location = 
 LOCATION 
  
--permissive-mode

Replace the following:

BACKUP_PLAN : the name of the backup plan that you want to update.
PROJECT_ID : the ID of your Google Cloud project.
LOCATION : the compute region for the resource, for example us-central1 . See About resource locations .

For a full list of options, refer to the gcloud beta container backup-restore backup-plans update documentation.

Console

Use the following instructions to enable permissive mode in the Google Cloud console:

In the Google Cloud console, go to the Google Kubernetes Enginepage.

Go to Google Kubernetes Engine
In the navigation menu, click Backup for GKE.
Click the Backup planstab.
Expand the cluster and click the plan name.
Click the Detailstab to edit the plan details.
Click Editto edit the section with Backup mode.
Click the Permissive modecheckbox and click Save changes.

Terraform

Update the existing google_gke_backup_backup_plan resource.

 resource "google_gke_backup_backup_plan" " NAME 
" {
   ...
   backup_config {
     permissive_mode = true
     ...
   }
}

Replace the following:

NAME : the name of the google_gke_backup_backup_plan that you want to update.

For more information, see gke_backup_backup_plan .

Troubleshoot backup failures

The following table provides explanations and recommended actions for various backup failure messages displayed in the backup's Status reasonfield.

Backup failure message

Message description and failure reason

Recommended action

CustomResourceDefinitions "..." have invalid schemas

Description: A Custom Resource Definition (CRD) in the cluster was originally applied as apiextensions.k8s.io/v1beta1 and lacks a structural schema required in apiextensions.k8s.io/v1 .

Reason: Backup for GKE cannot automatically define the structural schema. Restoring the CRD in Kubernetes v1.22+ clusters, where apiextensions.k8s.io/v1beta1 is not available, causes the restore to fail. This failure happens when restoring custom resources defined by the CRD.

We recommend you to use the following options:

If you manage the CRD, follow the steps in the Kubernetes documentation to specify a structural schema for your CRD.

If it's a GKE-managed CRD, you can call kubectl delete crd if there are no existing resources served by the CRD. If there are existing resources served by the CRD, you can enable permissive mode with an understanding of the restore behavior. For recommendations on common CRDs, see the documentation .

If it's a third-party CRD, consult the relevant documentation to migrate to apiextensions.k8s.io/v1 .

When permissive mode is enabled, the CRD without a structural schema won't be backed up in a Kubernetes v1.22+ cluster. To successfully restore such a backup, you need to exclude the resources served by the CRD from restore or create the CRD in the target cluster before starting the restore.

Failed to query API resources ...

Description: An API service in the cluster is misconfigured. This causes requests to the API path to return "Failed to query API resources." The underlying service may not exist or may not be ready yet.

Reason: Backup for GKE is unable to back up any resources served by the unavailable API.

Check the underlying service in the API service's spec.service to make sure it is ready.

When permissive mode is enabled, resources from the API groups that failed to load won't be backed up.

Secret ... is an auto-generated token from ServiceAccount ... referenced in Pod specs

Description: In Kubernetes v1.23 and earlier, service accounts automatically generate a token backed by a secret. However, in later versions, Kubernetes removed this auto-generated token feature. A Pod in the cluster might have mounted the secret volume to its containers' file system.

Reason: If Backup for GKE attempts to restore a service account along with its auto-generated secret and a Pod that mounts the secret volume, the restore appears to be successful. However, Kubernetes removes the secret, which causes the Pod to get stuck in container creation and fail to start.

Define the spec.serviceAccountName field in the Pod. This action ensures that the token is automatically mounted on /var/run/secrets/kubernetes.io/serviceaccount in the containers. For more information, refer to Configure Service Accounts for Pods documentation.

When permissive mode is enabled, the secret is backed up but can't be mounted in Pods in Kubernetes v1.24+ clusters.

Common Custom Resource Definitions (CRDs) with issues and recommended actions

Here are some common CRDs that have backup issues and the actions we recommend to address the issues:

capacityrequests.internal.autoscaling.k8s.io : This CRD was used temporarily in v1.21 clusters. Run kubectl delete crd capacityrequests.internal.autoscaling.k8s.io to remove the CRD.
scalingpolicies.scalingpolicy.kope.io : This CRD was used to control fluentd resources, but GKE has migrated to using fluentbit. Run kubectl delete crd scalingpolicies.scalingpolicy.kope.io to remove the CRD.
memberships.hub.gke.io : Run kubectl delete crd memberships.hub.gke.io to remove the CRD if there are no membership resources. Enable permissive mode if there are membership resources.
applications.app.k8s.io : Enable permissive mode with an understanding of restore behavior.