Version 1.8. This version is no longer supported. For more information see the version support policy .

Known issues

This document describes known issues for version 1.8 of Google Distributed Cloud.

/var/log/audit/ filling up disk space

Identified Versions

1.8.0+, 1.9.0+, 1.10.0+, 1.11.0+, 1.12.0+, 1.13.0+

Symptoms

/var/log/audit/ is filled with audit logs. You can check the disk usage by running sudo du -h -d 1 /var/log/audit .

Cause

Since Anthos v1.8, the Ubuntu image is hardened with CIS Level2 Benchmark. And one of the compliance rules, 4.1.2.2 Ensure audit logs are not automatically deleted , ensures the auditd setting max_log_file_action = keep_logs . This results in all the audit rules kept on the disk.

Workaround

Admin workstation

For the admin workstation, you can manually change the auditd settings to rotate the logs automatically, and then restart the auditd service:

 sed -i 's/max_log_file_action = keep_logs/max_log_file_action = rotate/g' /etc/audit/auditd.conf
sed -i 's/num_logs = .*/num_logs = 250/g' /etc/audit/auditd.conf
systemctl restart auditd

The above setting would make auditd automatically rotate its logs once it has generated more than 250 files (each with 8M size).

Cluster nodes

For cluster nodes, apply the following DaemonSet to your cluster to prevent potential issues:

  apiVersion 
 : 
  
 apps 
 / 
 v1 
 kind 
 : 
  
 DaemonSet 
 metadata 
 : 
  
 name 
 : 
  
 change 
 - 
 auditd 
 - 
 log 
 - 
 action 
  
 namespace 
 : 
  
 kube 
 - 
 system 
 spec 
 : 
  
 selector 
 : 
  
 matchLabels 
 : 
  
 app 
 : 
  
 change 
 - 
 auditd 
 - 
 log 
 - 
 action 
  
 template 
 : 
  
 metadata 
 : 
  
 labels 
 : 
  
 app 
 : 
  
 change 
 - 
 auditd 
 - 
 log 
 - 
 action 
  
 spec 
 : 
  
 hostIPC 
 : 
  
 true 
  
 hostPID 
 : 
  
 true 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 update 
 - 
 audit 
 - 
 rule 
  
 image 
 : 
  
 ubuntu 
  
 command 
 : 
  
 [ 
 "chroot" 
 , 
  
 "/host" 
 , 
  
 "bash" 
 , 
  
 "-c" 
 ] 
  
 args 
 : 
  
 - 
  
 | 
  
 while 
  
 true 
 ; 
  
 do 
  
 if 
  
 $ 
 ( 
 grep 
  
 - 
 q 
  
 "max_log_file_action = keep_logs" 
  
 /etc/audit/ 
 auditd 
 . 
 conf 
 ); 
  
 then 
  
 echo 
  
 "updating auditd max_log_file_action to rotate with a max of 250 files" 
  
 sed 
  
 - 
 i 
  
 's/max_log_file_action = keep_logs/max_log_file_action = rotate/g' 
  
 /etc/audit/ 
 auditd 
 . 
 conf 
  
 sed 
  
 - 
 i 
  
 's/num_logs = .*/num_logs = 250/g' 
  
 /etc/audit/ 
 auditd 
 . 
 conf 
  
 echo 
  
 "restarting auditd" 
  
 systemctl 
  
 restart 
  
 auditd 
  
 else 
  
 echo 
  
 "auditd setting is expected, skip update" 
  
 fi 
  
 sleep 
  
 600 
  
 done 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 host 
  
 mountPath 
 : 
  
 / 
 host 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 host 
  
 hostPath 
 : 
  
 path 
 : 
  
 /

Note that making this auditd config change would violate CIS Level2 rule 4.1.2.2 Ensure audit logs are not automatically deleted .

User cluster upgrade/update fails due to 'failed to register user cluster'

Identified Versions

1.7.0+, 1.8.0+

Symptoms

Run gkectl diagnose cluster when a previous gkectl command timed out in the following cases.

Upgrading user clusters with GKE connect enabled to 1.8 versions.
Running gkectl update cluster on 1.8 user clusters with GKE connect enabled.
Running gkectl update cluster to enable GKE connect on 1.8 user clusters.

 $  
gkectl  
diagnose  
cluster  
--kubeconfig  
kubeconfig  
--cluster-name  
foo-cluster
…  
Unhealthy  
Resources:  
OnPremUserCluster  
foo-cluster:  
not  
ready:  
ready  
condition  
is  
not  
true:  
ClusterCreateOrUpdate:  
failed  
to  
register  
user  
cluster  
 "foo-cluster" 
:  
failed  
to  
register  
cluster:  
...
...

Note that the functionality of GKE connect should not be affected. In other words, if GKE connect was functional before the command, it should remain functional.

Cause

The Connect Agent version 20210514-00-00 used in 1.8 versions is out of support.

Workaround

Please contact Google support to mitigate the issue.

systemd-timesyncd not running after reboot on Ubuntu Node

Identified Versions

1.7.1-1.7.5, 1.8.0-1.8.4, 1.9.0+

Symptoms

systemctl status systemd-timesyncd should show that the service is dead:

  ● 
  
 systemd 
 - 
 timesyncd 
 . 
 service 
  
 - 
  
 Network 
  
 Time 
  
 Synchronization 
 Loaded 
 : 
  
 loaded 
  
 ( 
 / 
 lib 
 / 
 systemd 
 / 
 system 
 / 
 systemd 
 - 
 timesyncd 
 . 
 service 
 ; 
  
 enabled 
 ; 
  
 vendor 
  
 preset 
 : 
  
 enabled 
 ) 
 Active 
 : 
  
 inactive 
  
 ( 
 dead 
 )

This could cause time out of sync issues.

Cause

chrony was incorrectly installed on Ubuntu OS image, and there's conflict between chrony and systemd-timesyncd , where systemd-timesyncd would become inactive and chrony become active everytime Ubuntu VM got rebooted. However, systemd-timesyncd should be the default ntp client for the VM.

Workaround

Option 1: Manually run restart systemd-timesyncd every time when VM got rebooted.

Option 2: Deploy the following Daemonset so that systemd-timesyncd will always be restarted if it's dead.

  apiVersion 
 : 
  
 apps/v1 
 kind 
 : 
  
 DaemonSet 
 metadata 
 : 
  
 name 
 : 
  
 ensure-systemd-timesyncd 
 spec 
 : 
  
 selector 
 : 
  
 matchLabels 
 : 
  
 name 
 : 
  
 ensure-systemd-timesyncd 
  
 template 
 : 
  
 metadata 
 : 
  
 labels 
 : 
  
 name 
 : 
  
 ensure-systemd-timesyncd 
  
 spec 
 : 
  
 hostIPC 
 : 
  
 true 
  
 hostPID 
 : 
  
 true 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 ensure-systemd-timesyncd 
  
 # Use your preferred image. 
  
 image 
 : 
  
 ubuntu 
  
 command 
 : 
  
 - 
  
 /bin/bash 
  
 - 
  
 -c 
  
 - 
  
 | 
  
 while true; do 
  
 echo $(date -u) 
  
 echo "Checking systemd-timesyncd status..." 
  
 chroot /host systemctl status systemd-timesyncd 
  
 if (( $? != 0 )) ; then 
  
 echo "Restarting systemd-timesyncd..." 
  
 chroot /host systemctl start systemd-timesyncd 
  
 else 
  
 echo "systemd-timesyncd is running." 
  
 fi; 
  
 sleep 60 
  
 done 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 host 
  
 mountPath 
 : 
  
 /host 
  
 resources 
 : 
  
 requests 
 : 
  
 memory 
 : 
  
 "10Mi" 
  
 cpu 
 : 
  
 "10m" 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 host 
  
 hostPath 
 : 
  
 path 
 : 
  
 / 
 ```` 
 ## ClientConfig custom resource 
 ` 
 gkectl update` reverts any manual changes that you have made to the ClientConfig 
 custom resource. We strongly recommend that you back up the ClientConfig 
 resource after every manual change. 
 ## gkectl check-config</code> validation fails: can't find F5 BIG-IP partitions 
< dl 
>
< dt>Symptoms</dt> 
< dd><p>Validation fails because F5 BIG-IP partitions can't be found, even though they exist.</p></dd> 
< dt>Potential causes</dt> 
< dd><p>An issue with the F5 BIG-IP API can cause validation to fail.</p></dd> 
< dt>Resolution</dt> 
< dd><p>Try running <code>gkectl check-config</code> again.</p></dd> 
< /dl 
> ## Disruption for workloads with PodDisruptionBudgets {:#workloads_pdbs_disruption} 
 Upgrading clusters can cause disruption or downtime for workloads that use 
 [ 
 PodDisruptionBudgets 
 ] 
 (https://kubernetes.io/docs/concepts/workloads/pods/disruptions/){:.external} 
  
 (PDBs). 
 ## Nodes fail to complete their upgrade process 
 If you have `PodDisruptionBudget` objects configured that are unable to 
 allow any additional disruptions, node upgrades might fail to upgrade to the 
 control plane version after repeated attempts. To prevent this failure, we 
 recommend that you scale up the `Deployment` or `HorizontalPodAutoscaler` to 
 allow the node to drain while still respecting the `PodDisruptionBudget` 
 configuration. 
 To see all `PodDisruptionBudget` objects that do not allow any disruptions 
 :

kubectl get poddisruptionbudget --all-namespaces -o jsonpath='{range .items[?(@.status.disruptionsAllowed==0)]}{.metadata.name}/{.metadata.namespace}{"\n"}{end}' ```

User cluster installation failed because of cert-manager/ca-injector's leader election issue in Anthos 1.8.2 and 1.8.3

You might see an installation failure due to cert-manager-cainjector in crashloop, when the apiserver/etcd is slow. The following command,

kubectl logs --kubeconfig USER_CLUSTER_KUBECONFIG 
-n kube-system deployments/cert-manager-cainjector

Known issues Stay organized with collections Save and categorize content based on your preferences.

/var/log/audit/ filling up disk space

Category

Identified Versions

Symptoms

Cause

Workaround

Admin workstation

Cluster nodes

User cluster upgrade/update fails due to 'failed to register user cluster'

Category

Identified Versions

Symptoms

Cause

Workaround

systemd-timesyncd not running after reboot on Ubuntu Node

Category

Identified Versions

Symptoms

Cause

Workaround

User cluster installation failed because of cert-manager/ca-injector's leader election issue in Anthos 1.8.2 and 1.8.3

Renewal of certificates might be required before an admin cluster upgrade

Admin cluster certificate renewal process

/etc/cron.daily/aide script uses up all space in /run, causing a crashloop in Pods

Upgrading Seesaw load balancer with version 1.8.0

Cannot log in to admin workstation due to password expiry issue

Prevention of password expiry error

Mitigation of password expiry error

Admin workstation

Admin cluster control plane VM

Admin cluster addon VMs

User cluster control plane VMs

User cluster worker VMs

Seesaw VMs

Restarting or upgrading vCenter for versions lower than 7.0U2

gkectl create-config admin and gkectl create-config cluster panic

Creating/upgrading admin cluster timeout

SSH connection closed by remote host

Conflict with cert-manager when upgrading to version 1.8.2 or above

Avoid conflicts during upgrade

Restore your own cert-manager in user clusters

Restore your own cert-manager in admin clusters

False positives in docker, containerd, and runc vulnerability scanning

/etc/cron.daily/aide CPU and memory spike issue

Cisco ACI doesn't work with Direct Server Return (DSR)

A service account bearer token that is too long can break Seesaw load balancer logs

Connectivity issues between Pods due to anetd daemons in software deadlock

gkectl diagnose checking certificates failure

Known issues

`gkectl create-config admin` and `gkectl create-config cluster` panic

Conflict with `cert-manager` when upgrading to version 1.8.2 or above

`/etc/cron.daily/aide` CPU and memory spike issue

Connectivity issues between Pods due to `anetd` daemons in software deadlock