Version 1.9. This version is no longer supported. For more information see the version support policy .

Known issues

This document describes known issues for version 1.9 of Google Distributed Cloud.

/var/log/audit/ filling up disk space

Identified Versions

1.8.0+, 1.9.0+, 1.10.0+, 1.11.0+, 1.12.0+, 1.13.0+

Symptoms

/var/log/audit/ is filled with audit logs. You can check the disk usage by running sudo du -h -d 1 /var/log/audit .

Cause

Since Anthos v1.8, the Ubuntu image is hardened with CIS Level2 Benchmark. And one of the compliance rules, 4.1.2.2 Ensure audit logs are not automatically deleted , ensures the auditd setting max_log_file_action = keep_logs . This results in all the audit rules kept on the disk.

Workaround

Admin workstation

For the admin workstation, you can manually change the auditd settings to rotate the logs automatically, and then restart the auditd service:

 sed -i 's/max_log_file_action = keep_logs/max_log_file_action = rotate/g' /etc/audit/auditd.conf
sed -i 's/num_logs = .*/num_logs = 250/g' /etc/audit/auditd.conf
systemctl restart auditd

The above setting would make auditd automatically rotate its logs once it has generated more than 250 files (each with 8M size).

Cluster nodes

For cluster nodes, apply the following DaemonSet to your cluster to prevent potential issues:

  apiVersion 
 : 
  
 apps 
 / 
 v1 
 kind 
 : 
  
 DaemonSet 
 metadata 
 : 
  
 name 
 : 
  
 change 
 - 
 auditd 
 - 
 log 
 - 
 action 
  
 namespace 
 : 
  
 kube 
 - 
 system 
 spec 
 : 
  
 selector 
 : 
  
 matchLabels 
 : 
  
 app 
 : 
  
 change 
 - 
 auditd 
 - 
 log 
 - 
 action 
  
 template 
 : 
  
 metadata 
 : 
  
 labels 
 : 
  
 app 
 : 
  
 change 
 - 
 auditd 
 - 
 log 
 - 
 action 
  
 spec 
 : 
  
 hostIPC 
 : 
  
 true 
  
 hostPID 
 : 
  
 true 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 update 
 - 
 audit 
 - 
 rule 
  
 image 
 : 
  
 ubuntu 
  
 command 
 : 
  
 [ 
 "chroot" 
 , 
  
 "/host" 
 , 
  
 "bash" 
 , 
  
 "-c" 
 ] 
  
 args 
 : 
  
 - 
  
 | 
  
 while 
  
 true 
 ; 
  
 do 
  
 if 
  
 $ 
 ( 
 grep 
  
 - 
 q 
  
 "max_log_file_action = keep_logs" 
  
 /etc/audit/ 
 auditd 
 . 
 conf 
 ); 
  
 then 
  
 echo 
  
 "updating auditd max_log_file_action to rotate with a max of 250 files" 
  
 sed 
  
 - 
 i 
  
 's/max_log_file_action = keep_logs/max_log_file_action = rotate/g' 
  
 /etc/audit/ 
 auditd 
 . 
 conf 
  
 sed 
  
 - 
 i 
  
 's/num_logs = .*/num_logs = 250/g' 
  
 /etc/audit/ 
 auditd 
 . 
 conf 
  
 echo 
  
 "restarting auditd" 
  
 systemctl 
  
 restart 
  
 auditd 
  
 else 
  
 echo 
  
 "auditd setting is expected, skip update" 
  
 fi 
  
 sleep 
  
 600 
  
 done 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 host 
  
 mountPath 
 : 
  
 / 
 host 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 host 
  
 hostPath 
 : 
  
 path 
 : 
  
 /

Note that making this auditd config change would violate CIS Level2 rule 4.1.2.2 Ensure audit logs are not automatically deleted .

systemd-timesyncd not running after reboot on Ubuntu Node

Identified Versions

1.7.1-1.7.5, 1.8.0-1.8.4, 1.9.0+

Symptoms

systemctl status systemd-timesyncd should show that the service is dead:

  ● 
  
 systemd 
 - 
 timesyncd 
 . 
 service 
  
 - 
  
 Network 
  
 Time 
  
 Synchronization 
 Loaded 
 : 
  
 loaded 
  
 ( 
 / 
 lib 
 / 
 systemd 
 / 
 system 
 / 
 systemd 
 - 
 timesyncd 
 . 
 service 
 ; 
  
 enabled 
 ; 
  
 vendor 
  
 preset 
 : 
  
 enabled 
 ) 
 Active 
 : 
  
 inactive 
  
 ( 
 dead 
 )

This could cause time out of sync issues.

Cause

chrony was incorrectly installed on Ubuntu OS image, and there's conflict between chrony and systemd-timesyncd , where systemd-timesyncd would become inactive and chrony become active everytime Ubuntu VM got rebooted. However, systemd-timesyncd should be the default ntp client for the VM.

Workaround

Option 1: Manually run restart systemd-timesyncd every time when VM got rebooted.

Option 2: Deploy the following Daemonset so that systemd-timesyncd will always be restarted if it's dead.

  apiVersion 
 : 
  
 apps/v1 
 kind 
 : 
  
 DaemonSet 
 metadata 
 : 
  
 name 
 : 
  
 ensure-systemd-timesyncd 
 spec 
 : 
  
 selector 
 : 
  
 matchLabels 
 : 
  
 name 
 : 
  
 ensure-systemd-timesyncd 
  
 template 
 : 
  
 metadata 
 : 
  
 labels 
 : 
  
 name 
 : 
  
 ensure-systemd-timesyncd 
  
 spec 
 : 
  
 hostIPC 
 : 
  
 true 
  
 hostPID 
 : 
  
 true 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 ensure-systemd-timesyncd 
  
 # Use your preferred image. 
  
 image 
 : 
  
 ubuntu 
  
 command 
 : 
  
 - 
  
 /bin/bash 
  
 - 
  
 -c 
  
 - 
  
 | 
  
 while true; do 
  
 echo $(date -u) 
  
 echo "Checking systemd-timesyncd status..." 
  
 chroot /host systemctl status systemd-timesyncd 
  
 if (( $? != 0 )) ; then 
  
 echo "Restarting systemd-timesyncd..." 
  
 chroot /host systemctl start systemd-timesyncd 
  
 else 
  
 echo "systemd-timesyncd is running." 
  
 fi; 
  
 sleep 60 
  
 done 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 host 
  
 mountPath 
 : 
  
 /host 
  
 resources 
 : 
  
 requests 
 : 
  
 memory 
 : 
  
 "10Mi" 
  
 cpu 
 : 
  
 "10m" 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 host 
  
 hostPath 
 : 
  
 path 
 : 
  
 /

ClientConfig custom resource

gkectl update reverts any manual changes that you have made to the ClientConfig custom resource. We strongly recommend that you back up the ClientConfig resource after every manual change.

gkectl check-config validation fails: can't find F5 BIG-IP partitions

Symptoms: Validation fails because F5 BIG-IP partitions can't be found, even though they exist.
Potential causes: An issue with the F5 BIG-IP API can cause validation to fail.
Resolution: Try running gkectl check-config again.

Disruption for workloads with PodDisruptionBudgets

Upgrading clusters can cause disruption or downtime for workloads that use PodDisruptionBudgets (PDBs).

Nodes fail to complete their upgrade process

If you have PodDisruptionBudget objects configured that are unable to allow any additional disruptions, node upgrades might fail to upgrade to the control plane version after repeated attempts. To prevent this failure, we recommend that you scale up the Deployment or HorizontalPodAutoscaler to allow the node to drain while still respecting the PodDisruptionBudget configuration.

To see all PodDisruptionBudget objects that do not allow any disruptions:

 kubectl get poddisruptionbudget --all-namespaces -o jsonpath='{range .items[?(@.status.disruptionsAllowed==0)]}{.metadata.name}/{.metadata.namespace}{"\n"}{end}'

User cluster installation failed because of cert-manager/ca-injector's leader election issue in Anthos 1.9.0

You might see an installation failure due to cert-manager-cainjector in crashloop, when the apiserver/etcd is slow. The following command,

kubectl logs --kubeconfig USER_CLUSTER_KUBECONFIG 
-n kube-system deployments/cert-manager-cainjector

Known issues Stay organized with collections Save and categorize content based on your preferences.

/var/log/audit/ filling up disk space

Category

Identified Versions

Symptoms

Cause

Workaround

Admin workstation

Cluster nodes

systemd-timesyncd not running after reboot on Ubuntu Node

Category

Identified Versions

Symptoms

Cause

Workaround

ClientConfig custom resource

gkectl check-config validation fails: can't find F5 BIG-IP partitions

Disruption for workloads with PodDisruptionBudgets

Nodes fail to complete their upgrade process

User cluster installation failed because of cert-manager/ca-injector's leader election issue in Anthos 1.9.0

Renewal of certificates might be required before an admin cluster upgrade

Admin cluster certificate renewal process

Restarting or upgrading vCenter for versions lower than 7.0U2

SSH connection closed by remote host

Conflict with cert-manager when upgrading to version 1.9.0 or 1.9.1

Avoid conflicts during upgrade

Restore your own cert-manager in user clusters

Restore your own cert-manager in admin clusters

Conflict with cert-manager when upgrading to version 1.9.2 or above

Avoid conflicts during upgrade

Restore your own cert-manager in user clusters

Restore your own cert-manager in admin clusters

False positives in docker, containerd, and runc vulnerability scanning

Unhealthy konnectivity server Pods when using the Seesaw or manual mode load balancer

/etc/cron.daily/aide CPU and memory spike issue

Load balancers and NSX-T stateful distributed firewall rules interact unpredictably

Failure to register admin cluster during creation

Using Anthos Identity Service can cause the Connect Agent to restart unpredictably

High network traffic to monitoring.googleapis.com

Missing metrics on some nodes

Cisco ACI doesn't work with Direct Server Return (DSR)

gkectl diagnose checking certificates failure

Known issues

Conflict with `cert-manager` when upgrading to version 1.9.0 or 1.9.1

Conflict with `cert-manager` when upgrading to version 1.9.2 or above

`/etc/cron.daily/aide` CPU and memory spike issue