Hotfix 14
Backup:
- Fixes an issue where an imported backup without the
createTimefield couldn't be deleted.
Lifecycle management:
-
Fixes an issue where subcomponents can fail to upgrade or reconcile.
-
Adds metric for tracking subcomponents in a non-error state.
-
Updates SLOs for lifecycle management.
-
Rewrites the OCLCM error runbooks and corresponding service manuals for accurate documentation.
-
Fixes the
OCLCM-A0101alert that fires incorrectly during monitoring setup.
Hotfix 12
Cloud DNS:
-
Fixes the issue where wildcard DNS records aren't created at the zone apex level.
-
Fixes the issue where it takes over 25 minutes to create a managed DNS zone.
Hotfix 11
Hotfix 10
Networking:
-
Fixes the Cloud NAT reconciler to be more resilient to node health fluctuations.
-
Fixes the issue where a project incorrectly keeps the
networking.gdc.goog/egress-nat-ipannotation even when egress NAT is disabled.
Hotfix 9
Networking:
- Fixes the
managed-dns-project-adminrole to exclude permissions that aren't required to manage DNS custom resources.
Hotfix 8
Networking:
- Fixes a permissions issue during the bootstrap request.
Hotfix 7
Networking:
-
Fixes an issue with the
enable-idpsannotation on organizations during version upgrades between 1.14.x and 1.15.x versions. -
Fixes unhealthy management aggregation switches due to incorrect port configurations.
Hotfix 6
Networking:
-
Namespace termination prevents Managed Harbor Service project network policies from being ready, which blocks new Harbor instances.
-
An incorrect switch model configuration causes unhealthy management aggregation switches.
Hotfix 5
Database service:
- Unable to update or delete a database cluster that has the status
FailoverInProgress.
Monitoring:
- The Managed Harbor Service monitoring target takes a long time to load.
Hotfix 4
GDC console:
- Added performance improvements for the GDC console.
Object storage:
-
Addressed the 50,000 metric cardinality limit to keep metrics flowing by removing events other than the total.
-
Decreased the range read latency by removing unnecessary calls to StorageGRID and KMS.
Hotfix 3
Cloud DNS:
-
Addressed DNS issues related to managed DNS splits, incorrect handling of PKI certificates, and the inability to delete legacy Cloud DNS records.
-
Updated DNS SLOs so they rely on actual data across both org infrastructure and root admin clusters.
-
Increased the alert firing time for legacy DNS alerts to reduce alert frequency.
-
There are TLS scrape issues for DNS metrics.
-
Added the Istio service and FRE watcher for the global
DNSRegistrationresource. -
Addressed resource addition problems for IAM roles related to the infrastructure operator group.
IAM:
-
Increased memory and replicas for AIS, and added more granular operable parameters to adjust them per container.
-
During Grafana API queries, a 302 error redirect happens causing usability issues.
Virtual machines:
- GPU resources on a node can't be repopulated after a kubelet restart,
resulting in GPU VMs getting stuck in the
Schedulingstate.
Hotfix 2
Cluster:
- There's an NPC upscaling issue within a cluster when using an OS version annotation. This applies to both bare metal and VM nodes.
Database service:
- There are conflicts during specification updates related to the health check process for highly available PostgreSQL databases.
Google Distributed Cloud for bare metal:
- There are security issues with containerd related to CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881.
- A cluster can become unresponsive during deletion, which is triggered by a busy perimeter when the cluster is reconciling.
- There's a recovery issue with the remote cluster watch.
- A cluster cache client can become unresponsive with GET and LIST operations.
- Node deletion causes a data leakage during cluster deprovisioning.
- A race condition occurs with a config map and secrets referenced in API server arguments.
Monitoring:
-
There are issues with exposing operable parameters for the tuning stack.
-
There are suboptimal CPU and memory limits related to the Cortex query stack and cache.
-
Added safeguards for cardinality and label counts.
-
Added monitoring diagnostic dashboards.
-
Added additional monitoring SLOs.
-
There are issues with the
AuditLogInventoryresource.
Object storage:
-
There's a missing runbook for a
BucketLocationConfigpermission issue. -
The default replica value for the syslog-server is too low.
-
Added S3 permissions to fetch various bucket metadata attributes.
-
Added a claim-by-force annotation to the
obj-system/allow-obs-system-ingress-trafficNetworkPolicyresource.
Operations Suite Infrastructure (OI):
- Added concurrency and retry logic to the knowledge base sync.
Operating system:
-
There are unexpected preflight job deletions during the OS controller restart.
-
Stale inventory machines and
OSPolicyresource deletion are not handled correctly. -
Added metrics, dashboard, and events for OS policy controllers.
-
The
OSPolicyReconcilerevent can degrade over time. -
The preflight job deadline and retry are not configurable, and don't persist.
-
A node target policy race condition occurs that reduces the API load.
-
Added a reason for
OSPolicyjob creation events. -
Modified predefined roles for OS monitoring and debugging.
Resource Manager:
- Unable to create an organization when using a hotpatch version.
Ticketing system:
- Inaccurate firewall signature update steps in the runbook SECOps-P0024.
Upgrade:
- Unable to complete a policy-based Google Distributed Cloud for bare metal upgrade when using a hotpatch version.
Vertex AI:
- Enabled any-to-any language translation. Previously, languages could only be translated to English or German.
Virtual machines:
-
The VM metadata server certificate isn't removed during VM deletion.
-
There is an issue when cloning a VM disk.
-
Added extra validation during a VM image import to check if the import process succeeds.
-
The VM external access was showing the egress IP address for Cloud NAT-enabled projects.
Hotfix 1
Backup and restore:
- When a backup job fails due to a NetApp ONTAP caching issue, all subsequent backup jobs fail.
Database service:
-
An instance of a highly available PostgreSQL database cluster could be incorrectly deleted.
-
Specification updates for highly available PostgreSQL databases causes conflicts.
Identity and access management:
- There are conflicting IAM and DNS role names.
Vertex AI:
-
Long-running Vertex AI Translation operation APIs return a 404 error.
-
When deploying an Online Prediction model, there's a missing Vertex AI Prediction Developer role.

