This page describes the best practices you can follow when planning and designing very large-size clusters.
Why plan for large GKE clusters
Every computer system including Kubernetes has some architectural limits. Exceeding the limits may affect the performance of your cluster or in some cases even cause downtimes. Follow the best practices and execute recommended actions to ensure your clusters run your workloads reliably at scale.
Limitations of large GKE clusters
When GKE scales a cluster to a large number of nodes, GKE makes an effort to change the amount of resources available to match your system needs while staying within its service-level objectives (SLOs) . Google Cloud supports large clusters. However, based on your use case, you must consider the limitations of large clusters to better respond to your infrastructure scale requirements.
This section describes the limitations and considerations when designing large GKE clusters based on the expected number of nodes.
Clusters with up to 5,000 nodes
When designing your cluster architecture to scale up to 5,000 nodes, consider the following conditions:
- Only available for regional cluster.
- Only available for clusters that use Private Service Connect .
- Migrating from zonal to regional clusters requires you to recreate the cluster to unlock higher node quota level.
If you expect to scale your cluster beyond 5,000 nodes, contact Cloud Customer Care to increase the cluster size and quota limit.
Clusters with more than 5,000 nodes
GKE supports large Standard clusters up to 15,000 nodes. In version 1.31 and later, GKE supports large clusters up to 65,000 nodes. The 65,000 limit is meant to be used to run large-scale AI workloads.
If you expect to scale your cluster to either 15,000 or 65,000 nodes, complete the following tasks:
-
Consider the following limitations:
- Cluster autoscaler is not supported. Instead, scale your node pools up or down using the GKE API.
- Multi-network is not supported.
- Services with more than 100 Pods must be headless .
- Every Pod should run on its own node, with the exception of system DaemonSets. To define Pod scheduling on specific nodes, you can use Kubernetes Pod affinity or anti-affinity .
- Migrating from zonal to regional clusters requires you to recreate the cluster to unlock higher node quota level.
- Migrating to clusters that use Private Service Connect requires you to recreate the cluster to unlock the higher node quota level.
-
Contact Cloud Customer Care to increase the cluster size and quota limit to either 15,000 nodes or to 65,000 nodes, depending on your scaling needs.
Best practices for splitting workloads between multiple clusters
You can run your workloads on a single, large cluster. This approach is easier to manage, more cost efficient, and provides better resource utilization than multiple clusters. However, in some cases you need to consider splitting your workload into multiple clusters:
- Review Multi-cluster use cases to learn more about general requirements and scenarios for using multiple clusters.
- In addition, from the scalability point of view, split your cluster when it could exceed one of the limits described in the section below or one of GKE quotas . Lowering any risk to reach the GKE limits, reduces the risk of downtime or other reliability issues.
If you decide to split your cluster, use Fleet management to simplify management of a multi-cluster fleet.
Limits and best practices
To ensure that your architecture supports large-scale GKE clusters, review the following limits and related best practices. Exceeding these limits may cause degradation of cluster performance or reliability issues.
These best practices apply to any default Kubernetes cluster with no extensions installed. Extending Kubernetes clusters with webhooks or custom resource definitions (CRDs) is common but can constrain your ability to scale the cluster.
The following table extends the main GKE quotas and limits . You should also familiarize yourself with the open-source Kubernetes limits for large-scale clusters.
The GKE version requirements mentioned in the table apply to both the nodes and the control plane.
You can use the following resources to help monitor your use:
- To view your current usage, go to the Quotas page to view a pre-filtered list of GKE quotas.
- Use insights and recommendations to get alerts for clusters at 80%, 90%, and 95% consumption level.
For more information about how to respond when you approach the limit, see Identify clusters where etcd usage is approaching the limit .
Keep the total size of all objects of each type stored in etcd below 800 MB. This is especially applicable to clusters using many large-sized Secrets or ConfigMaps, or a high volume of CRDs.
- There are too many Services.
- The number of backends behind a Service is high.
This limit is eliminated when GKE Dataplane V2 is enabled.
Keep the number of Services in the cluster below 10,000.
To learn more, see Exposing applications using services .
Keep the number of Services per namespace below 5,000.
You can opt-out from having those environment variables populated. See the documentation for how to set enableServiceLinks
in PodSpec to false.
To learn more, see Exposing applications using Services .
Every node runs a kube-proxy that uses watches for monitoring any Service change. The larger a cluster, the more change-related data the agent processes. This is especially visible in clusters with more than 500 nodes.
Information about the endpoints is split
between separate EndpointSlices
. This split reduces the amount of data
transferred on each change.
Endpoint objects are still available for components, but any endpoint above 1,000 Pods is automatically truncated .
Keep the number of Pods behind a single Service lower than 10,000.
To learn more, see exposing applications using services .
GKE Dataplane V2 contains limits on the number of Pods exposed by a single Service.
The same limit is applicable to Autopilot clusters as they use GKE Dataplane V2.
In GKE 1.23 and earlier, keep the number of Pods behind a single Service lower than 1,000.
In GKE 1.24 and later, keep the number of Pods behind a single Service lower than 10,000.
To learn more, see Exposing applications using services .
Keep the number of DNS records per headless Service below 1,000 for kube-dns and 3,500/2,000 (IPv4/IPv6) for Cloud DNS .
Keep the number of all endpoints in all services below 260,000.
GKE Dataplane V2, which is the default dataplane for GKE Autopilot, relies on eBPF maps that are currently limited to 260,000 endpoints across all Services.
Each Horizontal Pod Autoscaler (HPA) is processed every 15 seconds.
More than 300 HPA objects can cause linear degradation of performance.
Keep the number of HPA objects within this limit; otherwise you might experience linear degradation of frequency of HPA processing. For example in GKE 1.22 with 2,000 HPAs, a single HPA will be reprocessed every 1 minute and 40 seconds.
To learn more, see autoscaling based on resources utilization and horizontal pod autoscaling scalability .
We recommend you use worker nodes with at least one vCPU per each 10 pods.
To learn more, see manually upgrading a cluster or node pool .
Kubernetes has internal limits that impact the rate of creating or deleting Pods (Pods churn) in response to scaling requests. Additional factors like deleting a pod that is a part of a Service also can impact this Pod churn rate.
For clusters with up to 500 nodes, you can expect an average rate of 20 pods created per second and 20 pods deleted per second.
For clusters larger than 500 nodes, you can expect an average rate of 100 pods created per second and 100 pods deleted per second.
Take the Pod creation and deletion rate limit under consideration when planning how to scale your workloads.
Pods share the same deletion throughput with other resource types (for example, EndpointSlices). You can reduce deletion throughput when you define Pods as part of a Service.
To allow Cluster Autoscaler to effectively remove pods from underutilized nodes, avoid too restrictive PodDisruptionBudgets and long termination grace periods .
Wildcard Tolerations are also discouraged, as they can cause workloads to be scheduled on nodes that are in the process of being removed.
Nodes create a watch for every Secret and ConfigMaps you configure for Pods. The combined amount of watches created by all nodes might generate substantial load on the cluster control plane.
Having more than 200,000 watches per cluster might affect the initialization time of the cluster. This issue can cause the control plane to frequently restart.
Define larger nodes to decrease the likelihood and severity of issues caused by a large number of watches. Higher pod density (fewer large-sized nodes) might reduce the number of watches and mitigate the severity of the issue.
To learn more, see the machine series comparison .
Store less than 30,000 Secrets when using application-layer secrets encryption.
To learn more, see encrypt secrets at the application layer .
There is a limit on the maximum amount of logs sent by each node to Cloud Logging API. The default limit varies between 100 Kbps and 500 Kbps, depending on the load. For Standard clusters, you can raise the limit to 10 MiB by deploying a high-throughput Logging agent configuration. Going beyond this limit might cause log entries to be dropped.
Configure your Logging to stay within the default limits or configure a high throughput Logging agent.
To learn more, see Adjusting log throughput .
You can use Backup for GKE to backup and restore your GKE workloads.
Backup for GKE is subject to limits that you need to keep in mind when defining your backup plans.
Review the limits of Backup for GKE .
If it's possible for your workload to exceed these limits, we recommend creating multiple backup plans to partition your backup and stay within the limits.
You can use Config Connector to manage Google Cloud resources through Kubernetes. Config Connector has two modes of operation:
- Cluster
mode, where there is a single Config Connector instance per GKE cluster.
In this mode, a single Config Connector instance loads all the resources.
- Namespace
mode,
where each namespace within a cluster has a separate Config Connector instance.
In this mode, you can partition managed resources via namespaces. This setup reduces the amount of resources that a single Config Connector instance needs to manage, lowering its CPU and memory usage.
For details on resource limits, see Config Controller scalability guidelines . For information on managing a large number of resources, see Config Connector best practices .