Plan for large GKE clusters

This page describes the best practices you can follow when planning and designing very large-size clusters.

Why plan for large GKE clusters

Every computer system including Kubernetes has some architectural limits. Exceeding the limits may affect the performance of your cluster or in some cases even cause downtimes. Follow the best practices and execute recommended actions to ensure your clusters run your workloads reliably at scale.

Limitations of large GKE clusters

When GKE scales a cluster to a large number of nodes, GKE makes an effort to change the amount of resources available to match your system needs while staying within its service-level objectives (SLOs) . Google Cloud supports large clusters. However, based on your use case, you must consider the limitations of large clusters to better respond to your infrastructure scale requirements.

This section describes the limitations and considerations when designing large GKE clusters based on the expected number of nodes.

Clusters with up to 5,000 nodes

When designing your cluster architecture to scale up to 5,000 nodes, consider the following conditions:

Only available for regional cluster.
Only available for clusters that use Private Service Connect .
Migrating from zonal to regional clusters requires you to recreate the cluster to unlock higher node quota level.

If you expect to scale your cluster beyond 5,000 nodes, contact Cloud Customer Care to increase the cluster size and quota limit.

Clusters with more than 5,000 nodes

GKE supports large Standard clusters up to 15,000 nodes. In version 1.31 and later, GKE supports large clusters up to 65,000 nodes. The 65,000 limit is meant to be used to run large-scale AI workloads.

If you expect to scale your cluster to either 15,000 or 65,000 nodes, complete the following tasks:

Consider the following limitations:
- Cluster autoscaler is not supported. Instead, scale your node pools up or down using the GKE API.
- Multi-network is not supported.
- Services with more than 100 Pods must be headless .
- Every Pod should run on its own node, with the exception of system DaemonSets. To define Pod scheduling on specific nodes, you can use Kubernetes Pod affinity or anti-affinity .
- Migrating from zonal to regional clusters requires you to recreate the cluster to unlock higher node quota level.
- Migrating to clusters that use Private Service Connect requires you to recreate the cluster to unlock the higher node quota level.
Contact Cloud Customer Care to increase the cluster size and quota limit to either 15,000 nodes or to 65,000 nodes, depending on your scaling needs.

Best practices for splitting workloads between multiple clusters

You can run your workloads on a single, large cluster. This approach is easier to manage, more cost efficient, and provides better resource utilization than multiple clusters. However, in some cases you need to consider splitting your workload into multiple clusters:

Review Multi-cluster use cases to learn more about general requirements and scenarios for using multiple clusters.
In addition, from the scalability point of view, split your cluster when it could exceed one of the limits described in the section below or one of GKE quotas . Lowering any risk to reach the GKE limits, reduces the risk of downtime or other reliability issues.

If you decide to split your cluster, use Fleet management to simplify management of a multi-cluster fleet.

Limits and best practices

To ensure that your architecture supports large-scale GKE clusters, review the following limits and related best practices. Exceeding these limits may cause degradation of cluster performance or reliability issues.

These best practices apply to any default Kubernetes cluster with no extensions installed. Extending Kubernetes clusters with webhooks or custom resource definitions (CRDs) is common but can constrain your ability to scale the cluster.

The following table extends the main GKE quotas and limits . You should also familiarize yourself with the open-source Kubernetes limits for large-scale clusters.

The GKE version requirements mentioned in the table apply to both the nodes and the control plane.

GKE limit

Description

Best practices

The etcd database size

The maximum size of the etcd database is 6 GB. You should proactively monitor your cluster's etcd database size and configure alerts to be notified when usage approaches this limit. Exceeding the limit can cause control plane issues.

You can use the following resources to help monitor your use:

To view your current usage, go to the Quotas page to view a pre-filtered list of GKE quotas.
Use insights and recommendations to get alerts for clusters at 80%, 90%, and 95% consumption level.

For more information about how to respond when you approach the limit, see Identify clusters where etcd usage is approaching the limit .

Total size of etcd objects per type

The total size of all objects of the given resource type should not exceed 800 MB. For example, you can create 750 MB of Pod instances and 750 MB of Secrets, but you cannot create 850 MB of Secrets. If you create more than 800 MB of objects, this could lead your Kubernetes or customized controllers to fail to initialize and cause disruptions.

Keep the total size of all objects of each type stored in etcd below 800 MB. This is especially applicable to clusters using many large-sized Secrets or ConfigMaps, or a high volume of CRDs.

Number of Services for clusters where GKE Dataplane V2 is not enabled

The performance of iptables used by kube-proxy degrades if any of the following occurs:

There are too many Services.
The number of backends behind a Service is high.

This limit is eliminated when GKE Dataplane V2 is enabled.

Keep the number of Services in the cluster below 10,000.

To learn more, see Exposing applications using services .

Number of Services per namespace

The number of environment variables generated for Services might outgrow shell limits. This might cause Pods to crash on startup.

Keep the number of Services per namespace below 5,000.

You can opt-out from having those environment variables populated. See the documentation for how to set enableServiceLinks in PodSpec to false.

To learn more, see Exposing applications using Services .

Number of Pods behind a single Service for clusters where GKE Dataplane V2 is not enabled

Every node runs a kube-proxy that uses watches for monitoring any Service change. The larger a cluster, the more change-related data the agent processes. This is especially visible in clusters with more than 500 nodes.

Information about the endpoints is split between separate EndpointSlices . This split reduces the amount of data transferred on each change.

Endpoint objects are still available for components, but any endpoint above 1,000 Pods is automatically truncated .

Keep the number of Pods behind a single Service lower than 10,000.

To learn more, see exposing applications using services .

Number of Pods behind a single Service for clusters where GKE Dataplane V2 is enabled

GKE Dataplane V2 contains limits on the number of Pods exposed by a single Service.

The same limit is applicable to Autopilot clusters as they use GKE Dataplane V2.

In GKE 1.23 and earlier, keep the number of Pods behind a single Service lower than 1,000.

In GKE 1.24 and later, keep the number of Pods behind a single Service lower than 10,000.

To learn more, see Exposing applications using services .

DNS records per headless Service

The number of DNS records per Headless Service is limited for both kube-dns and Cloud DNS .

Keep the number of DNS records per headless Service below 1,000 for kube-dns and 3,500/2,000 (IPv4/IPv6) for Cloud DNS .

Number of all Service endpoints

The number of endpoints across all Services may hit limits. This may increase programming latency or result in an inability to program new endpoints at all.

Keep the number of all endpoints in all services below 260,000.

GKE Dataplane V2, which is the default dataplane for GKE Autopilot, relies on eBPF maps that are currently limited to 260,000 endpoints across all Services.

Number of Horizontal Pod Autoscaler objects per cluster

Each Horizontal Pod Autoscaler (HPA) is processed every 15 seconds.

More than 300 HPA objects can cause linear degradation of performance.

Keep the number of HPA objects within this limit; otherwise you might experience linear degradation of frequency of HPA processing. For example in GKE 1.22 with 2,000 HPAs, a single HPA will be reprocessed every 1 minute and 40 seconds.

To learn more, see autoscaling based on resources utilization and horizontal pod autoscaling scalability .

Number of Pods per node

GKE has a hard limit of 256 Pods per node. This assumes an average of two or fewer containers per Pod. If you increase the number of containers per Pod, this limit might be lower because GKE allocates more resources per container.

We recommend you use worker nodes with at least one vCPU per each 10 pods.

To learn more, see manually upgrading a cluster or node pool .

Rate of pod changes

Kubernetes has internal limits that impact the rate of creating or deleting Pods (Pods churn) in response to scaling requests. Additional factors like deleting a pod that is a part of a Service also can impact this Pod churn rate.

For clusters with up to 500 nodes, you can expect an average rate of 20 pods created per second and 20 pods deleted per second.

For clusters larger than 500 nodes, you can expect an average rate of 100 pods created per second and 100 pods deleted per second.

Take the Pod creation and deletion rate limit under consideration when planning how to scale your workloads.

Pods share the same deletion throughput with other resource types (for example, EndpointSlices). You can reduce deletion throughput when you define Pods as part of a Service.

To allow Cluster Autoscaler to effectively remove pods from underutilized nodes, avoid too restrictive PodDisruptionBudgets and long termination grace periods .

Wildcard Tolerations are also discouraged, as they can cause workloads to be scheduled on nodes that are in the process of being removed.

Number of open watches

Nodes create a watch for every Secret and ConfigMaps you configure for Pods. The combined amount of watches created by all nodes might generate substantial load on the cluster control plane.

Having more than 200,000 watches per cluster might affect the initialization time of the cluster. This issue can cause the control plane to frequently restart.

Define larger nodes to decrease the likelihood and severity of issues caused by a large number of watches. Higher pod density (fewer large-sized nodes) might reduce the number of watches and mitigate the severity of the issue.

To learn more, see the machine series comparison .

Number of Secrets per cluster if application-layer secrets encryption is enabled

A cluster must decrypt all Secrets during cluster startup when application-layer secrets encryption is enabled. If you store more than 30,000 secrets, your cluster might become unstable during startup or upgrades, causing workload outages.

Store less than 30,000 Secrets when using application-layer secrets encryption.

To learn more, see encrypt secrets at the application layer .

Log bandwidth per node

There is a limit on the maximum amount of logs sent by each node to Cloud Logging API. The default limit varies between 100 Kbps and 500 Kbps, depending on the load. For Standard clusters, you can raise the limit to 10 MiB by deploying a high-throughput Logging agent configuration. Going beyond this limit might cause log entries to be dropped.

Configure your Logging to stay within the default limits or configure a high throughput Logging agent.

To learn more, see Adjusting log throughput .

Node pools

Having a large number of node pools can impact infrastructure autoscaling latency because it increases the set of nodes that can potentially be added to the cluster. Features like workload separation or custom compute classes increase the number of node pools.

Keep the number of node pools below 200.

Backup for GKE limits

You can use Backup for GKE to backup and restore your GKE workloads.

Backup for GKE is subject to limits that you need to keep in mind when defining your backup plans.

Review the limits of Backup for GKE .

If it's possible for your workload to exceed these limits, we recommend creating multiple backup plans to partition your backup and stay within the limits.

Config Connector limits

You can use Config Connector to manage Google Cloud resources through Kubernetes. Config Connector has two modes of operation:

Cluster mode, where there is a single Config Connector instance per GKE cluster.
In this mode, a single Config Connector instance loads all the resources.
Namespace mode, where each namespace within a cluster has a separate Config Connector instance.
In this mode, you can partition managed resources via namespaces. This setup reduces the amount of resources that a single Config Connector instance needs to manage, lowering its CPU and memory usage.

Each mode has different scalability characteristic and limitations.

For details on resource limits, see Config Controller scalability guidelines . For information on managing a large number of resources, see Config Connector best practices .

What's next?

Plan for large workloads