About quicker workload startup with fast-starting nodes

Autopilot

This page shows you how to deploy and scale workloads more quickly in Google Kubernetes Engine (GKE) clusters using fast-starting nodes. Fast-starting nodes are used in GKE with Autopilot mode on a best-effort basis when workloads use compatible configurations.

Fast-starting GKE nodes have significantly lower startup time for compatible machine families. The accelerated startup time provides you with the following benefits:

Faster cold start
Faster autoscaling
Improved Pod scheduling long-tail latency
Improved infrastructure cost efficiency

With fast-starting nodes, GKE pre-initializes hardware resources to accelerate startup time. The pre-initialized resources are available on a best-effort basis. Surge requests might only be partially served. Without fast-starting nodes, resources are initialized on-demand, and nodes are served at normal startup time.

Requirements

Fast-starting nodes require no additional configuration. GKE automatically uses fast-starting nodes if your workloads use compatible configurations. You must meet all of the following requirements to use fast-starting nodes:

Use Autopilot clusters, or run workloads in Autopilot mode in your Standard clusters .
Use the Rapid release channel .
Use any of the following compatible compute resources, with a maximum compatible boot disk size of 500 GiB:
- NVIDIA L4 GPUs ( G2 machine series )
Use the pd-balanced boot disk type.
Don't use any features that are incompatible with fast-starting nodes. For more information, see Limitations .

Limitations

The following features aren't compatible with fast-starting GKE nodes. If you use any of these features, GKE provisions nodes with the typical startup time:

G2 machine series on GKE versions earlier than 1.31.
G2 with DEFAULT GPU driver version on GKE versions earlier than 1.33.0-gke.1304000.
Secondary boot disks on versions earlier than 1.33.2-gke.1015000.
Customer-managed encryption keys (CMEK)
Spot VMs
Local SSDs
Placement policies
Multi-network support

Autopilot GPU workloads

Requesting compatible GPUs in Autopilot mode results in up to four times faster node startup time and up to two times faster Pod scheduling time than similar requests in GKE Standard mode, because the Autopilot GPU workloads can use fast-starting nodes.

The following are some example use cases. However, any Pods meeting the conditions from the Requirements section are compatible with fast-starting nodes.

ComputeClass

Request a compatible accelerator type and count in a ComputeClass, like in the following example:

  apiVersion 
 : 
  
 cloud.google.com/v1 
 kind 
 : 
  
 ComputeClass 
 metadata 
 : 
  
 name 
 : 
  
  ACCELERATOR_COMPUTE_CLASS_NAME 
 
 spec 
 : 
  
 priorities 
 : 
  
 - 
  
 gpu 
 : 
  
 type 
 : 
  
  ACCELERATOR_TYPE 
 
  
 count 
 : 
  
  ACCELERATOR_COUNT 
 
  
 nodePoolAutoCreation 
 : 
  
 enabled 
 : 
  
 true

When you select this ComputeClass in a Pod, like in the following example, GKE uses fast-starting nodes:

  apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Pod 
 metadata 
 : 
  
 name 
 : 
  
  POD_NAME 
 
 spec 
 : 
  
 nodeSelector 
 : 
  
 # Select a ComputeClass that requests compatible GPUs 
  
 cloud.google.com/compute-class 
 : 
  
  ACCELERATOR_COMPUTE_CLASS_NAME 
 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 my-container 
  
 image 
 : 
  
 gcr.io/google_containers/pause 
  
 resources 
 : 
  
 limits 
 : 
  
 nvidia.com/gpu 
 : 
  
  ACCELERATOR_COUNT

Replace the following values:

ACCELERATOR_COMPUTE_CLASS_NAME : the name of the ComputeClass that requests the accelerators.
ACCELERATOR_TYPE : the type of accelerator.
ACCELERATOR_COUNT : the number of accelerators required by the Pod. This value must be less than or equal to the value in the spec.priorities.gpu.count field in the ComputeClass.
POD_NAME : the name of your Pod.

For more information about ComputeClass, see About custom compute classes .

Pod specification

Select a compatible accelerator type and count in your Pod specification, like in the following example:

  apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Pod 
 metadata 
 : 
  
 name 
 : 
  
  POD_NAME 
 
 spec 
 : 
  
 nodeSelector 
 : 
  
 cloud.google.com/gke-accelerator 
 : 
  
  ACCELERATOR_NAME 
 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 my-container 
  
 image 
 : 
  
 gcr.io/google_containers/pause 
  
 resources 
 : 
  
 limits 
 : 
  
 nvidia.com/gpu 
 : 
  
  ACCELERATOR_COUNT