About GPUs in Google Kubernetes Engine (GKE)

Autopilot Standard

This page describes GPUs in Google Kubernetes Engine (GKE) to help you to select the optimal GPU configuration for your workloads. If you want to deploy GPU workloads that use Slurm, see Create an AI-optimized Slurm cluster instead.

You can use GPUs to accelerate resource intensive tasks, such as machine learning and data processing. The information on this page can help you to do the following:

Ensure GPU availability when needed.
Decide whether to use GPUs in GKE Autopilot mode or GKE Standard mode clusters.
Choose GPU-related features to efficiently use your GPU capacity.
Monitor GPU node metrics.
Improve GPU workload reliability by handling disruptions more effectively.

This page is intended for Platform admins and operators and Machine learning (ML) engineers who want to ensure that accelerator infrastructure is optimized for your workloads.

Before reading this page, ensure that you're familiar with the following:

GPU selection in GKE

In GKE, the way you request GPU hardware depends on whether you are using Autopilot or Standard mode. In Autopilot, you request GPU hardware by specifying GPU resources in your workloads. In GKE Standard mode, you can attach GPU hardware to nodes in your clusters, and then allocate GPU resources to containerized workloads running on those nodes. For detailed instructions on how to attach and use GPUs in your workloads, refer to Deploy GPU workloads on Autopilot or Run GPUs on Standard node pools .

GKE offers some GPU-specific features to improve efficient GPU resource utilization of workloads running on your nodes, including time-sharing, multi-instance GPUs, and multi-instance GPUs with NVIDIA MPS.

This page helps you to consider choices for requesting GPUs in GKE, including the following:

Choosing your GPU quota , the maximum number of GPUs that can run in your project
Deciding between Autopilot and Standard modes
Manage the GPU stack through GKE or NVIDIA GPU Operator on GKE
Choosing features to reduce the amount of underutilized GPU resources
Accessing NVIDIA CUDA-X libraries for CUDA applications
Monitoring GPU node metrics
Handling disruption due to node maintenance
Use GKE Sandbox to secure GPU workloads

Available GPU models

The GPU hardware that's available for use in GKE is a subset of the GPU models available on Compute Engine . The specific hardware that's available depends on the Compute Engine region or zone of your cluster. For more information about specific availability, see GPU regions and zones .

For information about GPU pricing, see the Google Cloud SKUs and the GPU pricing page .

Plan GPU quota

Your GPU quota is the maximum number of GPUs that can run in your Google Cloud project. To use GPUs in your GKE clusters, your project must have enough GPU quota. Check the Quotaspage to ensure that you have enough GPUs available in your project.

Your GPU quota should be at least equal to the total number of GPUs you intend to run in your cluster. If you enable cluster autoscaling , you should request GPU quota at least equivalent to your cluster's maximum number of nodes multiplied by the number of GPUs per node.

For example, if you expect to utilize three nodes with two GPUs each, then six is the GPU quota required for your project.

To request additional GPU quota, follow the instructions to request a quota adjustment , using gpus as the metric.

Choose GPU support using Autopilot or Standard

GPUs are available in Autopilot and Standard clusters.

Best practice :

Use Autopilot clusters for a fully managed Kubernetes experience. In Autopilot, GKE manages driver installation, node scaling, Pod isolation, and node provisioning.

About GPUs in Google Kubernetes Engine (GKE)

GPU selection in GKE

Available GPU models

Plan GPU quota

Choose GPU support using Autopilot or Standard

Manage the GPU stack through GKE or the NVIDIA GPU Operator on GKE

Optimize resource usage using GPU features in GKE

Access the NVIDIA CUDA-X libraries for CUDA applications

Monitor your GPU node workload performance

View usage metrics for workloads

View NVIDIA Data Center GPU Manager (DCGM) metrics

Handle disruption due to node maintenance

What's next

About GPUs in Google Kubernetes Engine (GKE) Stay organized with collections Save and categorize content based on your preferences.

GPU selection in GKE

Available GPU models

Plan GPU quota

Choose GPU support using Autopilot or Standard

Manage the GPU stack through GKE or the NVIDIA GPU Operator on GKE

Optimize resource usage using GPU features in GKE

Access the NVIDIA CUDA-X libraries for CUDA applications

Monitor your GPU node workload performance

View usage metrics for workloads

View NVIDIA Data Center GPU Manager (DCGM) metrics

Handle disruption due to node maintenance

What's next

About GPUs in Google Kubernetes Engine (GKE)