s

Manage GPU devices with dynamic resource allocation

This page describes how to configure your GPU workloads to use dynamic resource allocation in your Google Distributed Cloud bare metal clusters. Dynamic resource allocation is a Kubernetes API that lets you request and share generic resources, such as GPUs, among Pods and containers. Third-party drivers manage these resources.

With dynamic resource allocation, Kubernetes schedules Pods based on the referenced device configuration. App operators don't need to select specific nodes in their workloads and don't need to ensure that each Pod requests exactly the number of devices that are attached to those nodes. This process is similar to allocating volumes for storage.

This capability helps you run AI workloads by dynamically and precisely allocating the GPU resources within your bare metal clusters, improving resource utilization and performance for demanding workloads.

This page is for Admins and architects and Operators who manage the lifecycle of the underlying tech infrastructure. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE Enterprise user roles and tasks .

Before you begin

Before you configure your GPU workloads to use dynamic resource allocation, verify that the following prerequisites are met:

  • Your bare metal cluster is at version 1.33.0 or later.
  • Your operating system is either Ubuntu 22.04 or Red Hat Enterprise Linux (RHEL) 9.4.
  • You have updated your cluster to enable dynamic resource allocation as described in Enable dynamic resource allocation .
  • You have at least one node machine with a GPU attached and the NVIDIA GPU driver installed. For more information, see Install or uninstall the bundled NVIDIA GPU Operator .
  • You have followed the instructions in NVIDIA DRA Driver for GPUs to install the NVIDIA DRA driver on all GPU-attached nodes.

Create GPU workloads that use dynamic resource allocation

For your GPU workloads to take advantage of dynamic resource allocation to request GPUs, they must be in a shared namespace with a ResourceClaim that describes the request for GPU device allocation. Your workloads must reference the ResourceClaim for Kubernetes to assign GPU resources.

The following steps set up an environment in which your workloads use dynamic resource allocation to request GPU resources:

  1. To create resources related to dynamic resource allocation, create a new Namespace in your cluster:

     cat  
    <<EOF  
     | 
      
    kubectl  
    apply  
    --kubeconfig = 
     CLUSTER_KUBECONFIG 
      
    -f  
    -
    apiVersion:  
    v1
    kind:  
    Namespace
    metadata:  
    name:  
     NAMESPACE_NAME 
    EOF 
    

    Replace the following:

    • CLUSTER_KUBECONFIG : the path of the user cluster kubeconfig file.

    • NAMESPACE_NAME with the name for your dynamic resource allocation namespace.

  2. Create a ResourceClaim to describe the request for GPU access:

     cat  
    <<EOF  
     | 
      
    kubectl  
    apply  
    --kubeconfig = 
     CLUSTER_KUBECONFIG 
      
    -f  
    -
    apiVersion:  
    resource.k8s.io/v1beta1
    kind:  
    ResourceClaim
    metadata:  
    namespace:  
     NAMESPACE_NAME 
      
    name:  
     RESOURCE_CLAIM_NAME 
    spec:  
    devices:  
    requests:  
    -  
    name:  
    gpu  
    deviceClassName:  
    gpu.nvidia.com
    EOF 
    

    Replace RESOURCE_CLAIM_NAME with the name of your resource claim for GPU requests.

  3. Create workloads that reference the ResourceClaim created in the preceding step.

    The following workload examples show how to reference a ResourceClaim named gpu-claim in the dra-test namespace. The containers in the pod1 Pod are NVIDIA compute unified device architecture (CUDA) samples designed to run CUDA workloads on the GPUs. When the pod1 Pod completes successfully, it indicates that the dynamic resource allocation capability is working properly and dynamic resource allocation is ready to manage GPU resources in your cluster.

    Ubuntu

    1. Use the following command to apply the manifest to your cluster:

       cat  
      <<EOF  
       | 
        
      kubectl  
      apply  
      --kubeconfig = 
       CLUSTER_KUBECONFIG 
        
      -f  
      -
      apiVersion:  
      v1
      kind:  
      Pod
      metadata:  
      name:  
      pod1  
      namespace:  
      dra-test
      spec:  
      restartPolicy:  
      OnFailure   
      resourceClaims:  
      -  
      name:  
      gpu  
      resourceClaimName:  
      gpu-claim  
      containers:  
      -  
      name:  
      ctr0  
      image:  
      nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0   
      resources:  
      claims:  
      -  
      name:  
      gpu  
      -  
      name:  
      ctr1  
      image:  
      nvcr.io/nvidia/k8s/cuda-sample:devicequery   
      resources:  
      claims:  
      -  
      name:  
      gpuEOF 
      

    RHEL

    1. Download and install SELinux policy module nvidia_container_t , which is required to access GPUs.

      For more information, refer to the NVIDIA dgx-selinux repository.

    2. Use the following command to apply the manifest to your cluster:

       cat  
      <<EOF  
       | 
        
      kubectl  
      apply  
      --kubeconfig = 
       CLUSTER_KUBECONFIG 
        
      -f  
      -
      apiVersion:  
      v1
      kind:  
      Pod
      metadata:  
      name:  
      pod1  
      namespace:  
      dra-test
      spec:  
      restartPolicy:  
      OnFailure  
      securityContext:  
      seLinuxOptions:  
      type:  
      nvidia_container_t   
      resourceClaims:  
      -  
      name:  
      gpu  
      resourceClaimName:  
      gpu-claim  
      containers:  
      -  
      name:  
      ctr0  
      image:  
      nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0   
      resources:  
      claims:  
      -  
      name:  
      gpu  
      -  
      name:  
      ctr1  
      image:  
      nvcr.io/nvidia/k8s/cuda-sample:devicequery   
      resources:  
      claims:  
      -  
      name:  
      gpuEOF 
      

Limitations

Consider the following limitations when you use dynamic resource allocation:

  • When you use RHEL OS, SELinux policy can interfere with containers that try to access GPUs. For more information, see How to use GPUs in containers on bare metal RHEL 8 .

  • This feature uses the resource.k8s.io/v1beta1 API group, which differs from the open source Kubernetes API group for this feature, resource.k8s.io/v1 . The v1 open source API group provides more features and better stability than the v1beta1 API group.

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: