Dynamically allocate devices to workloads with DRA

You can flexibly request devices for your Google Kubernetes Engine (GKE) workloads by using dynamic resource allocation (DRA) . This document shows you how to create a ResourceClaimTemplate to request devices, and then create a workload to observe how Kubernetes flexibly allocates the devices to your Pods.

This document is intended for Application operators and Data engineers who run workloads like AI/ML or high performance computing (HPC).

About requesting devices with DRA

When you set up your GKE infrastructure for DRA, the DRA drivers on your nodes create DeviceClass objects in the cluster. A DeviceClass defines a category of devices, such as GPUs, that are available to request for workloads. A platform administrator can optionally deploy additional DeviceClasses that limit which devices you can request in specific workloads.

To request devices within a DeviceClass, you create one of the following objects:

  • ResourceClaim: A ResourceClaim lets a Pod or a user request hardware resources by filtering for certain parameters within a DeviceClass.
  • ResourceClaimTemplate: A ResourceClaimTemplate defines a template that Pods can use to automatically create new per-Pod ResourceClaims.

For more information about ResourceClaims and ResourceClaimTemplates, see When to use ResourceClaims and ResourceClaimTemplates .

The examples on this page use a basic ResourceClaimTemplate to request the specified device configuration. For more information about all of the fields that you can specify, see the ResourceClaimTemplate API reference .

Limitations

  • Node auto-provisioning isn't supported.
  • Autopilot clusters don't support DRA.
  • You can't use the following GPU sharing features:
    • Time-sharing GPUs
    • Multi-instance GPUs
    • Multi-process Service (MPS)

Requirements

To use DRA, your GKE version must be version 1.34 or later.

You should also be familiar with the following requirements and limitations:

Before you begin

Before you start, make sure that you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.

Use DRA to deploy workloads

To request per-Pod device allocation, you create a ResourceClaimTemplate that has your requested device configuration, such as GPUs of a specific type. When you deploy a workload that references the ResourceClaimTemplate, Kubernetes creates ResourceClaims for each Pod in the workload based on the ResourceClaimTemplate. Kubernetes allocates the requested resources and schedules the Pods on corresponding nodes.

To request devices in a workload with DRA, select one of the following options:

GPU

  1. Save the following manifest as claim-template.yaml :

      apiVersion 
     : 
      
     resource.k8s.io/v1 
     kind 
     : 
      
     ResourceClaimTemplate 
     metadata 
     : 
      
     name 
     : 
      
     gpu-claim-template 
     spec 
     : 
      
     spec 
     : 
      
     devices 
     : 
      
     requests 
     : 
      
     - 
      
     name 
     : 
      
     single-gpu 
      
     exactly 
     : 
      
     deviceClassName 
     : 
      
     gpu.nvidia.com 
      
     allocationMode 
     : 
      
     ExactCount 
      
     count 
     : 
      
     1 
     
    
  2. Create the ResourceClaimTemplate:

     kubectl  
    create  
    -f  
    claim-template.yaml 
    
  3. To create a workload that references the ResourceClaimTemplate, save the following manifest as dra-gpu-example.yaml :

      apiVersion 
     : 
      
     apps/v1 
     kind 
     : 
      
     Deployment 
     metadata 
     : 
      
     name 
     : 
      
     dra-gpu-example 
     spec 
     : 
      
     replicas 
     : 
      
     1 
      
     selector 
     : 
      
     matchLabels 
     : 
      
     app 
     : 
      
     dra-gpu-example 
      
     template 
     : 
      
     metadata 
     : 
      
     labels 
     : 
      
     app 
     : 
      
     dra-gpu-example 
      
     spec 
     : 
      
     containers 
     : 
      
     - 
      
     name 
     : 
      
     ctr 
      
     image 
     : 
      
     ubuntu:22.04 
      
     command 
     : 
      
     [ 
     "bash" 
     , 
      
     "-c" 
     ] 
      
     args 
     : 
      
     [ 
     "echo 
      
     $(nvidia-smi 
      
     -L 
      
     || 
      
     echo 
      
     Waiting...)" 
     ] 
      
     resources 
     : 
      
     claims 
     : 
      
     - 
      
     name 
     : 
      
     single-gpu 
      
     resourceClaims 
     : 
      
     - 
      
     name 
     : 
      
     single-gpu 
      
     resourceClaimTemplateName 
     : 
      
     gpu-claim-template 
      
     tolerations 
     : 
      
     - 
      
     key 
     : 
      
     "nvidia.com/gpu" 
      
     operator 
     : 
      
     "Exists" 
      
     effect 
     : 
      
     "NoSchedule" 
     
    
  4. Deploy the workload:

     kubectl  
    create  
    -f  
    dra-gpu-example.yaml 
    

TPU

  1. Save the following manifest as claim-template.yaml :

      apiVersion 
     : 
      
     resource.k8s.io/v1 
     kind 
     : 
      
     ResourceClaimTemplate 
     metadata 
     : 
      
     name 
     : 
      
     tpu-claim-template 
     spec 
     : 
      
     spec 
     : 
      
     devices 
     : 
      
     requests 
     : 
      
     - 
      
     name 
     : 
      
     all-tpus 
      
     exactly 
     : 
      
     deviceClassName 
     : 
      
     tpu.google.com 
      
     allocationMode 
     : 
      
     All 
     
    

    This ResourceClaimTemplate requests that GKE allocate an entire TPU node pool to every ResourceClaim.

  2. Create the ResourceClaimTemplate:

     kubectl  
    create  
    -f  
    claim-template.yaml 
    
  3. To create a workload that references the ResourceClaimTemplate, save the following manifest as dra-tpu-example.yaml :

      apiVersion 
     : 
      
     apps/v1 
     kind 
     : 
      
     Deployment 
     metadata 
     : 
      
     name 
     : 
      
     dra-tpu-example 
     spec 
     : 
      
     replicas 
     : 
      
     1 
      
     selector 
     : 
      
     matchLabels 
     : 
      
     app 
     : 
      
     dra-tpu-example 
      
     template 
     : 
      
     metadata 
     : 
      
     labels 
     : 
      
     app 
     : 
      
     dra-tpu-example 
      
     spec 
     : 
      
     containers 
     : 
      
     - 
      
     name 
     : 
      
     ctr 
      
     image 
     : 
      
     ubuntu:22.04 
      
     command 
     : 
      
     - 
      
     /bin/sh 
      
     - 
      
     -c 
      
     - 
      
     | 
      
     echo "Environment Variables:" 
      
     env 
      
     echo "Sleeping indefinitely..." 
      
     sleep infinity 
      
     resources 
     : 
      
     claims 
     : 
      
     - 
      
     name 
     : 
      
     all-tpus 
      
     resourceClaims 
     : 
      
     - 
      
     name 
     : 
      
     all-tpus 
      
     resourceClaimTemplateName 
     : 
      
     tpu-claim-template 
      
     tolerations 
     : 
      
     - 
      
     key 
     : 
      
     "google.com/tpu" 
      
     operator 
     : 
      
     "Exists" 
      
     effect 
     : 
      
     "NoSchedule" 
     
    
  4. Deploy the workload:

     kubectl  
    create  
    -f  
    dra-tpu-example.yaml 
    

Verify the hardware allocation

You can verify that your workloads have been allocated hardware by checking the ResourceClaim or by looking at the logs for your Pod. To verify the allocation for GPUs or TPUs, select one of the following options:

GPU

  1. Get the ResourceClaim associated with the workload that you deployed:

     kubectl  
    get  
    resourceclaims 
    

    The output is similar to the following:

     NAME                                               STATE                AGE
    dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdh   allocated,reserved   9s 
    
  2. Get more details about the hardware assigned to the Pod:

     kubectl  
    describe  
    resourceclaims  
     RESOURCECLAIM 
     
    

    Replace RESOURCECLAIM with the full name of the ResourceClaim that you got from the output of the previous step.

    The output is similar to the following:

     Name:         dra-gpu-example-68f595d7dc-prv27-single-gpu-qgjq5
       Namespace:    default
       Labels:       <none>
       Annotations:  resource.kubernetes.io/pod-claim-name: single-gpu
       API Version:  resource.k8s.io/v1
       Kind:         ResourceClaim
       Metadata:
       # Multiple lines are omitted here.
       Spec:
         Devices:
           Requests:
             Exactly:
               Allocation Mode:    ExactCount
               Count:              1
               Device Class Name:  gpu.nvidia.com
             Name:                 single-gpu
       Status:
         Allocation:
           Devices:
             Results:
               Device:   gpu-0
               Driver:   gpu.nvidia.com
               Pool:     gke-cluster-1-dra-gpu-pool-b56c4961-7vnm
               Request:  single-gpu
           Node Selector:
             Node Selector Terms:
               Match Fields:
                 Key:       metadata.name
                 Operator:  In
                 Values:
                   gke-cluster-1-dra-gpu-pool-b56c4961-7vnm
         Reserved For:
           Name:      dra-gpu-example-68f595d7dc-prv27
           Resource:  pods
           UID:       e16c2813-08ef-411b-8d92-a72f27ebf5ef
       Events:        <none>
       ``` 
    
  3. Get logs for the workload that you deployed:

     kubectl  
    logs  
    deployment/dra-gpu-example  
    --all-pods = 
     true 
     
    

    The output is similar to the following:

     [pod/dra-gpu-example-64b75dc6b-x8bd6/ctr] GPU 0: Tesla T4 (UUID: GPU-2087ac7a-f781-8cd7-eb6b-b00943cc13ef) 
    

    The output of these steps shows that GKE allocated one GPU to the container.

TPU

  1. Get the ResourceClaim associated with the workload that you deployed:

     kubectl  
    get  
    resourceclaims  
     | 
      
    grep  
    dra-tpu-example 
    

    The output is similar to the following:

     NAME                                               STATE                AGE
    dra-tpu-example-64b75dc6b-x8bd6-all-tpus-jwwdh     allocated,reserved   9s 
    
  2. Get more details about the hardware assigned to the Pod:

     kubectl  
    describe  
    resourceclaims  
     RESOURCECLAIM 
      
    -o  
    yaml 
    

    Replace RESOURCECLAIM with the full name of the ResourceClaim that you got from the output of the previous step.

    The output is similar to the following:

      apiVersion 
     : 
      
     resource.k8s.io/v1beta1 
     kind 
     : 
      
     ResourceClaim 
     metadata 
     : 
      
     annotations 
     : 
      
     resource.kubernetes.io/pod-claim-name 
     : 
      
     all-tpus 
      
     creationTimestamp 
     : 
      
     "2025-03-04T21:00:54Z" 
      
     finalizers 
     : 
      
     - 
      
     resource.kubernetes.io/delete-protection 
      
     generateName 
     : 
      
     dra-tpu-example-59b8785697-k9kzd-all-gpus- 
      
     name 
     : 
      
     dra-tpu-example-59b8785697-k9kzd-all-gpus-gnr7z 
      
     namespace 
     : 
      
     default 
      
     ownerReferences 
     : 
      
     - 
      
     apiVersion 
     : 
      
     v1 
      
     blockOwnerDeletion 
     : 
      
     true 
      
     controller 
     : 
      
     true 
      
     kind 
     : 
      
     Pod 
      
     name 
     : 
      
     dra-tpu-example-59b8785697-k9kzd 
      
     uid 
     : 
      
     c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f 
      
     resourceVersion 
     : 
      
     "12189603" 
      
     uid 
     : 
      
     279b5014-340b-4ef6-9dda-9fbf183fbb71 
     spec 
     : 
      
     devices 
     : 
      
     requests 
     : 
      
     - 
      
     allocationMode 
     : 
      
     All 
      
     deviceClassName 
     : 
      
     tpu.google.com 
      
     name 
     : 
      
     all-tpus 
     status 
     : 
      
     allocation 
     : 
      
     devices 
     : 
      
     results 
     : 
      
     - 
      
     adminAccess 
     : 
      
     null 
      
     device 
     : 
      
     "0" 
      
     driver 
     : 
      
     tpu.google.com 
      
     pool 
     : 
      
     gke-tpu-2ec29193-bcc0 
      
     request 
     : 
      
     all-tpus 
      
     - 
      
     adminAccess 
     : 
      
     null 
      
     device 
     : 
      
     "1" 
      
     driver 
     : 
      
     tpu.google.com 
      
     pool 
     : 
      
     gke-tpu-2ec29193-bcc0 
      
     request 
     : 
      
     all-tpus 
      
     - 
      
     adminAccess 
     : 
      
     null 
      
     device 
     : 
      
     "2" 
      
     driver 
     : 
      
     tpu.google.com 
      
     pool 
     : 
      
     gke-tpu-2ec29193-bcc0 
      
     request 
     : 
      
     all-tpus 
      
     - 
      
     adminAccess 
     : 
      
     null 
      
     device 
     : 
      
     "3" 
      
     driver 
     : 
      
     tpu.google.com 
      
     pool 
     : 
      
     gke-tpu-2ec29193-bcc0 
      
     request 
     : 
      
     all-tpus 
      
     - 
      
     adminAccess 
     : 
      
     null 
      
     device 
     : 
      
     "4" 
      
     driver 
     : 
      
     tpu.google.com 
      
     pool 
     : 
      
     gke-tpu-2ec29193-bcc0 
      
     request 
     : 
      
     all-tpus 
      
     - 
      
     adminAccess 
     : 
      
     null 
      
     device 
     : 
      
     "5" 
      
     driver 
     : 
      
     tpu.google.com 
      
     pool 
     : 
      
     gke-tpu-2ec29193-bcc0 
      
     request 
     : 
      
     all-tpus 
      
     - 
      
     adminAccess 
     : 
      
     null 
      
     device 
     : 
      
     "6" 
      
     driver 
     : 
      
     tpu.google.com 
      
     pool 
     : 
      
     gke-tpu-2ec29193-bcc0 
      
     request 
     : 
      
     all-tpus 
      
     - 
      
     adminAccess 
     : 
      
     null 
      
     device 
     : 
      
     "7" 
      
     driver 
     : 
      
     tpu.google.com 
      
     pool 
     : 
      
     gke-tpu-2ec29193-bcc0 
      
     request 
     : 
      
     all-tpus 
      
     nodeSelector 
     : 
      
     nodeSelectorTerms 
     : 
      
     - 
      
     matchFields 
     : 
      
     - 
      
     key 
     : 
      
     metadata.name 
      
     operator 
     : 
      
     In 
      
     values 
     : 
      
     - 
      
     gke-tpu-2ec29193-bcc0 
      
     reservedFor 
     : 
      
     - 
      
     name 
     : 
      
     dra-tpu-example-59b8785697-k9kzd 
      
     resource 
     : 
      
     pods 
      
     uid 
     : 
      
     c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f 
     
    
  3. Get logs for the workload that you deployed:

     kubectl  
    logs  
    deployment/dra-tpu-example  
    --all-pods = 
     true 
      
     | 
      
    grep  
     "TPU" 
     
    

    The output is similar to the following:

      [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_CHIPS_PER_HOST_BOUNDS 
     = 
     2 
    ,4,1 [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_TOPOLOGY_WRAP 
     = 
    false,false,false [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_SKIP_MDS_QUERY 
     = 
     true 
     [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_RUNTIME_METRICS_PORTS 
     = 
     8431 
    ,8432,8433,8434,8435,8436,8437,8438 [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_WORKER_ID 
     = 
     0 
     [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_WORKER_HOSTNAMES 
     = 
    localhost [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_TOPOLOGY 
     = 
    2x4 [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_ACCELERATOR_TYPE 
     = 
    v6e-8 [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_HOST_BOUNDS 
     = 
     1 
    ,1,1 [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_TOPOLOGY_ALT 
     = 
     false 
     [ 
    pod/dra-tpu-example-59b8785697-tm2lc/ctr ] 
      
     TPU_DEVICE_0_RESOURCE_CLAIM 
     = 
    77e68f15-fa2f-4109-9a14-6c91da1a38d3 
    

    The output of these steps indicates that all of the TPUs in a node pool were allocated to the Pod.

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: