Run NCCL on custom GKE clusters that use A3 Mega or A3 High

This page describes how to run NVIDIA Collective Communications Library (NCCL) tests on custom GKE clusters that use GPUDirect-TCPXO and GPUDirect-TCPX networking protocols. A custom GKE cluster is a cluster that you create by using gcloud commands.

You can use the tests that are described on this page for the following scenarios:

If your GKE cluster uses Flex-start nodes, then use a basic test on two nodes .
If your GKE cluster uses different types of nodes such as on-demand or reservation-bound nodes, then use an NCCL test with Topology Aware Scheduling .

Before you begin

The tests on this page use JobSet and Kueue with Topology Aware Scheduling (TAS) . Before running any tests, you must set up your cluster and do the following:

Install JobSet .

Install Kueue.

 kubectl  
apply  
--server-side  
-f  
https://github.com/kubernetes-sigs/kueue/releases/download/v0.16.5/manifests.yaml

Set up your cluster with Jobset and Kueue

After you install JobSet and Kueue, take the following steps:

Save the following manifest as kueue-config.yaml :

A3 High

  apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 Topology 
 metadata 
 : 
  
 name 
 : 
  
 "gke-default" 
 spec 
 : 
  
 levels 
 : 
  
 - 
  
 nodeLabel 
 : 
  
 "cloud.google.com/gce-topology-block" 
  
 - 
  
 nodeLabel 
 : 
  
 "cloud.google.com/gce-topology-subblock" 
  
 - 
  
 nodeLabel 
 : 
  
 "cloud.google.com/gce-topology-host" 
  
 - 
  
 nodeLabel 
 : 
  
 "kubernetes.io/hostname" 
 --- 
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 ResourceFlavor 
 metadata 
 : 
  
 name 
 : 
  
 a3-high-flavor 
 spec 
 : 
  
 nodeLabels 
 : 
  
 cloud.google.com/gke-accelerator 
 : 
  
 nvidia-h100-80gb 
  
 topologyName 
 : 
  
 "gke-default" 
 --- 
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 ResourceFlavor 
 metadata 
 : 
  
 name 
 : 
  
 a3-high-dws-flavor 
 spec 
 : 
  
 nodeLabels 
 : 
  
 cloud.google.com/gke-accelerator 
 : 
  
 nvidia-h100-80gb 
  
 topologyName 
 : 
  
 "gke-default" 
  
 tolerations 
 : 
  
 - 
  
 key 
 : 
  
 "cloud.google.com/gke-queued" 
  
 operator 
 : 
  
 "Exists" 
  
 effect 
 : 
  
 NoSchedule 
 --- 
  
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 AdmissionCheck 
 metadata 
 : 
  
 name 
 : 
  
 dws-prov 
 spec 
 : 
  
 controllerName 
 : 
  
 kueue.x-k8s.io/provisioning-request 
  
 parameters 
 : 
  
 apiGroup 
 : 
  
 kueue.x-k8s.io 
  
 kind 
 : 
  
 ProvisioningRequestConfig 
  
 name 
 : 
  
 dws-config 
 --- 
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 ProvisioningRequestConfig 
 metadata 
 : 
  
 name 
 : 
  
 dws-config 
 spec 
 : 
  
 provisioningClassName 
 : 
  
 queued-provisioning.gke.io 
  
 podSetUpdates 
 : 
  
 - 
  
 key 
 : 
  
 autoscaling.gke.io/provisioning-request 
  
 valueFromProvisioningClassDetail 
 : 
  
 ResizeRequestName 
  
 managedResources 
 : 
  
 - 
  
 nvidia.com/gpu 
 --- 
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 ClusterQueue 
 metadata 
 : 
  
 name 
 : 
  
 cq-tas 
 spec 
 : 
  
 namespaceSelector 
 : 
  
 {} 
  
 clusterQueueingStrategy 
 : 
  
 BestEffortFIFO 
  
 resourceGroups 
 : 
  
 - 
  
 flavors 
 : 
  
 - 
  
 name 
 : 
  
 a3-high-flavor 
  
 resources 
 : 
  
 - 
  
 name 
 : 
  
 "cpu" 
  
 nominalQuota 
 : 
  
 1000 
  
 - 
  
 name 
 : 
  
 "memory" 
  
 nominalQuota 
 : 
  
 1000Ti 
  
 - 
  
 name 
 : 
  
 "nvidia.com/gpu" 
  
 nominalQuota 
 : 
  
 1000 
  
 - 
  
 name 
 : 
  
 a3-high-dws-flavor 
  
 resources 
 : 
  
 - 
  
 name 
 : 
  
 "cpu" 
  
 nominalQuota 
 : 
  
 1000 
  
 - 
  
 name 
 : 
  
 "memory" 
  
 nominalQuota 
 : 
  
 1000Ti 
  
 - 
  
 name 
 : 
  
 "nvidia.com/gpu" 
  
 nominalQuota 
 : 
  
 1000 
  
 admissionChecksStrategy 
 : 
  
 admissionChecks 
 : 
  
 - 
  
 name 
 : 
  
 "dws-prov" 
  
 onFlavors 
 : 
  
 [ 
 a3-high-dws-flavor 
 ] 
 --- 
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 LocalQueue 
 metadata 
 : 
  
 namespace 
 : 
  
 default 
  
 name 
 : 
  
 lq-tas 
 spec 
 : 
  
 clusterQueue 
 : 
  
 cq-tas

A3 Mega

  apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 Topology 
 metadata 
 : 
  
 name 
 : 
  
 "gke-default" 
 spec 
 : 
  
 levels 
 : 
  
 - 
  
 nodeLabel 
 : 
  
 "cloud.google.com/gce-topology-block" 
  
 - 
  
 nodeLabel 
 : 
  
 "cloud.google.com/gce-topology-subblock" 
  
 - 
  
 nodeLabel 
 : 
  
 "cloud.google.com/gce-topology-host" 
  
 - 
  
 nodeLabel 
 : 
  
 "kubernetes.io/hostname" 
 --- 
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 ResourceFlavor 
 metadata 
 : 
  
 name 
 : 
  
 a3-mega-flavor 
 spec 
 : 
  
 nodeLabels 
 : 
  
 cloud.google.com/gke-accelerator 
 : 
  
 nvidia-h100-mega-80gb 
  
 topologyName 
 : 
  
 "gke-default" 
 --- 
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 ResourceFlavor 
 metadata 
 : 
  
 name 
 : 
  
 a3-mega-dws-flavor 
 spec 
 : 
  
 nodeLabels 
 : 
  
 cloud.google.com/gke-accelerator 
 : 
  
 nvidia-h100-mega-80gb 
  
 topologyName 
 : 
  
 "gke-default" 
  
 tolerations 
 : 
  
 - 
  
 key 
 : 
  
 "cloud.google.com/gke-queued" 
  
 operator 
 : 
  
 "Exists" 
  
 effect 
 : 
  
 NoSchedule 
 --- 
  
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 AdmissionCheck 
 metadata 
 : 
  
 name 
 : 
  
 dws-prov 
 spec 
 : 
  
 controllerName 
 : 
  
 kueue.x-k8s.io/provisioning-request 
  
 parameters 
 : 
  
 apiGroup 
 : 
  
 kueue.x-k8s.io 
  
 kind 
 : 
  
 ProvisioningRequestConfig 
  
 name 
 : 
  
 dws-config 
 --- 
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 ProvisioningRequestConfig 
 metadata 
 : 
  
 name 
 : 
  
 dws-config 
 spec 
 : 
  
 provisioningClassName 
 : 
  
 queued-provisioning.gke.io 
  
 podSetUpdates 
 : 
  
 - 
  
 key 
 : 
  
 autoscaling.gke.io/provisioning-request 
  
 valueFromProvisioningClassDetail 
 : 
  
 ResizeRequestName 
  
 managedResources 
 : 
  
 - 
  
 nvidia.com/gpu 
 --- 
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 ClusterQueue 
 metadata 
 : 
  
 name 
 : 
  
 cq-tas 
 spec 
 : 
  
 namespaceSelector 
 : 
  
 {} 
  
 clusterQueueingStrategy 
 : 
  
 BestEffortFIFO 
  
 resourceGroups 
 : 
  
 - 
  
 flavors 
 : 
  
 - 
  
 name 
 : 
  
 a3-mega-flavor 
  
 resources 
 : 
  
 - 
  
 name 
 : 
  
 "cpu" 
  
 nominalQuota 
 : 
  
 1000 
  
 - 
  
 name 
 : 
  
 "memory" 
  
 nominalQuota 
 : 
  
 1000Ti 
  
 - 
  
 name 
 : 
  
 "nvidia.com/gpu" 
  
 nominalQuota 
 : 
  
 1000 
  
 - 
  
 name 
 : 
  
 a3-mega-dws-flavor 
  
 resources 
 : 
  
 - 
  
 name 
 : 
  
 "cpu" 
  
 nominalQuota 
 : 
  
 1000 
  
 - 
  
 name 
 : 
  
 "memory" 
  
 nominalQuota 
 : 
  
 1000Ti 
  
 - 
  
 name 
 : 
  
 "nvidia.com/gpu" 
  
 nominalQuota 
 : 
  
 1000 
  
 admissionChecksStrategy 
 : 
  
 admissionChecks 
 : 
  
 - 
  
 name 
 : 
  
 "dws-prov" 
  
 onFlavors 
 : 
  
 [ 
 a3-mega-dws-flavor 
 ] 
 --- 
 apiVersion 
 : 
  
 kueue.x-k8s.io/v1beta2 
 kind 
 : 
  
 LocalQueue 
 metadata 
 : 
  
 namespace 
 : 
  
 default 
  
 name 
 : 
  
 lq-tas 
 spec 
 : 
  
 clusterQueue 
 : 
  
 cq-tas

Apply the manifest:

 kubectl  
apply  
-f  
kueue-config.yaml

When running workloads with TAS enabled, you can specify how strictly topology constraints are enforced by using one of the following annotations in your workload manifest:

kueue.x-k8s.io/podset-required-topology : If you use this annotation, Kueue blocks scheduling until the workload can be scheduled within the requested topology constraint. Use this annotation to ensure that pods are placed together for optimal performance.
kueue.x-k8s.io/podset-preferred-topology : If you use this annotation, Kueue attempts to schedule pods within the requested topology constraint, but if that's not possible, it admits the workload without meeting topology constraints.

Note:Avoid using the required mode with DWS Flex-start. Because Flex-start provisions nodes dynamically, the resulting nodes might not satisfy strict topological requirements, which can result in unschedulable workloads. For these configurations, use podset-preferred-topology instead.

For either annotation, specify one of the following values as the topology constraint:

cloud.google.com/gce-topology-block : Schedules pods within the same network block.
cloud.google.com/gce-topology-subblock : Schedules pods within the same rack.
cloud.google.com/gce-topology-host : Schedules pods on the same physical host.

Test on two Flex-start nodes

To run NCCL tests on a GKE cluster that uses A3 Mega or A3 High Flex-start VMs, use the following procedure. This procedure uses a JobSet manifest to run an NCCL test on two nodes.

Save the following manifest as nccl-tas-jobset.yaml :

A3 Mega

  apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 ConfigMap 
 metadata 
 : 
  
 name 
 : 
  
 nccl-configmap 
 data 
 : 
  
 allgather.sh 
 : 
  
 | 
  
 #!/bin/bash 
  
 service ssh restart; 
  
 /scripts/init_ssh.sh ${@}; 
  
 pushd /scripts; 
  
 /scripts/gen_hostfiles.sh ${@}; 
  
 popd; 
  
 # Set up environment variables for GPUDirect-TCPXO 
  
 export LD_LIBRARY_PATH=/usr/local/nvidia/lib64 
  
 export NCCL_FASTRAK_CTRL_DEV=eth0 
  
 export NCCL_FASTRAK_IFNAME=eth1,eth2,eth3,eth4,eth5,eth6,eth7,eth8 
  
 export NCCL_SOCKET_IFNAME=eth0 
  
 export NCCL_CROSS_NIC=0 
  
 export NCCL_ALGO=Ring,Tree 
  
 export NCCL_PROTO=Simple 
  
 export NCCL_NET_GDR_LEVEL=PIX 
  
 # Run the benchmark 
  
 /scripts/demo-run-nccl-test-tcpxo-via-mpi.sh 
 --- 
 apiVersion 
 : 
  
 jobset.x-k8s.io/v1alpha2 
 kind 
 : 
  
 JobSet 
 metadata 
 : 
  
 name 
 : 
  
 nccl-tas-test 
  
 labels 
 : 
  
 kueue.x-k8s.io/queue-name 
 : 
  
 lq-tas 
 spec 
 : 
  
 ttlSecondsAfterFinished 
 : 
  
 1200 
  
 suspend 
 : 
  
 true 
  
 network 
 : 
  
 enableDNSHostnames 
 : 
  
 true 
  
 replicatedJobs 
 : 
  
 - 
  
 name 
 : 
  
 worker 
  
 replicas 
 : 
  
 2 
  
 template 
 : 
  
 spec 
 : 
  
 parallelism 
 : 
  
 1 
  
 completions 
 : 
  
 1 
  
 template 
 : 
  
 metadata 
 : 
  
 annotations 
 : 
  
 kueue.x-k8s.io/podset-preferred-topology 
 : 
  
 "cloud.google.com/gce-topology-block" 
  
 networking.gke.io/default-interface 
 : 
  
 'eth0' 
  
 networking.gke.io/interfaces 
 : 
  
 | 
  
 [ 
  
 {"interfaceName":"eth0","network":"default"}, 
  
 {"interfaceName":"eth1","network":"vpc0"}, 
  
 {"interfaceName":"eth2","network":"vpc1"}, 
  
 {"interfaceName":"eth3","network":"vpc2"}, 
  
 {"interfaceName":"eth4","network":"vpc3"}, 
  
 {"interfaceName":"eth5","network":"vpc4"}, 
  
 {"interfaceName":"eth6","network":"vpc5"}, 
  
 {"interfaceName":"eth7","network":"vpc6"}, 
  
 {"interfaceName":"eth8","network":"vpc7"} 
  
 ] 
  
 spec 
 : 
  
 activeDeadlineSeconds 
 : 
  
 3600 
  
 restartPolicy 
 : 
  
 Never 
  
 nodeSelector 
 : 
  
 cloud.google.com/gke-accelerator 
 : 
  
 nvidia-h100-mega-80gb 
  
 tolerations 
 : 
  
 - 
  
 key 
 : 
  
 cloud.google.com/gke-queued 
  
 effect 
 : 
  
 NoSchedule 
  
 value 
 : 
  
 "true" 
  
 - 
  
 key 
 : 
  
 "nvidia.com/gpu" 
  
 operator 
 : 
  
 "Exists" 
  
 effect 
 : 
  
 "NoSchedule" 
  
 setHostnameAsFQDN 
 : 
  
 true 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 nvidia 
  
 hostPath 
 : 
  
 path 
 : 
  
 /home/kubernetes/bin/nvidia 
  
 - 
  
 name 
 : 
  
 lib64 
  
 hostPath 
 : 
  
 path 
 : 
  
 /lib64 
  
 - 
  
 name 
 : 
  
 proc 
  
 hostPath 
 : 
  
 path 
 : 
  
 /proc 
  
 - 
  
 name 
 : 
  
 shared-memory 
  
 emptyDir 
 : 
  
 medium 
 : 
  
 "Memory" 
  
 sizeLimit 
 : 
  
 250Gi 
  
 - 
  
 name 
 : 
  
 nccl-config 
  
 configMap 
 : 
  
 name 
 : 
  
 nccl-configmap 
  
 defaultMode 
 : 
  
 0755 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 nccl-test 
  
 image 
 : 
  
 us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/nccl-plugin-gpudirecttcpx-dev:v1.0.15 
  
 stdin 
 : 
  
 true 
  
 tty 
 : 
  
 true 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 env 
 : 
  
 - 
  
 name 
 : 
  
 LD_LIBRARY_PATH 
  
 value 
 : 
  
 /usr/local/nvidia/lib64 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 nvidia 
  
 mountPath 
 : 
  
 /usr/local/nvidia 
  
 - 
  
 name 
 : 
  
 shared-memory 
  
 mountPath 
 : 
  
 /dev/shm 
  
 - 
  
 name 
 : 
  
 nccl-config 
  
 mountPath 
 : 
  
 /configs 
  
 resources 
 : 
  
 limits 
 : 
  
 cpu 
 : 
  
 "200" 
  
 memory 
 : 
  
 "3700Gi" 
  
 nvidia.com/gpu 
 : 
  
 8 
  
 requests 
 : 
  
 cpu 
 : 
  
 "200" 
  
 memory 
 : 
  
 "3700Gi" 
  
 nvidia.com/gpu 
 : 
  
 8 
  
 - 
  
 name 
 : 
  
 tcpxo-daemon 
  
 image 
 : 
  
 us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/tcpgpudmarxd-dev:v1.0.21 
  
 imagePullPolicy 
 : 
  
 Always 
  
 command 
 : 
  
 [ 
 "/bin/sh" 
 , 
  
 "-c" 
 ] 
  
 args 
 : 
  
 - 
  
 | 
  
 set -ex 
  
 chmod 755 /fts/entrypoint_rxdm_container.sh 
  
 /fts/entrypoint_rxdm_container.sh --num_hops=2 --num_nics=8 --uid= --alsologtostderr 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 capabilities 
 : 
  
 add 
 : 
  
 - 
  
 NET_ADMIN 
  
 - 
  
 NET_BIND_SERVICE 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 nvidia 
  
 mountPath 
 : 
  
 /usr/local/nvidia/lib64 
  
 - 
  
 name 
 : 
  
 proc 
  
 mountPath 
 : 
  
 /proc 
  
 env 
 : 
  
 - 
  
 name 
 : 
  
 LD_LIBRARY_PATH 
  
 value 
 : 
  
 /usr/local/nvidia/lib64

A3 High

  apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 ConfigMap 
 metadata 
 : 
  
 name 
 : 
  
 nccl-config 
 data 
 : 
  
 allgather.sh 
 : 
  
 | 
  
 #!/bin/bash 
  
 for script in /configs/*; do 
  
 name=$(basename $script) 
  
 cp $script "/scripts/$name" 
  
 chmod +x "/scripts/$name" 
  
 done 
  
 /scripts/init_ssh.sh ${@}; 
  
 pushd /scripts; 
  
 /scripts/gen_hostfiles.sh ${@}; 
  
 popd; 
  
 /scripts/run-allgather.sh 8 eth1,eth2,eth3,eth4 1M 512M ${#}; 
 --- 
 apiVersion 
 : 
  
 jobset.x-k8s.io/v1alpha2 
 kind 
 : 
  
 JobSet 
 metadata 
 : 
  
 name 
 : 
  
 nccl-tas-test 
  
 labels 
 : 
  
 kueue.x-k8s.io/queue-name 
 : 
  
 lq-tas 
 spec 
 : 
  
 suspend 
 : 
  
 true 
  
 network 
 : 
  
 enableDNSHostnames 
 : 
  
 true 
  
 replicatedJobs 
 : 
  
 - 
  
 name 
 : 
  
 worker 
  
 replicas 
 : 
  
 2 
  
 template 
 : 
  
 spec 
 : 
  
 parallelism 
 : 
  
 1 
  
 completions 
 : 
  
 1 
  
 template 
 : 
  
 metadata 
 : 
  
 annotations 
 : 
  
 kueue.x-k8s.io/podset-preferred-topology 
 : 
  
 "cloud.google.com/gce-topology-block" 
  
 networking.gke.io/default-interface 
 : 
  
 'eth0' 
  
 networking.gke.io/interfaces 
 : 
  
 | 
  
 [ 
  
 {"interfaceName":"eth0","network":"default"}, 
  
 {"interfaceName":"eth1","network":"vpc0"}, 
  
 {"interfaceName":"eth2","network":"vpc1"}, 
  
 {"interfaceName":"eth3","network":"vpc2"}, 
  
 {"interfaceName":"eth4","network":"vpc3"} 
  
 ] 
  
 spec 
 : 
  
 terminationGracePeriodSeconds 
 : 
  
 0 
  
 nodeSelector 
 : 
  
 cloud.google.com/gke-accelerator 
 : 
  
 nvidia-h100-80gb 
  
 tolerations 
 : 
  
 - 
  
 key 
 : 
  
 cloud.google.com/gke-queued 
  
 effect 
 : 
  
 NoSchedule 
  
 value 
 : 
  
 "true" 
  
 - 
  
 key 
 : 
  
 "nvidia.com/gpu" 
  
 operator 
 : 
  
 "Exists" 
  
 effect 
 : 
  
 "NoSchedule" 
  
 setHostnameAsFQDN 
 : 
  
 true 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 tcpx-daemon 
  
 image 
 : 
  
 us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpx/tcpgpudmarxd-dev:v2.0.11 
  
 command 
 : 
  
 - 
  
 /tcpgpudmarxd/build/app/tcpgpudmarxd 
  
 - 
  
 --gpu_nic_preset 
  
 - 
  
 a3vm 
  
 - 
  
 --gpu_shmem_type 
  
 - 
  
 fd 
  
 - 
  
 --uds_path 
  
 - 
  
 /run/tcpx 
  
 - 
  
 --setup_param 
  
 - 
  
 "--verbose 
  
 128 
  
 2 
  
 0 
  
 " 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 capabilities 
 : 
  
 add 
 : 
  
 - 
  
 NET_ADMIN 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 libraries 
  
 mountPath 
 : 
  
 /usr/local/nvidia/lib64 
  
 - 
  
 name 
 : 
  
 tcpx-socket 
  
 mountPath 
 : 
  
 /run/tcpx 
  
 - 
  
 name 
 : 
  
 sys 
  
 mountPath 
 : 
  
 /hostsysfs 
  
 - 
  
 name 
 : 
  
 proc-sys 
  
 mountPath 
 : 
  
 /hostprocsysfs 
  
 env 
 : 
  
 - 
  
 name 
 : 
  
 LD_LIBRARY_PATH 
  
 value 
 : 
  
 /usr/local/nvidia/lib64 
  
 - 
  
 name 
 : 
  
 nccl-test 
  
 image 
 : 
  
 us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpx/nccl-plugin-gpudirecttcpx-dev:v3.1.8 
  
 command 
 : 
  
 - 
  
 bash 
  
 - 
  
 -c 
  
 - 
  
 | 
  
 /scripts/container_entry.sh daemon; 
  
 sleep infinity; 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 tcpx-socket 
  
 mountPath 
 : 
  
 /tmp 
  
 - 
  
 name 
 : 
  
 libraries 
  
 mountPath 
 : 
  
 /usr/local/nvidia/lib64 
  
 - 
  
 name 
 : 
  
 nccl-config 
  
 mountPath 
 : 
  
 /configs 
  
 - 
  
 name 
 : 
  
 shared-memory 
  
 mountPath 
 : 
  
 /dev/shm 
  
 resources 
 : 
  
 limits 
 : 
  
 cpu 
 : 
  
 "200" 
  
 memory 
 : 
  
 "1800Gi" 
  
 nvidia.com/gpu 
 : 
  
 8 
  
 requests 
 : 
  
 cpu 
 : 
  
 "200" 
  
 memory 
 : 
  
 "1800Gi" 
  
 nvidia.com/gpu 
 : 
  
 8 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 libraries 
  
 hostPath 
 : 
  
 path 
 : 
  
 /home/kubernetes/bin/nvidia/lib64 
  
 - 
  
 name 
 : 
  
 tcpx-socket 
  
 emptyDir 
 : 
  
 {} 
  
 - 
  
 name 
 : 
  
 sys 
  
 hostPath 
 : 
  
 path 
 : 
  
 /sys 
  
 - 
  
 name 
 : 
  
 proc-sys 
  
 hostPath 
 : 
  
 path 
 : 
  
 /proc/sys 
  
 - 
  
 name 
 : 
  
 shared-memory 
  
 emptyDir 
 : 
  
 medium 
 : 
  
 Memory 
  
 sizeLimit 
 : 
  
 250Gi 
  
 - 
  
 name 
 : 
  
 nccl-config 
  
 configMap 
 : 
  
 name 
 : 
  
 nccl-config 
  
 defaultMode 
 : 
  
 0777

Apply the manifest to your cluster:

 kubectl  
apply  
-f  
nccl-tas-jobset.yaml

Check that the JobSet is admitted and running:
```
 kubectl  
get  
jobset  
nccl-tas-test 
```
Wait for the JobSet to be unsuspended and Pods to reach the Running status.

Trigger the NCCL test by executing the allgather.sh script from the first worker Pod:

 kubectl  
 exec 
  
--stdin  
--tty  
--container = 
nccl-test  
nccl-tas-test-worker-0-0  
--  
/configs/allgather.sh  
nccl-tas-test-worker-0-0  
nccl-tas-test-worker-1-0

The output for a two-node test is similar to the following:

A3 Mega

 #                                                              out-of-place                       in-place
#        size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#         (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
    0                 0         float    none      -1     0.24    0.00    0.00      0     0.18    0.00    0.00      0
    ...
    8589934592     134217728    float    none      -1    42603  201.63  189.03      0    42670  201.31  188.73      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 45.7587

A3 High

 #                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
    1048576         16384     float    none      -1    696.8    1.50    1.41      0    729.0    1.44    1.35      0
    ...
    536870912       8388608     float    none      -1   7101.7   75.60   70.87      0   7060.9   76.03   71.28      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 29.8293

Deploy an NCCL test workload with TAS

If you have more than two nodes, we recommend using the following test, which uses Topology Aware Scheduling (TAS). To run NCCL tests with TAS on a GKE cluster that uses A3 Mega or A3 High Flex-start VMs, use the following procedure.

Save the following manifest as nccl-jobset-test.yaml . Replace NUM_NODES with the number of nodes in the node pool:

A3 Mega

  apiVersion 
 : 
  
 jobset.x-k8s.io/v1alpha2 
 kind 
 : 
  
 JobSet 
 metadata 
 : 
  
 name 
 : 
  
 nccl-ag 
  
 labels 
 : 
  
 kueue.x-k8s.io/queue-name 
 : 
  
 lq-tas 
 spec 
 : 
  
 ttlSecondsAfterFinished 
 : 
  
 1200 
  
 suspend 
 : 
  
 true 
  
 network 
 : 
  
 enableDNSHostnames 
 : 
  
 true 
  
 replicatedJobs 
 : 
  
 - 
  
 name 
 : 
  
 worker 
  
 template 
 : 
  
 spec 
 : 
  
 parallelism 
 : 
  
  NUM_NODES 
 
  
 completions 
 : 
  
  NUM_NODES 
 
  
 template 
 : 
  
 metadata 
 : 
  
 annotations 
 : 
  
 kueue.x-k8s.io/podset-preferred-topology 
 : 
  
 "cloud.google.com/gce-topology-subblock" 
  
 networking.gke.io/default-interface 
 : 
  
 'eth0' 
  
 networking.gke.io/interfaces 
 : 
  
 | 
  
 [ 
  
 {"interfaceName":"eth0","network":"default"}, 
  
 {"interfaceName":"eth1","network":"vpc0"}, 
  
 {"interfaceName":"eth2","network":"vpc1"}, 
  
 {"interfaceName":"eth3","network":"vpc2"}, 
  
 {"interfaceName":"eth4","network":"vpc3"}, 
  
 {"interfaceName":"eth5","network":"vpc4"}, 
  
 {"interfaceName":"eth6","network":"vpc5"}, 
  
 {"interfaceName":"eth7","network":"vpc6"}, 
  
 {"interfaceName":"eth8","network":"vpc7"} 
  
 ] 
  
 spec 
 : 
  
 activeDeadlineSeconds 
 : 
  
 3600 
  
 restartPolicy 
 : 
  
 Never 
  
 nodeSelector 
 : 
  
 cloud.google.com/gke-accelerator 
 : 
  
 nvidia-h100-mega-80gb 
  
 tolerations 
 : 
  
 - 
  
 key 
 : 
  
 "nvidia.com/gpu" 
  
 operator 
 : 
  
 "Exists" 
  
 effect 
 : 
  
 "NoSchedule" 
  
 setHostnameAsFQDN 
 : 
  
 true 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 proc 
  
 hostPath 
 : 
  
 path 
 : 
  
 /proc 
  
 - 
  
 name 
 : 
  
 nvidia 
  
 hostPath 
 : 
  
 path 
 : 
  
 /home/kubernetes/bin/nvidia 
  
 - 
  
 name 
 : 
  
 lib64 
  
 hostPath 
 : 
  
 path 
 : 
  
 /lib64 
  
 - 
  
 name 
 : 
  
 shared-memory 
  
 emptyDir 
 : 
  
 medium 
 : 
  
 "Memory" 
  
 sizeLimit 
 : 
  
 250Gi 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 nccl-test 
  
 stdin 
 : 
  
 true 
  
 tty 
 : 
  
 true 
  
 image 
 : 
  
 us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/nccl-plugin-tcpxo-diagnostic:v1.0.6 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 env 
 : 
  
 - 
  
 name 
 : 
  
 MY_NODE_NAME 
  
 valueFrom 
 : 
  
 fieldRef 
 : 
  
 fieldPath 
 : 
  
 spec.nodeName 
  
 - 
  
 name 
 : 
  
 OMPI_ALLOW_RUN_AS_ROOT 
  
 value 
 : 
  
 "1" 
  
 - 
  
 name 
 : 
  
 OMPI_ALLOW_RUN_AS_ROOT_CONFIRM 
  
 value 
 : 
  
 "1" 
  
 - 
  
 name 
 : 
  
 N_NODES 
  
 value 
 : 
  
 " NUM_NODES 
" 
  
 - 
  
 name 
 : 
  
 NCCL_SOCKET_IFNAME 
  
 value 
 : 
  
 eth0 
  
 - 
  
 name 
 : 
  
 NCCL_FASTRAK_CTRL_DEV 
  
 value 
 : 
  
 eth0 
  
 - 
  
 name 
 : 
  
 NCCL_FASTRAK_IFNAME 
  
 value 
 : 
  
 eth1,eth2,eth3,eth4,eth5,eth6,eth7,eth8 
  
 - 
  
 name 
 : 
  
 NCCL_CROSS_NIC 
  
 value 
 : 
  
 "0" 
  
 - 
  
 name 
 : 
  
 NCCL_ALGO 
  
 value 
 : 
  
 Ring,Tree 
  
 - 
  
 name 
 : 
  
 NCCL_PROTO 
  
 value 
 : 
  
 Simple 
  
 - 
  
 name 
 : 
  
 NCCL_NET_GDR_LEVEL 
  
 value 
 : 
  
 PIX 
  
 - 
  
 name 
 : 
  
 LD_LIBRARY_PATH 
  
 value 
 : 
  
 /usr/local/nvidia/lib64 
  
 command 
 : 
  
 - 
  
 bash 
  
 - 
  
 -c 
  
 - 
  
 | 
  
 set -x 
  
 /scripts/container_entry.sh daemon 
&  
 export POSTFIX=$(hostname | cut -d . -f 2-) 
  
 export WORKERS_BASENAME=$(hostname | cut -d . -f 1 | rev | cut -d - -f 2- | rev ) 
  
 export NODE_RANK=$JOB_COMPLETION_INDEX 
  
 for i in `seq 0 $(($N_NODES-1))`; do 
  
 OTHER=<span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="mord mathnormal" style="margin-right:0.00773em;">OR</span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="mord mathnormal" style="margin-right:0.00773em;">ER</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05764em;">S</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0576em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05017em;">B</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">A</span><span class="mord mathnormal" style="margin-right:0.10903em;">SEN</span><span class="mord mathnormal">A</span><span class="mord mathnormal" style="margin-right:0.05764em;">ME</span></span><span class="mord">−</span></span></span></span>{i}.${POSTFIX} 
  
 until ssh -p 222 -o StrictHostKeyChecking=no $OTHER hostname; do 
  
 sleep 10 
  
 done 
  
 echo ${OTHER} port=222 slots=8 | tee -a /tmp/hostfile; 
  
 done 
  
 if [[ "${NODE_RANK}" -eq "0" ]]; then 
  
 export NCCL_TESTS_SPLIT_MASK="0x0"; 
  
 ENV_VARS=$(echo ${!NCCL*} ${!OMPI*} LD_LIBRARY_PATH PATH | sed 's/ / -x /g') 
  
 mpirun --hostfile /tmp/hostfile \ 
  
 -x $ENV_VARS  \ 
  
 -mca plm_rsh_no_tree_spawn 1 \ 
  
 --mca orte_keep_fqdn_hostnames 1 \ 
  
 --mca btl self,tcp \ 
  
 --mca btl_tcp_if_include eth0 \ 
  
 --bind-to none \ 
  
 --mca plm_rsh_agent "ssh -q -o LogLevel=ERROR -o StrictHostKeyChecking=no -p 222" \ 
  
 /third_party/nccl-tests/build/all_gather_perf -b 1K -e 8G -f 2 -g 1 -w 5 --iters 100 -c 1 
  
 else 
  
 while ping -c 1 <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="mord mathnormal" style="margin-right:0.00773em;">OR</span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="mord mathnormal" style="margin-right:0.00773em;">ER</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05764em;">S</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0576em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05017em;">B</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">A</span><span class="mord mathnormal" style="margin-right:0.10903em;">SEN</span><span class="mord mathnormal">A</span><span class="mord mathnormal" style="margin-right:0.05764em;">ME</span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.</span></span></span></span>{POSTFIX}; do 
  
 sleep 5 
  
 done 
  
 fi 
  
 exit 0 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 nvidia 
  
 mountPath 
 : 
  
 /usr/local/nvidia 
  
 - 
  
 name 
 : 
  
 lib64 
  
 mountPath 
 : 
  
 /lib64 
  
 - 
  
 name 
 : 
  
 shared-memory 
  
 mountPath 
 : 
  
 /dev/shm 
  
 resources 
 : 
  
 limits 
 : 
  
 cpu 
 : 
  
 "200" 
  
 memory 
 : 
  
 "3700Gi" 
  
 nvidia.com/gpu 
 : 
  
 8 
  
 requests 
 : 
  
 cpu 
 : 
  
 "200" 
  
 memory 
 : 
  
 "3700Gi" 
  
 nvidia.com/gpu 
 : 
  
 8 
  
 - 
  
 name 
 : 
  
 tcpxo-daemon 
  
 image 
 : 
  
 us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/tcpxo-daemon:v1.0.1 
  
 imagePullPolicy 
 : 
  
 Always 
  
 command 
 : 
  
 - 
  
 bash 
  
 - 
  
 -c 
  
 - 
  
 | 
  
 /usr/bin/tcpxo_daemon 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 nvidia 
  
 mountPath 
 : 
  
 /usr/local/nvidia 
  
 - 
  
 name 
 : 
  
 proc 
  
 mountPath 
 : 
  
 /proc 
  
 env 
 : 
  
 - 
  
 name 
 : 
  
 LD_LIBRARY_PATH 
  
 value 
 : 
  
 /usr/local/nvidia/lib64

A3 High

  apiVersion 
 : 
  
 jobset.x-k8s.io/v1alpha2 
 kind 
 : 
  
 JobSet 
 metadata 
 : 
  
 name 
 : 
  
 nccl-ag 
  
 labels 
 : 
  
 kueue.x-k8s.io/queue-name 
 : 
  
 lq-tas 
 spec 
 : 
  
 ttlSecondsAfterFinished 
 : 
  
 1200 
  
 suspend 
 : 
  
 true 
  
 network 
 : 
  
 enableDNSHostnames 
 : 
  
 true 
  
 replicatedJobs 
 : 
  
 - 
  
 name 
 : 
  
 worker 
  
 template 
 : 
  
 spec 
 : 
  
 parallelism 
 : 
  
  NUM_NODES 
 
  
 completions 
 : 
  
  NUM_NODES 
 
  
 template 
 : 
  
 metadata 
 : 
  
 annotations 
 : 
  
 kueue.x-k8s.io/podset-preferred-topology 
 : 
  
 "cloud.google.com/gce-topology-subblock" 
  
 networking.gke.io/default-interface 
 : 
  
 'eth0' 
  
 networking.gke.io/interfaces 
 : 
  
 | 
  
 [ 
  
 {"interfaceName":"eth0","network":"default"}, 
  
 {"interfaceName":"eth1","network":"vpc0"}, 
  
 {"interfaceName":"eth2","network":"vpc1"}, 
  
 {"interfaceName":"eth3","network":"vpc2"}, 
  
 {"interfaceName":"eth4","network":"vpc3"} 
  
 ] 
  
 spec 
 : 
  
 activeDeadlineSeconds 
 : 
  
 3600 
  
 restartPolicy 
 : 
  
 Never 
  
 nodeSelector 
 : 
  
 cloud.google.com/gke-accelerator 
 : 
  
 nvidia-h100-80gb 
  
 tolerations 
 : 
  
 - 
  
 key 
 : 
  
 "nvidia.com/gpu" 
  
 operator 
 : 
  
 "Exists" 
  
 effect 
 : 
  
 "NoSchedule" 
  
 setHostnameAsFQDN 
 : 
  
 true 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 proc 
  
 hostPath 
 : 
  
 path 
 : 
  
 /proc 
  
 - 
  
 name 
 : 
  
 nvidia 
  
 hostPath 
 : 
  
 path 
 : 
  
 /home/kubernetes/bin/nvidia 
  
 - 
  
 name 
 : 
  
 libraries 
  
 hostPath 
 : 
  
 path 
 : 
  
 /home/kubernetes/bin/nvidia/lib64 
  
 - 
  
 name 
 : 
  
 tcpx-socket 
  
 emptyDir 
 : 
  
 {} 
  
 - 
  
 name 
 : 
  
 shared-memory 
  
 emptyDir 
 : 
  
 medium 
 : 
  
 "Memory" 
  
 sizeLimit 
 : 
  
 250Gi 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 tcpx-daemon 
  
 image 
 : 
  
 us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpx/tcpgpudmarxd-dev:v2.0.11 
  
 command 
 : 
  
 - 
  
 /tcpgpudmarxd/build/app/tcpgpudmarxd 
  
 - 
  
 --gpu_nic_preset 
  
 - 
  
 a3vm 
  
 - 
  
 --uds_path 
  
 - 
  
 /run/tcpx 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 tcpx-socket 
  
 mountPath 
 : 
  
 /run/tcpx 
  
 - 
  
 name 
 : 
  
 libraries 
  
 mountPath 
 : 
  
 /usr/local/nvidia/lib64 
  
 - 
  
 name 
 : 
  
 nccl-test 
  
 stdin 
 : 
  
 true 
  
 tty 
 : 
  
 true 
  
 image 
 : 
  
 us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpx/nccl-plugin-gpudirecttcpx-dev:v3.1.8 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 env 
 : 
  
 - 
  
 name 
 : 
  
 MY_NODE_NAME 
  
 valueFrom 
 : 
  
 fieldRef 
 : 
  
 fieldPath 
 : 
  
 spec.nodeName 
  
 - 
  
 name 
 : 
  
 OMPI_ALLOW_RUN_AS_ROOT 
  
 value 
 : 
  
 "1" 
  
 - 
  
 name 
 : 
  
 OMPI_ALLOW_RUN_AS_ROOT_CONFIRM 
  
 value 
 : 
  
 "1" 
  
 - 
  
 name 
 : 
  
 N_NODES 
  
 value 
 : 
  
 " NUM_NODES 
" 
  
 - 
  
 name 
 : 
  
 LD_LIBRARY_PATH 
  
 value 
 : 
  
 /usr/local/nvidia/lib64 
  
 command 
 : 
  
 - 
  
 bash 
  
 - 
  
 -c 
  
 - 
  
 | 
  
 /scripts/container_entry.sh daemon 
&  
 export POSTFIX=$(hostname | cut -d . -f 2-) 
  
 export WORKERS_BASENAME=$(hostname | cut -d . -f 1 | rev | cut -d - -f 2- | rev ) 
  
 export NODE_RANK=$JOB_COMPLETION_INDEX 
  
 for i in `seq 0 $(($N_NODES-1))`; do 
  
 OTHER=<span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="mord mathnormal" style="margin-right:0.00773em;">OR</span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="mord mathnormal" style="margin-right:0.00773em;">ER</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05764em;">S</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0576em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05017em;">B</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">A</span><span class="mord mathnormal" style="margin-right:0.10903em;">SEN</span><span class="mord mathnormal">A</span><span class="mord mathnormal" style="margin-right:0.05764em;">ME</span></span><span class="mord">−</span></span></span></span>{i}.${POSTFIX} 
  
 until ssh -p 222 -o StrictHostKeyChecking=no $OTHER hostname; do 
  
 sleep 10 
  
 done 
  
 echo ${OTHER} port=222 slots=8 | tee -a /tmp/hostfile; 
  
 done 
  
 if [[ "${NODE_RANK}" -eq "0" ]]; then 
  
 /scripts/run-allgather.sh 8 eth1,eth2,eth3,eth4 1M 512M ${N_NODES} 
  
 else 
  
 while ping -c 1 <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="mord mathnormal" style="margin-right:0.00773em;">OR</span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="mord mathnormal" style="margin-right:0.00773em;">ER</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05764em;">S</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0576em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05017em;">B</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">A</span><span class="mord mathnormal" style="margin-right:0.10903em;">SEN</span><span class="mord mathnormal">A</span><span class="mord mathnormal" style="margin-right:0.05764em;">ME</span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.</span></span></span></span>{POSTFIX}; do 
  
 sleep 5 
  
 done 
  
 fi 
  
 exit 0 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 nvidia 
  
 mountPath 
 : 
  
 /usr/local/nvidia 
  
 - 
  
 name 
 : 
  
 tcpx-socket 
  
 mountPath 
 : 
  
 /tmp 
  
 - 
  
 name 
 : 
  
 libraries 
  
 mountPath 
 : 
  
 /usr/local/nvidia/lib64 
  
 - 
  
 name 
 : 
  
 shared-memory 
  
 mountPath 
 : 
  
 /dev/shm 
  
 resources 
 : 
  
 limits 
 : 
  
 cpu 
 : 
  
 "200" 
  
 memory 
 : 
  
 "1800Gi" 
  
 nvidia.com/gpu 
 : 
  
 8 
  
 requests 
 : 
  
 cpu 
 : 
  
 "200" 
  
 memory 
 : 
  
 "1800Gi" 
  
 nvidia.com/gpu 
 : 
  
 8

Apply the manifest:

 kubectl  
apply  
-f  
nccl-jobset-test.yaml

Check that the workload is admitted and reaches the Completed state.

Fetch logs for the Pod matching nccl-ag-worker-0-0-.* to see the results:

 kubectl  
logs  
 $( 
kubectl  
get  
pods  
-o  
go-template = 
 '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' 
  
 | 
  
grep  
nccl-ag-worker-0-0 )

What's next

Collect and Understand NCCL Logs for Troubleshooting to understand the test outputs and troubleshoot issues.
Learn about troubleshooting slow performance .

Run NCCL on custom GKE clusters that use A3 Mega or A3 High Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Set up your cluster with Jobset and Kueue

A3 High

A3 Mega

Test on two Flex-start nodes

A3 Mega

A3 High

A3 Mega

A3 High

Deploy an NCCL test workload with TAS

A3 Mega

A3 High

What's next

Run NCCL on custom GKE clusters that use A3 Mega or A3 High