Run NCCL on Compute Engine VMs

This page provides instructions for how to install NCCL/gIB with either Debian Software Packages ( .deb ) or the Red Hat Package Manager ( .rpm ). This installation lets you run NCCL tests on A3 Ultra, A4, and A4X VMs (the following examples are for 2-node tests).

If you are using Google's 1P schedulers such as GKE and Cluster Toolkit (with Slurm and GKE support), then you don't need to follow the steps on this page. Instead, follow the instructions on the page that is appropriate for your scenario:

Install nccl-gib

Depending on where you run your workloads, you install NCCL/gIB in either the guest VM or the container image.

The nccl-gib package is bundled with an unmodified NVidia NCCL library ( libnccl2.so ) and headers. All NCCL/gIB content is installed to the /usr/local/gib directory. Some dependencies are also fetched through the distribution's repository.

Debian 12+/Ubuntu 20.04+ (.deb package)

 # If not using an image from Google, trust the GCP signing key 
curl  
http://packages.cloud.google.com/apt/doc/apt-key.gpg  
 | 
  
sudo  
gpg  
--dearmor  
-o  
/etc/apt/trusted.gpg.d/cloud.google.gpg # Add gpudirect-gib-apt repo 
 echo 
  
 'deb https://packages.cloud.google.com/apt gpudirect-gib-apt main' 
  
 | 
  
sudo  
tee  
/etc/apt/sources.list.d/nccl-gib.list

sudo  
apt  
update
sudo  
apt  
install  
nccl-gib

RockyLinux/CentOS/RHEL 9+ (.rpm package)

 # Add gpudirect-gib-rpm repo 
sudo  
tee  
-a  
/etc/yum.repos.d/nccl-gib.repo  
<<  
EOL [ 
gpudirect-gib-rpm ] 
 name 
 = 
NCCL/gIB baseurl 
 = 
https://packages.cloud.google.com/yum/repos/gpudirect-gib-rpm enabled 
 = 
 1 
 repo_gpgcheck 
 = 
 0 
 gpgcheck 
 = 
 0 
sudo  
dnf  
makecache
sudo  
dnf  
install  
nccl-gib

If you are using standard OS images, you must also install the latest NVIDIA DOCA-OFED driver . You don't need to install this driver if you are using Google's A* optimized images, such as Container OS or Guest Accelerator Ubuntu/RockyLinux OS Images .

To avoid VMs running different versions of the nccl-gib package, we recommend that you update nccl-gib before you run your NCCL workloads or disable unattended-upgrades.

Use NCCL/gIB

To enable NCCL/gIB in your workloads, ensure the following:

  • /usr/local/gib/scripts/set_nccl_env.sh is sourced in your runtime environment. The source file includes all the necessary environment variables for NCCL/gIB and Google expects to update them in future NCCL/gIB releases.
  • The /usr/local/gib/lib64 directory is in your LD_LIBRARY_PATH .

To verify NCCL/gIB is enabled check that the following NCCL INFO level log entries are present:

  # A sample log entry from NCCL core 
vm-0:606:642  
 [ 
 6 
 ] 
  
NCCL  
INFO  
Using  
network  
gIB # A sample log entry from the gIB network plugin 
vm-0:606:642  
 [ 
 6 
 ] 
  
NCCL  
INFO  
NET/gIB  
:  
Initializing  
gIB  
v1.0.5 

Run NCCL tests

To learn how to run NCCL tests in a scheduled environment, see the following:

We also publish a diagnostic container image with everything included at http://us-docker.pkg.dev/gce-ai-infra/gpudirect-gib/nccl-plugin-gib-diagnostic:latest .

To run NCCL tests in a non-scheduled environment:

  1. Install cuda-12.8 (or newer) and openmpi
  2. Set up non-interactive ssh logins among the VMs
  3. Build nccl-tests with MPI enabled. When building nccl-tests, set NCCL_HOME=/usr/local/gib

To run the script shipped with the NCCL/gIB package:

  # The script assumes binaries at /opt/nccl-tests/build/ 
$  
/usr/local/gib/scripts/run_nccl_tests.sh  
-d  
/opt/nccl-tests/build/  
-p  
 22 
  
-t  
all_gather  
-m  
0x0  
-b  
4K  
-e  
16G  
a4-vm-1  
a4-vm-2 

Example output on two A4 VMs:

 NCCL version 2.25.1+cuda12.8
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
        4096            64     float    none      -1    59.97    0.07    0.06      0    57.49    0.07    0.07      0
        8192           128     float    none      -1    58.17    0.14    0.13      0    58.36    0.14    0.13      0
       16384           256     float    none      -1    59.07    0.28    0.26      0    59.03    0.28    0.26      0
       32768           512     float    none      -1    60.93    0.54    0.50      0    60.79    0.54    0.51      0
       65536          1024     float    none      -1    61.93    1.06    0.99      0    62.17    1.05    0.99      0
      131072          2048     float    none      -1    64.62    2.03    1.90      0    64.48    2.03    1.91      0
      262144          4096     float    none      -1    66.50    3.94    3.70      0    67.05    3.91    3.67      0
      524288          8192     float    none      -1    69.37    7.56    7.09      0    67.83    7.73    7.25      0
     1048576         16384     float    none      -1    117.2    8.95    8.39      0    113.7    9.22    8.64      0
     2097152         32768     float    none      -1    118.8   17.65   16.55      0    118.1   17.75   16.64      0
     4194304         65536     float    none      -1    122.2   34.32   32.17      0    122.6   34.22   32.08      0
     8388608        131072     float    none      -1    132.2   63.44   59.48      0    130.7   64.20   60.18      0
    16777216        262144     float    none      -1    139.2  120.49  112.96      0    139.7  120.07  112.56      0
    33554432        524288     float    none      -1    152.0  220.81  207.01      0    152.1  220.59  206.81      0
    67108864       1048576     float    none      -1    227.6  294.87  276.44      0    225.9  297.08  278.51      0
   134217728       2097152     float    none      -1    431.7  310.87  291.44      0    438.0  306.41  287.26      0
   268435456       4194304     float    none      -1    728.6  368.44  345.41      0    735.9  364.79  341.99      0
   536870912       8388608     float    none      -1   1404.2  382.33  358.44      0   1418.4  378.51  354.85      0
  1073741824      16777216     float    none      -1   2795.8  384.06  360.05      0   2768.9  387.79  363.55      0
  2147483648      33554432     float    none      -1   5440.1  394.75  370.08      0   5418.7  396.31  371.54      0
  4294967296      67108864     float    none      -1    10754  399.40  374.43      0    10746  399.67  374.69      0
  8589934592     134217728     float    none      -1    21434  400.77  375.72      0    21421  401.01  375.95      0
 17179869184     268435456     float    none      -1    42679  402.53  377.38      0    42792  401.48  376.38      0 

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: