Test performance

The examples in this section show common commands we recommend to evaluate performance using the IOR benchmark ( github ) tool.

Prior to installing IOR, MPI needs to be installed for synchronization between benchmarking processes. We recommend use of the HPC Image for client VMs, which includes tooling to install Intel MPI 2021 . For Ubuntu clients, we recommend openmpi.

Check network performance

Before running IOR it may be helpful to ensure your network has the expected throughput. If you have two client VMs, you can use a tool called iperf to test the network between them.

Install iperf on both VMs:

HPC Rocky 8

 sudo  
dnf  
-y  
install  
iperf

Ubuntu

 sudo  
apt  
install  
-y  
iperf

Start an iperf server on one of your VMs:

 iperf  
-s  
-w  
100m  
-P  
 30

Start an iperf client on the other VM:

 iperf  
-c  
<IP  
ADDRESS  
OF  
iperf  
server  
VM>  
-w  
100m  
-t  
30s  
-P  
 30

Observe the network throughput number between the VMs. For the highest single-client performance, ensure that Tier_1 networking is used.

Single VM performance

The following instructions provide steps and benchmarks to measure single VM performance. The tests run multiple I/O processes into and out of Parallelstore with the intention of saturating the network interface card (NIC).

Install Intel MPI

HPC Rocky 8

 sudo  
google_install_intelmpi  
--impi_2021

To specify the correct libfabric networking stack, set the following variable on your environment:

  export 
  
 I_MPI_OFI_LIBRARY_INTERNAL 
 = 
 0

Then:

  source 
  
/opt/intel/setvars.sh

Ubuntu

 sudo  
apt  
install  
-y  
autoconf
sudo  
apt  
install  
-y  
pkg-config
sudo  
apt  
install  
-y  
libopenmpi-dev
sudo  
apt  
install  
-y  
make

Install IOR

To install IOR:

 git  
clone  
https://github.com/hpc/ior.git cd 
  
ior
./bootstrap
./configure
make
sudo  
make  
install

Run the IOR commands

Run the following IOR commands. To view expected performance numbers, see the Parallelstore overview .

Max performance from a single client VM

HPC Rocky 8

 mpirun  
-genv  
 LD_PRELOAD 
 = 
 "/usr/lib64/libioil.so" 
  
-ppn  
 1 
  
 \ 
  
--bind-to  
socket  
ior  
 \ 
  
-o  
 "/tmp/parallelstore/test" 
  
-O  
 useO_DIRECT 
 = 
 1 
  
 \ 
  
-w  
-r  
-e  
-F  
-t  
 "1m" 
  
-b  
 "8g"

Ubuntu

 mpirun  
--oversubscribe  
-x  
 LD_PRELOAD 
 = 
 "/usr/lib64/libioil.so" 
  
-n  
 1 
  
 \ 
  
ior  
-o  
 "/tmp/parallelstore/test" 
  
-O  
 useO_DIRECT 
 = 
 1 
  
 \ 
  
-w  
-r  
-e  
-F  
-t  
 "1m" 
  
-b  
 "8g"

Where:

ior : actual benchmark. Ensure it is available in the path or provide the full path.
-ppn : the number of processes (jobs) to run. We recommend starting with 1 and then increasing up to the number of vCPUs to achieve max aggregate performance.
-O useO_DIRECT=1 : force the use of direct I/O to bypass the page cache and avoid reading cached data.
-genv LD_PRELOAD="/usr/lib64/libioil.so" : use the DAOS interception library. This option delivers the highest raw performance but bypasses the Linux page cache for data. Metadata is still cached.
-w : Perform writes to individual files.
-r : Perform reads.
-e : Perform fsync upon completion of writes.
-F : Use individual files.
-t "1m" : Read and write data in chunks of specified size. Larger chunk sizes result in better single thread streaming I/O performance.
-b "8g" - size of each file

Max IOps from a single client VM

HPC Rocky 8

 mpirun  
-genv  
 LD_PRELOAD 
 = 
 "/usr/lib64/libioil.so" 
  
-ppn  
 80 
  
 \ 
  
--bind-to  
socket  
ior  
 \ 
  
-o  
 "/tmp/parallelstore/test" 
  
-O  
 useO_DIRECT 
 = 
 1 
  
 \ 
  
-w  
-r  
-e  
-F  
-t  
 "4k" 
  
-b  
 "1g"

Ubuntu

 mpirun  
--oversubscribe  
-x  
 LD_PRELOAD 
 = 
 "/usr/lib64/libioil.so" 
  
-n  
 80 
  
 \ 
  
ior  
-o  
 "/tmp/parallelstore/test" 
  
-O  
 useO_DIRECT 
 = 
 1 
  
 \ 
  
-w  
-r  
-e  
-F  
-t  
 "4k" 
  
-b  
 "1g"

Max performance from a single application thread

HPC Rocky 8

 mpirun  
-genv  
 LD_PRELOAD 
 = 
 "/usr/lib64/libioil.so" 
  
-ppn  
 1 
  
 \ 
  
--bind-to  
socket  
ior  
 \ 
  
-o  
 "/tmp/parallelstore/test" 
  
-O  
 useO_DIRECT 
 = 
 1 
  
 \ 
  
-w  
-r  
-e  
-F  
-t  
 "32m" 
  
-b  
 "64g"

Ubuntu

 mpirun  
-x  
 LD_PRELOAD 
 = 
 "/usr/lib64/libioil.so" 
  
-n  
 1 
  
 \ 
  
ior  
-o  
 "/tmp/parallelstore/test" 
  
-O  
 useO_DIRECT 
 = 
 1 
  
 \ 
  
-w  
-r  
-e  
-F  
-t  
 "32m" 
  
-b  
 "64g"

Small I/O latency from a single application thread

HPC Rocky 8

 mpirun  
-genv  
 LD_PRELOAD 
 = 
 "/usr/lib64/libioil.so" 
  
-ppn  
 1 
  
 \ 
  
--bind-to  
socket  
ior  
 \ 
  
-o  
 "/tmp/parallelstore/test" 
  
-O  
 useO_DIRECT 
 = 
 1 
  
 \ 
  
-z  
-w  
-r  
-e  
-F  
-t  
 "4k" 
  
-b  
 "100m"

Ubuntu

 mpirun  
-x  
 LD_PRELOAD 
 = 
 "/usr/lib64/libioil.so" 
  
-n  
 1 
  
 \ 
  
ior  
-o  
 "/tmp/parallelstore/test" 
  
-O  
 useO_DIRECT 
 = 
 1 
  
 \ 
  
-z  
-w  
-r  
-e  
-F  
-t  
 "4k" 
  
-b  
 "100m"

Multi VMs performance tests

In order to reach the limits of Parallelstore instances, it's important to test the aggregate I/O achievable with parallel I/O from multiple VMs. The instructions in this section provide details and commands on how to do this using mpirun and ior .

See the IOR guide for the full set of options that are useful to test on a larger set of nodes. Note that there are a variety of ways to launch client VMs for multi-client testing from using schedulers such as Batch , Slurm , or using the Compute Engine bulk commands . Also, the HPC Toolkit can help build templates to deploy compute nodes.

This guide uses the following steps to deploy multiple client instances configured to use Parallelstore:

Create an SSH key to use to set up a user on each client VM. You must disable the OS Login requirement on the project if it has been enabled.
Get the access points of the Parallelstore instance.
Create a startup script to deploy to all client instances.
Bulk create the Compute Engine VMs using the startup script and key.
Copy the necessary keys and host files needed to run the tests.

Details for each step are in the following sections.

Set environment variables

The following environment variables are used in the example commands in this document:

  export 
  
 SSH_USER 
 = 
 "daos-user" 
 export 
  
 CLIENT_PREFIX 
 = 
 "daos-client-vm" 
 export 
  
 NUM_CLIENTS 
 = 
  10

Update these to your desired values.

Create an SSH key

Create an SSH key and save it locally to be distributed to the client VMs. The key is associated with the SSH user specified in the environment variables, and will be created on each VM:

  # Generate an SSH key for the specified user 
ssh-keygen  
-t  
rsa  
-b  
 4096 
  
-C  
 " 
 ${ 
 SSH_USER 
 } 
 " 
  
-N  
 '' 
  
-f  
 "./id_rsa" 
chmod  
 600 
  
 "./id_rsa" 
 #Create a new file in the format [user]:[public key] user 
 echo 
  
 " 
 ${ 
 SSH_USER 
 } 
 : 
 $( 
cat  
 "./id_rsa.pub" 
 ) 
  
 ${ 
 SSH_USER 
 } 
 " 
 > 
 "./keys.txt"

Get Parallelstore network details

Get the Parallelstore server IP addresses in a format consumable by the daos agent:

  export 
  
 ACCESS_POINTS 
 = 
 $( 
gcloud  
beta  
parallelstore  
instances  
describe  
 INSTANCE_NAME 
  
 \ 
  
--location  
 LOCATION 
  
 \ 
  
--format  
 "value[delimiter=', '](format(" 
 { 
 0 
 } 
 ", accessPoints))" 
 )

Get the network name associated with the Parallelstore instance:

  export 
  
 NETWORK 
 = 
 $( 
gcloud  
beta  
parallelstore  
instances  
describe  
 INSTANCE_NAME 
  
 \ 
  
--location  
 LOCATION 
  
 \ 
  
--format  
 "value[delimiter=', '](format('{0}', network))" 
  
 | 
  
awk  
-F  
 '/' 
  
 '{print $NF}' 
 )

Create the startup script

The startup script is attached to the VM and will be run every time the system starts. The startup script does the following:

Configures the daos agent
Installs required libraries
Mounts your Parallelstore instance to /tmp/parallelstore/ on each VM
Installs performance testing tools

This script can be used to deploy your custom applications to multiple machines. Edit the section that is related to application specific code in the script.

The following script works on VMs running HPC Rocky 8.

  # Create a startup script that configures the VM 
cat > 
./startup-script << 
EOF
sudo  
tee  
/etc/yum.repos.d/parallelstore-v2-6-el8.repo << 
INNEREOF [ 
parallelstore-v2-6-el8 ] 
 name 
 = 
Parallelstore  
EL8  
v2.6 baseurl 
 = 
https://us-central1-yum.pkg.dev/projects/parallelstore-packages/v2-6-el8 enabled 
 = 
 1 
 repo_gpgcheck 
 = 
 0 
 gpgcheck 
 = 
 0 
INNEREOF
sudo  
dnf  
makecache # 2) Install daos-client 
dnf  
install  
-y  
epel-release  
 # needed for capstone 
dnf  
install  
-y  
daos-client # 3) Upgrade libfabric 
dnf  
upgrade  
-y  
libfabric

systemctl  
stop  
daos_agent

mkdir  
-p  
/etc/daos
cat > 
/etc/daos/daos_agent.yml << 
INNEREOF
access_points:  
 ${ 
 ACCESS_POINTS 
 } 
transport_config:  
allow_insecure:  
 true 
fabric_ifaces:
-  
numa_node:  
 0 
  
devices:  
-  
iface:  
eth0  
domain:  
eth0
INNEREOF echo 
  
-e  
 "Host *\n\tStrictHostKeyChecking no\n\tUserKnownHostsFile /dev/null" 
 > 
/home/ ${ 
 SSH_USER 
 } 
/.ssh/config
chmod  
 600 
  
/home/ ${ 
 SSH_USER 
 } 
/.ssh/config

usermod  
-u  
 2000 
  
 ${ 
 SSH_USER 
 } 
groupmod  
-g  
 2000 
  
 ${ 
 SSH_USER 
 } 
chown  
-R  
 ${ 
 SSH_USER 
 } 
: ${ 
 SSH_USER 
 } 
  
/home/ ${ 
 SSH_USER 
 } 
chown  
-R  
daos_agent:daos_agent  
/etc/daos/

systemctl  
 enable 
  
daos_agent
systemctl  
start  
daos_agent

mkdir  
-p  
/tmp/parallelstore
dfuse  
-m  
/tmp/parallelstore  
--pool  
default-pool  
--container  
default-container  
--disable-wb-cache  
--thread-count = 
 16 
  
--eq-count = 
 8 
  
--multi-user
chmod  
 777 
  
/tmp/parallelstore #Application specific code 
 #Install Intel MPI: 
sudo  
google_install_intelmpi  
--impi_2021 export 
  
 I_MPI_OFI_LIBRARY_INTERNAL 
 = 
 0 
 source 
  
/opt/intel/setvars.sh #Install IOR 
git  
clone  
https://github.com/hpc/ior.git cd 
  
ior
./bootstrap
./configure
make
make  
install
EOF

Create the client VMs

The overall performance of your workloads depends on the client machine types. The following example uses c2-standard-30 VMs; modify the machine-type value to increase performance with faster NICs. See Machine families resource and comparison guide for details of the available machine types.

To create VM instances in bulk, use the gcloud compute instances create command:

 gcloud  
compute  
instances  
bulk  
create  
 \ 
  
--name-pattern = 
 " 
 ${ 
 CLIENT_PREFIX 
 } 
 -####" 
  
 \ 
  
--zone = 
 " LOCATION 
" 
  
 \ 
  
--machine-type = 
 " c2-standard-30 
" 
  
 \ 
  
--network-interface = 
 subnet 
 = 
 ${ 
 NETWORK 
 } 
,nic-type = 
GVNIC  
 \ 
  
--network-performance-configs = 
total-egress-bandwidth-tier = 
TIER_1  
 \ 
  
--create-disk = 
auto-delete = 
yes,boot = 
yes,device-name = 
client-vm1,image = 
projects/cloud-hpc-image-public/global/images/hpc-rocky-linux-8-v20240126,mode = 
rw,size = 
 100 
,type = 
pd-balanced  
 \ 
  
--metadata = 
enable-oslogin = 
FALSE  
 \ 
  
--metadata-from-file = 
ssh-keys = 
./keys.txt,startup-script = 
./startup-script  
 \ 
  
--count  
 ${ 
 NUM_CLIENTS 
 }

Copy keys and files

Retrieve and save the private and public IP addresses for all VMs.

Private IPs:

 gcloud  
compute  
instances  
list  
--filter = 
 "name ~ '^ 
 ${ 
 CLIENT_PREFIX 
 } 
 *'" 
  
--format = 
 "csv[no-heading](INTERNAL_IP)" 
 > 
hosts.txt

Public IPs:

 gcloud  
compute  
instances  
list  
--filter = 
 "name ~ '^ 
 ${ 
 CLIENT_PREFIX 
 } 
 *'" 
  
--format = 
 "csv[no-heading](EXTERNAL_IP)" 
 > 
external_ips.txt

Copy the private key to allow for inter-node passwordless SSH. This is required for the IOR test using SSH to orchestrate machines.

  while 
  
 IFS 
 = 
  
 read 
  
-r  
IP do 
  
 echo 
  
 "Copying id_rsa to 
 ${ 
 SSH_USER 
 } 
 @ 
 $IP 
 " 
  
scp  
-i  
./id_rsa  
-o  
 StrictHostKeyChecking 
 = 
no  
./id_rsa  
 ${ 
 SSH_USER 
 } 
@ $IP 
:~/.ssh/ done 
 < 
 "./external_ips.txt"

Retrieve the IP of the first node, and copy the list of internal IPs to that node. This will be the head node for the test run.

  export 
  
 HEAD_NODE 
 = 
 $( 
head  
-n  
 1 
  
./external_ips.txt ) 
scp  
-i  
./id_rsa  
-o  
 "StrictHostKeyChecking=no" 
  
-o  
 UserKnownHostsFile 
 = 
/dev/null  
./hosts.txt  
 ${ 
 SSH_USER 
 } 
@ ${ 
 HEAD_NODE 
 } 
:~

Run IOR commands on multiple VMs

Connect to the head node with the specified user:

 ssh  
-i  
./id_rsa  
-o  
 "StrictHostKeyChecking=no" 
  
-o  
 UserKnownHostsFile 
 = 
/dev/null  
 ${ 
 SSH_USER 
 } 
@ ${ 
 HEAD_NODE 
 }

Then:

  source 
  
/opt/intel/setvars.sh export 
  
 I_MPI_OFI_LIBRARY_INTERNAL 
 = 
 0 
 export 
  
 D_LOG_MASK 
 = 
INFO export 
  
 D_LOG_FILE_APPEND_PID 
 = 
 1 
rm  
-f  
/tmp/client.log.* export 
  
 D_LOG_FILE 
 = 
/tmp/client.log

Max performance from multiple client VMs

Test performance in a multi-process, maximum throughput scenario.

 mpirun  
-f  
hosts.txt  
-genv  
 LD_PRELOAD 
 = 
 "/usr/lib64/libioil.so" 
  
-ppn  
 30 
  
 \ 
  
--bind-to  
socket  
ior  
 \ 
  
-o  
 "/tmp/parallelstore/test" 
  
-O  
 useO_DIRECT 
 = 
 1 
  
 \ 
  
-w  
-r  
-e  
-F  
-t  
 "1m" 
  
-b  
 "8g"

Max IOPs from multiple client VMs

Test performance in a multi-process, maximum IOPs scenario.

 mpirun  
-f  
hosts.txt  
-genv  
 LD_PRELOAD 
 = 
 "/usr/lib64/libioil.so" 
  
-ppn  
 30 
  
 \ 
  
--bind-to  
socket  
ior  
 \ 
  
-o  
 "/tmp/parallelstore/test" 
  
-O  
 useO_DIRECT 
 = 
 1 
  
 \ 
  
-w  
-r  
-e  
-F  
-t  
 "4k" 
  
-b  
 "1g"

Cleanup

Unmount the DAOS container:
```
 sudo  
umount  
/tmp/parallelstore/ 
```

Delete the Parallelstore instance:

gcloud CLI

 gcloud  
beta  
parallelstore  
instances  
delete  
 INSTANCE_NAME 
  
--location = 
 LOCATION

REST

 curl  
-X  
DELETE  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
-H  
 "Content-Type: application/json" 
  
https://parallelstore.googleapis.com/v1beta/projects/ PROJECT_ID 
/locations/ LOCATION 
/instances/ INSTANCE_NAME

Delete the Compute Engine VMs:
```
  
```

Test performance Stay organized with collections Save and categorize content based on your preferences.

Check network performance

HPC Rocky 8

Ubuntu

Single VM performance

Install Intel MPI

HPC Rocky 8

Ubuntu

Install IOR

Run the IOR commands

Max performance from a single client VM

HPC Rocky 8

Ubuntu

Max IOps from a single client VM

HPC Rocky 8

Ubuntu

Max performance from a single application thread

HPC Rocky 8

Ubuntu

Small I/O latency from a single application thread

HPC Rocky 8

Ubuntu

Multi VMs performance tests

Set environment variables

Create an SSH key

Get Parallelstore network details

Create the startup script

Create the client VMs

Copy keys and files

Run IOR commands on multiple VMs

Max performance from multiple client VMs

Max IOPs from multiple client VMs

Cleanup

gcloud CLI

REST

Test performance