This document explains how to create a cluster in Cluster Director where you fully customize the compute, networking, and storage resources for your specific artificial intelligence (AI), machine learning (ML), or high performance computing (HPC) workloads.
This process lets you design a fault-tolerant and highly scalable Slurm environment with your own custom specifications, helping your cluster meet the needs of your workloads. To create a cluster based on a template that is optimized for running AI and ML workloads, see instead Create an AI-optimized cluster based on a template .
Limitations
When you create a cluster in Cluster Director, the following limitations apply:
- Regional scope: clusters are regional resources. You can only create or use compute resources, storage resources, and subnetworks that exist within the same region as your cluster.
- Compute resource configuration per nodeset:
- You can only specify one compute resource configuration for each nodeset in your cluster.
- To create A4X VMs in a cluster, the total number of VMs for each nodeset with A4X VMs must be a multiple of 18. This total is the sum of the static node count and the dynamic node count.
- Storage classes for new Cloud Storage buckets: if you plan to create one or more buckets when creating a cluster, then you can only specify the Standard storage class or Autoclass . To use other classes, update the bucket after you create the cluster.
Before you begin
Before you create a cluster in Cluster Director, do the following:
-
Choose consumption options. If you haven't already, then you must choose the consumption options for the virtual machine (VM) instances that you want to use in each partition for your cluster. Each consumption option determines the availability, obtainability, and pricing for your VMs.
To learn more, see Choose a consumption option .
-
Obtain capacity and quota. Based on your chosen consumption option, review the quota requirements for the VMs that you want to create in the cluster. If you lack sufficient quota, then creating your cluster fails.
To learn more, see Capacity and quota overview .
-
Verify usable reservations. If you want to create your cluster by using one or more reservations, then verify that the reservations have enough available resources to create your chosen number of VMs in the cluster. Otherwise, skip this step.
To learn more, see Consumable VMs in a reservation .
-
Verify trusted image policy. If the organization in which your project exists has a trusted image policy (
constraints/compute.trustedImageProjects), then verify that theclusterdirector-public-imagesproject is included in the list of allowed projects.To learn more, see Setting up trusted image policies .
-
Verify existing resource requirements. If you plan to use existing storage or networking resources in your cluster instead of creating new ones, then you must verify that those resources are correctly configured. Otherwise, skip this step.
To learn more, see Cluster creation process overview .
-
Authenticate. To use the samples on this page, you might need to authenticate to Google.
Select the tab for how you plan to use the samples on this page:
Console
When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.
gcloud
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
Install the Google Cloud CLI. After installation, initialize the Google Cloud CLI by running the following command:
gcloud init
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
Required roles
To get the permissions that you need to create a custom cluster from scratch, ask your administrator to grant you the following IAM roles:
- To create and manage a cluster:
- Cluster Director Editor
(
roles/hypercomputecluster.editor) on your project - Service Account User
(
roles/iam.serviceAccountUser) on the Compute Engine default service account
- Cluster Director Editor
(
- To manage resources in a cluster:
- Compute Instance Admin (v1)
(
roles/compute.instanceAdmin.v1) on your project - Logs Writer
(
roles/logging.logWriter) on the Compute Engine default service account - Monitoring Metric Writer
(
roles/monitoring.metricWriter) on the Compute Engine default service account - Storage Object Viewer
(
roles/storage.objectViewer) on the Compute Engine default service account
- Compute Instance Admin (v1)
(
For more information about granting roles, see Manage access to projects, folders, and organizations .
These predefined roles contain the permissions required to create a custom cluster from scratch. To see the exact permissions that are required, expand the Required permissionssection:
Required permissions
The following permissions are required to create a custom cluster from scratch:
- To create a cluster:
hypercomputecluster.clusters.create
You might also be able to get these permissions with custom roles or other predefined roles .
Create a custom cluster from scratch
To create a custom cluster from scratch, select one of the following options:
Console
-
In the Google Cloud console, go to the Cluster Directorpage.
-
Click Create cluster.
-
In dialog that appears, click Step-by-step configuration. The Create clusterpage appears.
-
In the Cluster namefield, enter a name for your cluster. The name can contain up to 10 characters and must only use lowercase letters.
-
In the Computesection, click Configure resources. The Add resource configurationpane appears.
-
To configure a compute resource configuration, complete the following steps:
-
In the Machine configurationsection, select the machine series and type that you want to use.
-
In the Number of instancesfield, enter the number of Compute Engine instances to use for the configuration.
-
In the Consumption optionssection, specify the consumption option that you want to use to obtain resources:
-
To create GPU instances by using a reservation, do the following:
-
Click the Use reservationtab.
-
Click Select reservation. The Choose a reservationpane appears. If you want to use a reservation of A4X VMs, then you can optionally choose the block or sub-block to control the placement of your VMs.
-
Select the reservation that you want to use. Then, click Choose. This action automatically sets the Regionand Zonefields to the region and zone of your reservation.
-
-
To create GPU Flex-start VMs, do the following:
-
Click the Flex starttab.
-
In the Time limit for the VMsection, specify the run duration for the Flex-start VMs. The value must be between 10 minutes and 7 days.
-
In the Locationsection, select the region where you want to create Flex-start VMs. The Google Cloud console automatically filters the available regions to only show those regions that support Flex-start VMs for your selected machine type.
-
-
To create GPU or N2 Spot VMs, do the following:
-
Click the Use spottab.
-
In the On VM terminationlist, select one of the following options:
-
To delete Spot VMs on preemption, select Delete.
-
To stop Spot VMs on preemption, select Stop.
-
-
In the Locationsection, select the Regionand Zonewhere you want to create Spot VMs. The Google Cloud console automatically filters the available regions to only show those regions that support Spot VMs for your selected machine type.
-
-
To create N2 instances, do the following:
-
Click the Use on-demandtab.
-
In the Locationsection, select the region where you want to create instances.
-
-
-
Click Done.
-
Optional: To create additional compute resource configurations, click the Add resource configuration, and then follow the prompts to specify the compute resources.
-
-
Click Continue.
-
In the Choose a Virtual Private Cloud (VPC) networksection, do one of the following:
-
Recommended: To let Cluster Director automatically create a pre-configured VPC network for your cluster, do the following:
-
Select Create a new VPC network.
-
In the Network namefield, enter a name for the VPC network.
-
-
To use an existing VPC or Shared VPC network, do the following:
-
Select Use a VPC network in the current projector Use a Shared VPC network hosted in another project.
-
In the Select VPC networkor Shared VPC networklist, select a VPC or Shared VPC network that meets the required configurations .
-
In the Select subnetworklist, select an existing subnetwork.
-
-
-
Click Continue.
-
Optional: To edit a storage resource, in the Storagesection, click the Edit storage plan, and then do one of the following:
-
To specify a Filestore instance, do the following:
-
Click the Filestoretab.
-
In the Instance provisioningsection, select one of the following options:
-
To use an existing Filestore instance that uses the same network as your cluster, select Select existing Filestore instance, and then select the instance.
-
To create a new Filestore instance, select Create new instance. Then, follow the prompts to create your instance. For more information about the configurations that you can specify in the instance, see Create an instance .
-
-
-
To specify a Google Cloud Managed Lustre instance, do the following:
-
Click the Managed Lustretab.
-
In the Instance provisioningsection, select one of the following options:
-
To use an existing Managed Lustre instance, select Select an existing Managed Lustre instance, and then select the instance.
-
To create a new Managed Lustre instance, select Create new instance. Then, follow the prompts to create your instance. For more information about the configurations that you can specify in the instance, see Create a Managed Lustre instance .
-
-
-
To specify a Cloud Storage bucket, do the following:
-
Click the Cloud Storagetab.
-
In the Bucket provisioningsection, select one of the following options:
-
To use an existing Cloud Storage bucket, select Select an existing bucket, and then select the bucket.
-
To create a new Cloud Storage bucket, select Create a new bucket, and then follow the prompts to create your bucket. For more information about the configurations that you can specify in the bucket, see Create a bucket .
-
-
-
-
Optional: To add storage resources to your cluster, click Add storage configuration, and then follow the prompts to specify the configuration for the storage resource.
-
Click Continue.
-
Optional: To edit the number and type of compute instances that the login node uses, expand the Login nodesection, and then complete the following steps:
-
In the Machine typefield, select an N2 standard machine type of 32 vCPUs or fewer.
-
Optional: To specify a custom OS image instead of the one that Cluster Director automatically configures for the login node, in the Source imagefield, select one of the supported OS images for Cluster Director .
-
-
In the Node countfield, enter the number of compute instances to use in the login node.
-
Optional: To specify a startup script for the compute instances, in the Startup scriptfield, enter your script. For more information about this type of scripts, see About startup scripts .
-
In the Boot disk typeand Boot disk sizefields, select a boot disk type and size for the compute instances in the login node. For more information about the boot disks that your compute instances can use, see Choose a disk type .
-
Optional: To specify more advanced configurations for the login node, expand Advanced login node settings. Then, follow the prompts to manage OS login, manage public IPs, or add or remove labels to the compute instances in the login node.
-
Optional: To edit the partitions of your cluster to organize your compute resources, expand the Partitionssection, and then do one of the following:
-
To add a partition, click Add partition, and then do the following:
-
In the Partition namefield, enter a name for the partition.
-
To edit a nodeset, click Toggle nodeset. To add a nodeset, click Add nodeset.
-
In the Nodeset namefield, enter a name for your nodeset.
-
In the Resource configurationfield, select a compute resource configuration that you created in the previous steps.
-
In the Source imagefield, select an OS image to use for the compute nodes in the nodeset.
-
In the Static node countfield, enter the minimum number of compute instances that must always be running in the cluster.
-
In the Dynamic node countfield, enter a limit of compute instances that Cluster Director can increase the cluster to during increases in traffic.
-
In the Boot disk typelist and Boot disk sizefield, enter the type and size of the boot disk for the compute instances to use.
-
In the Boot disk typeand Boot disk sizefields, select a boot disk type and size for the compute instances in the compute node.
-
Optional: To specify more advanced configurations for the compute node, expand Advanced nodeset settings. Then, follow the prompts to add or remove startup scripts, or add or remove labels.
-
Click Done.
-
-
To remove a partition, click Delete partition.
-
-
Optional: To add a partition to your cluster, click Add partition, and then follow the prompts to specify the compute resources for the partition.
-
Optional: To add prolog or epilog scripts to your Slurm cluster, do the following:
-
Expand the Advanced orchestration settingssection.
-
In the Scriptssection, follow the prompts to add prolog or epilog scripts.
-
-
Click Create.
The Clusterspage appears. Creating the cluster can take some time to complete. The completion time depends on the number of compute instances that you request and resource availability in the compute instances' zone. If your requested resources are unavailable, then Cluster Director maintains the creation request until resources become available. To view the status of the cluster create operation, view your cluster's details .
When Cluster Director creates your login node, the cluster state changes to Ready. You can then connect to your cluster; however, you can run workloads only after Cluster Director creates the compute nodes in the cluster.
gcloud
To create a cluster from scratch, use the gcloud alpha cluster-director clusters create
command
.
Based on how you want to specify the cluster configuration, use one of the following methods:
-
Specify a configuration file: to create a cluster by specifying the cluster configuration in a JSON file, use the
--configflag. To run the command, select one of the following options:Bash
gcloud alpha cluster-director clusters create CLUSTER_NAME \ --location = REGION \ --config = CONFIGURATION_FILEPowershell
gcloud alpha cluster-director clusters create CLUSTER_NAME ` - -location = REGION ` - -config = CONFIGURATION_FILEcmd.exe
gcloud alpha cluster-director clusters create CLUSTER_NAME ^ --location= REGION ^ --config= CONFIGURATION_FILEReplace the following:
-
CLUSTER_NAME: the name of the cluster. The name can contain up to 10 characters, and it can only use numbers or lowercase letters. -
REGION: the region where to create your cluster. -
CONFIGURATION_FILE: the path to the JSON file that contains the configuration details for the cluster. To review the configuration details that you can specify, review the request body for creating a cluster by using REST .
-
-
Specify cluster properties directly: to create a cluster by specifying each configuration property directly, use the following flags:
-
To specify a network, use one of the following flags:
-
To create a new network:
--create-network -
To use an existing network and subnetwork:
--networkand--subnet
-
-
To specify a Filestore instance, use one of the following flags:
-
To create a new instance:
--create-filestores -
To use an existing instance:
--filestores
-
-
Optionally, to specify a Cloud Storage bucket, use one of the following flags:
-
To create a new bucket:
--create-buckets -
To use an existing bucket:
--buckets
-
-
Optionally, to specify a Google Cloud Managed Lustre instance, use one of the following flags:
-
To create a new instance:
--create-lustres -
To use an existing instance:
--lustres
-
-
To specify a compute resource configuration, use one of the following flags for each resource configuration that you want to create in the cluster.
-
To create compute instances by using a reservation:
--reserved-instances -
To create Flex-start VMs:
--dws-flex-instances -
To create Spot VMs:
--spot-instances -
To create N2 on-demand instances:
--on-demand-instances
-
-
To specify the configuration for the login node, use the
--slurm-login-nodeflag. -
To specify the configuration for a compute nodeset, use the
--slurm-node-setsflag. You can repeat this flag for each nodeset in the cluster. -
To specify the cluster partitions, use the
--slurm-partitionsflag. You can repeat this flag for each partition in the cluster. -
To specify the default partition for the cluster, use the
--slurm-default-partitionflag.
For example, assume that you want to create a cluster with one partition that uses reserved compute instances, one partition that uses Spot VMs, a new Filestore instance, and a new network. To create the example cluster, select one of the following options:
Bash
gcloud alpha cluster-director clusters create CLUSTER_NAME \ --location = REGION \ --create-network = name = NETWORK_NAME \ --create-filestores = name = "locations/ FILESTORE_INSTANCE_ZONE /instances/ FILESTORE_INSTANCE_NAME " ,tier = TIER ,capacityGb = CAPACITY ,fileshare = SHARE_NAME ,protocol = PROTOCOL \ --reserved-instances = id = COMPUTE_RESOURCE_NAME_1 ,reservation = "projects/ RESERVATION_PROJECT_ID /zones/ RESERVATION_ZONE /reservations/ RESERVATION_NAME " ,machineType = RESERVATION_MACHINE_TYPE \ --spot-instances = id = COMPUTE_RESOURCE_NAME_2 ,zone = SPOT_VMS_ZONE ,machineType = SPOT_MACHINE_TYPE \ --slurm-login-node = machineType = LOGIN_NODE_MACHINE_TYPE ,zone = LOGIN_NODE_ZONE ,count = LOGIN_NODES_COUNT \ --slurm-node-sets = id = NODESET_NAME_1 ,computeId = COMPUTE_RESOURCE_NAME_1 ,staticNodeCount = NODESET_1_STATIC_COUNT ,maxDynamicNodeCount = NODESET_1_MAX_DYNAMIC_COUNT \ --slurm-node-sets = id = NODESET_NAME_2 ,computeId = COMPUTE_RESOURCE_NAME_2 ,staticNodeCount = NODESET_2_STATIC_COUNT ,maxDynamicNodeCount = NODESET_2_MAX_DYNAMIC_COUNT \ --slurm-partitions = id = PARTITION_NAME_1 ,nodesetIds =[ NODESET_NAME_1 ] \ --slurm-partitions = id = PARTITION_NAME_2 ,nodesetIds =[ NODESET_NAME_2 ] \ --slurm-default-partition = PARTITION_NAME_1Powershell
gcloud alpha cluster-director clusters create CLUSTER_NAME ` - -location = REGION ` - -create-network = name = NETWORK_NAME ` - -create-filestores = name = "locations/ FILESTORE_INSTANCE_ZONE /instances/ FILESTORE_INSTANCE_NAME " , tier = TIER , capacityGb = CAPACITY , fileshare = SHARE_NAME , protocol = PROTOCOL ` - -reserved-instances = id = COMPUTE_RESOURCE_NAME_1 , reservation = "projects/ RESERVATION_PROJECT_ID /zones/ RESERVATION_ZONE /reservations/ RESERVATION_NAME " ` - -spot-instances = id = COMPUTE_RESOURCE_NAME_2 , zone = SPOT_VMS_ZONE , machineType = SPOT_MACHINE_TYPE ` - -slurm-login-node = machineType = LOGIN_NODE_MACHINE_TYPE , zone = LOGIN_NODE_ZONE , count = LOGIN_NODES_COUNT ` - -slurm-node-sets = id = NODESET_NAME_1 , computeId = COMPUTE_RESOURCE_NAME_1 , staticNodeCount = NODESET_1_STATIC_COUNT , maxDynamicNodeCount = NODESET_1_MAX_DYNAMIC_COUNT ` - -slurm-node-sets = id = NODESET_NAME_2 , computeId = COMPUTE_RESOURCE_NAME_2 , staticNodeCount = NODESET_2_STATIC_COUNT , maxDynamicNodeCount = NODESET_2_MAX_DYNAMIC_COUNT ` - -slurm-partitions = id = PARTITION_NAME_1 , nodesetIds = [ NODESET_NAME_1 ] ` - -slurm-partitions = id = PARTITION_NAME_2 , nodesetIds = [ NODESET_NAME_2 ] ` - -slurm-default-partition = PARTITION_NAME_1cmd.exe
gcloud alpha cluster-director clusters create CLUSTER_NAME ^ --location= REGION ^ --create-network=name= NETWORK_NAME ^ --create-filestores=name= "locations/ FILESTORE_INSTANCE_ZONE /instances/ FILESTORE_INSTANCE_NAME " ,tier= TIER ,capacityGb= CAPACITY ,fileshare= SHARE_NAME ,protocol= PROTOCOL ^ --reserved-instances=id= COMPUTE_RESOURCE_NAME_1 ,reservation= "projects/ RESERVATION_PROJECT_ID /zones/ RESERVATION_ZONE /reservations/ RESERVATION_NAME " ^ --spot-instances=id= COMPUTE_RESOURCE_NAME_2 ,zone= SPOT_VMS_ZONE ,machineType= SPOT_MACHINE_TYPE ^ --slurm-login-node=machineType= LOGIN_NODE_MACHINE_TYPE ,zone= LOGIN_NODE_ZONE ,count= LOGIN_NODES_COUNT ^ --slurm-node-sets=id= NODESET_NAME_1 ,computeId= COMPUTE_RESOURCE_NAME_1 ,staticNodeCount= NODESET_1_STATIC_COUNT ,maxDynamicNodeCount= NODESET_1_MAX_DYNAMIC_COUNT ^ --slurm-node-sets=id= NODESET_NAME_2 ,computeId= COMPUTE_RESOURCE_NAME_2 ,staticNodeCount= NODESET_2_STATIC_COUNT ,maxDynamicNodeCount= NODESET_2_MAX_DYNAMIC_COUNT ^ --slurm-partitions=id= PARTITION_NAME_1 ,nodesetIds=[ NODESET_NAME_1 ] ^ --slurm-partitions=id= PARTITION_NAME_2 ,nodesetIds=[ NODESET_NAME_2 ] ^ --slurm-default-partition= PARTITION_NAME_1Replace the following:
-
CLUSTER_NAME: the name of the cluster. The name can contain up to 10 characters, and it can only use numbers or lowercase letters (a-z). Spaces or special characters aren't allowed. -
REGION: the region where to create your cluster. -
NETWORK_NAME: the name of the network that you want to create. -
FILESTORE_INSTANCE_ZONE: the zone where you want to create your Filestore instance. -
FILESTORE_INSTANCE_NAME: the name for your Filestore instance. -
TIER: the type of service tier that you want to use for the instance and that Cluster Director supports. Specify one of the following values:-
For the zonal tier:
ZONAL -
For the regional tier:
REGIONAL
-
-
CAPACITY: the size, in GiB, that you want to allocate for the instance. The value must be between 1,024 GiB (1024) and 102,400 GiB (102400), and it must be in 256 GiB (256) increments. -
SHARE_NAME: the name for the NFS file share that is served from the instance. -
PROTOCOL: the system protocol for the instance. Specify one of the following values:-
For NFSv3:
NFSV3 -
For NFSv4.1:
NFSV41
-
-
COMPUTE_RESOURCE_NAME_1andCOMPUTE_RESOURCE_NAME_2: the name of the two compute resource configurations. -
RESERVATION_PROJECT_ID: the ID of the project where the reservation exists. If you want to use a reservation from a different project, then verify that your project is allowed to consume the reservation. For more information, see Allow and restrict projects from creating and modifying shared reservations . -
RESERVATION_ZONE: the zone where the reservation exists. -
RESERVATION_NAME: the name of the reservation that you want to use to create VMs. If you use a reservation of A4X VMs, then you can optionally specify the block or sub-block to control the placement of your VMs:-
Block:
RESERVATION_NAME /reservationBlocks/ RESERVATION_BLOCK_NAME -
Sub-block:
RESERVATION_NAME /reservationBlocks/ RESERVATION_BLOCK_NAME /reservationSubBlocks/ RESERVATION_SUB_BLOCK_NAME
-
-
RESERVATION_MACHINE_TYPE: the machine type that is specified in the reservation. -
SPOT_VMS_ZONE: the zone where you want to create your Spot VMs. To review the regions and zones where the machine type that you want to use is available, see Available regions and zones . -
SPOT_MACHINE_TYPE: the machine type to use for the Spot VMs. Specify one of the following machine types:-
For an A4 machine type:
a4-highgpu-8g -
For an A3 Ultra machine type:
a3-ultragpu-8g -
For an A3 Mega machine type:
a3-megagpu-8g -
For an N2 machine type, see N2 machine series .
-
-
LOGIN_NODE_MACHINE_TYPE: the machine type that you want the compute instances in the login nodeset to use. Specify an N2 standard machine type with 32 or fewer vCPUs. -
LOGIN_NODE_ZONE: the zone where you want to create the compute instances in the login nodeset. -
LOGIN_NODES_COUNT: the number of compute instances to use for the login nodeset. -
NODESET_NAME_1andNODESET_NAME_2: the name of the two nodesets. -
NODESET_1_STATIC_COUNTandNODESET_2_STATIC_COUNT: the minimum number of compute instances that must always be running in each nodeset. -
NODESET_1_MAX_DYNAMIC_COUNTandNODESET_2_MAX_DYNAMIC_COUNT: the maximum number of compute instances that Cluster Director can add to each nodeset during increases in traffic. -
PARTITION_NAME_1andPARTITION_NAME_2: the name of the partitions for your cluster.
-
The output is similar to the following:
Create request issued for: [cluster000]
Waiting for operation [projects/example-project/locations/us-central1/operations/operation-1759856594716-640948b2f058e-f403bef9-1a08178a] to complete...working...
Creating the cluster can take some time to complete. The completion time depends on the number of compute instances that you request and resource availability in the compute instances' zone. If your requested resources are unavailable, then Cluster Director maintains the creation request until resources become available. When Cluster Director creates your login node, the output is similar to the following. You can then connect to your cluster; however, you can only run workloads when Cluster Director creates the compute nodes in your cluster.
Created cluster [cluster000].
REST
To create a cluster from scratch, make a POST
request to the clusters.create
method
.
Your request must include the following HTTP method and request URL:
POST https://hypercomputecluster.googleapis.com/v1/projects/ PROJECT_ID
/locations/ REGION
/clusters?clusterId= CLUSTER_NAME
In the request body, include the following fields:
-
description: Optional. A description for your cluster. -
labels: Optional. Key-value pairs of labels to help you organize and filter your clusters and its associated resources. For more information, see Organize resources using labels . -
networkResources: the network configuration for your cluster. You can either create a new network or use an existing one. -
storageResources: the storage resources for your cluster. You can either create Filestore instances, Managed Lustre instances, or Cloud Storage buckets, or use existing ones. -
computeResources: the compute resources for your cluster, including the machine types and provisioning models to use for the compute instances in the cluster. -
orchestrator: the settings for the Slurm workload scheduler for your cluster, as well as the configurations for the cluster nodesets and partitions.
For example, assume that you want to create a cluster with one partition
that uses reserved compute instances, one partition that uses
Spot VMs, a new Filestore instance, and a new
network. To create the example cluster, include the following in a JSON file
named request-body.json
:
{
"name"
:
" CLUSTER_NAME
"
,
"networkResources"
:
{
" NETWORK_NAME
"
:
{
"config"
:
{
"newNetwork"
:
{
"network"
:
"projects/ PROJECT_ID
/global/networks/ NETWORK_NAME
"
}
}
}
},
"storageResources"
:
{
" STORAGE_RESOURCE_CONFIGURATION
"
:
{
"config"
:
{
"newFilestore"
:
{
"filestore"
:
"projects/ PROJECT_ID
/locations/ FILESTORE_INSTANCE_ZONE
/instances/ FILESTORE_INSTANCE_NAME
"
,
"fileShares"
:
{
"capacityGb"
:
" CAPACITY
"
,
"fileShare"
:
" SHARE_NAME
"
},
"tier"
:
" TIER
"
,
"protocol"
:
" PROTOCOL
"
}
}
}
},
"computeResources"
:
{
" COMPUTE_RESOURCE_NAME_1
"
:
{
"config"
:
{
"newReservedInstances"
:
{
"reservation"
:
"projects/ RESERVATION_PROJECT_ID
/zones/ RESERVATION_ZONE
/reservations/ RESERVATION_NAME
"
}
}
},
" COMPUTE_RESOURCE_NAME_2
"
:
{
"config"
:
{
"newSpotInstances"
:
{
"zone"
:
" SPOT_VMS_ZONE
"
,
"machineType"
:
" SPOT_MACHINE_TYPE
"
}
}
}
},
"orchestrator"
:
{
"slurm"
:
{
"loginNodes"
:
{
"count"
:
" LOGIN_NODES_COUNT
"
,
"zone"
:
" LOGIN_NODE_ZONE
"
,
"machineType"
:
" LOGIN_NODE_MACHINE_TYPE
"
},
"nodeSets"
:
[
{
"id"
:
" NODESET_NAME_1
"
,
"computeId"
:
" COMPUTE_RESOURCE_NAME_1
"
,
"storageConfigs"
:
[
{
"id"
:
" STORAGE_RESOURCE_CONFIGURATION
"
,
"localMount"
:
"/home"
}
],
"staticNodeCount"
:
" NODESET_1_STATIC_COUNT
"
,
"maxDynamicNodeCount"
:
" NODESET_1_MAX_DYNAMIC_COUNT
"
,
"computeInstance"
:
{
"bootDisk"
:
{
"type"
:
"projects/ PROJECT_ID
/zones/ DISK_ZONE_1
/diskTypes/ DISK_TYPE_1
"
,
"sizeGb"
:
" DISK_SIZE_1
"
}
}
},
{
"id"
:
" NODESET_NAME_2
"
,
"computeId"
:
" COMPUTE_RESOURCE_NAME_2
"
,
"storageConfigs"
:
[
{
"id"
:
" STORAGE_RESOURCE_CONFIGURATION
"
,
"localMount"
:
"/home"
}
],
"staticNodeCount"
:
" NODESET_2_STATIC_COUNT
"
,
"maxDynamicNodeCount"
:
" NODESET_2_MAX_DYNAMIC_COUNT
"
,
"computeInstance"
:
{
"bootDisk"
:
{
"type"
:
"projects/ PROJECT_ID
/zones/ DISK_ZONE_2
/diskTypes/ DISK_TYPE_2
"
,
"sizeGb"
:
" DISK_SIZE_2
"
}
}
}
],
"partitions"
:
[
{
"id"
:
" PARTITION_NAME_1
"
,
"nodeSetIds"
:
[
" NODESET_NAME_1
"
]
},
{
"id"
:
" PARTITION_NAME_2
"
,
"nodeSetIds"
:
[
" NODESET_NAME_2
"
]
}
],
"defaultPartition"
:
" PARTITION_NAME_1
"
}
}
}
Replace the following:
-
PROJECT_ID: the ID of the project where you want to create your cluster and its associated resources. -
REGION: the region where to create your cluster. -
CLUSTER_NAME: the name of the cluster. The name can contain up to 10 characters, and it can only use numbers or lowercase letters (a-z). -
NETWORK_NAME: the name of the network that you want to create. -
STORAGE_RESOURCE_CONFIGURATION: the name of the storage resource configuration. -
FILESTORE_INSTANCE_ZONE: the zone where you want to create your Filestore instance. -
FILESTORE_INSTANCE_NAME: the name for your Filestore instance. -
CAPACITY: the size, in GiB, that you want to allocate for the instance. The value must be between 1,024 GiB (1024) and 102,400 GiB (102400), and it must be in 256 GiB (256) increments. For more information about the supported service tiers and capacity for Filestore instances, see Service tiers . -
SHARE_NAME: the name for the NFS file share that is served from the instance. -
TIER: the type of service tier that you want to use for the instance and that Cluster Director supports. Specify one of the following values:-
For the zonal tier:
ZONAL -
For the regional tier:
REGIONAL
-
-
PROTOCOL: the system protocol for the instance. Specify one of the following values:-
For NFSv3:
NFSV3 -
For NFSv4.1:
NFSV41
-
-
COMPUTE_RESOURCE_NAME_1andCOMPUTE_RESOURCE_NAME_2: the name of the two compute resource configurations. -
RESERVATION_PROJECT_ID: the ID of the project where the reservation exists. If you want to use a reservation from a different project, then verify that your project is allowed to consume the reservation. For more information, see Allow and restrict projects from creating and modifying shared reservations . -
RESERVATION_NAME: the name of the reservation that you want to use to create VMs. If you use a reservation of A4X VMs, then you can optionally specify the block or sub-block to control the placement of your VMs:-
Block:
RESERVATION_NAME /reservationBlocks/ RESERVATION_BLOCK_NAME -
Sub-block:
RESERVATION_NAME /reservationBlocks/ RESERVATION_BLOCK_NAME /reservationSubBlocks/ RESERVATION_SUB_BLOCK_NAME
-
-
SPOT_VMS_ZONE: the zone where you want to create your Spot VMs. To review the regions and zones where the machine type that you want to use is available, see Available regions and zones . -
SPOT_MACHINE_TYPE: the machine type to use for the Spot VMs. Specify one of the following machine types:-
For an A4 machine type:
a4-highgpu-8g -
For an A3 Ultra machine type:
a3-ultragpu-8g -
For an A3 Mega machine type:
a3-megagpu-8g -
For an N2 machine type, see N2 machine series .
-
-
LOGIN_NODES_COUNT: the number of compute instances to use for the login nodeset. -
LOGIN_NODE_ZONE: the zone where you want to create the compute instances in the login nodeset. -
LOGIN_NODE_MACHINE_TYPE: the machine type that you want the compute instances in the login nodeset to use. Specify an N2 standard machine type with 32 or fewer vCPUs. -
NODESET_NAME_1andNODESET_NAME_2: the name of the two nodesets. -
NODESET_1_STATIC_COUNTandNODESET_2_STATIC_COUNT: the minimum number of compute instances that must always be running in each nodeset. -
NODESET_1_MAX_DYNAMIC_COUNTandNODESET_2_MAX_DYNAMIC_COUNT: the maximum number of compute instances that Cluster Director can add to each nodeset during increases in traffic. -
DISK_ZONE_1andDISK_ZONE_2: the zone where you want to create the boot disks for the nodesets. -
DISK_TYPE_1andDISK_TYPE_2: the type of boot disks for the nodesets. Based on the machine type in the node, specify one of the following values:-
For A4X instances:
hyperdisk-balanced -
For A4 instances:
hyperdisk-balanced -
For A3 Ultra instances:
hyperdisk-balanced -
For A3 Mega instances:
pd-balanced,pd-ssd,hyperdisk-balanced,hyperdisk-ml,hyperdisk-extreme, orhyperdisk-throughput -
For N2 instances:
pd-standard,pd-balanced,pd-ssd,pd-extreme,hyperdisk-extreme, orhyperdisk-throughput
For an overview of the different types of boot disks that you can use, see Choose a disk type .
-
-
DISK_SIZE_1andDISK_SIZE_2: the size of the boot disks for the two compute nodes in GB. The value must be10or higher. -
PARTITION_NAME_1andPARTITION_NAME_2: the name of the partitions for your cluster.
To send your request, select one of the following options:
curl (Bash)
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json; charset=utf-8"
\
-d
@request-body.json
\
"https://hypercomputecluster.googleapis.com/v1/projects/ PROJECT_ID
/locations/ REGION
/clusters?clusterId= CLUSTER_NAME
"
Powershell
$cred
=
gcloud
auth
print-access-token
$headers
=
@{
"Authorization"
=
"Bearer $cred"
}
Invoke-WebRequest
`
-Method
POST
`
-Headers
$headers
`
-ContentType
:
"application/json; charset=utf-8"
`
-InFile
request-body
.
json
`
-Uri
"https://hypercomputecluster.googleapis.com/v1/projects/ PROJECT_ID
/locations/ REGION
/clusters?clusterId= CLUSTER_NAME
"
|
Select-Object
-Expand
Content
curl (cmd.exe)
curl -X POST ^
-H "Authorization: Bearer $(gcloud auth print-access-token)"
^
-H "Content-Type: application/json; charset=utf-8"
^
-d @request-body.json ^
"https://hypercomputecluster.googleapis.com/v1/projects/ PROJECT_ID
/locations/ REGION
/clusters?clusterId= CLUSTER_NAME
"
The response is similar to the following:
{
"name": "projects/example-project/locations/us-central1/operations/operation-1758842430697-63fa86a4c3030-028b6436-2fbda8e1",
"metadata": {
"@type": "type.googleapis.com/google.cloud.hypercomputecluster.v1.OperationMetadata",
"createTime": "2025-09-25T23:20:30.707315354Z",
"target": "projects/example-project/locations/us-central1/clusters/clusterp6a",
"verb": "update",
"requestedCancellation": false,
"apiVersion": "v1"
},
"done": false
}
Creating the cluster can take some time to complete. The completion time depends on the number of compute instances that you request and resource availability in the compute instances' zone. If your requested resources are unavailable, then Cluster Director maintains the creation request until resources become available. When Cluster Director creates your login node, you can connect to your cluster. However, you can run workloads only after Cluster Director creates the compute nodes in your cluster.

