Review VM and cluster configurations

This document describes the configurations in AI Hypercomputer to consider before you create virtual machine (VM) instances and clusters. Reviewing the available configurations helps ensure optimal performance for your workloads, as well as minimize downtimes and performance issues.

Configuration factors for VM and cluster creation

Before you create VMs and clusters to run your workloads, consider which configuration to use:

The provisioning model
The cluster deployment tools
If you use the reservation-bound provisioning model, then you must also consider the following factors:

Provisioning models

Based on the consumption option that you choose for creating VMs or clusters, you can use one of the following provisioning models to obtain the necessary resources for creating VMs:

Reservation-bound: you can reserve resources at a discounted price for a future date and duration. At the start of your reservation period, you can use the reserved resources to create VMs or clusters. You have exclusive access to your reserved resources for the reservation period.
Flex-start: you can request discounted resources for up to seven days. Compute Engine makes best-effort attempts to schedule the provisioning of your requested resources as soon as they're available. You have exclusive access to your obtained resources for your requested period.
Spot: based on availability, you can immediately obtain deeply discounted resources. However, Compute Engine might stop or delete VMs at any time to reclaim capacity.

Reservation-bound provisioning model

The reservation-bound provisioning model links your created VMs to the capacity that you previously reserved. When you reserve capacity, Compute Engine creates an empty reservation. Then, at the reservation start time, the following occurs:

Compute Engine adds your reserved number of VMs to the reservation. You have exclusive access to the reserved capacity until the reservation end time.
Google Cloud charges you for the reserved capacity until the end of your reservation period, whether you use the capacity or not.

You can then use the reserved resources to create VMs without additional charges. You only pay for resources that aren't included in the reservation, such as disks or IP addresses.

To specify the reservation-bound provisioning model when you create VMs or MIGs, do the following:

In the Google Cloud console, in the Provisioning modellist, select Reservation-bound.
In the Google Cloud CLI, include the --provisioning-model=RESERVATION_BOUND flag in the command.
In the Compute Engine API, include the "provisioningModel": "RESERVATION_BOUND" field in the request body.

For more information about setting these parameters when you create VMs or MIGs after you reserve capacity, see VM and cluster creation overview . If you use Cluster Toolkit to deploy your clusters, then the cluster blueprint sets the provisioning model for you.

Flex-start provisioning model

The flex-start provisioning model lets you create standalone Flex-start VMs or add Flex-start VMs to a managed instance group (MIG) when your requested capacity is available. When you add Flex-start VMs to a MIG by using resize requests, the MIG creates the VMs all at once. This approach helps you avoid unnecessary charges for partial capacity that Compute Engine might deliver while you wait for the full capacity needed to start your workload. The flex-start provisioning model provisions resources from a secure capacity pool, which helps to increase your chances of obtaining high-demand resources like GPUs.

To specify the flex-start provisioning model when creating a standalone VM or an instance template for a MIG, do the following:

In the Google Cloud console, in the Provisioning modellist, select Flex-start.
In the gcloud CLI, include the --provisioning-model=FLEX_START flag in the command.
In the Compute Engine API, include the "provisioningModel": "FLEX_START" field in the request body.

For more information about setting these parameters, create VMs or clusters by using one of the following options:

Create a standalone VM
Create MIGs with resize requests
Create Slurm clusters
Create GKE clusters:
- Create a cluster with the default configuration
- Create a custom cluster

Spot provisioning model

The spot provisioning model lets you create deeply-discounted VMs based on availability. However, Compute Engine might stop or delete the created VMs at any time to reclaim capacity. This process is called preemption .

To specify the spot provisioning model when you create VMs or MIGs, do the following:

In the Google Cloud console, in the Provisioning modellist, select Spot.
In the gcloud CLI, include the --provisioning-model=SPOT flag in the command.
In the Compute Engine API, include the "provisioningModel": "SPOT" field in the request body.

For more information about setting these parameters when you create VMs or MIGs, see VM and cluster creation overview .

Cluster deployment tools

Cluster Toolkit is an open source deployment tool that is recommended for creating GPU-accelerated clusters. Cluster Toolkit can deploy both Google Kubernetes Engine (GKE) or Slurm clusters.

Alternatively, you can choose to provision your groups of VMs by using one of the following methods, and then incorporate your own workload scheduler as needed:

Reservation block deployment types

If you use the reservation-bound provisioning model when creating A4X, A4, and A3 Ultra VMs or clusters, the machines you receive are automatically deployed within blocks of densely allocated hosts. This deployment offers the following benefits:

Non-blocking networking for consistent high-bandwidth, low-latency VM connectivity by using dynamic machine learning (ML) network fabric from Google.
Access to network topology that provides a hierarchical view of the relative proximity among VMs. This feature is useful for advanced job scheduling use cases.
Fine-grained, topology-aware placement when you use orchestrators.
Fine-grained user control over maintenance schedules to maximize job scheduling and uptime, and minimize downtimes.

Reservation operational mode

If you use the reservation-bound provisioning model , then the machine type that you reserve determines the reservation operational mode for your reserved capacity. Each mode defines how to respond to host errors or faulty host reports, as well as your level of visibility and control over the reservation's infrastructure.

Each reservation operational mode defines the following:

Who manages recovery: you or Google Cloud.
What capacity you use for recovery: only your reserved capacity, or capacity within or outside your reservations.
Your level of placement control: whether you can view and start maintenance before the planned time for specific reservation sub-blocks for fine-grained control.

When you reserve capacity to create VMs or clusters, you must choose between one of the following reservation operational modes: managed mode or all capacity mode .

Managed mode

In managed mode, Google Cloud automatically manages the maintenance and recovery process of your VMs after host errors or faulty host reports. This approach is ideal when your workload requires high stability, and you prefer an automated process to minimize downtimes.

The managed mode has the following features:

Only use reserved capacity for recovery: Compute Engine only uses your reserved capacity to restart VMs. If there's no available capacity in your reservations, then Compute Engine only restarts VMs after you obtain more capacity.
Automated VM restarts: Google Cloud handles the entire recovery process for a VM. When host maintenance is required, Compute Engine automatically migrates your VMs on other available machines within your reservation and restarts the VMs.
Block management and visibility: you can view the topology, health, and maintenance status of individual reservations and reservation blocks. You can also receive maintenance notifications, and optionally start maintenance before the scheduled maintenance time, for these resources.
Potential API rate limits: calls to the report faulty host API may be rate-limited per reservation.

All capacity mode

In all capacity mode, you are responsible for managing a VM recovery process. You must manually start maintenance after host errors or faulty host reports. Unlike the managed mode, you can also view and start maintenance for your reservation sub-blocks. These features give you full, granular control over the maintenance and recovery process for your VMs.

The all capacity mode has the following features:

Use reserved and unreserved capacity for recovery: you can use your reserved resources, as well as any resources that are available outside of your reservation, to help you migrate and restart a VM when its host fails.
Manual VM restarts: you're responsible for the recovery process of a VM. When host maintenance is required because of an host error or faulty host report, Compute Engine stops your VM. You can only restart the VM after maintenance completes.
Block and sub-block management and visibility: you can view the topology, health, and maintenance status of individual reservations, reservation blocks, and reservation sub-blocks. You can also receive maintenance notifications, and optionally start maintenance before the scheduled maintenance time, for these resources.
No API rate limits: there are no rate limits when you make calls to the report faulty host API.

Maintenance scheduling types

If you use the reservation-bound provisioning model , then Cluster Director provides options for scheduling host maintenance for the running VMs in your cluster. When you reserve capacity, you can specify whether to group VMs and have synchronized maintenance scheduling ( grouped ), or the VMs can be loosely coupled and have independent maintenance scheduling ( independent ).

Grouped maintenance scheduling

The grouped maintenance scheduling type helps ensure that, no matter when Compute Engine provisions a VM, all VMs running the same workload have the same planned maintenance frequency. This tightly-coupled maintenance lets you optimize your job's performance by giving you complete control over your used and unused capacity.

A group maintenance scheduling type is useful in the following cases:

Your environment uses a job scheduler, such as Slurm or GKE.
You want to run training or other highly parallelized-computing workloads.

Independent maintenance scheduling

This independent maintenance scheduling type gives VMs different maintenance schedules. This configuration is ideal if you want to run inference or limited-scale training where workloads run more efficiently when they have separate maintenance schedules.

What's next?

Reserve capacity