Create TPU Flex-start VMs with Compute Engine

TPU Flex-start VMs, powered by Dynamic Workload Scheduler , offer a flexible, cost-effective way to access TPU resources for AI workloads for up to 7 days without long-term reservations. When you request TPU Flex-start VMs, your request remains in a queue until capacity is available. Once provisioned, the TPU VMs run for your specified duration.

TPU Flex-start VMs are a good fit for quick experimentation, small-scale testing, dynamic provisioning of TPUs for inference workloads, model fine-tuning, and workload runs that take less than 7 days. For more information about other TPU consumption options, see Cloud TPU consumption options .

You can delete your TPU resources at any time to stop billing. For more information about TPU pricing, see Cloud TPU pricing .

Limitations

TPU Flex-start VMs have the following limitations:

  • You can request TPU Flex-start VMs for a duration of up to 7 days.
  • You can request the following Cloud TPU versions and zones:
    • TPU7x : us-central1-c
    • TPU v6e : asia-northeast1-b , us-east5-a , us-south1-ai1b
    • TPU v5p : us-east5-a

MIGs with TPUs have the following limitations:

  • Lifecycle operations: You can't stop, start, resume, or suspend TPU instances. To change configurations that require a restart or to stop incurring charges, you must delete the instances.

  • Regional MIG zone distribution: You must set the target distribution shape to ANY_SINGLE_ZONE .

  • Configuration updates in a MIG:

    • You can't update a MIG that forms a multi-host TPU slice due to the defined accelerator topology.
    • You can update a MIG that forms single-host TPU slices by using the automatic or selective methods . However, the updates for single-host TPU slice don't support the restart ( RESTART ) action. If a restart is necessary and the most disruptive action allowed is replace ( REPLACE ), then the updater will replace the instance; otherwise, the update attempt fails with an error.
  • For a MIG that forms a multi-host TPU slice, the following limitations also apply:

    • Target size policy: You must set the target size policy mode to BULK . After you set this mode, you can't change it.

    • Target size: In bulk mode, you can set the target size to either 0 or the number of instances that are required to form the accelerator topology.

    • Workload policy: You must specify a workload policy in which the accelerator topology is defined. After you set the workload policy, you can't change or remove the policy from the MIG.

  • Unsupported features: MIGs with TPUs don't support the following features:

Before you begin

Before requesting TPU Flex-start VMs, you must:

  • Install the Google Cloud CLI
  • Create a Google Cloud project
  • Enable the Compute Engine API ( compute.googleapis.com )
  • Ensure you have the required permissions:
    • roles/compute.instanceAdmin.v1
    • roles/iam.serviceAccountUser

For more information, see Set up a Google Cloud project for TPUs .

Ensure you have sufficient preemptible quota to use TPU Flex-start VMs. If your workload requires more cores than your current allocation, you can request a quota increase. For details, see Cloud TPU quotas .

Create TPU Flex-start VMs with MIGs

To use TPU Flex-start VMs, you create a managed instance group (MIG) with a specific instance template configuration.

For general instructions on creating Flex-start VMs, see Create Flex-start VMs .

Create TPU Flex-start VMs with a multi-host slice

Create an instance template

Create an instance template specifying the FLEX_START provisioning model and your chosen run duration.

 gcloud  
compute  
instance-templates  
create  
 TEMPLATE_NAME 
  
 \ 
  
--machine-type = 
 MACHINE_TYPE 
  
 \ 
  
--image-family = 
 IMAGE_FAMILY 
  
 \ 
  
--image-project = 
 IMAGE_PROJECT 
  
 \ 
  
--provisioning-model = 
FLEX_START  
 \ 
  
--instance-termination-action = 
DELETE  
 \ 
  
--max-run-duration = 
 DURATION 
  
 \ 
  
--region = 
 REGION 
  
 \ 
  
--maintenance-policy = 
TERMINATE 

Replace the following placeholders:

  • TEMPLATE_NAME : The name of your instance template.
  • MACHINE_TYPE : The machine type for the TPU VM (for example, ct6e-standard-8t ).
  • IMAGE_FAMILY : The OS image family for the TPU VM (for example, ubuntu-accelerator-2204-amd64-with-tpu-v6e )
  • IMAGE_PROJECT : The OS image project for the TPU VM (for example, ubuntu-os-accelerator-images )
  • DURATION : The maximum run duration (for example, 7d for 7 days).
  • REGION : The region in which to create the instance template.

Create a workload policy

The following command creates a workload policy. This is optional for single-host slices.

 gcloud  
compute  
resource-policies  
create  
workload  
 WORKLOAD_POLICY_NAME 
  
 \ 
  
--type = 
high-throughput  
 \ 
  
--accelerator-topology = 
 TOPOLOGY 
 

Replace the following placeholders:

  • WORKLOAD_POLICY_NAME : The name of your workload policy.
  • TOPOLOGY : The topology of the TPU VMs, for example, 4x4x8 .

Create the MIG

Create the MIG using the template.

 gcloud  
compute  
instance-groups  
managed  
create  
 MIG_NAME 
  
 \ 
  
--zone = 
 ZONE 
  
 \ 
  
--template = 
 TEMPLATE_NAME 
  
 \ 
  
--size = 
 SIZE 
  
 \ 
  
--workload-policy = 
projects/ PROJECT_ID 
/regions/ WORKLOAD_POLICY_REGION 
/resourcePolicies/ WORKLOAD_POLICY_NAME 
  
 \ 
  
--target-size-policy-mode = 
bulk 

Replace the following placeholders:

  • MIG_NAME : The name of your MIG.
  • ZONE : The zone of your MIG.
  • TEMPLATE_NAME : The name of your instance template.
  • SIZE : The number of instances to create.
  • PROJECT_ID : The ID of your Google Cloud project.
  • WORKLOAD_POLICY_REGION : The region where the workload policy is defined.
  • WORKLOAD_POLICY_NAME : The name of your workload policy.

Create TPU Flex-start VMs with single-host slices

Create an instance template

Create an instance template specifying the FLEX_START provisioning model and your chosen run duration.

 gcloud  
compute  
instance-templates  
create  
 TEMPLATE_NAME 
  
 \ 
  
--machine-type = 
 MACHINE_TYPE 
  
 \ 
  
--image-family = 
 IMAGE_FAMILY 
  
 \ 
  
--image-project = 
 IMAGE_PROJECT 
  
 \ 
  
--provisioning-model = 
FLEX_START  
 \ 
  
--instance-termination-action = 
DELETE  
 \ 
  
--max-run-duration = 
 DURATION 
  
 \ 
  
--region = 
 REGION 
  
 \ 
  
--maintenance-policy = 
TERMINATE 

Replace the following placeholders:

  • TEMPLATE_NAME : The name of your instance template.
  • MACHINE_TYPE : The machine type for the TPU VM (for example, ct6e-standard-8t ).
  • IMAGE_FAMILY : The OS image family for the TPU VM (for example, ubuntu-accelerator-2204-amd64-with-tpu-v6e )
  • IMAGE_PROJECT : The OS image project for the TPU VM (for example, ubuntu-os-accelerator-images )
  • DURATION : The maximum run duration (for example, 7d for 7 days).
  • REGION : The region in which to create the instance template.

Create a workload policy

The following command creates a workload policy. This is optional for single-host slices.

 gcloud  
compute  
resource-policies  
create  
workload  
 WORKLOAD_POLICY_NAME 
  
 \ 
  
--type = 
high-throughput 

Replace the following placeholders:

  • WORKLOAD_POLICY_NAME : A name for your workload policy.

Create the MIG

Create the MIG using the template.

 gcloud  
compute  
instance-groups  
managed  
create  
 MIG_NAME 
  
 \ 
  
--zone = 
 ZONE 
  
 \ 
  
--template = 
 TEMPLATE_NAME 
  
 \ 
  
--size = 
 SIZE 
  
 \ 
  
--workload-policy = 
projects/ PROJECT_ID 
/regions/ WORKLOAD_POLICY_REGION 
/resourcePolicies/ WORKLOAD_POLICY_NAME 
 

Replace the following placeholders:

  • MIG_NAME : The name of your MIG.
  • ZONE : The zone of your MIG.
  • TEMPLATE_NAME : The name of your instance template.
  • SIZE : The number of instances to create.
  • PROJECT_ID : The ID of your Google Cloud project.
  • WORKLOAD_POLICY_REGION : The region where the workload policy is defined.
  • WORKLOAD_POLICY_NAME : The name of your workload policy.
Create a Mobile Website
View Site in Mobile | Classic
Share by: