Run a small batch workload with TPUs and Flex-start VMs

Autopilot Standard

This guide shows you how to optimize TPU provisioning for medium- and small-scale training workloads by using Flex-start VMs. Flex-start VMs are created by using the flex-start consumption option. In this guide, you use Flex-start VMs to deploy a workload that consists of a TPU slice node pool.

This guide is intended for Machine learning (ML) engineers, Platform admins and operators, and for Data and AI specialists who are interested in using Kubernetes container orchestration capabilities for running batch workloads. For more information about common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks .

Flex-start pricing

Flex-start is recommended if your workload requires dynamically provisioned resources as needed, for up to seven days with short-term reservations, no complex quota management, and cost-effective access. Flex-start is powered by Dynamic Workload Scheduler and is billed using Dynamic Workload Scheduler pricing :

Discounted (up to 53%) for vCPUs, GPUs, and TPUs.
You pay as you go.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.
Note: For existing gcloud CLI installations, make sure to set the compute/region property . If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location . You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Verify that you have an Autopilot cluster or a Standard cluster that's running version 1.33.0-gke.1712000 or later.
Verify that you're familiar with limitations of flex-start .
When using a Standard cluster, verify that you maintain at least one node pool without flex-start enabled for the cluster to function correctly.
Verify that you have quota for preemptible TPUs in your node locations.

Create a node pool with flex-start

If you use a cluster in Autopilot mode, skip this section and go to the Run a batch workload section.

To create a node pool with flex-start enabled on an existing Standard cluster, use the gcloud CLI.

Create a single-host TPU slice node pool

You can create a single-host TPU slice node pool with flex-start:

Create a node pool with flex-start:
```
 gcloud  
container  
node-pools  
create  
 NODE_POOL_NAME 
  
 \ 
  
--cluster = 
 CLUSTER_NAME 
  
 \ 
  
--location = 
 CONTROL_PLANE_LOCATION 
  
 \ 
  
--node-locations = 
 NODE_ZONES 
  
 \ 
  
--machine-type = 
 MACHINE_TYPE 
  
 \ 
  
--reservation-affinity = 
none  
 \ 
  
--enable-autoscaling  
 \ 
  
 --flex-start  
 \ 
  
--num-nodes  
 0 
  
 \ 
  
--min-nodes = 
 0 
  
 \ 
  
--max-nodes = 
 1 
 
```
Replace the following:
- NODE_POOL_NAME : the name you choose for your node pool.
- CLUSTER_NAME : the name of the cluster.
- CONTROL_PLANE_LOCATION : the compute region for the cluster control plane.
- NODE_ZONES : the comma-separated list of one or more zones where GKE creates the node pool.
- MACHINE_TYPE : the type of machine to use for nodes. For more information about TPU compatible machine types, use the table in Choose the TPU version .
  
  For example, your node pool creation command can include the following parameters:
```
 ...
--machine-type = 
ct6e-standard-4t  
 \ 
--tpu-topology = 
4x4  
 \ 
--enable-autoscaling  
 \ 
--num-nodes = 
 0 
  
 \ 
--max-nodes = 
 4 
  
 \ 
 
```
  This command sets the --max-nodes field to 4 because a 4x4 topology consists of 16 chips and each ct6e-standard-4t VM has 4 chips.
  
  Cluster autoscaler scales up to the number of nodes that your workload requires. After your workload completes, cluster autoscaler scales down to zero nodes.
- --reservation-affinity=none :flex-start doesn't use or require reservations.

Create a multi-host TPU slice node pool

The steps to create a multi-host TPU slice node pool differ depending on whether you use Ironwood (TPU7x) or an earlier TPU version.

Ironwood (TPU7x)

Preview — Ironwood (TPU7x)

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms . Pre-GA features are available "as is" and might have limited support. For more information, see the launch stage descriptions .

You can create a multi-host TPU slice node pool in version Ironwood (TPU7x) by using the Google Cloud CLI or Terraform:

gcloud

To create a multi-host TPU slice node pool with Ironwood (TPU7x), you must first create a workload policy.

Create a workload policy:
```
 gcloud  
compute  
resource-policies  
create  
workload-policy  
 WORKLOAD_POLICY_NAME 
  
 \ 
  
--type = 
HIGH_THROUGHPUT  
 \ 
  
--accelerator-topology = 
 TPU_TOPOLOGY 
  
 \ 
  
--project = 
 PROJECT_ID 
  
 \ 
  
--region = 
 REGION 
 
```
Replace the following:
- WORKLOAD_POLICY_NAME : a name for your workload policy.
- TPU_TOPOLOGY : the TPU Ironwood (TPU7x) topology. For example, 2x2x2 . To see all supported Ironwood (TPU7x) topologies, see the topology section .
- PROJECT_ID : your Google Cloud project ID.
- REGION : the region for the workload policy. A workload policy is a regional resource and can be re-used across node pools that share the same topology.
Create the node pool with the workload policy:
```
 gcloud  
container  
node-pools  
create  
 NODE_POOL_NAME 
  
 \ 
  
--cluster = 
 CLUSTER_NAME 
  
 \ 
  
--location = 
us-central1  
 \ 
  
--node-locations = 
us-central1-c  
 \ 
  
--machine-type = 
tpu7x-standard-4t  
 \ 
  
--reservation-affinity = 
none  
 \ 
  
--enable-autoscaling  
 \ 
  
--num-nodes = 
 0 
  
--min-nodes = 
 0 
  
--max-nodes = 
 MAX_NODES 
  
 \ 
  
--flex-start  
 \ 
  
--placement-policy = 
 WORKLOAD_POLICY 
 
```
Replace the following:
- NODE_POOL_NAME : the name of the new node pool.
- WORKLOAD_POLICY : the name of the workload policy that you created.
- MAX_NODES : The maximum size of the node pool. The --max-nodes flag is required if --enable-autoscaling is supplied and must be equal to the product of the values defined in TPU_TOPOLOGY ( {A}x{B}x{C} ) divided by the number of chips in each VM. For example, if TPU_TOPOLOGY is 2x2x2 , the product is 8. Since each VM in tpu7x-standard-4t has 4 chips, the number of nodes is 2.
This command creates a node pool named NODE_POOL_NAME with the following characteristics:
- --machine-type=tpu7x-standard-4t specifies the Ironwood (TPU7x) machine type.
- --flex-start enables flex-start.

Terraform

Ensure that you use the version 4.84.0 or later of the google provider.
Create a workload policy:
```
  resource 
  
 "google_compute_resource_policy" 
  
 { 
  
 name 
  
 = 
  
 " WORKLOAD_POLICY_NAME 
" 
  
 region 
  
 = 
  
  CLUSTER_LOCATION 
 
  
 workload_policy 
  
 { 
  
 type 
  
 = 
  
 "HIGH_THROUGHPUT" 
  
 accelerator_topology 
  
 = 
  
 " TPU_TOPOLOGY 
" 
  
 } 
 } 
 
```
Replace the following:
- WORKLOAD_POLICY_NAME : a name for your workload policy.
- CLUSTER_LOCATION : Compute location for the cluster. We recommend having a regional cluster for higher reliability of the Kubernetes control plane. You can also use a zonal cluster. For more information, see Select a TPU version and topology .
- TPU_TOPOLOGY : the TPU Ironwood (TPU7x) topology. For example, 2x2x2 . To see all supported Ironwood (TPU7x) topologies, see Plan TPUs .
For more information about the google_compute_resource_policy reference, see Terraform Provider .
In your Terraform configuration, add the following block:
```
  resource 
  
 "google_container_node_pool" 
  
 " NODE_POOL_RESOURCE_NAME 
" 
  
 { 
  
 provider 
  
 = 
  
 google 
  
 project 
  
 = 
  
  PROJECT_ID 
 
  
 cluster 
  
 = 
  
  CLUSTER_NAME 
 
  
 name 
  
 = 
  
  POOL_NAME 
 
  
 location 
  
 = 
  
  CLUSTER_LOCATION 
 
  
 node_locations 
  
 = 
  
 [ 
  NODE_ZONES 
 
 ] 
  
 initial_node_count 
  
 = 
  
  NUM_NODES 
 
  
 autoscaling 
  
 { 
  
 max_node_count 
  
 = 
  
  MAX_NODES 
 
  
 location_policy 
  
 = 
  
 "ANY" 
  
 } 
  
 node_config 
  
 { 
  
 machine_type 
  
 = 
  
  MACHINE_TYPE 
 
  
 reservation_affinity 
  
 { 
  
 consume_reservation_type 
  
 = 
  
 "SPECIFIC_RESERVATION" 
  
 key 
  
 = 
  
 "compute.googleapis.com/reservation-name" 
  
 values 
  
 = 
  
 [ 
  RESERVATION_LABEL_VALUES 
 
 ] 
  
 } 
  
 flex_start 
  
 = 
  
 true 
  
 } 
  
 placement_policy 
  
 { 
  
 policy_name 
  
 = 
  
  WORKLOAD_POLICY_NAME 
 
  
 } 
 } 
 
```
Replace the following:
- NODE_POOL_RESOURCE_NAME : the name of the node pool resource in the Terraform template.
- PROJECT_ID : your project ID.
- CLUSTER_NAME : the name of the existing cluster to add the node pool to.
- POOL_NAME : the name of the node pool to create.
- NODE_ZONES : the comma-separated list of one or more zones where GKE creates the node pool.
  Optional: use an AI zone , like us-central1-ai1a . AI zones are specialized locations that are optimized for AI/ML workloads within Google Cloud regions.
- NUM_NODES : the number of nodes in the node pool. It must be zero or the product of the number of the TPU chips divided by four, because in multi-host TPU slices each TPU slice node has four chips. For example, if TPU_TOPOLOGY is 4x8 , then there are 32 chips, which means NUM_NODES must be 8. To learn more about TPU topologies, use the table in Choose the TPU version .
- TPU_TOPOLOGY : this indicates the selected physical topology for the TPU slice. The format of the topology depends on the TPU version you are using. To learn more about TPU topologies, use the table in Choose a topology .
Optionally, you can also use the following variables:
- RESERVATION_NAME : if you use a TPU reservation , provide a list of reservation-resource labels to use when creating the node pool. To learn more about how to populate the RESERVATION_LABEL_VALUES in the reservation_affinity field, see Terraform Provider .
- autoscaling : create a node pool with autoscaling enabled. When GKE scales a multi-host TPU slice node pool, it atomically scales up the node pool from zero to the maximum size.
  - MAX_NODES : the maximum size of the node pool. The value must be equal to the product of the values defined in TPU_TOPOLOGY ( {A}x{B}x{C} ) divided by the number of chips in each VM. For example, if TPU_TOPOLOGY is 2x2x2 , the product is 8. Since each VM in tpu7x-standard-4t has 4 chips, the number of nodes is 2.
- spot : the node pool that will use Spot VMs for the TPU slice nodes. This setting cannot be changed after the node pool is created. For more information, see Spot VMs .
- flex_start : the node pool that will use flex-start consumption option. This setting can't be set to true if spot is enabled.

Other TPU versions

You can create a multi-host TPU slice node pool in version v3, v4, v5p, v5e, and Trillium (v6e) by using the Google Cloud CLI, Terraform, or the Google Cloud console.

gcloud

   
gcloud  
container  
node-pools  
create  
 NODE_POOL_NAME 
  
 \ 
  
--cluster = 
 CLUSTER_NAME 
  
 \ 
  
--location = 
 CONTROL_PLANE_LOCATION 
  
 \ 
  
--node-locations = 
 NODE_ZONES 
  
 \ 
  
--machine-type = 
 MACHINE_TYPE 
  
 \ 
  
--tpu-topology = 
 TPU_TOPOLOGY 
  
 \ 
  
--reservation-affinity = 
none  
 \ 
  
--enable-autoscaling  
 \ 
  
--num-nodes = 
 0 
  
--min-nodes = 
 0 
  
--max-nodes = 
 MAX_NODES 
  
 \ 
  
--flex-start

Replace the following:

NODE_POOL_NAME : the name of the new node pool.
CLUSTER_NAME : the name of the cluster.
CONTROL_PLANE_LOCATION : the name of the zone based on the TPU version you want to use. To identify an available location, see TPU availability in GKE .
NODE_ZONES : the comma-separated list of one or more zones where GKE creates the node pool.
Optional: use an AI zone , like us-central1-ai1a . AI zones are specialized locations that are optimized for AI/ML workloads within Google Cloud regions.
MACHINE_TYPE : the type of machine to use for nodes. For more information about TPU compatible machine types, use the table in Choose the TPU version .
TPU_TOPOLOGY : the TPU topology. For example, 2x2x2 . To see all supported TPU topologies, see the topology section .
MAX_NODES : the maximum size of the node pool. The --max-nodes flag is required if --enable-autoscaling is supplied and must be equal to the product of the values defined in TPU_TOPOLOGY ( {A}x{B}x{C} ) divided by the number of chips in each VM.
This command creates a node pool named NODE_POOL_NAME with flex-start enabled.

Terraform

Ensure that you use the version 4.84.0 or later of the google provider.
Add the following block to your Terraform configuration:
```
  resource 
  
 "google_container_node_pool" 
  
 " NODE_POOL_RESOURCE_NAME 
" 
  
 { 
  
 provider 
  
 = 
  
 google 
  
 project 
  
 = 
  
  PROJECT_ID 
 
  
 cluster 
  
 = 
  
  CLUSTER_NAME 
 
  
 name 
  
 = 
  
  POOL_NAME 
 
  
 location 
  
 = 
  
  CLUSTER_LOCATION 
 
  
 node_locations 
  
 = 
  
 [ 
  NODE_ZONES 
 
 ] 
  
 initial_node_count 
  
 = 
  
  NUM_NODES 
 
  
 autoscaling 
  
 { 
  
 max_node_count 
  
 = 
  
  MAX_NODES 
 
  
 location_policy 
  
 = 
  
 "ANY" 
  
 } 
  
 node_config 
  
 { 
  
 machine_type 
  
 = 
  
  MACHINE_TYPE 
 
  
 reservation_affinity 
  
 { 
  
 consume_reservation_type 
  
 = 
  
 "SPECIFIC_RESERVATION" 
  
 key 
  
 = 
  
 "compute.googleapis.com/reservation-name" 
  
 values 
  
 = 
  
 [ 
  RESERVATION_LABEL_VALUES 
 
 ] 
  
 } 
  
 flex_start 
  
 = 
  
 true 
  
 } 
  
 placement_policy 
  
 { 
  
 type 
  
 = 
  
 "COMPACT" 
  
 tpu_topology 
  
 = 
  
  TPU_TOPOLOGY 
 
  
 } 
 } 
 
```
Replace the following:
- NODE_POOL_RESOURCE_NAME : the name of the node pool resource in the Terraform template.
- PROJECT_ID : your project ID.
- CLUSTER_NAME : the name of the existing cluster to add the node pool to.
- POOL_NAME : the name of the node pool to create.
- CLUSTER_LOCATION : compute location for the cluster. We recommend having a regional cluster for higher reliability of the Kubernetes control plane. You can also use a zonal cluster. To learn more, see Select a TPU version and topology .
- NODE_ZONES : the comma-separated list of one or more zones where GKE creates the node pool.
  Optional: use an AI zone , like us-central1-ai1a . AI zones are specialized locations that are optimized for AI/ML workloads within Google Cloud regions.
- NUM_NODES : the number of nodes in the node pool. It must be zero or the product of the number of the TPU chips divided by four, because in multi-host TPU slices each TPU slice node has 4 chips. For example, if TPU_TOPOLOGY is 4x8 , then there are 32 chips which means NUM_NODES must be 8. To learn more about TPU topologies, use the table in Choose the TPU version .
- TPU_TOPOLOGY : this indicates the physical topology for the TPU slice. The format of the topology depends on the TPU version you are using. To learn more about TPU topologies, use the table in Choose a topology .
Optionally, you can also use the following variables:
- RESERVATION_NAME : if you use TPU reservation , this is the list of labels of the reservation resources to use when creating the node pool. To learn more about how to populate the RESERVATION_LABEL_VALUES in the reservation_affinity field, see Terraform Provider .
- autoscaling : Create a node pool with autoscaling enabled. When GKE scales a multi-host TPU slice node pool, it atomically scales up the node pool from zero to the maximum size.
  - MAX_NODES : it is the maximum size of the node pool. It must be equal to the product of the values defined in TPU_TOPOLOGY ( {A}x{B}x{C} ) divided by the number of chips in each VM).
- spot : lets the node pool to use Spot VMs for the TPU slice nodes. This cannot be changed after node pool creation. For more information, see Spot VMs .
- flex_start : Sets the node pool to use flex-start consumption option. Can't be set to true if spot is enabled.

Console

To create a node pool with TPUs:

Go to the Google Kubernetes Enginepage in the Google Cloud console.

Go to Google Kubernetes Engine
In the cluster list, click the name of the cluster you want to modify.
Click Add node pool.
In the Node pool detailssection, check the Specify node locationsbox.
Select the name of the zone based on the TPU version you want to use. To identify an available location, see TPU availability in GKE .
From the navigation pane, click Nodes.
In the Machine Configurationsection, select TPUs.
In the Seriesdrop-down menu, select one of the following:
- CT3: TPU v3, single-host device
- CT3P: TPU v3, multi-host pod slice
- CT4P: TPU v4
- CT5LP: TPU v5e
- CT5P: TPU v5p
- CT6E: TPU Trillium (v6e)
In the Machine typedrop-down menu, select the name of the machine to use for nodes. Use the Choose the TPU version table to learn how to define the machine type and TPU topology that create a multi-host TPU slice node pool.
In the TPU Topologydrop-down menu, select the physical topology for the TPU slice.
In the Changes neededdialog, click Make changes.
Ensure that Boot disk typeis either Standard persistent diskor SSD persistent disk.
Optionally, select the Enable nodes on spot VMscheckbox to use Spot VMs for the nodes in the node pool.
Click Create.

Verify the status of flex-start in the node pool

Run the following command:

 gcloud  
container  
node-pools  
describe  
 NODE_POOL_NAME 
  
 \ 
  
--cluster  
 CLUSTER_NAME 
  
 \ 
  
--location  
 CONTROL_PLANE_LOCATION 
  
 \ 
  
--format = 
 "get(config.flexStart)"

If flex-start is enabled in the node pool, the field flexStart is set to True .

Run a batch workload

In this section, you create a Job that schedules a TPU node with Flex-start VMs. A Job controller in Kubernetes creates one or more Pods and ensures that they successfully execute a specific task.

In the Google Cloud console , launch a Cloud Shell session by clicking Activate Cloud Shell. A session opens in the bottom pane of the Google Cloud console.

Create a file named dws-flex-start.yaml :

Single-host

Use the following manifest for the dws-flex-start.yaml file:

  apiVersion 
 : 
  
 batch/v1 
 kind 
 : 
  
 Job 
 metadata 
 : 
  
 name 
 : 
  
 job-1 
 spec 
 : 
  
 template 
 : 
  
 spec 
 : 
  
 nodeSelector 
 : 
  
 cloud.google.com/gke-flex-start 
 : 
  
 "true" 
  
 cloud.google.com/gke-tpu-accelerator 
 : 
  
  ACCELERATOR_TYPE 
 
  
 cloud.google.com/gke-tpu-topology 
 : 
  
  TPU_TOPOLOGY 
 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 container-1 
  
 image 
 : 
  
 gcr.io/k8s-staging-perf-tests/sleep:latest 
  
 args 
 : 
  
 [ 
 "3600s" 
 ] 
  
 # Sleep for 1 hour 
  
 resources 
 : 
  
 requests 
 : 
  
 google.com/tpu 
 : 
  
  NUM_CHIPS 
 
  
 limits 
 : 
  
 google.com/tpu 
 : 
  
  NUM_CHIPS 
 
  
 restartPolicy 
 : 
  
 OnFailure

Multi-host

Use the following manifest for the dws-flex-start.yaml file:

  apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Service 
 metadata 
 : 
  
 name 
 : 
  
 headless-svc 
 spec 
 : 
  
 clusterIP 
 : 
  
 None 
  
 selector 
 : 
  
 job-name 
 : 
  
 job-1 
 --- 
 apiVersion 
 : 
  
 batch/v1 
 kind 
 : 
  
 Job 
 metadata 
 : 
  
 name 
 : 
  
 job-1 
 spec 
 : 
  
 backoffLimit 
 : 
  
 0 
  
 completions 
 : 
  
 2 
  
 parallelism 
 : 
  
 2 
  
 completionMode 
 : 
  
 Indexed 
  
 template 
 : 
  
 spec 
 : 
  
 subdomain 
 : 
  
 headless-svc 
  
 restartPolicy 
 : 
  
 Never 
  
 nodeSelector 
 : 
  
 cloud.google.com/gke-flex-start 
 : 
  
 "true" 
  
 cloud.google.com/gke-tpu-accelerator 
 : 
  
  ACCELERATOR_TYPE 
 
  
 cloud.google.com/gke-tpu-topology 
 : 
  
  TPU_TOPOLOGY 
 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 tpu-job 
  
 image 
 : 
  
 us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latest 
  
 ports 
 : 
  
 - 
  
 containerPort 
 : 
  
 8471 
  
 # Default port using which TPU VMs communicate 
  
 - 
  
 containerPort 
 : 
  
 8431 
  
 # Port to export TPU runtime metrics, if supported. 
  
 securityContext 
 : 
  
 privileged 
 : 
  
 true 
  
 command 
 : 
  
 - 
  
 bash 
  
 - 
  
 -c 
  
 - 
  
 | 
  
 python -c 'import jax; print("TPU cores:", jax.device_count())' 
  
 resources 
 : 
  
 requests 
 : 
  
 google.com/tpu 
 : 
  
  NUM_CHIPS 
 
  
 limits 
 : 
  
 google.com/tpu 
 : 
  
  NUM_CHIPS

Replace the following:

ACCELERATOR_TYPE : the type of TPU accelerator you used when you created the node pools. For example, tpu-v4-podslice or tpu-v5-lite-podslice .
TPU_TOPOLOGY : the physical topology for the TPU slice. For example, the value might be 4x4x4 or 2x2 , depending on the TPU version.
NUM_CHIPS : the number of TPU chips in each VM is one, four, or eight. To learn more, see TPU versions .

Apply the dws-flex-start.yaml manifest:

 kubectl  
apply  
-f  
dws-flex-start.yaml

Verify that the Jobs are running on the same node:

 kubectl  
get  
pods

The output is similar to the following:

 NAME    READY   STATUS      RESTARTS   AGE   IP       NODE               NOMINATED NODE   READINESS GATES
job-1   0/1     Completed   0          19m   10.(...) gke-flex-zonal-a2<none>           <none>

Clean up

To avoid incurring charges to your Google Cloud account for the resources that you used on this page, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete .
In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the individual resource

Delete the Jobs:

 kubectl  
delete  
job  
-l  
 "job-name in (job-1,job-2)"

Delete the node pool:

 gcloud  
container  
node-pools  
delete  
 NODE_POOL_NAME 
  
 \ 
  
--location  
 CONTROL_PLANE_LOCATION

Delete the cluster:

 gcloud  
container  
clusters  
delete  
 CLUSTER_NAME

What's next

Learn more about TPUs in GKE .
Learn more about node auto-provisioning .
Learn how to provision Managed Lustre on GKE using XPK .
Learn more about best practices for running batch workloads on GKE .

Run a small batch workload with TPUs and Flex-start VMs Stay organized with collections Save and categorize content based on your preferences.

Flex-start pricing

Before you begin

Create a node pool with flex-start

Create a single-host TPU slice node pool

Create a multi-host TPU slice node pool

Ironwood (TPU7x)

gcloud

Terraform

Other TPU versions

gcloud

Terraform

Console

Verify the status of flex-start in the node pool

Run a batch workload

Single-host

Multi-host

Clean up

Delete the project

Delete the individual resource

What's next

Run a small batch workload with TPUs and Flex-start VMs