Deploy a GKE TPU 7x cluster

This document shows you how to use a Cluster Toolkit blueprint to automate the deployment of a Google Kubernetes Engine (GKE) cluster that has a dedicated Cloud TPU 7x node pool. You can choose between the standard blueprint and the advanced blueprint. These blueprints let you rapidly provision repeatable, scalable infrastructure optimized to train and serve large-scale AI models.

Both blueprints provision the same underlying GKE cluster infrastructure. You can deploy the standard blueprint immediately by using default settings, or you can configure and deploy the advanced blueprint, which adds automatic bucket creation, performance-tuned storage mounts, and support for optional high-performance storage systems. For a detailed description of these differences, see Comparison of blueprint options .

For more information about the architecture of this TPU, see the TPU 7x .

For more information about TPUs in GKE, see How TPUs in GKE work .

Before you begin

Before you begin, verify that you have completed the following tasks:

Verify that you have a TPU-compatible version of GKE. For more information, see Validate TPU availability in GKE .
Enable the Kubernetes Engine API .
Install and initialize the Google Cloud CLI .
Set up Cluster Toolkit .
Verify that you have sufficient quota for Cloud TPU 7x in your target region.

Required roles

To get the permissions that you need to deploy the GKE Cloud TPU 7x cluster, ask your administrator to grant you the following IAM roles on your project:

Editor ( roles/editor )
Kubernetes Engine Cluster Admin ( roles/container.clusterAdmin )
Service Account Admin ( roles/iam.serviceAccountAdmin )
For Google Cloud Managed Lustre configurations, you need the following roles:
- Lustre Admin ( roles/lustre.admin )
- Compute Network Admin ( roles/compute.networkAdmin )
For Hyperdisk Balanced configurations, you need the following roles: Compute Admin ( roles/compute.admin )
For Filestore configurations, you need the following roles: Cloud Filestore Editor ( roles/file.editor )

For more information about granting roles, see Manage access to projects, folders, and organizations .

You might also be able to get the required permissions through custom roles or other predefined roles .

Set up the cluster infrastructure

To set up the cluster infrastructure that's required for both blueprint deployments, do the following:

Create a Cloud Storage bucket to store the state of the Terraform deployment:

 gcloud  
storage  
buckets  
create  
gs:// BUCKET_NAME 
  
 \ 
  
--default-storage-class = 
STANDARD  
 \ 
  
--location = 
 COMPUTE_REGION 
  
 \ 
  
--uniform-bucket-level-access

Replace the following:

BUCKET_NAME : the name of the new Cloud Storage bucket.
COMPUTE_REGION : the compute region where you want to store the Terraform state.

Enable versioning on the bucket:

 gcloud  
storage  
buckets  
update  
gs:// BUCKET_NAME 
  
--versioning

Open the examples/gke-tpu-7x/gke-tpu-7x-deployment.yaml file.
In the terraform_backend_defaults and vars sections, replace the placeholders to match your deployment:
```
  terraform_backend_defaults 
 : 
  
 type 
 : 
  
 gcs 
  
 configuration 
 : 
  
 bucket 
 : 
  
  BUCKET_NAME 
 
 vars 
 : 
  
 project_id 
 : 
  
  PROJECT_ID 
 
  
 deployment_name 
 : 
  
  DEPLOYMENT_NAME 
 
  
 region 
 : 
  
  REGION 
 
  
 zone 
 : 
  
  ZONE 
 
  
 num_slices 
 : 
  
  NUM_SLICES 
 
  
 machine_type 
 : 
  
  MACHINE_TYPE 
 
  
 tpu_topology 
 : 
  
  TPU_TOPOLOGY 
 
  
 authorized_cidr 
 : 
  
  AUTHORIZED_CIDR 
 
  
 reservation 
 : 
  
  RESERVATION_NAME 
 
 
```
Replace the following:
- BUCKET_NAME : the Cloud Storage bucket used for storing Terraform state.
- PROJECT_ID : your Google Cloud project ID.
- REGION : the Google Cloud region used for this deployment—for example, us-east5 .
- ZONE : the Google Cloud zone used for this deployment—for example, us-east5-c .
- DEPLOYMENT_NAME : the name of your deployment.
- NUM_SLICES : the number of independent Cloud TPU slices to create.
- MACHINE_TYPE : the machine type for your Cloud TPU nodes.
- TPU_TOPOLOGY : the physical arrangement of the Cloud TPU chips in a slice.
- AUTHORIZED_CIDR : the CIDR block containing the IP address of the machine calling Terraform. To allow all IP addresses, use 0.0.0.0/0 . Identity and Access Management (IAM) restrictions are still enforced. To allow only your IP address, use your IP address followed by /32 .
- RESERVATION_NAME : the name of the Compute Engine reservation of Cloud TPU 7x nodes.
Cluster Toolkit automatically calculates the exact number of nodes required based on your selected tpu_topology and machine_type .
Generate Application Default Credentials (ADC) to provide access to Terraform:
```
 gcloud  
auth  
application-default  
login 
```

Deploy the standard blueprint

If you want to use the more advanced examples/gke-tpu-7x/gke-tpu-7x-advanced.yaml blueprint, skip this section and go to Deploy the advanced blueprint .

After you have set up the cluster infrastructure , deploy the standard blueprint to provision the GKE infrastructure:

  cd 
  
~/cluster-toolkit
./gcluster  
deploy  
-d  
 \ 
  
examples/gke-tpu-7x/gke-tpu-7x-deployment.yaml  
 \ 
  
examples/gke-tpu-7x/gke-tpu-7x.yaml

Deploy the advanced blueprint

The following GitHub repository also includes an advanced blueprint which is optimized for production workloads: examples/gke-tpu-7x/gke-tpu-7x-advanced.yaml . To get a better understanding of the advanced blueprint functionality, see Advanced blueprints .

To deploy the advanced blueprint, do the following:

Set up the cluster infrastructure .
Configure advanced scheduling with Kueue .
Optional: Configure Managed Lustre .
Optional: Configure Hyperdisk Balanced .
Optional: Configure Filestore .
Deploy the advanced GKE Cloud TPU 7x cluster .

Configure advanced scheduling with Kueue

The advanced blueprint supports Kueue , a Kubernetes-native system for managing quotas and Job queuing. The advanced blueprint enables Kueue by default.

Submit a Job to the queue by adding the kueue.x-k8s.io/queue-name: user-queue label to your Job or JobSet manifest.

Create the resources by using the provided sample Job file:

 kubectl  
create  
-f  
~/cluster-toolkit/examples/gke-tpu-7x/kueue-job-sample.yaml

Check the status of your workload:
```
 kubectl  
get  
workloads 
```

Configure Managed Lustre

Managed Lustre provides a fully managed parallel file system optimized for AI and HPC applications. To configure Managed Lustre for your GKE Cloud TPU 7x deployment, do the following:

Open the examples/gke-tpu-7x/gke-tpu-7x-advanced.yaml file.
In the vars section, uncomment the Managed Lustre variables.
Find the section commented # --- MANAGED LUSTRE ADDITIONS --- and uncomment the private_service_access , lustre_firewall_rule , managed-lustre , and lustre-pv modules.
Deploy the cluster by using the standard gcluster deploy command.

Configure Hyperdisk Balanced

Hyperdisk Balanced provides highly available and consistent performance across GKE nodes. To configure Hyperdisk Balanced for your GKE Cloud TPU 7x deployment, do the following:

Open the examples/gke-tpu-7x/gke-tpu-7x-advanced.yaml file.
In the gke-tpu-7x-cluster module, verify that the value enable_persistent_disk_csi: true is set.
Find the section commented # --- HYPERDISK BALANCED ADDITIONS --- and uncomment the hyperdisk-balanced-setup and fio-bench-job-hyperdisk modules.
Deploy the cluster by using the standard gcluster deploy command.

Configure Filestore

Filestore provides managed NFS capabilities that let multiple Cloud TPU hosts share logs, code, or datasets. To configure Filestore for your GKE Cloud TPU 7x deployment, do the following:

Open the examples/gke-tpu-7x/gke-tpu-7x-advanced.yaml file.
In the gke-tpu-7x-cluster module, ensure that enable_filestore_csi: true is set.
Find the section commented # --- FILESTORE ADDITIONS --- and uncomment the filestore , shared-filestore-pv , and shared-fs-job modules.
Deploy the cluster by using the standard gcluster deploy command.

Deploy the advanced GKE Cloud TPU 7x cluster

After you have set up the cluster infrastructure and configured your chosen storage option, deploy the blueprint to provision the GKE infrastructure:

  cd 
  
~/cluster-toolkit
./gcluster  
deploy  
-d  
 \ 
  
examples/gke-tpu-7x/gke-tpu-7x-deployment.yaml  
 \ 
  
examples/gke-tpu-7x/gke-tpu-7x-advanced.yaml

After deployment, the blueprint prints instructions for running a FIO benchmark Job. This Job acts as a validation test to verify that your Cloud Storage FUSE mounts are working correctly for both reading and writing. Follow the printed instructions in the terminal to run the validation test.

Run the sample Job

The examples/gke-tpu-7x/gke-tpu-7x-job.yaml file creates a Service and a Job resource in Kubernetes. The workload returns the number of Cloud TPU chips across all of the nodes in a multi-host Cloud TPU slice.

Connect to your cluster:
```
 gcloud  
container  
clusters  
get-credentials  
 DEPLOYMENT_NAME 
  
 \ 
  
--region = 
 REGION 
  
 \ 
  
--project = 
 PROJECT_ID 
 
```
Replace the following:
- DEPLOYMENT_NAME : the name of your deployment.
- REGION : the Google Cloud region used for this deployment.
- PROJECT_ID : your Google Cloud project ID.
Open the examples/gke-tpu-7x/gke-tpu-7x-job.yaml file and update the nodeSelector values under the template specification to match the accelerator and topology that you used in your blueprint.

For example, the nodeSelector section might look like the following example:
```
 nodeSelector:
    cloud.google.com/gke-tpu-accelerator: tpu7x
    cloud.google.com/gke-tpu-topology: 2x2x1 
```
In the resources section of the container specification, update the values for the google.com/tpu field in both the requests and limits sections. Supply values that match the number of chips per node for your selected machine type:
```
 resources:
  requests:
    google.com/tpu: CHIPS_PER_NODE 
limits:
    google.com/tpu: CHIPS_PER_NODE 
 
```
Replace CHIPS_PER_NODE with the number of Cloud TPU chips per node in your machine type, such as 4 .

Create the resources:

 kubectl  
create  
-f  
~/cluster-toolkit/examples/gke-tpu-7x/gke-tpu-7x-job.yaml

Get a list of Pods, and identify two Pods with the prefix multislice-job-slice :
```
 kubectl  
get  
pods 
```
Get the logs of either of the Pods:
```
 kubectl  
logs  
 POD_NAME 
 
```
Replace POD_NAME with the name of one of the Pods that you identified in the previous step.

The logs display Global device count: 32 at the end, which is the number of Cloud TPU chips across all of the nodes in a multi-host Cloud TPU slice.

Verify storage integrations

If you configured any of the optional storage systems, verify that your storage integrations work correctly. To do so, perform the following steps:

Connect to your cluster:
```
 gcloud  
container  
clusters  
get-credentials  
 DEPLOYMENT_NAME 
  
 \ 
  
--region = 
 REGION 
  
 \ 
  
--project = 
 PROJECT_ID 
 
```
Replace the following:
- DEPLOYMENT_NAME : the name of your deployment.
- REGION : the Google Cloud region used for this deployment.
- PROJECT_ID : your Google Cloud project ID.
Follow the relevant section for your storage option:

Test the Managed Lustre mount

To test the Managed Lustre mount, do the following:

Create a file named lustre-claim-pod.yaml with the following settings:

The storageClassName field must be empty to bind to the manually created PersistentVolumeClaim resource.
The storage field size must match the lustre_size_gib value from your blueprint.

  apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 PersistentVolumeClaim 
 metadata 
 : 
  
 name 
 : 
  
 my-lustre-claim 
 spec 
 : 
  
 accessModes 
 : 
  
 - 
  
 ReadWriteMany 
  
 storageClassName 
 : 
  
 "" 
  
 resources 
 : 
  
 requests 
 : 
  
 storage 
 : 
  
 36000Gi 
 --- 
 apiVersion 
 : 
  
 v1 
 kind 
 : 
  
 Pod 
 metadata 
 : 
  
 name 
 : 
  
 lustre-test-pod 
 spec 
 : 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 test-container 
  
 image 
 : 
  
 ubuntu:22.04 
  
 command 
 : 
  
 [ 
 "/bin/sleep" 
 , 
  
 "infinity" 
 ] 
  
 volumeMounts 
 : 
  
 - 
  
 name 
 : 
  
 lustre-storage 
  
 mountPath 
 : 
  
 /mnt/lustre 
  
 volumes 
 : 
  
 - 
  
 name 
 : 
  
 lustre-storage 
  
 persistentVolumeClaim 
 : 
  
 claimName 
 : 
  
 my-lustre-claim

Apply the manifest to your cluster:
```
 kubectl  
apply  
-f  
lustre-claim-pod.yaml 
```
The Pod starts, and the Managed Lustre file system is available inside the container at /mnt/lustre .

Test the Hyperdisk Balanced mount

To test the Hyperdisk Balanced mount, do the following:

Apply the generated Flexible I/O tester (FIO) Job manifest:
```
 kubectl  
apply  
-f  
 PATH_TO_FIO_BENCHMARK 
 
```
Replace PATH_TO_FIO_BENCHMARK with the path to the generated fio-benchmark.yaml file. The path is displayed in the final instructions printed to the terminal after you deploy the blueprint.

The Job created in the cluster is named fio-benchmark .

Wait for the Job to complete, and then obtain the list of Pods:

 kubectl  
get  
 jobs 
kubectl  
get  
pods

View the logs of the completed Pod to check the benchmark results:
```
 kubectl  
logs  
 POD_NAME 
 
```
Replace POD_NAME with the name of the completed benchmark Pod. You can find the Pod name in the output of the kubectl get pods command in the previous step.

The logs of the Pod verify that the disk is mounted successfully and show the results of a mixed input and output test that is used to validate the disk's provisioned performance.

Test the shared Filestore mount

The blueprint includes a sample Job named shared-fs-job that demonstrates how two different Pods can write to and read from the same file simultaneously.

To test the shared Filestore mount, do the following:

Apply the Filestore test manifest:
```
 kubectl  
apply  
-f  
 PATH_TO_SHARED_FS_JOB 
 
```
Replace PATH_TO_SHARED_FS_JOB with the path to the generated shared-fs-job.yaml file. The path is displayed in the final instructions printed to the terminal after you deploy the blueprint.
Check the logs of the first Pod to verify that the first Pod is reading data written by the second Pod:
```
 kubectl  
get  
pods
kubectl  
logs  
 FIRST_POD_NAME 
 
```
Replace FIRST_POD_NAME with the name of the first Pod in the output of the kubectl get pods command.

The logs display content from the shared_output.txt file, showing timestamps and hostnames from both Pods, confirming that the file system is shared.

Delete resources

To avoid recurring charges for the resources used on this page, delete the resources provisioned by Cluster Toolkit, including the Virtual Private Cloud (VPC) networks and GKE cluster:

 ./gcluster  
destroy  
 DEPLOYMENT_NAME 
/

Replace DEPLOYMENT_NAME with the name of your deployment. You can find this name in the Set up the cluster infrastructure section, where you defined the deployment_name variable in the examples/gke-tpu-7x/gke-tpu-7x-deployment.yaml file.

Deploy a GKE TPU 7x cluster Stay organized with collections Save and categorize content based on your preferences.