You can use the Accelerated Processing Kit (XPK)
to create pre-configured Google Kubernetes Engine (GKE) clusters for
Pathways-based workloads. You can also use gcloud
to manually create
GKE clusters for Pathways-based workloads
Before you begin
Make sure you have:
- Installed Kubernetes tools
- Installed XPK
- Enabled the TPU API
- Enabled the Google Kubernetes Engine API
- Ensure your Google Cloud project is allowlisted for Pathways
Set up your local environment
Log in with your Google Cloud credentials.
gcloud
auth
application-default
login
Define the following environment variables with values appropriate to your workload.
Required variables
Create a GKE cluster
In the following example, you create a cluster with two v5e 2x4 node pools.
You can create a cluster using XPK or the gcloud
command.
XPK
-
Set some environment variables
CLUSTER_NODEPOOL_COUNT = CLUSTER_NODEPOOL_COUNT PROJECT = PROJECT_ID ZONE = ZONE CLUSTER = GKE_CLUSTER_NAME TPU_TYPE = " v5litepod-8 " PW_CPU_MACHINE_TYPE = " n2-standard-64 " NETWORK = NETWORK SUBNETWORK = SUB_NETWORK
Replace the following:
-
CLUSTER_NODEPOOL_COUNT: the maximum number of node pools a workload can use -
PROJECT_ID: your Google Cloud project name -
ZONE: the zone where you are creating resources -
CLUSTER: the GKE cluster name -
TPU_TYPE: the TPU type. For more information, see supported types in XPK -
PW_CPU_MACHINE_TYPE: the CPU node type for the Pathways controller -
NETWORK: [Optional] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster -
SUBNETWORK: [Optional] set a subnetwork name if using XPK, this must be created before creating your cluster
-
-
Use XPK to create a GKE Pathways cluster. This command can take several minutes to provision the capacity. Once completed, your capacity is allocated and you will start incurring charges.
xpk cluster create-pathways \ --num-slices = ${ CLUSTER_NODEPOOL_COUNT } \ --tpu-type = ${ TPU_TYPE } \ --pathways-gce-machine-type = ${ PW_CPU_MACHINE_TYPE } \ --on-demand \ --project = ${ PROJECT } \ --zone = ${ ZONE } \ --cluster = ${ CLUSTER } \ --custom-cluster-arguments = "--network= ${ NETWORK } --subnetwork= ${ SUBNETWORK } --enable-ip-alias"
Once the cluster is created, you can create and delete workloads as needed. You don't need to re-provision the TPU capacity.
gcloud
-
Set some environment variables
CLUSTER = GKE_CLUSTER_NAME PROJECT = PROJECT_ID ZONE = ZONE REGION = REGION CLUSTER_VERSION = GKE_CLUSTER_VERSION PW_CPU_MACHINE_TYPE = " n2-standard-64 " NETWORK = NETWORK SUBNETWORK = SUB_NETWORK CLUSTER_NODEPOOL_COUNT = 3 TPU_MACHINE_TYPE = " ct5lp-hightpu-4t " WORKERS_PER_SLICE = 2 TOPOLOGY = " 2x4 " NUM_CPU_NODES = 1
Replace the following:
-
CLUSTER: the GKE cluster name -
PROJECT_ID: your Google Cloud project name -
ZONE: the zone where you are creating resources -
REGION: the region where you are creating resources -
CLUSTER_VERSION: [Optional] the GKE cluster version, use 1.32.2-gke.1475000 or later -
PW_CPU_MACHINE_TYPE: the CPU node type for the Pathways controller -
NETWORK: [Optional] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster -
SUBNETWORK: [Optional] set a subnetwork name if using XPK, this must be created before creating your cluster -
CLUSTER_NODEPOOL_COUNT: the maximum number of node pools a workload can use -
TPU_MACHINE_TYPE: the TPU machine type you want to use -
WORKERS_PER_SLICE: the number of nodes per node pool -
GKE_ACCELERATOR_TYPE: the Google Kubernetes Engine accelerator type, see Choose a TPU version -
TOPOLOGY: the TPU topology -
NUM_CPU_NODES: the Pathways CPU node pool size
-
The following steps explain how to create a GKE cluster and set it up for running Pathways workloads.
-
Create a GKE cluster:
gcloud beta container clusters create ${ CLUSTER } \ --project = ${ PROJECT } \ --zone = ${ ZONE } \ --cluster-version = ${ CLUSTER_VERSION } \ --scopes = storage-full,gke-default,cloud-platform \ --machine-type ${ PW_CPU_MACHINE_TYPE } \ --network = ${ NETWORK } \ --subnetwork = ${ SUBNETWORK } -
Create TPU node pools:
for i in $( seq 1 ${ CLUSTER_NODEPOOL_COUNT } ) ; do gcloud container node-pools create "tpu-np- ${ i } " \ --project = ${ PROJECT } \ --zone = ${ ZONE } \ --cluster = ${ CLUSTER } \ --machine-type = ${ TPU_MACHINE_TYPE } \ --num-nodes = ${ WORKERS_PER_SLICE } \ --placement-type = COMPACT \ --tpu-topology = ${ TOPOLOGY } \ --scopes = storage-full,gke-default,cloud-platform \ --workload-metadata = GCE_METADATA done -
Create a CPU node pool:
gcloud container node-pools create "cpu-pathways-np" \ --project ${ PROJECT } \ --zone ${ ZONE } \ --cluster ${ CLUSTER } \ --machine-type ${ PW_CPU_MACHINE_TYPE } \ --num-nodes ${ NUM_CPU_NODES } \ --scopes = storage-full,gke-default,cloud-platform \ --workload-metadata = GCE_METADATA -
Install the
JobSetandPathwaysJobAPIsGet credentials for the cluster and add them to your local kubectl context.
gcloud container clusters get-credentials ${ CLUSTER } \ [ --zone = ${ ZONE } | --region = ${ REGION } ] \ --project = ${ PROJECT } \ && kubectl config set-context --current --namespace = defaultTo use the Pathways architecture on your GKE cluster, you need to install the
JobSetAPI and thePathwaysJobAPI.kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.8.0/manifests.yaml kubectl apply --server-side -f https://github.com/google/pathways-job/releases/download/v0.1.2/install.yaml

