This document explains how to quickly deploy and configure a basic Slurm cluster
on Google Kubernetes Engine (GKE) by using the open-source Slurm Helm chart and the
Slurm Operator add-on for GKE. This setup includes a Slurm controller ( slurmctld
), REST API
( slurmrestd
), a login node for user access, and a single worker node
( slurmd
) managed by the Slurm Operator add-on for GKE.
This document is for Data administrators, Operators, and Developers who want to enable and configure the Slurm cluster on GKE.
Before reading this document, ensure that you're familiar with the Slurm Operator add-on for GKE .
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task, install
and then initialize
the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.
-
Ensure you have already generated an SSH key pair. This key pair is required only if you want to set up OS Login.
-
Ensure you have a running GKE cluster with Slurm Operator enabled. If not, create one:
gcloud container clusters create CLUSTER_NAME \ --cluster-version = VERSION \ --location = LOCATION \ --project = PROJECT_ID \ --addons = SlurmOperatorReplace the following:
-
CLUSTER_NAME: the name of the new cluster. -
VERSION: the GKE version, which must be 1.35.2-gke.1842000 or later. You can also use the--release-channeloption to select a release channel. The release channel must have a default version of 1.35.2-gke.1842000 or later. -
LOCATION: the location of the cluster. -
PROJECT_ID: the ID of the project.
For more information, see enable Slurm Operator add-on for GKE .
-
(Optional) Configure OS Login
OS Login simplifies SSH access management by linking your Linux user account to your IAM identity. This configuration lets you manage access to Slurm nodes by using IAM permissions.
-
Grant necessary IAM roles. Ensure your user account has the necessary IAM roles in the project:
-
roles/compute.osLogin: lets you manage your own OS Login profile. -
roles/compute.instanceAdmin.v1: provides permissions to manage compute instances. -
roles/iam.serviceAccountUser: lets users act as a service account, which is often needed for node operations.
For more information about the required roles, see the guide to set up OS Login .
-
-
Add your SSH key to OS Login by uploading your public SSH key:
gcloud compute os-login ssh-keys add --key-file = PATH_TO_PUBLIC_KEY --project = PROJECT_IDAlternatively, you can add a key that's loaded in your
ssh-agent:gcloud compute os-login ssh-keys add --key = " $( ssh-add -L | grep publickey | head -n 1 ) " --project = PROJECT_ID -
Enable OS Login in your project metadata:
gcloud compute project-info add-metadata --metadata enable-oslogin = TRUE --project = PROJECT_ID
For managing OS Login
across multiple projects in an organization, consider enforcing OS Login by
using an Organization Policy Service constraint ( compute.requireOsLogin
). This
is a recommended security best practice. For more information, see Enable and configure OS
Login in GKE
.
(Optional) Add a compute node pool
If you want to run Slurm compute workloads on separate nodes, you can create a dedicated node pool for them.
gcloud
container
node-pools
create
NODE_POOL_NAME
\
--cluster =
CLUSTER_NAME
\
--machine-type =
MACHINE_TYPE
\
--num-nodes =
NUM_NODES
\
--node-taints =
slurm-worker =
true:NoSchedule
Replace the following:
-
NODE_POOL_NAME: the name of the new node pool. -
CLUSTER_NAME: the name of your cluster. -
MACHINE_TYPE: the machine type for the nodes (for example:n2-standard-4). -
NUM_NODES: the number of nodes in the node pool.
Deploy Slurm using Helm
This section guides you through deploying the Slurm cluster components by using
the Slurm Helm chart. The Helm chart deploys slurmctld
, slurmrestd
, and slurmd
components within the GKE cluster.
-
Configure
kubectlto communicate with your cluster:gcloud container clusters get-credentials CLUSTER_NAMEReplace CLUSTER_NAME with your cluster name.
-
Verify that you are running Helm 3.8.0 or later.
helm versionThe output is similar to the following:
version.BuildInfo{Version:"v3.17.3", GitCommit:"e4da49785aa6e6ee2b86efd5dd9e43400318262b", GitTreeState:"clean", GoVersion:"go1.23.7"}If needed, you can install Helm by following the official Helm documentation .
-
Find an available image tag:
-
In the Google Cloud console, go to the Artifact Registry repositorypage that includes the
slinky/slurmdpackage. -
Annotate one of the image tag value, for example
25.11-ubuntu24.04-gke.4. You use this tag in the IMAGE_TAG placeholder in the following configuration file.
-
-
Save the following configuration to a new file named
values.yaml:controller : slurmctld : image : repository : gcr.io/gke-release/slinky/slurmctld tag : IMAGE_TAG reconfigure : image : repository : gcr.io/gke-release/slinky/slurmctld tag : IMAGE_TAG restapi : replicas : 1 slurmrestd : image : repository : gcr.io/gke-release/slinky/slurmrestd tag : IMAGE_TAG nodesets : slinky : replicas : 1 slurmd : image : repository : gcr.io/gke-release/slinky/slurmd tag : IMAGE_TAG # The podSpec block is optional and only required when using # a dedicated node pool for compute nodes. podSpec : nodeSelector : cloud.google.com/gke-nodepool : NODE_POOL_NAME tolerations : - key : "slurm-worker" operator : "Equal" value : "true" effect : "NoSchedule" loginsets : slinky : enabled : true replicas : 1 login : image : repository : gcr.io/gke-release/slinky/login tag : IMAGE_TAGReplace IMAGE_TAG with the tag that you copied in the previous step. For example, use
25.11-ubuntu24.04-gke.4. -
Install the Slurm Helm chart by using the
values.yamlfile:helm install slurm oci://ghcr.io/slinkyproject/charts/slurm \ --namespace = slurm --create-namespace --version 1 .0.2 -f values.yaml
Verify the Slurm installation
You can verify that Slurm is deployed on the cluster by using kubectl
.
-
Check Pod status:
kubectl get pods --namespace slurmThe output should be similar to the following, and show the
Runningstatus for all Pods:NAME READY STATUS RESTARTS AGE slurm-controller-0 3/3 Running 0 60s slurm-login-slinky-5d79cd755c-mf62z 1/1 Running 0 60s slurm-restapi-6b4ccb479f-njlp9 1/1 Running 0 60s slurm-worker-slinky-0 2/2 Running 0 60s -
To see the registered nodes, execute the
sinfocommand on the login node:kubectl exec -it deployment/slurm-login-slinky -n slurm -- sinfoThe output should list the
slinkypartition and the worker node.
Run a Slurm Job
-
To run a job, you need to access the Slurm login node. The way you access the login node depends on whether you have configured OS Login in the previous section.
-
If you configured OS Login in the preceding section, access the login node by using SSH. To do this, get the
external IP address ofslurm-login-slinky` Service:kubectl get service --namespace slurm slurm-login-slinkyThe output looks like this:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE slurm-login-slinky LoadBalancer 10.X.X.X X.X.X.X 22:30171/TCP 5mCopy the value of the
EXTERNAL-IPcolumn.ssh OSLOGIN_USERNAME @ EXTERNAL_IPReplace the following:
-
EXTERNAL_IP: the IP address obtained in the previous step. -
OSLOGIN_USERNAME: your OS Login username .
-
-
If you did not configure OS Login, you can still access the login node by using the
kubectl execcommand:kubectl exec -it deployment/slurm-login-slinky -n slurm -- bash
-
-
Run an interactive job: After you're in the login node, you can run a command on a compute node by using the
sruncommand line utility.srun hostnameThe output includes the hostname of the
slurm-worker-slinky-0Pod.
Clean up
To avoid incurring charges, clean up the resources created in this document.
-
Uninstall the Helm deployment: This command removes all Kubernetes resources deployed by the Helm chart.
helm uninstall slurm --namespace slurm -
Delete the Slurm namespace:
kubectl delete namespace slurm -
Delete the GKE cluster:
gcloud container clusters delete CLUSTER_NAME \ --location = LOCATION \Replace the following:
-
CLUSTER_NAME: the name of the new cluster. -
LOCATION: the region of cluster.
-
What's next
- Explore the Slurm Project on GitHub .
- Learn Slurm basics .
- Learn how to enable or disable the Slurm Operator add-on for GKE .

