This tutorial shows you how to orchestrate a distributed training environment for reinforcement learning (RL) on Google Kubernetes Engine (GKE). You use Ray and the NVIDIA NeMo RL framework to set up a distributed training environment to fine-tune a model.
This tutorial focuses on the Group Relative Policy Optimization (GRPO) training pipeline on GKE with Ray and NeMo RL. GRPO is a reinforcement learning algorithm designed to improve a model's reasoning ability. This memory-efficient algorithm simplifies the RL process by eliminating the Critic, or value model , and using a relative group-based calculation instead.
Before you run this tutorial, we recommend that you complete the Fine-tune and scale reinforcement learning with verl on GKE tutorial. The following tutorial uses the same cluster setup and configuration as the fine-tuning and scaling RL with verl tutorial.
Background
The following sections provide a brief overview of the concepts used in this tutorial.
Reinforcement learning (RL)
RL teaches models through experience, exploration, and feedback rather than static imitation. Although pre-training teaches a model what to say, reinforcement learning from human feedback (RLHF) teaches it how to be helpful, safe, and logical. RL serves as the bridge between a base model and a fine-tuned model for a specialized use case.
For more information, see What is reinforcement learning?
Group Relative Policy Optimization (GRPO)
GRPO , an algorithm popularized by DeepSeek, offers a memory-efficient alternative to Proximal Policy Optimization (PPO) for LLM alignment by removing the Critic model. Instead of a Critic network, GRPO generates a group of responses for the same prompt and uses the average reward of that group as the baseline.
For more information, see GRPO .
NVIDIA NeMo RL
NeMo RL is NVIDIA's open-source post-training library designed for scalable RL. Part of the broader NeMo framework ecosystem, NeMo RL enables both small-scale experiments on a single GPU and multi-node deployments across thousands of GPUs.
For more information, see NVIDIA NeMo RL .
GSM8k dataset
In this tutorial, you use the GSM8k dataset, which contains 8,500 high-quality, linguistically diverse grade-school-math word problems problems.
By using GSM8k and GRPO, the model generates a group of n different responses for the same problem. GRPO compares these responses against the group average. The model is rewarded more for paths that are consistently correct and logically sound compared to the rest of the group. Over time, the model learns that articulating its steps clearly is the most reliable way to maximize reward, effectively reducing the reward for low-performance answers.
For more information, see GSM8k .
Objectives
This tutorial shows you how to set up RL on GKE with NeMo RL by completing the following steps:
- Prepare your environment.
- Set up a GKE cluster with B200 or H200 GPUs.
- Configure KubeRay to manage a distributed Ray cluster.
- Use Managed Lustre for high-performance storage.
- Run a GRPO training job that uses NeMo RL.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project .
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Create a Google Cloud project:
gcloud projects create PROJECT_IDReplace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_IDReplace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the required APIs:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles .gcloud services enable container.googleapis.com
storage.googleapis.com compute.googleapis.com -
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project .
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Create a Google Cloud project:
gcloud projects create PROJECT_IDReplace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_IDReplace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the required APIs:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles .gcloud services enable container.googleapis.com
storage.googleapis.com compute.googleapis.com -
Grant roles to your user account. Run the following command once for each of the following IAM roles:
roles/container.admin, roles/iam.serviceAccountAdmin, roles/storage.admingcloud projects add-iam-policy-binding PROJECT_ID --member = "user: USER_IDENTIFIER " --role = ROLE
Replace the following:
-
PROJECT_ID: Your project ID. -
USER_IDENTIFIER: The identifier for your user account. For example,myemail@example.com. -
ROLE: The IAM role that you grant to your user account.
-
- Create a Hugging Face account, if you don't already have one.
- Ensure that you have a Hugging Face token .
- Ensure your project has sufficient quota for B200 and H200 GPUs. To learn more, see Plan GPU quota and GPU quota .
Prepare your environment
In this tutorial, you use Cloud Shell .
-
Go to the Google Cloud console .
-
At the top of the Google Cloud console window, click the Activate Cloud Shellbutton.
-
Set the following environment variables:
export PROJECT_ID = $( gcloud config get project ) export PROJECT_NUMBER = $( gcloud projects describe ${ PROJECT_ID } --format = "value(projectNumber)" ) export CONTROL_PLANE_REGION = CONTROL_PLANE_REGION export NODE_ZONE = NODE_ZONE export CLUSTER_NAME = CLUSTER_NAME export GPU_TYPE = GPU_TYPE export MACHINE_TYPE = MACHINE_TYPE export KSA_NAME = generic-ksa export NAMESPACE = default export RESERVATION = RESERVATION export LUSTRE_NAME = LUSTRE_NAME export HF_TOKEN = YOUR_HF_TOKEN export WANDB_API_KEY = YOUR_WANDB_API_KEYReplace the following values:
-
CLUSTER_NAME: the name of your GKE cluster. -
CONTROL_PLANE_REGION: the Compute Engine region for the GKE cluster control plane. -
NODE_ZONE: the zone for your nodes. Select a zone where NVIDIA B200 or H200 GPUs are available . -
GPU_TYPE: the accelerator that you reserved in the Compute Engine capacity reservation. Must be one of the following values:-
nvidia-b200: NVIDIA B200 (180 GB) -
nvidia-h200-141gb: NVIDIA H200 (141 GB)
-
-
MACHINE_TYPE: the type of machine to use:- For NVIDIA B200 (180 GB) GPUs, use
a4-highgpu-8gor later. - For NVIDIA H200 (141 GB) GPUs, use
a3-ultragpu-8gor later.
- For NVIDIA B200 (180 GB) GPUs, use
-
RESERVATION: the name of your GPU reservation. -
LUSTRE_NAME: the name of your Lustre instance. -
YOUR_HF_TOKEN: your Hugging Face token. -
YOUR_WANDB_API_KEY: your Wandb API key.
-
-
Create the following environment variables for the network:
export NETWORK = "NETWORK-NAME" export GVNIC_NETWORK_PREFIX = "GVNIC-NAME" export RDMA_NETWORK_PREFIX = "RDMA-NAME"Replace the following values:
-
NETWORK-NAME: the network name for GKE. -
GVNIC-NAME: the prefix for the gVNIC network name. You can use any prefix you want. -
RDMA-NAME: the prefix for the remote direct memory access (RDMA) network. You can use any prefix you want.
-
Set up infrastructure
In this section, you create VPC networks and a GKE cluster.
Create a VPC network
-
Create a VPC network for the gVNIC interface:
gcloud compute networks create ${ NETWORK } --subnet-mode = auto gcloud compute networks create ${ GVNIC_NETWORK_PREFIX } -net \ --subnet-mode = custom gcloud compute networks subnets create ${ GVNIC_NETWORK_PREFIX } -sub \ --network = ${ GVNIC_NETWORK_PREFIX } -net \ --region = ${ CONTROL_PLANE_REGION } \ --range = 192 .168.0.0/24 gcloud compute firewall-rules create ${ GVNIC_NETWORK_PREFIX } -internal \ --network = ${ GVNIC_NETWORK_PREFIX } -net \ --action = ALLOW \ --rules = tcp:0-65535,udp:0-65535,icmp \ --source-ranges = 192 .168.0.0/16 -
Create a VPC network and subnets for RDMA that includes eight subnets for eight GPUs:
gcloud compute networks create ${ RDMA_NETWORK_PREFIX } -net \ --network-profile = ${ NODE_ZONE } -vpc-roce \ --subnet-mode = custom for N in $( seq 0 7 ) ; do gcloud compute networks subnets create ${ RDMA_NETWORK_PREFIX } -sub- $N \ --network = ${ RDMA_NETWORK_PREFIX } -net \ --region = ${ CONTROL_PLANE_REGION } \ --range = 192 .168. $(( N + 1 )) .0/24 & done wait
Create the GKE cluster
You can set NeMo RL in a GKE Standard cluster.
-
Create a Standard cluster:
gcloud container clusters create ${ CLUSTER_NAME } \ --location = ${ CONTROL_PLANE_REGION } \ --workload-pool = ${ PROJECT_ID } .svc.id.goog \ --enable-dataplane-v2 \ --enable-ip-alias \ --enable-multi-networking \ --addons = RayOperator,LustreCsiDriver \ --enable-legacy-lustre-port \ --machine-type = n2-highmem-80 \ --num-nodes = 1 \ --min-nodes = 1 \ --max-nodes = 5 \ --enable-autoscaling \ --network = ${ NETWORK } -
Get credentials for your cluster:
gcloud container clusters get-credentials ${ CLUSTER_NAME } \ --location = ${ CONTROL_PLANE_REGION } -
Create the GPU node pool:
gcloud container node-pools create gpu-pool \ --cluster = ${ CLUSTER_NAME } \ --location = ${ CONTROL_PLANE_REGION } \ --node-locations = ${ NODE_ZONE } \ --machine-type = ${ MACHINE_TYPE } \ --accelerator = type = ${ GPU_TYPE } ,count = 8 ,gpu-driver-version = DEFAULT \ --reservation-affinity = specific \ --reservation = ${ RESERVATION } \ --enable-autoscaling \ --num-nodes = 0 \ --total-max-nodes = 2 \ --additional-node-network = network = ${ GVNIC_NETWORK_PREFIX } -net,subnetwork = ${ GVNIC_NETWORK_PREFIX } -sub \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-0 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-1 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-2 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-3 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-4 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-5 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-6 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-7 -
Install the NCCL RDMA installer:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/refs/heads/master/gpudirect-rdma/nccl-rdma-installer.yaml
Configure network mappings
-
Save the following manifest as
network-mapping.yaml: -
Apply the manifest:
envsubst < network-mapping.yaml > network-mapping-updated.yaml kubectl apply -f network-mapping-updated.yaml
Prepare storage
In this section, you create a Managed Lustre instance, which provisions the high-performance storage required for your RL workload.
-
Allocate an IP address range for private services access:
gcloud compute addresses create ${ LUSTRE_NAME } -range \ --global --purpose = VPC_PEERING \ --prefix-length = 20 --network = ${ NETWORK } -
Connect the peering:
gcloud services vpc-peerings connect \ --service = servicenetworking.googleapis.com \ --ranges = ${ LUSTRE_NAME } -range \ --network = ${ NETWORK } -
Create a Managed Lustre instance:
gcloud lustre instances create ${ LUSTRE_NAME } \ --per-unit-storage-throughput = 500 \ --capacity-gib = 18000 \ --filesystem = lustrefs \ --location = ${ NODE_ZONE } \ --network = projects/ ${ PROJECT_ID } /global/networks/ ${ NETWORK } \ --gke-support-enabled -
Access an existing Managed Lustre instance using the Managed Lustre CSI driver
-
Extract the IP address of the Managed Lustre instance.
export LUSTRE_IP = $( gcloud lustre instances describe ${ LUSTRE_NAME } \ --location = $NODE_ZONE --format = "value(mountPoint)" | awk -F '@' '{print $1}' ) -
Inspect
lustre-pv.yamlmanifest. -
Apply the manifest:
envsubst < lustre-pv.yaml > lustre-pv-updated.yaml kubectl apply -f lustre-pv-updated.yaml -
Inspect
lustre-pvc.yamlmanifest. -
Apply the manifest:
kubectl apply -f lustre-pvc.yaml
-
Deploy RayCluster
In this section, you clone the sample repository, prepare the manifests, and deploy Ray cluster:
-
Clone the sample repository:
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git cd kubernetes-engine-samples -
Navigate to the working directory:
cd ai-ml/nemo-rl-on-gke/nemoRL -
Inspect the
values.yamlmanifest:Replace
NCCL_TUNER_CONFIG_PATHwith any of the following values, based on the accelerator you use in this tutorial:- NVIDIA B200 (180 GB):
/usr/local/gib/configs/tuner_config_a4.txtpb - NVIDIA H200 (141 GB):
/usr/local/gib/configs/tuner_config_a3u.txtpb
In this manifest, the head node manages the Job and hosts the Ray Dashboard. The worker nodes run the training Jobs.
- NVIDIA B200 (180 GB):
-
Deploy the Ray cluster:
export REPLICA_COUNT = 2 helm install ray-cluster . \ --set additionalWorkerGroups.worker-grp-0.replicas = $REPLICA_COUNTFor this tutorial, you use two worker nodes. If you want to change the number of worker nodes, change the
REPLICA_COUNTvalue. -
Verify the worker and head nodes are running:
kubectl get podsThe output is similar to the following:
NAME READY STATUS RESTARTS AGE ray-cluster-kuberay-head-sw7dp 3/3 Running 0 33h ray-cluster-kuberay-worker-grp-0-worker-gkbxw 3/3 Running 0 33h ray-cluster-kuberay-worker-grp-0-worker-kdg62 3/3 Running 0 33h -
Verify the Ray cluster is running:
kubectl ray get clusterThe output is similar to the following:
NAME NAMESPACE DESIRED WORKERS AVAILABLE WORKERS CPUS GPUS TPUS MEMORY CONDITION STATUS AGE ray-cluster-kuberay default 2 2 618 17 0 1573741824k RayClusterProvisioned ready 33h
Launch the GRPO Job
After your Ray cluster is ready, you can submit a Ray Job to your running Ray cluster on GKE. NeMo RL automatically downloads the model during the execution of the RL training Job.
To submit a Ray Job, start an interactive session to execute the Job.
-
To establish a local connection to your Ray cluster, run this command:
kubectl ray session ray-cluster-kuberayThis command initiates port forwarding between your local machine and the Ray head node in your GKE cluster. Note that your terminal will be occupied while this session is active; to proceed, open a separate terminal instance.
-
Edit the
gemma3-27b-gsm8k.shfile:Replace the following values in the
gemma3-27b-gsm8k.shfile:-
YOUR_WANDB_API_KEY: your WandB API key. -
YOUR_HF_TOKEN: your Hugging Face token.
In this file, you can see the configuration to run a Job with the gemma3-27b-it model on the GSM8k dataset. To complete the GRPO training pipeline, this script defines the following parameters:
-
num_prompts_per_step: 16andnum_generations_per_prompt: 32: the Gemma3-27b-it model generates a large group of responses for every prompt. In this configuration, the model produces 512 total responses (16 × 32 = 512). -
policy.generation.colocated.enabled=False: this parameter disables the colocated generation feature, which means that the model doesn't generate responses in the same node as the training process. In standard RL, the same GPUs handle both training and generation. In this NeMo RL setup, you dedicate specific nodes (managed with thepolicy.generation.colocated.resourcesparameter) solely to vLLM inference, while the rest of the cluster focuses on the heavy-duty training math. By separating these workloads, you prevent resource contention between the memory-intensive training buffers and the compute-intensive inference workloads.
-
-
To submit the Job, run the following command:
bash gemma3-27b-it/gemma3-27b-gsm8k.shWhen the Job is running, the output shows the training results, timing, and performance metrics.
Monitor the health of the GRPO Job
After Ray finishes the Job, NeMo RL stores the checkpoints in the configured path.
-
To check the output of the of the GRPO Job, create a SSH session to the
ray-headcontainer:kubectl exec -it $( kubectl get pods -l ray.io/node-type = head -o name ) -c ray-head -- bash -
Install the apt tree utility within the
ray-headcontainer's terminal:apt update && apt install -y treeThe output is similar to the following:
root@ray-cluster-kuberay-worker-grp-0-worker-gkbxw:/opt/nemo-rl# tree /data/nemo_rl_gemma3_27b_3_17/ /data/nemo_rl_gemma3_27b_3_17/ `-- step_10 |-- config.yaml |-- policy | |-- optimizer | | |-- __0_0.distcp | | |-- __10_0.distcp | | |-- __11_0.distcp | | |-- __12_0.distcp | | |-- __13_0.distcp | | |-- __14_0.distcp | | |-- __15_0.distcp | | |-- __1_0.distcp | | |-- __2_0.distcp | | |-- __3_0.distcp | | |-- __4_0.distcp | | |-- __5_0.distcp | | |-- __6_0.distcp | | |-- __7_0.distcp | | |-- __8_0.distcp | | `-- __9_0.distcp | |-- tokenizer | | |-- chat_template.jinja | | |-- special_tokens_map.json | | |-- tokenizer.json | | `-- tokenizer_config.json | `-- weights | |-- __0_0.distcp | |-- __10_0.distcp | |-- __11_0.distcp | |-- __12_0.distcp | |-- __13_0.distcp | |-- __14_0.distcp | |-- __15_0.distcp | |-- __1_0.distcp | |-- __2_0.distcp | |-- __3_0.distcp | |-- __4_0.distcp | |-- __5_0.distcp | |-- __6_0.distcp | |-- __7_0.distcp | |-- __8_0.distcp | `-- __9_0.distcp |-- train_dataloader.pt `-- training_info.json 6 directories, 39 files ```
Clean up
To avoid incurring charges, delete the resources:
helm
delete
ray-cluster
gcloud
container
clusters
delete
${
CLUSTER_NAME
}
\
--location =
${
CONTROL_PLANE_REGION
}
\
--quiet
gcloud
lustre
instances
delete
${
LUSTRE_NAME
}
--location =
${
NODE_ZONE
}
--quiet
gcloud
services
vpc-peerings
delete
\
--service =
servicenetworking.googleapis.com
\
--network =
${
NETWORK
}
gcloud
compute
addresses
delete
${
LUSTRE_NAME
}
-range
--global
--quiet
gcloud
compute
firewall-rules
delete
${
GVNIC_NETWORK_PREFIX
}
-internal
--quiet
gcloud
compute
networks
subnets
delete
${
GVNIC_NETWORK_PREFIX
}
-sub
\
--region =
${
CONTROL_PLANE_REGION
}
--quiet
gcloud
compute
networks
delete
${
GVNIC_NETWORK_PREFIX
}
-net
--quiet for
N
in
$(
seq
0
7
)
;
do
gcloud
compute
networks
subnets
delete
${
RDMA_NETWORK_PREFIX
}
-sub- $N
\
--region =
${
CONTROL_PLANE_REGION
}
--quiet
& done
wait
gcloud
compute
networks
delete
${
RDMA_NETWORK_PREFIX
}
-net
--quiet
gcloud
compute
networks
delete
${
NETWORK
}
--quiet

