This tutorial shows you how to orchestrate a distributed training environment for reinforcement learning on Google Kubernetes Engine (GKE). You use Ray and the verl(Volcano Engine Reinforcement Learning) framework to set up a distributed training environment to fine-tune a Qwen2.5-32B-Instruct model.
This tutorial focuses on the Group Relative Policy Optimization (GRPO) training pipeline on GKE with Ray and verl. GRPO is a reinforcement learning algorithm designed to improve a model's reasoning ability. This memory-efficient algorithm simplifies the reinforcement learning (RL) process by eliminating the Critic, or value model , and using a relative group-based calculation instead.
This tutorial is a good starting point if you need to set up a distributed training environment where data, model weights, and the training engine are decoupled for efficiency.
Background
The following sections provide a brief overview of the concepts used in this tutorial.
Reinforcement learning
RL teaches models through experience, exploration, and feedback rather than static imitation. While pre-training teaches a model what to say, RL—specifically Reinforcement Learning from Human Feedback (RLHF)—teaches it how to be helpful, safe, and logical. RL serves as the bridge between a base model and a fine-tuned model for a specialized use case.
For more information, see What is reinforcement learning?
Volcano Engine Reinforcement Learning (verl)
verlis a high-performance framework designed to handle the complex memory and compute patterns of LLM-based RL.
For more information, see verl .
Group Relative Policy Optimization (GRPO)
GRPO, an algorithm popularized by DeepSeek, offers a memory-efficient alternative to Proximal Policy Optimization (PPO) for LLM alignment by removing the Critic model. Instead of a Critic network, GRPO generates a group of responses for the same prompt and uses the average reward of that group as the baseline.
For more information, see GRPO .
Objectives
This tutorial shows you how set up reinforcement learning on GKE with verl, by completing the following steps:
- Set up a GKE cluster with B200 or H200 GPUs.
- Configure KubeRay to manage a distributed Ray cluster.
- Use Cloud Storage FUSE to mount a Cloud Storage bucket across all nodes.
- Run a GRPO training job using verlto align the Qwen2.5-32B-Instruct model with the GSM8K dataset.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project .
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Create a Google Cloud project:
gcloud projects create PROJECT_IDReplace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_IDReplace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the required APIs:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles .gcloud services enable container.googleapis.com
storage.googleapis.com compute.googleapis.com -
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project .
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Create a Google Cloud project:
gcloud projects create PROJECT_IDReplace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_IDReplace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the required APIs:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles .gcloud services enable container.googleapis.com
storage.googleapis.com compute.googleapis.com -
Grant roles to your user account. Run the following command once for each of the following IAM roles:
roles/container.admin, roles/iam.serviceAccountAdmin, roles/storage.admingcloud projects add-iam-policy-binding PROJECT_ID --member = "user: USER_IDENTIFIER " --role = ROLE
Replace the following:
-
PROJECT_ID: Your project ID. -
USER_IDENTIFIER: The identifier for your user account. For example,myemail@example.com. -
ROLE: The IAM role that you grant to your user account.
-
- Create a Hugging Face account, if you don't already have one.
- Ensure that you have a Hugging Face token .
- Ensure your project has sufficient quota for B200 and H200 GPUs. To learn more, see Plan GPU quota and GPU quota .
Prepare your environment
In this tutorial, you use Cloud Shell .
-
Go to the Google Cloud console .
-
At the top of the Google Cloud console window, click the Activate Cloud Shellbutton.
-
Set the following environment variables:
export PROJECT_ID = $( gcloud config get project ) export PROJECT_NUMBER = $( gcloud projects describe ${ PROJECT_ID } --format = "value(projectNumber)" ) export GPU_TYPE = GPU_TYPE export CONTROL_PLANE_LOCATION = CONTROL_PLANE_LOCATION export NODE_LOCATION = NODE_LOCATION export CLUSTER_NAME = CLUSTER_NAME export KSA_NAME = CLUSTER_NAME export GS_BUCKET = BUCKET_NAME - ${ PROJECT_ID } export NAMESPACE = default export HF_TOKEN = YOUR_HUGGING_FACE_TOKEN export MACHINE_TYPE = MACHINE_TYPE export GKE_VERSION = GKE_VERSIONReplace the following values:
-
CONTROL_PLANE_LOCATION: the Compute Engine region for the GKE cluster control plane. -
GPU_TYPE: the accelerator that you reserved in the Compute Engine capacity reservation. Must be one of the following values:-
nvidia-b200: NVIDIA B200 (180GB) -
nvidia-h200-141gb: NVIDIA H200 (141GB)
-
-
NODE_LOCATION: the zone for the GKE nodes. Select a zone where NVIDIA B200 or H200 GPUs are available . -
CLUSTER_NAME: the name of your GKE cluster. -
BUCKET_NAME: the base name for your Cloud Storage bucket. You don't need to specify thegs://prefix. -
YOUR_HUGGING_FACE_TOKEN: your Hugging Face token for model access. -
MACHINE_TYPE: the type of machine to use. Valid options arec2standard8orc2standard16. -
GKE_VERSION: the version of GKE to use:- For NVIDIA B200 (180 GB) GPUs, use
1.32.2-gke.1422000or later. - For NVIDIA H200 (141GB) GPUs, use
1.31.4-gke.1183000or later.
- For NVIDIA B200 (180 GB) GPUs, use
-
-
Create the following environment variables for the network:
export GVNIC_NETWORK_PREFIX = "GVNIC-NAME" export RDMA_NETWORK_PREFIX = "RDMA-NAME"Replace the following values:
-
GVNIC-NAME: the prefix for the gVNIC network name. You can use any prefix you want. -
RDMA-NAME: the prefix for the remote direct memory access (RDMA) network. You can use any prefix you want.
-
Set up infrastructure
In this section, you create a RDMA network and a GKE cluster.
Create RDMA network and subnets
-
Create a VPC network for the gVNIC interface:
gcloud compute networks create ${ GVNIC_NETWORK_PREFIX } -net \ --subnet-mode = custom \ --project = ${ PROJECT } gcloud compute networks subnets create ${ GVNIC_NETWORK_PREFIX } -sub \ --network = ${ GVNIC_NETWORK_PREFIX } -net \ --location = ${ CONTROL_PLANE_LOCATION } \ --range = 192 .168.0.0/24 gcloud compute firewall-rules create ${ GVNIC_NETWORK_PREFIX } -internal \ --network = ${ GVNIC_NETWORK_PREFIX } -net \ --action = ALLOW \ --rules = tcp:0-65535,udp:0-65535,icmp \ --source-ranges = 192 .168.0.0/16 -
Create a VPC network and subnets for RDMA with 8 subnets for 8 GPUs:
gcloud beta compute networks create ${ RDMA_NETWORK_PREFIX } -net \ --network-profile = ${ CONTROL_PLANE_LOCATION } -vpc-roce \ --subnet-mode = custom for N in $( seq 0 7 ) ; do gcloud compute networks subnets create ${ RDMA_NETWORK_PREFIX } -sub- $N \ --network = ${ RDMA_NETWORK_PREFIX } -net \ --location = ${ CONTROL_PLANE_LOCATION } \ --range = 192 .168. $(( N + 1 )) .0/24 & done wait -
Clone the sample repository:
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git cd kubernetes-engine-samples -
Navigate to the working directory:
cd ai-ml/verl-on-gke
Create the GKE cluster
You can set verl in a GKE Autopilot or Standard cluster. We recommend that you use a Autopilot cluster for a fully managed Kubernetes experience. To choose the GKE mode of operation that's the best fit for your workloads, see Choose a GKE mode of operation .
Autopilot
-
Create an Autopilot cluster:
gcloud container clusters create-auto ${ CLUSTER_NAME } \ --location = ${ CONTROL_PLANE_LOCATION } \ --enable-multi-networking \ --enable-ray-operator -
Get credentials for your cluster:
gcloud container clusters get-credentials ${ CLUSTER_NAME } \ --location = ${ REGION } -
Install the NCCL RDMA installer for Autopilot:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/refs/heads/master/gpudirect-rdma/nccl-rdma-installer-autopilot.yaml
Standard
-
Create a Standard cluster:
gcloud container clusters create ${ CLUSTER_NAME } \ --location = ${ CONTROL_PLANE_LOCATION } \ --location = ${ ZONE } \ --enable-dataplane-v2 \ --enable-ip-alias \ --enable-multi-networking \ --addons = RayOperator,GcsFuseCsiDriver \ --machine-type = ${ MACHINE_TYPE } \ --num-nodes = 1 \ --min-nodes = 1 \ --max-nodes = 5 \ --enable-autoscaling -
Get credentials for your cluster:
gcloud container clusters get-credentials ${ CLUSTER_NAME } --location = ${ ZONE } -
Create the GPU node pool (using Spot instances for cost efficiency):
gcloud container node-pools create gpu-pool \ --cluster = ${ CLUSTER_NAME } \ --location = ${ NODE_LOCATION } \ --machine-type = ${ MACHINE_TYPE } \ --accelerator = type = ${ GPU_TYPE } ,count = 8 ,gpu-driver-version = DEFAULT \ --spot \ --enable-autoscaling \ --num-nodes = 0 \ --total-max-nodes = 10 \ --additional-node-network = network = ${ GVNIC_NETWORK_PREFIX } -net,subnetwork = ${ GVNIC_NETWORK_PREFIX } -sub \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-0 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-1 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-2 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-3 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-4 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-5 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-6 \ --additional-node-network = network = ${ RDMA_NETWORK_PREFIX } -net,subnetwork = ${ RDMA_NETWORK_PREFIX } -sub-7 -
Install the NCCL RDMA installer used for Standard clusters:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/refs/heads/master/gpudirect-rdma/nccl-rdma-installer.yaml
Configure network mappings
-
Inspect the
network-mapping.yamlmanifest: -
Apply the manifest:
kubectl apply -f network-mapping.yaml
Prepare data and storage
-
Create a Cloud Storage bucket:
gcloud storage buckets create gs:// ${ GS_BUCKET } --location = ${ REGION } --enable-hierarchical-namespace --uniform-bucket-level-access -
Create a Kubernetes Service Account (KSA) and bind it to the bucket:
kubectl create serviceaccount ${ KSA_NAME } --namespace ${ NAMESPACE } gcloud storage buckets add-iam-policy-binding gs:// ${ GS_BUCKET } \ --member "principal://iam.googleapis.com/projects/ ${ PROJECT_NUMBER } /locations/global/workloadIdentityPools/ ${ PROJECT_ID } .svc.id.goog/subject/ns/ ${ NAMESPACE } /sa/ ${ KSA_NAME } " \ --role "roles/storage.objectUser" -
Create the Secret for Hugging Face:
kubectl create secret generic hf-secret --from-literal = hf_api_token = ${ HF_TOKEN } -
Inspect the
gcsfuse-storage.yamlmanifest: -
Apply the manifest:
kubectl apply -f gcsfuse-storage.yaml
Prepare model and data
You can run these commands locally or on a GKE Pod to populate the bucket.
-
Clone the verl repository:
git clone https://github.com/volcengine/verl.git -
Download the Qwen2.5-32B-Instructmodel using the Hugging Face CLI:
huggingface-cli download Qwen/Qwen2.5-32B-Instruct --local-dir Qwen2.5-32B-Instruct -
Preprocess the GSM8Kdataset:
python examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8k -
Upload the model, data, and the verl code to your Cloud Storage bucket:
gcloud storage cp --recursive verl gs:// ${ GS_BUCKET } /verl gcloud storage cp --recursive Qwen2.5-32B-Instruct gs:// ${ GS_BUCKET } /Qwen2.5-32B-Instruct gcloud storage cp --recursive ~/data/gsm8k/* ${ GS_BUCKET }
Deploy RayCluster custom resource
Deploy a RayCluster custom resource, which typically consists of one system Pod and multiple worker Pods.
Autopilot
-
Deploy the RayCluster. Save the following to
ray-cluster-auto.yaml: -
Apply the RayCluster:
kubectl apply -f ray-cluster.yaml
Standard
-
Deploy the RayCluster. Save the following to
ray-cluster.yaml: -
Apply the RayCluster:
kubectl apply -f ray-cluster.yaml
Launch the GRPO Job
-
Set up port forwarding to the Ray dashboard node:
kubectl port-forward svc/b200-ray-cluster-head-svc 8265 :8265 -
Inspect the
runtime-env.yamlmanifest:If you use H200GPUs, change
NCCL_TUNER_CONFIG_PATHto/usr/local/gib/configs/tuner_config_a3u.txtpb.This file is used by the Ray client. You don't need to apply this manifest to the cluster.
-
Submit the Job using
ray job submit:ray -- job submit \ --address "http://localhost:8265" \ --runtime-env runtime-env.yaml \ -- \ bash -c " cd /data/verl && PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=/data/gsm8k/train.parquet \ data.val_files=/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.max_prompt_length=512 \ data.max_response_length=512 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-32B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-5 \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=64 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ actor_rollout_ref.actor.strategy=fsdp2 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=console \ trainer.val_before_train=False \ trainer.n_gpus_per_node=8 \ trainer.nnodes=2 \ trainer.save_freq=10 \ trainer.test_freq=10 \ algorithm.adv_estimator=grpo \ actor_rollout_ref.rollout.n=8 \ trainer.total_epochs=2" 2>&1 | tee verl_demo.logMonitor the logs in the Ray Dashboard or output. Look for
critic/score/meanto increase, indicating learning.
Clean up
To avoid incurring charges, delete the resources:
kubectl
delete
raycluster
b200-ray-cluster
# change to variables
gcloud
container
clusters
delete
${
CLUSTER_NAME
}
--location =
${
CONTROL_PLANE_LOCATION
}
gcloud
storage
rm
-r
gs:// ${
GS_BUCKET
}

