Scale Agent Sandboxes dynamically using HPA and Capacity Buffers

Standard

This page explains how to dynamically scale GKE Agent Sandbox environments using the Horizontal Pod Autoscaler (HPA) and standby capacity buffers on a GKE Standard cluster.

By default, Agent Sandbox Warm Poolskeep a static number of pre-provisioned replicas ready to minimize Pod startup latency. This helps to avoid scenarios with variable traffic, where maintaining a high number of static replicas can incur high compute costs.

You can balance capacity readiness and cost savings by using dynamic scaling. This approach adjusts the size of the SandboxWarmPool based on demand and uses standby capacity buffers(suspended VMs) to proactively provision infrastructure for fast scaling without the full cost of over-provisioning active nodes.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.
Note: For existing gcloud CLI installations, make sure to set the compute/region property . If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location . You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

A GKE Standard cluster running version 1.36.0-gke.2208000 or later.

Note: Standby buffers are available in experimental GKE version 1.35.2-gke.1842002.
Enable the Agent Sandboxadd-on on your cluster .

Create a cluster

To create a GKE Standard cluster with the required configurations for standby capacity buffers and Agent Sandbox, run the following command:

 gcloud  
container  
clusters  
create  
 CLUSTER_NAME 
  
 \ 
  
--location = 
 CONTROL_PLANE_LOCATION 
  
 \ 
  
--cluster-version = 
 VERSION 
  
 \ 
  
--enable-autoscaling  
 \ 
  
--enable-autoprovisioning  
 \ 
  
--max-cpu = 
 MAX_CPU 
  
 \ 
  
--max-memory = 
 MAX_MEMORY 
  
 \ 
  
--enable-agent-sandbox  
 \ 
  
--enable-image-streaming  
 \ 
  
--workload-pool = 
 PROJECT_ID 
.svc.id.goog  
 \ 
  
--monitoring = 
SYSTEM

Replace the following:

CLUSTER_NAME : the name of your new cluster.
VERSION : the GKE version, which must be 1.36.0-gke.2208000 or later.
CONTROL_PLANE_LOCATION : the Compute Engine location for your new cluster. Choose a region for regional clusters (for example, us-central1 ), or a zone for zonal clusters (for example, us-central1-a ).
MAX_CPU : maximum CPU limits for auto-provisioning, for example 4000 .
MAX_MEMORY : maximum memory limits for auto-provisioning in GB, for example 12000 .
PROJECT_ID : your Google Cloud project ID.

Configure Agent Sandbox components

You must define a SandboxTemplate and a SandboxWarmPool to manage your sandboxed workloads.

Save the following manifest as sandboxtemplate.yaml :

  apiVersion 
 : 
  
 extensions.agents.x-k8s.io/v1alpha1 
 kind 
 : 
  
 SandboxTemplate 
 metadata 
 : 
  
 name 
 : 
  
 agent-template 
  
 namespace 
 : 
  
  NAMESPACE 
 
 spec 
 : 
  
 podTemplate 
 : 
  
 metadata 
 : 
  
 labels 
 : 
  
 app 
 : 
  
 agent-sandbox-workload 
  
 spec 
 : 
  
 restartPolicy 
 : 
  
 Never 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 python-agent 
  
 image 
 : 
  
 python:3.11-slim 
  
 command 
 : 
  
 [ 
 "/bin/sh" 
 , 
  
 "-c" 
 ] 
  
 args 
 : 
  
 [ 
 "echo 
  
 'Hello 
  
 from 
  
 the 
  
 Sandbox!' 
 && 
 sleep 
  
 3600" 
 ] 
  
 resources 
 : 
  
 requests 
 : 
  
 cpu 
 : 
  
 "1000m" 
  
 memory 
 : 
  
 "100Mi"

Replace NAMESPACE with your namespace, for example agent-sandbox-demo .

Apply the manifest:

 kubectl  
apply  
-f  
sandboxtemplate.yaml

Save the following manifest as sandboxwarmpool.yaml . This establishes an initial static pool of replicas.

  apiVersion 
 : 
  
 extensions.agents.x-k8s.io/v1alpha1 
 kind 
 : 
  
 SandboxWarmPool 
 metadata 
 : 
  
 name 
 : 
  
 agent-warmpool 
  
 namespace 
 : 
  
  NAMESPACE 
 
 spec 
 : 
  
 replicas 
 : 
  
 10 
  
 sandboxTemplateRef 
 : 
  
 name 
 : 
  
 agent-template

Apply the manifest:

 kubectl  
apply  
-f  
sandboxwarmpool.yaml

Configure metrics collection

The Agent Sandbox controller exposes a counter metric for the number of sandboxes claimed: agent_sandbox_claim_creation_total . You can configure a PodMonitoring resource to collect this metric and send it to Google Cloud Managed Service for Prometheus.

Save the following manifest as podmonitoring.yaml :

  apiVersion 
 : 
  
 monitoring.googleapis.com/v1 
 kind 
 : 
  
 PodMonitoring 
 metadata 
 : 
  
 name 
 : 
  
 agent-sandbox-controller-monitoring 
  
 namespace 
 : 
  
 agent-sandbox-system 
  
 # Namespace where the controller is running 
 spec 
 : 
  
 selector 
 : 
  
 matchLabels 
 : 
  
 app 
 : 
  
 agent-sandbox-controller 
  
 endpoints 
 : 
  
 - 
  
 port 
 : 
  
 8080 
  
 # Port where metrics are exposed 
  
 path 
 : 
  
 /metrics 
  
 interval 
 : 
  
 15s

Apply the manifest:

 kubectl  
apply  
-f  
podmonitoring.yaml

Enable custom metrics adapter

To allow the HPA to read metrics from Google Cloud Managed Service for Prometheus, you must deploy the custom-metrics-stackdriver-adapter .

Enable the required IAM bindings. Run the following commands:

 kubectl  
create  
clusterrolebinding  
cluster-admin-binding  
 \ 
  
--clusterrole = 
cluster-admin  
--user = 
 " 
 $( 
gcloud  
config  
get-value  
account ) 
 " 
kubectl  
apply  
-f  
https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

gcloud  
projects  
add-iam-policy-binding  
 PROJECT_ID 
  
 \ 
  
--role = 
roles/monitoring.viewer  
 \ 
  
--member = 
principal://iam.googleapis.com/projects/ PROJECT_NUMBER 
/locations/global/workloadIdentityPools/ PROJECT_ID 
.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter

Replace PROJECT_NUMBER with your Google Cloud project number.

Configure RBAC permissions for SandboxWarmPool

The capacity buffer controller needs permission to read the scale subresource of the SandboxWarmPool custom resource.

Save the following manifest as capacity-buffer-rbac.yaml :

  apiVersion 
 : 
  
 rbac.authorization.k8s.io/v1 
 kind 
 : 
  
 ClusterRole 
 metadata 
 : 
  
 name 
 : 
  
 sandbox-warmpool-scale-reader 
 rules 
 : 
 - 
  
 apiGroups 
 : 
  
 [ 
 "extensions.agents.x-k8s.io" 
 ] 
  
 resources 
 : 
  
 [ 
 "sandboxwarmpools/scale" 
 ] 
  
 verbs 
 : 
  
 [ 
 "get" 
 ] 
 --- 
 apiVersion 
 : 
  
 rbac.authorization.k8s.io/v1 
 kind 
 : 
  
 ClusterRoleBinding 
 metadata 
 : 
  
 name 
 : 
  
 ca-sandbox-warmpool-scale-reader 
 subjects 
 : 
 - 
  
 kind 
 : 
  
 User 
  
 name 
 : 
  
 "system:cluster-autoscaler" 
  
 namespace 
 : 
  
 kube-system 
 roleRef 
 : 
  
 apiGroup 
 : 
  
 rbac.authorization.k8s.io 
  
 kind 
 : 
  
 ClusterRole 
  
 name 
 : 
  
 sandbox-warmpool-scale-reader

Apply the manifest:

 kubectl  
apply  
-f  
capacity-buffer-rbac.yaml

Configure capacity buffer

Configure a CapacityBuffer to maintain an infrastructure buffer proportional to the size of the SandboxWarmPool . For more information, see Configure capacity buffers .

Save the following manifest as capacitybuffer.yaml . This example maintains a buffer equivalent to 200% of the SandboxWarmPool 's replicas using standby capacity (suspended VMs).

  apiVersion 
 : 
  
 autoscaling.x-k8s.io/v1beta1 
 kind 
 : 
  
 CapacityBuffer 
 metadata 
 : 
  
 name 
 : 
  
 agent-warmpool-buffer 
  
 namespace 
 : 
  
  NAMESPACE 
 
 spec 
 : 
  
 percentage 
 : 
  
 200 
  
 scalableRef 
 : 
  
 apiGroup 
 : 
  
 extensions.agents.x-k8s.io 
  
 kind 
 : 
  
 SandboxWarmPool 
  
 name 
 : 
  
 agent-warmpool 
  
 provisioningStrategy 
 : 
  
 "buffer.gke.io/standby-capacity"

Apply the manifest:

 kubectl  
apply  
-f  
capacitybuffer.yaml

Configure Horizontal Pod Autoscaler

Connect the SandboxWarmPool to the HPA to dynamically scale replicas based on the custom metric.

Save the following manifest as hpa.yaml :

  apiVersion 
 : 
  
 autoscaling/v2 
 kind 
 : 
  
 HorizontalPodAutoscaler 
 metadata 
 : 
  
 name 
 : 
  
 agent-warmpool-hpa 
  
 namespace 
 : 
  
  NAMESPACE 
 
 spec 
 : 
  
 scaleTargetRef 
 : 
  
 apiVersion 
 : 
  
 extensions.agents.x-k8s.io/v1alpha1 
  
 kind 
 : 
  
 SandboxWarmPool 
  
 name 
 : 
  
 agent-warmpool 
  
 minReplicas 
 : 
  
 10 
  
 maxReplicas 
 : 
  
 100 
  
 metrics 
 : 
  
 - 
  
 type 
 : 
  
 External 
  
 external 
 : 
  
 metric 
 : 
  
 name 
 : 
  
 "prometheus.googleapis.com|agent_sandbox_claim_creation_total|counter" 
  
 selector 
 : 
  
 matchLabels 
 : 
  
 metric.labels.warmpool_name 
 : 
  
 "agent-warmpool" 
  
 target 
 : 
  
 type 
 : 
  
 Value 
  
 value 
 : 
  
 0.2

Apply the manifest:
```
 kubectl  
apply  
-f  
hpa.yaml 
```

Monitor scaling events

You can monitor HPA and Capacity Buffer events to verify dynamic scaling.

Monitor HPA events

To watch HPA events, run the following command:

 kubectl  
get  
events  
-n  
 NAMESPACE 
  
--watch  
 \ 
  
--field-selector  
involvedObject.kind = 
HorizontalPodAutoscaler

The sample output when scaling occurs looks similar to the following:

 SuccessfulRescale New size: 20; reason: external metric prometheus.googleapis.com|agent_sandbox_claim_creation_total|counter above target

Monitor CapacityBuffer events

To watch Capacity Buffer events, run the following command:

 kubectl  
get  
events  
-n  
 NAMESPACE 
  
--watch  
 \ 
  
--field-selector  
involvedObject.kind = 
CapacityBuffer

The sample output showing suspended VM resume or scale-up looks similar to the following:

 TriggeredScaleUp capacity buffer 20 fake pods triggered scale-up

What's next

Learn more about Agent Sandbox .
Learn more about Capacity buffers .

Scale Agent Sandboxes dynamically using HPA and Capacity Buffers Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Create a cluster

Configure Agent Sandbox components

Configure metrics collection

Enable custom metrics adapter

Configure RBAC permissions for SandboxWarmPool

Configure capacity buffer

Configure Horizontal Pod Autoscaler

Monitor scaling events

Monitor HPA events

Monitor CapacityBuffer events

What's next

Scale Agent Sandboxes dynamically using HPA and Capacity Buffers