Scale Agent Sandboxes dynamically using HPA and Capacity Buffers

This page explains how to dynamically scale GKE Agent Sandbox environments using the Horizontal Pod Autoscaler (HPA) and standby capacity buffers on a GKE Standard cluster.

By default, Agent Sandbox Warm Poolskeep a static number of pre-provisioned replicas ready to minimize Pod startup latency. This helps to avoid scenarios with variable traffic, where maintaining a high number of static replicas can incur high compute costs.

You can balance capacity readiness and cost savings by using dynamic scaling. This approach adjusts the size of the SandboxWarmPool based on demand and uses standby capacity buffers(suspended VMs) to proactively provision infrastructure for fast scaling without the full cost of over-provisioning active nodes.

Before you begin

Before you start, make sure that you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.

Create a cluster

To create a GKE Standard cluster with the required configurations for standby capacity buffers and Agent Sandbox, run the following command:

 gcloud  
container  
clusters  
create  
 CLUSTER_NAME 
  
 \ 
  
--location = 
 CONTROL_PLANE_LOCATION 
  
 \ 
  
--cluster-version = 
 VERSION 
  
 \ 
  
--enable-autoscaling  
 \ 
  
--enable-autoprovisioning  
 \ 
  
--max-cpu = 
 MAX_CPU 
  
 \ 
  
--max-memory = 
 MAX_MEMORY 
  
 \ 
  
--enable-agent-sandbox  
 \ 
  
--enable-image-streaming  
 \ 
  
--workload-pool = 
 PROJECT_ID 
.svc.id.goog  
 \ 
  
--monitoring = 
SYSTEM 

Replace the following:

  • CLUSTER_NAME : the name of your new cluster.
  • VERSION : the GKE version, which must be 1.36.0-gke.2208000 or later.
  • CONTROL_PLANE_LOCATION : the Compute Engine location for your new cluster. Choose a region for regional clusters (for example, us-central1 ), or a zone for zonal clusters (for example, us-central1-a ).
  • MAX_CPU : maximum CPU limits for auto-provisioning, for example 4000 .
  • MAX_MEMORY : maximum memory limits for auto-provisioning in GB, for example 12000 .
  • PROJECT_ID : your Google Cloud project ID.

Configure Agent Sandbox components

You must define a SandboxTemplate and a SandboxWarmPool to manage your sandboxed workloads.

  1. Save the following manifest as sandboxtemplate.yaml :

      apiVersion 
     : 
      
     extensions.agents.x-k8s.io/v1alpha1 
     kind 
     : 
      
     SandboxTemplate 
     metadata 
     : 
      
     name 
     : 
      
     agent-template 
      
     namespace 
     : 
      
      NAMESPACE 
     
     spec 
     : 
      
     podTemplate 
     : 
      
     metadata 
     : 
      
     labels 
     : 
      
     app 
     : 
      
     agent-sandbox-workload 
      
     spec 
     : 
      
     restartPolicy 
     : 
      
     Never 
      
     containers 
     : 
      
     - 
      
     name 
     : 
      
     python-agent 
      
     image 
     : 
      
     python:3.11-slim 
      
     command 
     : 
      
     [ 
     "/bin/sh" 
     , 
      
     "-c" 
     ] 
      
     args 
     : 
      
     [ 
     "echo 
      
     'Hello 
      
     from 
      
     the 
      
     Sandbox!' 
     && 
     sleep 
      
     3600" 
     ] 
      
     resources 
     : 
      
     requests 
     : 
      
     cpu 
     : 
      
     "1000m" 
      
     memory 
     : 
      
     "100Mi" 
     
    

    Replace NAMESPACE with your namespace, for example agent-sandbox-demo .

  2. Apply the manifest:

     kubectl  
    apply  
    -f  
    sandboxtemplate.yaml 
    
  3. Save the following manifest as sandboxwarmpool.yaml . This establishes an initial static pool of replicas.

      apiVersion 
     : 
      
     extensions.agents.x-k8s.io/v1alpha1 
     kind 
     : 
      
     SandboxWarmPool 
     metadata 
     : 
      
     name 
     : 
      
     agent-warmpool 
      
     namespace 
     : 
      
      NAMESPACE 
     
     spec 
     : 
      
     replicas 
     : 
      
     10 
      
     sandboxTemplateRef 
     : 
      
     name 
     : 
      
     agent-template 
     
    
  4. Apply the manifest:

     kubectl  
    apply  
    -f  
    sandboxwarmpool.yaml 
    

Configure metrics collection

The Agent Sandbox controller exposes a counter metric for the number of sandboxes claimed: agent_sandbox_claim_creation_total . You can configure a PodMonitoring resource to collect this metric and send it to Google Cloud Managed Service for Prometheus.

  1. Save the following manifest as podmonitoring.yaml :

      apiVersion 
     : 
      
     monitoring.googleapis.com/v1 
     kind 
     : 
      
     PodMonitoring 
     metadata 
     : 
      
     name 
     : 
      
     agent-sandbox-controller-monitoring 
      
     namespace 
     : 
      
     agent-sandbox-system 
      
     # Namespace where the controller is running 
     spec 
     : 
      
     selector 
     : 
      
     matchLabels 
     : 
      
     app 
     : 
      
     agent-sandbox-controller 
      
     endpoints 
     : 
      
     - 
      
     port 
     : 
      
     8080 
      
     # Port where metrics are exposed 
      
     path 
     : 
      
     /metrics 
      
     interval 
     : 
      
     15s 
     
    
  2. Apply the manifest:

     kubectl  
    apply  
    -f  
    podmonitoring.yaml 
    

Enable custom metrics adapter

To allow the HPA to read metrics from Google Cloud Managed Service for Prometheus, you must deploy the custom-metrics-stackdriver-adapter .

Enable the required IAM bindings. Run the following commands:

 kubectl  
create  
clusterrolebinding  
cluster-admin-binding  
 \ 
  
--clusterrole = 
cluster-admin  
--user = 
 " 
 $( 
gcloud  
config  
get-value  
account ) 
 " 
kubectl  
apply  
-f  
https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

gcloud  
projects  
add-iam-policy-binding  
 PROJECT_ID 
  
 \ 
  
--role = 
roles/monitoring.viewer  
 \ 
  
--member = 
principal://iam.googleapis.com/projects/ PROJECT_NUMBER 
/locations/global/workloadIdentityPools/ PROJECT_ID 
.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter 

Replace PROJECT_NUMBER with your Google Cloud project number.

Configure RBAC permissions for SandboxWarmPool

The capacity buffer controller needs permission to read the scale subresource of the SandboxWarmPool custom resource.

  1. Save the following manifest as capacity-buffer-rbac.yaml :

      apiVersion 
     : 
      
     rbac.authorization.k8s.io/v1 
     kind 
     : 
      
     ClusterRole 
     metadata 
     : 
      
     name 
     : 
      
     sandbox-warmpool-scale-reader 
     rules 
     : 
     - 
      
     apiGroups 
     : 
      
     [ 
     "extensions.agents.x-k8s.io" 
     ] 
      
     resources 
     : 
      
     [ 
     "sandboxwarmpools/scale" 
     ] 
      
     verbs 
     : 
      
     [ 
     "get" 
     ] 
     --- 
     apiVersion 
     : 
      
     rbac.authorization.k8s.io/v1 
     kind 
     : 
      
     ClusterRoleBinding 
     metadata 
     : 
      
     name 
     : 
      
     ca-sandbox-warmpool-scale-reader 
     subjects 
     : 
     - 
      
     kind 
     : 
      
     User 
      
     name 
     : 
      
     "system:cluster-autoscaler" 
      
     namespace 
     : 
      
     kube-system 
     roleRef 
     : 
      
     apiGroup 
     : 
      
     rbac.authorization.k8s.io 
      
     kind 
     : 
      
     ClusterRole 
      
     name 
     : 
      
     sandbox-warmpool-scale-reader 
     
    
  2. Apply the manifest:

     kubectl  
    apply  
    -f  
    capacity-buffer-rbac.yaml 
    

Configure capacity buffer

Configure a CapacityBuffer to maintain an infrastructure buffer proportional to the size of the SandboxWarmPool . For more information, see Configure capacity buffers .

  1. Save the following manifest as capacitybuffer.yaml . This example maintains a buffer equivalent to 200% of the SandboxWarmPool 's replicas using standby capacity (suspended VMs).

      apiVersion 
     : 
      
     autoscaling.x-k8s.io/v1beta1 
     kind 
     : 
      
     CapacityBuffer 
     metadata 
     : 
      
     name 
     : 
      
     agent-warmpool-buffer 
      
     namespace 
     : 
      
      NAMESPACE 
     
     spec 
     : 
      
     percentage 
     : 
      
     200 
      
     scalableRef 
     : 
      
     apiGroup 
     : 
      
     extensions.agents.x-k8s.io 
      
     kind 
     : 
      
     SandboxWarmPool 
      
     name 
     : 
      
     agent-warmpool 
      
     provisioningStrategy 
     : 
      
     "buffer.gke.io/standby-capacity" 
     
    
  2. Apply the manifest:

     kubectl  
    apply  
    -f  
    capacitybuffer.yaml 
    

Configure Horizontal Pod Autoscaler

Connect the SandboxWarmPool to the HPA to dynamically scale replicas based on the custom metric.

  1. Save the following manifest as hpa.yaml :

      apiVersion 
     : 
      
     autoscaling/v2 
     kind 
     : 
      
     HorizontalPodAutoscaler 
     metadata 
     : 
      
     name 
     : 
      
     agent-warmpool-hpa 
      
     namespace 
     : 
      
      NAMESPACE 
     
     spec 
     : 
      
     scaleTargetRef 
     : 
      
     apiVersion 
     : 
      
     extensions.agents.x-k8s.io/v1alpha1 
      
     kind 
     : 
      
     SandboxWarmPool 
      
     name 
     : 
      
     agent-warmpool 
      
     minReplicas 
     : 
      
     10 
      
     maxReplicas 
     : 
      
     100 
      
     metrics 
     : 
      
     - 
      
     type 
     : 
      
     External 
      
     external 
     : 
      
     metric 
     : 
      
     name 
     : 
      
     "prometheus.googleapis.com|agent_sandbox_claim_creation_total|counter" 
      
     selector 
     : 
      
     matchLabels 
     : 
      
     metric.labels.warmpool_name 
     : 
      
     "agent-warmpool" 
      
     target 
     : 
      
     type 
     : 
      
     Value 
      
     value 
     : 
      
     0.2 
     
    
  2. Apply the manifest:

     kubectl  
    apply  
    -f  
    hpa.yaml 
    

Monitor scaling events

You can monitor HPA and Capacity Buffer events to verify dynamic scaling.

Monitor HPA events

To watch HPA events, run the following command:

 kubectl  
get  
events  
-n  
 NAMESPACE 
  
--watch  
 \ 
  
--field-selector  
involvedObject.kind = 
HorizontalPodAutoscaler 

The sample output when scaling occurs looks similar to the following:

 SuccessfulRescale New size: 20; reason: external metric prometheus.googleapis.com|agent_sandbox_claim_creation_total|counter above target 

Monitor CapacityBuffer events

To watch Capacity Buffer events, run the following command:

 kubectl  
get  
events  
-n  
 NAMESPACE 
  
--watch  
 \ 
  
--field-selector  
involvedObject.kind = 
CapacityBuffer 

The sample output showing suspended VM resume or scale-up looks similar to the following:

 TriggeredScaleUp capacity buffer 20 fake pods triggered scale-up 

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: