Scaling an application

This page explains how to scale a deployed application in Google Kubernetes Engine (GKE).

Overview

When you deploy an application in GKE, you define how many replicas of the application you'd like to run. Each replica of your application represents a Kubernetes Pod that encapsulates your application's container(s).

When you scale an application, you either increase or decrease the number of workload replicas, or adjust the resources that are available to the replicas in-place. There are two methods to scale an application:

Horizontal scaling , where you increase or decrease the number of workload replicas.
Vertical scaling , where you adjust the resources available to replicas in-place.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.
Note: For existing gcloud CLI installations, make sure to set the compute/region property . If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location . You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Inspecting an application

Before scaling your application, you should inspect the application and ensure that it is healthy.

To see all applications deployed to your cluster, run the following command:

 kubectl  
get  
 CONTROLLER

Substitute CONTROLLER for deployments , statefulsets , or another controller object type.

For example, if you run kubectl get deployments and you have created only one Deployment, the command's output should look similar to the following:

 NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
my-app                1         1         1            1           10m

The output of this command is similar for all objects, but may appear slightly different. For Deployments, the output has six columns:

NAME lists the names of the Deployments in the cluster.
DESIRED displays the desired number of replicas , or the desired state , of the application, which you define when you create the Deployment.
CURRENT displays how many replicas are currently running.
UP-TO-DATE displays the number of replicas that have been updated to achieve the desired state.
AVAILABLE displays how many replicas of the application are available to your users.
AGE displays the amount of time that the application has been running in the cluster.

In this example, there is only one Deployment, my-app , which has only one replica because its desired state is one replica. You define the desired state at the time of creation, and you can change it at any time by scaling the application.

Inspecting StatefulSets

Before scaling a StatefulSet, you should inspect it by running the following command:

 kubectl  
describe  
statefulset  
my-app

In the output of this command, check the Pods Statusfield. If the Failed value is greater than 0 , scaling might fail.

If a StatefulSet appears to be unhealthy, perform the following:

Get a list of pods, and see which pods are unhealthy:
```
 kubectl  
get  
pods 
```
Remove the unhealthy pod:
```
 kubectl  
delete  
 POD_NAME 
 
```

Attempting to scale a StatefulSet while it is unhealthy may cause it to become unavailable.

Scaling an application

The following sections describe each method you can use to scale an application. The kubectl scale method is the fastest way to scale. However, you may prefer another method in some situations, like when updating configuration files or when performing in-place modifications.

kubectl scale

The kubectl scale command lets your instantaneously change the number of replicas you want to run your application.

To use kubectl scale , you specify the new number of replicas by setting the --replicas flag. For example, to scale my-app to four replicas, run the following command, substituting CONTROLLER for deployment , statefulset , or another controller object type:

 kubectl  
scale  
 CONTROLLER 
  
my-app  
--replicas  
 4

If successful, this command's output should be similar to deployment "my-app" scaled .

Next, run:

 kubectl  
get  
 CONTROLLER 
  
my-app

The output should look similar to the following:

 NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
my-app                4         4         4            4           15m

kubectl patch

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms . Pre-GA features are available "as is" and might have limited support. For more information, see the launch stage descriptions .

Starting in Kubernetes version 1.33 , you can use the kubectl patch command to vertically scale your workload by updating the resources that are assigned to a container, without recreating the Pod. For more information, including limitations, see the Kubernetes documentation for resizing CPU and memory resources .

To use the kubectl patch command, specify the updated resource request under the --patch flag. For example, to scale my-app to 800 mCPUs, run the following command:

 kubectl  
patch  
pod  
my-app  
--subresource  
resize  
--patch  
 \ 
  
 '{"spec":{"containers":[{"name":"pause", "resources":{"requests":{"cpu":"800m"}, "limits":{"cpu":"800m"}}}]}}'

kubectl apply

You can use kubectl apply to apply a new configuration file to an existing controller object. kubectl apply is useful for making multiple changes to a resource, and may be useful for users who prefer to manage their resources in configuration files.

To scale using kubectl apply , the configuration file you supply should include a new number of replicas in the replicas field of the object's specification.

The following is an updated version of the configuration file for the example my-app object. The example shows a Deployment, so if you use another type of controller, such as a StatefulSet, change the kind accordingly. This example works best on a cluster with at least three Nodes.

  apiVersion 
 : 
  
 apps/v1 
 kind 
 : 
  
 Deployment 
 metadata 
 : 
  
 name 
 : 
  
 my-app 
 spec 
 : 
  
 replicas 
 : 
  
 3 
  
 selector 
 : 
  
 matchLabels 
 : 
  
 app 
 : 
  
 app 
  
 template 
 : 
  
 metadata 
 : 
  
 labels 
 : 
  
 app 
 : 
  
 app 
  
 spec 
 : 
  
 containers 
 : 
  
 - 
  
 name 
 : 
  
 my-container 
  
 image 
 : 
  
 us-docker.pkg.dev/google-samples/containers/gke/hello-app:2.0

In this file, the value of the replicas field is 3 . When this configuration file is applied, the object my-app scales to three replicas.

To apply an updated configuration file, run the following command:

 kubectl  
apply  
-f  
config.yaml

Next, run:

 kubectl  
get  
 CONTROLLER 
  
my-app

The output should look similar to the following:

 NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
my-app                3         3         3            3           15m

Console

To scale a workload in the Google Cloud console, perform the following steps:

Go to the Workloadspage in the Google Cloud console.

Go to Workloads
In the workloads list, click the name of the workload you want to scale.
Click Actions > Scale > Edit replicas.
Enter the new number of Replicasfor the workload.
Click Scale.

Autoscaling Deployments

You can autoscale Deployments based on CPU utilization of Pods using kubectl autoscale or from the GKE Workloads menu in the Google Cloud console.

kubectl autoscale

kubectl autoscale creates a HorizontalPodAutoscaler (or HPA) object that targets a specified resource (called the scale target ) and scales it as needed. The HPA periodically adjusts the number of replicas of the scale target to match the average CPU utilization that you specify.

When you use kubectl autoscale , you specify a maximum and minimum number of replicas for your application, as well as a CPU utilization target. For example, to set the maximum number of replicas to six and the minimum to four, with a CPU utilization target of 50% utilization, run the following command:

 kubectl  
autoscale  
deployment  
my-app  
--max  
 6 
  
--min  
 4 
  
--cpu-percent  
 50

In this command, the --max flag is required. The --cpu-percent flag is the target CPU utilization over all the Pods. This command does not immediately scale the Deployment to six replicas, unless there is already a systemic demand.

After running kubectl autoscale , the HorizontalPodAutoscaler object is created and targets the application. When there is a change in load, the object increases or decreases the application's replicas.

To get a list of the HorizontalPodAutoscaler objects in your cluster, run:

 kubectl  
get  
hpa

To see a specific HorizontalPodAutoscaler object in your cluster, run:

 kubectl  
get  
hpa  
 HPA_NAME

Replace HPA_NAME with the name of your HorizontalPodAutoscaler object.

To see the HorizontalPodAutoscaler configuration:

 kubectl  
get  
hpa  
 HPA_NAME 
  
-o  
yaml

The output of this command is similar to the following:

  apiVersion 
 : 
  
 v1 
 items 
 : 
 - 
  
 apiVersion 
 : 
  
 autoscaling/v1 
  
 kind 
 : 
  
 HorizontalPodAutoscaler 
  
 metadata 
 : 
  
 creationTimestamp 
 : 
  
 ... 
  
 name 
 : 
  
  HPA_NAME 
 
  
 namespace 
 : 
  
 default 
  
 resourceVersion 
 : 
  
 "664" 
  
 selfLink 
 : 
  
 ... 
  
 uid 
 : 
  
 ... 
  
 spec 
 : 
  
 maxReplicas 
 : 
  
 10 
  
 minReplicas 
 : 
  
 1 
  
 scaleTargetRef 
 : 
  
 apiVersion 
 : 
  
 apps/v1 
  
 kind 
 : 
  
 Deployment 
  
 name 
 : 
  
  HPA_NAME 
 
  
 targetCPUUtilizationPercentage 
 : 
  
 50 
  
 status 
 : 
  
 currentReplicas 
 : 
  
 0 
  
 desiredReplicas 
 : 
  
 0 
 kind 
 : 
  
 List 
 metadata 
 : 
  
 {} 
 resourceVersion 
 : 
  
 "" 
 selfLink 
 : 
  
 ""

In this example output, the targetCPUUtilizationPercentage field holds the 50 percentage value passed in from the kubectl autoscale example.

To see a detailed description of a specific HorizontalPodAutoscaler object in the cluster:

 kubectl  
describe  
hpa  
 HPA_NAME

You can modify the HorizontalPodAutoscaler by applying a new configuration file with kubectl apply , using kubectl edit , or using kubectl patch .

To delete a HorizontalPodAutoscaler object:

 kubectl  
delete  
hpa  
 HPA_NAME

Console

To autoscale a Deployment, perform the following steps:

Go to the Workloadspage in the Google Cloud console.

Go to Workloads
In the workloads list, click the name of the Deployment you want to autoscale.
Click Actions > Autoscale.
Enter the Maximum number of replicasand, optionally, the Minimum number of replicasfor the Deployment.
Under Autoscaling metrics, select and configure metrics as desired.
Click Autoscale.

Autoscaling with custom metrics

You can scale your Deployments based on custom metrics exported from Cloud Monitoring .

To learn how to use custom metrics to autoscale deployments, refer to the Autoscaling deployments with custom metrics tutorial.

What's next

Learn about exposing your application externally .