Manual scaling

This page describes how to manually scale your service. It also provides instructions for a common use case, changing the instance count based on a schedule using Cloud Scheduler jobs and the Cloud Run Admin API.

Overview

By default, Cloud Run automatically scales out to a specified or default maximum number of instances depending on traffic and CPU utilization. However, for some use-cases, you might want the ability to set a specific number of instances, using manual scaling.

Manual scaling lets you set a specific instance count, regardless of traffic or utilization, and without requiring redeployment. All of this gives you the option to write your own scaling logic using an external system. See Schedule-based scaling for an example of this.

Revision-level minimum and maximum settings and manual scaling

If you set your service to manual scaling, any revision-level minimum and maximum instance settings are ignored.

Traffic splits for manual scaling

The following list describes how instances are allocated when splitting traffic under manual scaling. This includes behavior for traffic-tag-only revisions.

During a traffic split, each revision is allocated instances proportionally, based on traffic split, similar to traffic splitting with service-level minimum instances .
If the number of revisions receiving traffic exceeds the manual instance count, some of the revisions will have no instances. Traffic sent to those revisions will get the same error as if the revisions were disabled.
For all revisions receiving traffic in a traffic split, any revision-level minimum and maximum instances are disabled.
If a revision is active only due to traffic tags :
- If revision-level minimum instances is set, the specified number of instances will start but does not count toward the total service manual instance count. The revision will not autoscale.
- If revision-level minimum instances is not set, the revision scales out to at most one instance, in response to traffic sent to the tag URL.

Billing behavior using manual scaling

When you use manual scaling, billing behavior is similar to the behavior when you use the minimum instances feature.

That is, with manual scaling and instance-based billing , manually scaled idle instances are billed as active instances.

If you use manual scaling with request-based billing , manually scaled idle instances are billed as idle minimum-instances . For complete billing details, see the pricing page .

Required roles

To get the permissions that you need to deploy Cloud Run services, ask your administrator to grant you the following IAM roles:

Cloud Run Developer ( roles/run.developer ) on the Cloud Run service
Service Account User ( roles/iam.serviceAccountUser ) on the service identity
Artifact Registry Reader ( roles/artifactregistry.reader ) on the Artifact Registry repository of the deployed container image (if applicable)

For a list of IAM roles and permissions that are associated with Cloud Run, see Cloud Run IAM roles and Cloud Run IAM permissions . If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see the service identity configuration guide . For more information about granting roles, see deployment permissions and manage access .

Configure manual scaling

You can configure the scaling mode using the Google Cloud console, the Google Cloud CLI, YAML file, or API when you create a service or update a revision :

Console

In the Google Cloud console, go to the Cloud Run Servicespage:

Go to Cloud Run
If you are configuring a new service, click Deploy containerto display the Create serviceform. If you are configuring an existing service, click the service to display its detail panel, then click the pen icon next to Scalingat the top right of the detail panel.
Locate the Service scalingform (for a new service) or the Edit scalingform for an existing service.

In the field labelled Number of instances , specify the number of container instances for the service.
Click Createfor a new service or Savefor an existing service.

gcloud

To specify scaling for a new service, use the deploy command :

gcloud  
run  
deploy  
 SERVICE 
  
 \ 
  
--scaling = 
 INSTANCE_COUNT 
  
 \ 
  
--image  
 IMAGE_URL

Replace the following:

SERVICE : the name of your service.
INSTANCE_COUNT : the number of instances for the service. This sets the service to manual scaling. Specify a value of 0 to disable the service. Specify a value of auto to use the default Cloud Run autoscaling behavior.
IMAGE_URL : a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest . If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION -docker.pkg.dev/ PROJECT_ID / REPO_NAME / PATH : TAG .

Specify scaling for an existing service by using the following update command :

gcloud  
run  
services  
update  
 SERVICE 
  
 \ 
  
--scaling = 
 INSTANCE_COUNT

YAML

If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration :
```
gcloud  
run  
services  
describe  
 SERVICE 
  
--format  
 export 
  
>  
service.yaml
```

Update the scalingMode and manualInstanceCount attributes:

 apiVersion 
 : 
  
 serving.knative.dev/v1 
 kind 
 : 
  
 Service 
 metadata 
 : 
  
 name 
 : 
  
  SERVICE 
 
  
 annotations 
 : 
  
 run.googleapis.com/scalingMode 
 : 
  
  MODE 
 
  
 run.googleapis.com/manualInstanceCount 
 : 
  
  INSTANCE_COUNT

Replace the following:

SERVICE : the name of your Cloud Run service
MODE : manual for manual scaling, or automatic for the default Cloud Run autoscaling behavior.
INSTANCE_COUNT : the number of instances you are manually scaling for the service. Specify a value of 0 to disable the service.

Create or update the service using the following command:

gcloud  
run  
services  
replace  
service.yaml

REST API

To update service-level minimum instances for a given service, send a PATCH HTTP request to the Cloud Run Admin API service endpoint .

For example, using curl :

  
curl  
-H  
 "Content-Type: application/json" 
  
 \ 
  
-H  
 "Authorization: Bearer ACCESS_TOKEN 
" 
  
 \ 
  
-X  
PATCH  
 \ 
  
-d  
 '{"scaling":{"manualInstanceCount": MANUAL_INSTANCE_COUNT 
}}' 
  
 \ 
  
https://run.googleapis.com/v2/projects/ PROJECT_ID 
/locations/ REGION 
/services/ SERVICE 
?update_mask = 
scaling.manualInstanceCount

Replace the following:

ACCESS_TOKEN : a valid access token for an account that has the IAM permissions to update a service . For example, if you are logged into gcloud , you can retrieve an access token using gcloud auth print-access-token . From within a Cloud Run container instance, you can retrieve an access token using the container instance metadata server .
MANUAL_INSTANCE_COUNT : the number of instances for the service. This sets the service to manual scaling. Specify a value of 0 to disable the service.
SERVICE : the name of the service.
REGION : the Google Cloud region that the service is deployed in.
PROJECT_ID : the Google Cloud project ID.

To switch the scaling mode from manual to automatic, send a PATCH request to the Cloud Run Admin API service endpoint and set the scalingMode field to AUTOMATIC .

For example, run the following curl command:

  
curl  
-H  
 "Content-Type: application/json" 
  
 \ 
  
-H  
 "Authorization: Bearer ACCESS_TOKEN 
" 
  
 \ 
  
-X  
PATCH  
 \ 
  
-d  
 '{"launchStage":"BETA","scaling":{"scalingMode": "AUTOMATIC","manualInstanceCount":null}}' 
  
 \ 
  
https://run.googleapis.com/v2/projects/ PROJECT_ID 
/locations/ REGION 
/services/ SERVICE 
?update_mask = 
launchStage,scaling.scalingMode,scaling.manualInstanceCount

Terraform

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands .

Add the following to a google_cloud_run_v2_service resource in your Terraform configuration:

  resource 
  
 "google_cloud_run_v2_service" 
  
 "default" 
  
 { 
  
 name 
  
 = 
  
 " SERVICE_NAME 
" 
  
 location 
  
 = 
  
 " REGION 
" 
  
 template 
  
 { 
  
 containers 
  
 { 
  
 image 
  
 = 
  
 " IMAGE_URL 
" 
  
 } 
  
 } 
  
 scaling 
  
 { 
  
 scaling_mode 
  
 = 
  
 "MANUAL" 
  
 manual_instance_count 
  
 = 
  
 " INSTANCE_COUNT 
" 
  
 } 
 }

Replace the following:

SERVICE_NAME : the name of your Cloud Run service.
REGION : the Google Cloud region. For example, europe-west1 .
IMAGE_URL : a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest . If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION -docker.pkg.dev/ PROJECT_ID / REPO_NAME / PATH : TAG .
INSTANCE_COUNT : the number of instances you are manually scaling for the service. This number of instances is divided among all revisions with specified traffic based on the percent of traffic they are receiving.

View scaling configuration for your service

To view the scaling configuration instances for your Cloud Run service:

Console

In the Google Cloud console, go to the Cloud Run Servicespage:

Go to Cloud Run
Click the service you are interested in to open the Service detailspanel.
The current scaling setting is shown at the upper right of the service details panel, after the Scalinglabel, next to the pen icon.

gcloud

Use the following command to view the current scaling configuration for the service:

gcloud  
run  
services  
describe  
 SERVICE

Replace SERVICE with the name of your service.

Look for the field Scaling: Manual (Instances: ) near the top of the text returned from the describe .

YAML

Use the following command to download the service YAML configuration :

gcloud  
run  
services  
describe  
 SERVICE 
  
--format  
 export 
 > 
service.yaml

The scaling configuration is contained in the scalingMode and manualInstanceCount attributes.

Disable a service

When you disable a service, any requests that are currently being processed will be allowed to complete. However, any further requests to the service URL will fail with a Service unavailable or Service disabled error.

Requests to service revisions that are only active due to traffic tags are not impacted because those revisions are not disabled.

To disable a service, you set scaling to zero. You can disable a service using the Google Cloud console, the Google Cloud CLI, YAML file, or API:

Console

In the Google Cloud console, go to the Cloud Run Servicespage:

Go to Cloud Run
Click the service you want to disable to display its detail panel, then click the pen icon next to Scalingat the top right of the detail panel.
Locate the Edit scalingform and select Manual scaling.

In the field labelled Number of instances , enter the value 0 (zero).
Click Save.

gcloud

To disable a service, use the following command to set scaling to zero:

gcloud  
run  
services  
update  
 SERVICE 
  
--scaling = 
 0

Replace SERVICE with the name of your service.

YAML

Download your service's YAML configuration :

gcloud  
run  
services  
describe  
 SERVICE 
  
--format  
 export 
  
>  
service.yaml

Set the manualInstanceCount attribute to zero ( 0 ):

 apiVersion 
 : 
  
 serving.knative.dev/v1 
 kind 
 : 
  
 Service 
 metadata 
 : 
  
 name 
 : 
  
  SERVICE 
 
  
 annotations 
 : 
  
 run.googleapis.com/scalingMode 
 : 
  
 manual 
  
 run.googleapis.com/manualInstanceCount 
 : 
  
 ` 
 0`

Replace SERVICE with the name of your Cloud Run service.

Create or update the service using the following command:

gcloud  
run  
services  
replace  
service.yaml

REST API

To disable a service, send a PATCH HTTP request to the Cloud Run Admin API service endpoint .

For example, using curl :

  
curl  
-H  
 "Content-Type: application/json" 
  
 \ 
  
-H  
 "Authorization: Bearer ACCESS_TOKEN 
" 
  
 \ 
  
-X  
PATCH  
 \ 
  
-d  
 '{"scaling":{"manualInstanceCount":0 }}' 
  
 \ 
  
https://run.googleapis.com/v2/projects/ PROJECT_ID 
/locations/ REGION 
/services/ SERVICE 
?update_mask = 
scaling.manualInstanceCount

Replace the following:

ACCESS_TOKEN : a valid access token for an account that has the IAM permissions to update a service . For example, if you are logged into gcloud , you can retrieve an access token using gcloud auth print-access-token . From within a Cloud Run container instance, you can retrieve an access token using the container instance metadata server .
SERVICE : the name of the service.
REGION : the Google Cloud region that the service is deployed in.
PROJECT_ID : the Google Cloud project ID.

Terraform

To disable a service, set the manual_instance_count attribute to zero ( 0 ):

  resource 
  
 "google_cloud_run_v2_service" 
  
 "default" 
  
 { 
  
 name 
  
 = 
  
 " SERVICE_NAME 
" 
  
 location 
  
 = 
  
 " REGION 
" 
  
 template 
  
 { 
  
 containers 
  
 { 
  
 image 
  
 = 
  
 " IMAGE_URL 
" 
  
 } 
  
 } 
  
 scaling 
  
 { 
  
 scaling_mode 
  
 = 
  
 "MANUAL" 
  
 manual_instance_count 
  
 = 
  
 "0" 
  
 } 
 }

Replace the following:

SERVICE_NAME : the name of your Cloud Run service.
REGION : the Google Cloud region. For example, europe-west1 .
IMAGE_URL : a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest . If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION -docker.pkg.dev/ PROJECT_ID / REPO_NAME / PATH : TAG

Schedule-based scaling example

A common use case of manual scaling is changing the instance count based on a predefined schedule. In this example, we use Cloud Scheduler to schedule two jobs, each of which invokes the Cloud Run Admin API to scale the number of instances. The first Cloud Scheduler job sets the service to manually scale to a specified number of instances during business hours (9am-5pm, M-F). The second job sets the service to scale down to a specified number of instances during off-hours.

In this example, we use the Cloud Run quickstart for simplicity, but you can use a service of your choice.

To set up schedule-based manual scaling:

Deploy your service using the following command:

gcloud  
run  
deploy  
 SERVICE 
  
 \ 
  
--image = 
us-docker.pkg.dev/cloudrun/container/hello  
 \ 
  
--region = 
 REGION 
  
 \ 
  
--project  
 PROJECT_ID

Replace the following:

SERVICE : the name of the Cloud Run service.
REGION : the region the Cloud Run service is deployed to.
PROJECT_ID : the Google Cloud project ID.

Configure your service for manual scaling to 10 instances using the following command:

gcloud  
run  
services  
update  
 SERVICE 
  
 \ 
  
--region = 
 REGION 
  
 \ 
  
--scaling = 
 10

Create a Cloud Scheduler job that manually scales to a specified number of service instances during business hours:
```
gcloud  
scheduler  
 jobs 
  
create  
http  
hello-start-instances  
 \ 
  
--location = 
 REGION 
  
 \ 
  
--schedule = 
 "0 9 * * MON-FRI" 
  
 \ 
  
--time-zone = 
America/Los_Angeles  
 \ 
  
--uri = 
https://run.googleapis.com/v2/projects/ PROJECT_ID 
/  
locations/ REGION 
/services/hello?update_mask = 
launchStage,scaling.manualInstanceCount  
 \ 
  
--headers = 
Content-Type = 
application/json,X-HTTP-Method-Override = 
PATCH  
 \ 
  
--http-method = 
PUT  
 \ 
  
--message-body = 
 '{"scaling":{"manualInstanceCount": INSTANCE_COUNT 
}}' 
  
 \ 
  
--oauth-service-account-email = 
 PROJECT_NUMBER 
-compute@developer.gserviceaccount.com
```
Replace the following:
- REGION : the region the Cloud Run service is deployed to.
- PROJECT_ID : the Google Cloud project ID.
- INSTANCE_COUNT : the number of instances you want to scale to—for example, 10 .
- PROJECT_NUMBER : the Google Cloud project number.
This command creates a Cloud Scheduler job that makes an HTTP call to the Cloud Run Admin API, setting the number of instances to the number you specify. The example uses the Compute Engine default service account PROJECT_NUMBER -compute@developer.gserviceaccount.com for the Cloud Scheduler jobs. You can use any service account that has permissions to update Cloud Run services.

Note: This example uses the X-HTTP-Method-Override=PATCH header because the Cloud Scheduler CLI does not support setting http-method=PATCH . If you configure the Cloud Scheduler job using the Google Cloud console, you can set the HTTP method to PATCH, and exclude the header.

Create a Cloud Scheduler job that manually scales down service instances during off hours:

gcloud  
scheduler  
 jobs 
  
create  
http  
hello-stop-instances  
 \ 
  
--location = 
 REGION 
  
 \ 
  
--schedule = 
 "0 17 * * MON-FRI" 
  
 \ 
  
--time-zone = 
America/Los_Angeles  
 \ 
  
--uri = 
https://run.googleapis.com/v2/projects/ PROJECT_ID 
/  
locations/ REGION 
/services/hello?update_mask = 
launchStage,scaling.manualInstanceCount  
 \ 
  
--headers = 
Content-Type = 
application/json,X-HTTP-Method-Override = 
PATCH  
 \ 
  
--http-method = 
PUT  
 \ 
  
--message-body = 
 '{"scaling":{"manualInstanceCount": INSTANCE_COUNT 
}}' 
  
 \ 
  
--oauth-service-account-email = 
 PROJECT_NUMBER 
-compute@developer.gserviceaccount.com

Replace the following:

REGION : the region the Cloud Run service is deployed to.
PROJECT_ID : the Google Cloud project ID.
INSTANCE_COUNT : the number of instances you want to scale to. To disable the service, set this to 0 .
PROJECT_NUMBER : the Google Cloud project number.

This command creates a Cloud Scheduler job that makes an HTTP call to the Cloud Run Admin API, setting manual scaling instances to the number of instances you specified. Setting the instances to zero effectively disables the service, but not the Cloud Scheduler jobs. Those jobs continue to run and will reset (and re-enable) the service to an increased number of instances as scheduled.

Manual scaling Stay organized with collections Save and categorize content based on your preferences.

Overview

Revision-level minimum and maximum settings and manual scaling

Traffic splits for manual scaling

Billing behavior using manual scaling

Required roles

Configure manual scaling

Console

gcloud

YAML

REST API

Terraform

View scaling configuration for your service

Console

gcloud

YAML

Disable a service

Console

gcloud

YAML

REST API

Terraform

Schedule-based scaling example

Manual scaling