Set maximum instances for services

This page describes how to set the maximum number of instances that can be used for your Cloud Run service using the default Cloud Run autoscaling behavior. To manually scale your service, see Manual scaling .

Specifying maximum instances in Cloud Run lets you limit the scaling of your service in response to incoming requests, although this maximum setting can be exceeded for a brief period due to circumstances such as traffic spikes .

You can use this setting as a way to control your costs or to limit the number of connections to a backing service, such as to a database.

For information about the maximum instance limits that might apply to your service, refer to Maximum instances limits .

For more information on the way Cloud Run autoscales container instances, refer to Instance autoscaling .

Apply maximum instances at service-level versus revision-level

You can configure maximum instances at the service level or at the revision level. Google recommends that you use service-level maximum instances unless you have a specific need to limit instances at the revision level.

When applying maximum instances, the settings go into effect as follows:

Service-level: immediately
Revision-level: upon deployment of the revision

Tagged revisions and service-level maximum instances

Tagged revisions are started, but only count toward the service-level maximum instances if they are a part of a traffic split.

Required roles

To get the permissions that you need to configure and deploy Cloud Run services, ask your administrator to grant you the following IAM roles:

Cloud Run Developer ( roles/run.developer ) on the Cloud Run service
Service Account User ( roles/iam.serviceAccountUser ) on the service identity

If you are deploying a service or function from source code, you must also have additional roles granted to you on your project and Cloud Build service account.

For a list of IAM roles and permissions that are associated with Cloud Run, see Cloud Run IAM roles and Cloud Run IAM permissions . If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see the service identity configuration guide . For more information about granting roles, see deployment permissions and manage access .

Configure service-level maximum instances

You can change the maximum instances setting using the Google Cloud console, the Google Cloud CLI, or a YAML file when you create a new service or deploy a new revision .

Console

In the Google Cloud console, go to Cloud Run:

Go to Cloud Run
If you are configuring a new service, select Servicesand click Deploy containerto display the Create serviceform. Locate the Service scalingform.
If you are configuring an existing service, click the service to display its detail panel, then click the Edit service level scaling settingsat the top right of the detail panel.
In the field labelled Maximum number of instances, specify the required maximum number of container instances, using any integer value from 1 to the maximum limit possible for your service.
Click Createfor a new service or Deployfor an existing service.

gcloud

Note: If you are using the gcloud CLI to deploy a function , you must specify the required flags when running gcloud run deploy , and have the required roles granted to you.

You can update the maximum number of instances of a given service by using the following command:

gcloud  
beta  
run  
services  
update  
 SERVICE 
  
--max  
 MAX-VALUE

Replace the following:

SERVICE : the name of your service.
MAX-VALUE : the required maximum number of container instances, using any integer value from 1 to the maximum limit possible for your service.

You can also set the maximum number of instances during deployment using the command:

gcloud  
beta  
run  
deploy  
--image  
 IMAGE_URL 
  
--max  
 MAX-VALUE

Replace the following:

IMAGE_URL : a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest . If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION -docker.pkg.dev/ PROJECT_ID / REPO_NAME / PATH : TAG .
MAX-VALUE : the required maximum number of container instances, using any integer value from 1 to the maximum limit .

YAML

Any configuration change leads to the creation of a new revision. Subsequent revisions will also automatically get this configuration setting unless you make explicit updates to change it.

If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration :
```
gcloud  
run  
services  
describe  
 SERVICE 
  
--format  
 export 
  
>  
service.yaml
```

Update the run.googleapis.com/maxScale: attribute:

 apiVersion 
 : 
  
 serving.knative.dev/v1 
 kind 
 : 
  
 Service 
 metadata 
 : 
  
 name 
 : 
  
  SERVICE 
 
  
 annotations 
 : 
  
 run.googleapis.com/launch-stage 
 : 
  
 BETA 
  
 run.googleapis.com/maxScale 
 : 
  
 ' MAX-INSTANCE 
'

Replace the following:

SERVICE : the name of your Cloud Run service
MAX-INSTANCE : the required maximum number of container instances, using any integer value from 1 to the maximum limit possible for your service.

Create or update the service using the following command:

gcloud  
run  
services  
replace  
service.yaml

View service-level maximum instances

To view the current service-level maximum instances settings for your Cloud Run service:

Console

In the Google Cloud console, go to Cloud Run:

Go to Cloud Run
Click that service to open its Service detailspanel.
View the current setting in the upper right of the service details panel, next to Scaling.

gcloud

Use the following command:

gcloud  
run  
services  
describe  
 SERVICE

Locate the value for Scaling: Auto (Min: MIN_VALUE , Max: MAX_VALUE )in the returned configuration.

Configure revision-level maximum instances

Any configuration change leads to the creation of a new revision. Subsequent revisions will also automatically get this configuration setting unless you make explicit updates to change it.

By default, Cloud Run revisions are configured to scale up to a maximum of 100 instances.

You can change the maximum instances setting using the Google Cloud console, the Google Cloud CLI, or a YAML file when you create a new service or deploy a new revision .

Console

In the Google Cloud console, go to Cloud Run:

Go to Cloud Run
Find and click the service you want to update in the services list to open that service's details.
Click Edit and deploy new revisionto display the revision deployment form.
Click the Containertab.
Locate the Revision scalingsection. In the field labelled Maximum number of instances , specify the maximum number of container instances.
Click Deploy.

gcloud

Note: If you are using the gcloud CLI to deploy a function , you must specify the required flags when running gcloud run deploy , and have the required roles granted to you.

You can update the maximum number of instances of a given service by using the following command:

gcloud  
beta  
run  
services  
update  
 SERVICE 
  
--max-instances  
 MAX-VALUE

Replace the following:

SERVICE : the name of your service.
MAX-VALUE : the required maximum number of container instances, using any integer value from 1 to the maximum limit .

YAML

If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration :
```
gcloud  
run  
services  
describe  
 SERVICE 
  
--format  
 export 
  
>  
service.yaml
```

Update the autoscaling.knative.dev/maxScale: attribute:

 apiVersion 
 : 
  
 serving.knative.dev/v1 
 kind 
 : 
  
 Service 
 metadata 
 : 
  
 name 
 : 
  
  SERVICE 
 
  
 annotations 
 : 
  
 run.googleapis.com/launch-stage 
 : 
  
 BETA 
 spec 
 : 
  
 template 
 : 
  
 metadata 
 : 
  
 annotations 
 : 
  
 autoscaling.knative.dev/maxScale 
 : 
  
 ' MAX-INSTANCE 
' 
  
 name 
 : 
  
  REVISION

Replace the following:

SERVICE : the name of your Cloud Run service
MAX-INSTANCE : the required maximum number of container instances, using any integer value from 1 to the maximum limit .
REVISION with a new revision name or delete it (if present). If you supply a new revision name, it must meet the following criteria:
- Starts with SERVICE -
- Contains only lowercase letters, numbers and -
- Does not end with a -
- Does not exceed 63 characters

Create or update the service using the following command:

gcloud  
run  
services  
replace  
service.yaml

View revision-level maximum instance settings

To view the current revision-level maximum instances settings for your Cloud Run service:

Console

In the Google Cloud console, go to Cloud Run:

Go to Cloud Run
Click that service to open its Service detailspanel.
Click the Revisionstab.
In the details panel at the right, view the Revision max. instancessetting listed under the Containertab.

gcloud

Use the following command:

gcloud  
run  
services  
describe  
 SERVICE

Locate the value for Max instances:in the returned configuration.

Use both service-level and revision-level minimum or maximum instances

The following table shows the behavior if you combine service-level maximum instances with revision-level minimum or maximum instances:

Service-level setting	Revision-level setting	Behavior
Service-level maximum instances set	Revision-level maximum instances set	Effective maximum instance limit is the lesser value between revision-level maximum instances and service-level maximum instances.
Service-level maximum instances set	Revision-level minimum instances set	If service-level maximum instances is set to a value lower than revision-level minimum instances, then the revision starts instances up to the service-level maximum instances, and won't reach the configured revision-level minimum instances.

Use service-level maximum instances with traffic splitting

If you use traffic splitting , the service-level maximum instances are distributed across the revisions based on the proportion of the traffic split. For example, if the service-level maximum instances = 100, a 50/50 traffic split allocates 50 service-level maximum instances to each revision. The following table shows a sample configuration scenario:

Sample configuration

Resulting behavior

Service-level maximum instances set (scenario where there are no revision-level settings): 100

Traffic spit for Revision A: 10%
Traffic split for Revision B: 10%
Traffic split for Revision C : 80%

A portion of the service-level maximum instances is allocated to each revision. The effective maximum instances for each revision is fixed based on traffic split. Maximum instances for Revision A is 10, Revision B is 10, and Revision C is 80.

Set maximum instances for services Stay organized with collections Save and categorize content based on your preferences.

Apply maximum instances at service-level versus revision-level

Tagged revisions and service-level maximum instances

Required roles

Configure service-level maximum instances

Console

gcloud

YAML

View service-level maximum instances

Console

gcloud

Configure revision-level maximum instances

Console

gcloud

YAML

View revision-level maximum instance settings

Console

gcloud

Use both service-level and revision-level minimum or maximum instances

Use service-level maximum instances with traffic splitting

Set maximum instances for services