This page describes how Gemini Enterprise Agent Platform manages quotas for Anthropic Claude models.
Overview
Anthropic Claude models on Agent Platform use one of the following quota systems:
- Models launched after May 26, 2026: Shared lineage quotas .
- Models launched before May 26, 2026: Per-model quotas .
Shared lineage quotas
Global and multi-regional endpoints for Anthropic Claude models launched after May 26, 2026 use shared model lineage quotas. A single quota limit is shared across all model versions in a model lineagefor a given location.
For example, if you call Claude Opus 4.8 through the global endpoint,
then all requests, input tokens, and output tokens count against the same anthropic-claude-opus
quota bucket for the global endpoint, regardless of
the specific Opus version that you call.
Quotas on the global endpoint and each multi-region endpoint are independent buckets. Usage on the global endpoint doesn't consume quota on a multi-region endpoint, and similarly usage on a multi-region endpoint doesn't consume quota on the global endpoint.
The following sections describe how shared lineage quotas work for global endpoints and multi-region endpoints.
Global endpoints
The following table describes metrics for global endpoints:
global_online_prediction_requests_per_base_model
base_model
: -
anthropic-claude-opus -
anthropic-claude-sonnet -
anthropic-claude-haiku -
anthropic-claude-mythos
global_online_prediction_input_tokens_per_minute_per_base_model
base_model
: -
anthropic-claude-opus -
anthropic-claude-sonnet -
anthropic-claude-haiku -
anthropic-claude-mythos
global_online_prediction_output_tokens_per_minute_per_base_model
base_model
: -
anthropic-claude-opus -
anthropic-claude-sonnet -
anthropic-claude-haiku -
anthropic-claude-mythos
Multi-region endpoints
The following table describes the metrics for multi-region endpoints, such as us
and eu
:
LOCATION
_multi_region_online_prediction_requests_per_base_model
base_model
: -
anthropic-claude-opus -
anthropic-claude-sonnet -
anthropic-claude-haiku -
anthropic-claude-mythos
LOCATION
_multi_region_online_prediction_input_tokens_per_minute_per_base_model
base_model
: -
anthropic-claude-opus -
anthropic-claude-sonnet -
anthropic-claude-haiku -
anthropic-claude-mythos
LOCATION
_multi_region_online_prediction_output_tokens_per_minute_per_base_model
base_model
: -
anthropic-claude-opus -
anthropic-claude-sonnet -
anthropic-claude-haiku -
anthropic-claude-mythos
Implications for your application
-
Adding a new version of a lineage doesn't require a new quota request.When a new version of a model lineage, such as a future Sonnet release, launches on a public endpoint, it shares the existing model lineage quota bucket. You don't need to file a separate quota increase to use the new version.
-
Mixing versions consumes the same quota bucket.Traffic split across multiple versions of the same model lineage draws from one shared quota. Plan capacity at the lineage level, not the version level.
View and manage your quotas
To view your current usage limits or request quota increases, go to the Quotas and system limits page in the Google Cloud console.
To filter quotas, use the model lineagevalue of base_model
, such as anthropic-claude-opus
.
Per-model quotas
Anthropic Claude models launched before May 26, 2026 have quotas based on the type of endpoint used: regional, multi-region, or global. The quota is specified in queries per minute (QPM) and tokens per minute (TPM). TPM includes both input and output tokens.
To maintain overall service performance and acceptable use, the maximum quotas might vary by account and, in some cases, access might be restricted. View your project's quotas on the Quotas & Systems Limits page in the Google Cloud console. You must also have the following quotas available:
-
Queries Per Minute (QPM):
- For regional endpoints:
online_prediction_requests_per_base_model - For the global endpoint:
global_online_prediction_requests_per_base_model - For the US multi-region endpoint:
us_multi_region_online_prediction_requests_per_base_model - For the EU multi-region endpoint:
eu_multi_region_online_prediction_requests_per_base_model
- For regional endpoints:
-
Tokens Per Minute (TPM):
- Some models count input and output tokens together:
- Regional:
online_prediction_tokens_per_minute_per_base_model - Global:
global_online_prediction_tokens_per_minute_per_base_model
- Regional:
- Other models count input and output tokens separately:
- Input TPM:
- Regional:
online_prediction_input_tokens_per_minute_per_base_model - Global:
global_online_prediction_input_tokens_per_minute_per_base_model - US Multi-Region:
us_multi_region_online_prediction_input_tokens_per_minute_per_base_model - EU Multi-Region:
eu_multi_region_online_prediction_input_tokens_per_minute_per_base_model
- Regional:
- Output TPM:
- Regional:
online_prediction_output_tokens_per_minute_per_base_model - Global:
global_online_prediction_output_tokens_per_minute_per_base_model - US Multi-Region:
us_multi_region_online_prediction_output_tokens_per_minute_per_base_model - EU Multi-Region:
eu_multi_region_online_prediction_output_tokens_per_minute_per_base_model
- Regional:
- Input TPM:
To see which models count input and output tokens separately, see Quotas by model and region .
- Some models count input and output tokens together:
Input tokens
The following list defines the input tokens that can count towards your input TPM quota. The input tokens that each model counts can vary. To see which input tokens a model counts, see Quotas by model and region .
- Input tokensincludes all input tokens, including cache read and cache write tokens.
- Uncached input tokensincludes only the input tokens that weren't read from a cache (cache read tokens).
- Cache write tokensincludes tokens that were used to create or update a cache.
Quotas by model and region
The following table shows the default quotas and supported context length for each model in each region.
Multi-region
- QPM: 1,000
- Input TPM: 10,000,000 uncached and cache write
- Output TPM: 1,000,000
Multi-region
- QPM: 1,000
- Input TPM: 10,000,000 uncached and cache write
- Output TPM: 1,000,000
global endpoint
- QPM: 2,000
- Input TPM: 20,000,000 uncached and cache write
- Output TPM: 2,000,000
Multi-region
- QPM: 400
- Input TPM: 4,000,000 uncached and cache write
- Output TPM: 400,000
global endpoint
- QPM: 800
- Input TPM: 8,000,000 uncached and cache write
- Output TPM: 800,000
us-east5
- QPM: 200
- Input TPM: 2,000,000 uncached and cache write
- Output TPM: 200,000
asia-southeast1
- QPM: 200
- Input TPM: 2,000,000 uncached and cache write
- Output TPM: 200,000
global endpoint
- QPM: 400
- Input TPM: 4,000,000 uncached and cache write
- Output TPM: 400,000
us-east5
- QPM: 1500
- Input TPM: 1,500,000 uncached and cache write
- Output TPM: 150,000
asia-southeast1
- QPM: 1500
- Input TPM: 1,500,000 uncached and cache write
- Output TPM: 150,000
global endpoint
- QPM: 1500
- Input TPM: 1,500,000 uncached and cache write
- Output TPM: 150,000
us-east5
- QPM: 200
- Input TPM: 2,000,000 uncached and cache write
- Output TPM: 200,000
us-east5
- QPM: 25
- Input TPM: 60,000 uncached and cache write
- Output TPM: 6,000
us-east5
- QPM: 1,500
- Input TPM: 1,500,000 uncached and cache write
- Output TPM: 150,000
europe-west1
- QPM: 1,800
- Input TPM: 1,800,000 uncached and cache write
- Output TPM: 180,000
asia-southeast1
- QPM: 1,500
- Input TPM: 1,500,000 uncached and cache write
- Output TPM: 150,000
global endpoint
- QPM: 1,500
- Input TPM: 1,500,000 uncached and cache write
- Output TPM: 150,000
us-east5
- QPM: 35
- Input TPM: 280,000 uncached and cache write
- Output TPM: 20,000
us-east5
- QPM: 90
- TPM: 540,000 (input and output)
europe-west1
- QPM: 55
- TPM: 330,000 (input and output)
global endpoint
- QPM: 25
- TPM: 140,000 (input and output)
us-east5
- QPM: 1,500
- Input TPM: 1,500,000 uncached and cache write
- Output TPM: 150,000
global endpoint
- QPM: 2,500
- Input TPM: 2,500,000 uncached and cache write
- Output TPM: 250,000
us-east5
- QPM: 80
- TPM: 350,000 (input and output)
europe-west1
- QPM: 90
- TPM: 400,000 (input and output)
us-east5
- QPM: 80
- TPM: 350,000 (input and output)
europe-west1
- QPM: 130
- TPM: 600,000 (input and output)
asia-southeast1
- QPM: 35
- TPM: 150,000 (input and output)
us-east5
- QPM: 20
- TPM: 105,000 (input and output)
us-east5
- QPM: 245
- TPM: 600,000 (input and output)
europe-west1
- QPM: 75
- TPM: 181,000 (input and output)
asia-southeast1
- QPM: 70
- TPM: 174,000 (input and output)
If you want to increase any of your quotas for Agent Platform, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see the Cloud Quotas overview .

