Plan your Cloud TPU resources

This page describes how to plan your Tensor Processing Unit (TPU) usage.

Choose a consumption option

Consumption options refers to the ways to get and use compute resources. You can request Cloud TPU VM capacity based on your needs for speed, duration, cost, and preemption tolerance. Options include:

  • On-demand:Standard pay-as-you-go instances.
  • Spot VMs:Lower-cost, preemptible instances. Uses preemptible quota.
  • Flex-start VMs:Reserve capacity as needed, for up to 7 days, without long-term reservations or complex quota management.
  • Reservations:Reserve capacity for a specific duration (up to 90 days or 1 year+), guaranteeing availability. Uses on-demand quota.

For TPU v6e and later generations, you can also use GKE with TPU Cluster Director. This feature is available through an All Capacity mode reservation. It provides full access to your reserved capacity and complete visibility into the TPU's hardware layout, usage, and health. For more information, see All Capacity mode overview .

The following table compares TPU consumption options based on how they work, their ideal use cases, supported TPU versions and zones, and required quota types.

Consumption option How it works Best used for Supported TPU versions and zones Quota type for Cloud TPU API
Future reservations for one year or longer

You request TPU resources for one year or longer in advance. These resources are reserved for your exclusive use during that time.

Reservations provide the highest level of assurance for capacity and provide a lower price than on-demand resources.

Future TPU reservations include a committed use discount (CUD) CUDs provide discounted prices when you purchase a committed use contract. For more information, see Future reservations for one year or longer

Future reservations for one year or longer are ideal for long-running training jobs and inference workloads. All TPU versions : See TPU regions and zones On-demand quota
Future reservations for up to 90 days (calendar mode) ( Preview )

You request TPU resources for a specific start time and duration, between one and 90 days. These resources are reserved for your exclusive use during that time. For more information, see Future reservations for up to 90 days (in calendar mode)

Reservations provide the highest level of assurance for capacity and provide a lower price than on-demand resources.

Future reservations in calendar mode are a good fit for training and experimentation workloads that require precise start times and have a defined duration.

TPU7x (Ironwood) ( Preview ) for training and serving : us-central1-c

v6e (Trillium) for training and serving : asia-northeast1-b, us-east5-a

v5p for training and serving : us-east5-a

v5e for training : us-west4-a

v5e for serving : us-central1-a

No quota required
On-demand

You request TPU resources for immediate use, for as long as you need them.

On-demand provides significant flexibility. On-demand resources aren't preempted, but there's no guarantee that there are enough available TPU resources to satisfy your request. On-demand is the default option when you create TPU resources. For more information about creating and using on-demand TPUs, see Create TPU VMs .

On-demand is a good fit for urgent jobs and workloads that require a flexible end time. All TPU versions : See TPU regions and zones On-demand quota
Flex-start ( Preview )

You request TPU resources for a specific amount of time, up to seven days, without reserving capacity in advance.

TPU Flex-start VMs are delivered from a dedicated pool of capacity, so the availability of these resources is higher than on-demand. For more information, see Request TPU Flex-start VMs .

For more information about using TPU Flex-start VMs with Google Kubernetes Engine (GKE), see About GPU and TPU provisioning with flex-start provisioning mode .

Flex-start is ideal for experimentation, small-scale testing, dynamic provisioning of TPUs for inference workloads, model fine-tuning, and workload runs that take less than seven days.

TPU7x (Ironwood) ( Preview ) : us-central1-c (using GKE only)

v6e (Trillium) : asia-northeast1-b, us-east5-a

v5p : us-east5-a

v5e : us-west4-a

Preemptible quota
Spot

You request TPU resources that can be preempted.

Spot VMs are available at a significantly lower price than on-demand resources. Spot VMs are often easier to obtain than on-demand resources but can be preempted (shut down) at any time. There is no limit on runtime duration. For more information about TPU Spot VMs, see Manage TPU Spot VMs .

Spot is a good fit for scheduling lower priority workloads like model pre-training, model fine-tuning, and simulation jobs that are tolerant to availability disruptions. All TPU versions : See TPU regions and zones Preemptible quota

Request TPU quota

To use TPU VMs, regardless of the consumption option, you need either on-demand or preemptible quota for Cloud TPU cores or chips. Make sure you have enough quota for your chosen option, TPU version, size, and zone. Quotas are specific to each TPU version and differ for on-demand versus preemptible use. Some TPU versions have default quotas; for others, you must request quota. For more information, see Cloud TPU quotas .

If you use TPUs with Google Kubernetes Engine (GKE) , you need Compute Engine API quota instead of the standard TPU API quota. For more information about TPU quotas in GKE, see Ensure that you have TPU quota .

Choose TPU version

Select the TPU version, for example, v5e, v5p, v6e, or TPU7x (Ironwood) based on your model's training or inference needs. For more information, see TPU versions .

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: